How to choose the right psychometric test
They are many psychometric tests out there, and it can be hard to get your bearings. Choosing the right tool has a major impact on your organization’s continuity, so it is important to be self-reliant and not just depend on test designers.
So how do you separate the wheat from the chaff?
Reliability tells you whether the psychometric test consistently measures what it claims to measure (e.g.: skills, personality traits, reasoning skills). Think of a scale: a reliable scale won’t fluctuate 10 kilos between weighings. Without reliability, an instrument is neither valid nor useful.
There are two indexes that indicate whether a test is reliable.
- Stability over time: is the score the same over time?
When you test candidates, their test results shouldn’t fluctuate within a short timeframe, such as two weeks. To check whether the test is stable over time, look at the test-retest index, which most test designers will provide. Generally, an acceptable test-retest index is higher than 0.70. The closer the index is to 1, the more stable the instrument over time. The closer the index is to 0.50, the more difficulty the instrument has reliably predicting the score.
- Internal consistency: is the score stable throughout the test?
When you evaluate a candidate’s extraversion, for instance, the level of extraversion should be unchanged throughout the test. To ensure the entire test provides a stable measurement of extraversion, you can ask to see Cronbach’s alpha. This index ensures that the test consistently measures what it claims to measure. An acceptable Cronbach’s alpha index is greater than 0.70. The closer the index is to 1, the more consistent the instrument. The closer the index is to 0.50, the more difficulty the instrument has predicting the score consistently.
There are a number of facets to validity, but when it comes to employment assessment tests, we are usually talking about predictive validity, i.e., whether the test effectively predicts a variable of interest, such as job performance. The variable of interest is not limited to performance, because you can also try to predict a drop in the number of accidents, reduced turnover or anything else your organization considers important.
To check whether a psychometric test has predictive validity, the index you want from the designers is the correlation between the test (or a sub-scale of the test) and the variable of interest (e.g.: performance, turnover, etc.). Since performance and accidents are complex phenomena influenced by many factors, you can’t really hope for a perfect correlation (r = 1). To give you a few benchmarks, a study (Schmidt and Hunter, 1998) reports that the strongest link between a test and job performance is r = 0.51 (it is observed between cognitive aptitude tests and job performance). A correlation between a test and a variable of interest (e.g.: performance) is strong if it is beyond the threshold of r = 0.50; it is moderate if it is beyond the threshold of r = 0.30 and weak if it is under r = 0.30. The higher the correlation, the more predictive the test.
A test can establish significant links with variables of interest for an organization. But if reliability indices aren’t at the above thresholds, the test can’t be considered valid.
Imagine if researchers were to show a connection between weight and job performance: the higher the person’s weight, the better the job performance. You might be tempted to include this measurement in your selection process. But if you were told that the scale used for the study showed weight that varied by 10 kilos in a given day for a given person, would you have the same confidence in the study’s results?
This is why psychometric test reliability is a precursor to test validity.