Abstract
Reliability of test scores is one of the most pervasive psychometric concepts in measurement. Reliability coefficients based on a unifactor model for continuous indicators include maximal reliability ρ and an unweighted sum score–based ω, among many others. With increasing popularity of item response theory, a parallel reliability measure π has been introduced using the information function. This article studies the relationship among the three reliability coefficients. Exploiting the equivalency between item factor analysis and the normal ogive model, π(2) for dichotomous data is shown to be always smaller than ρ. Additional results imply that ω is typically greater than π(2) in practical conditions, though mathematically there is no dominant relationship between π(2) and ω. Furtherresults indicate that, as the number of response categories increases, π can surpass ω. The reasons why π and ω fall short of ρ are also explored from an information gain/loss perspective. Implications of the findings on scale development and analysis are discussed.
References
|
Andrich, D. (1982). An index of person separation in latent trait theory, the traditional KR-20 index and the Guttman scale response pattern. Education Research and Perspectives, 9, 95-104. Google Scholar | |
|
Bartholomew, D. J., Deary, I. J., Lawn, M. (2009). The origin of factor scores: Spearman, Thomson and Bartlett. British Journal of Mathematical and Statistical Psychology, 62, 569-582. Google Scholar | Crossref | Medline | ISI | |
|
Bechger, T. M., Maris, G., Verstralen, H., Béguin, A. A. (2003). Using classical test theory in combination with item response theory. Applied Psychological Measurement, 27, 319-334. Google Scholar | SAGE Journals | ISI | |
|
Bentler, P. M. (1968). Alpha-maximized factor analysis (Alphamax): Its relation to alpha and canonical factor analysis. Psychometrika, 33, 335-345. Google Scholar | Crossref | Medline | ISI | |
|
Bentler, P. M. (2007). Covariance structure models for maximal reliability of unit-weighted composites. In Lee, S.-Y. (Ed.), Handbook of latent variable and related models (pp. 1-19). Amsterdam, Netherlands: Elsevier North-Holland. Google Scholar | |
|
Bentler, P. M. (2009). Alpha, dimension-free, and model-based internal consistency reliability. Psychometrika, 74, 137-143. Google Scholar | Crossref | Medline | ISI | |
|
Birnbaum, A . (1968). Some latent trait models and their use in inferring an examinee s ability. In Lord, F.M, Novick, M. R. (eds.), Statistical theories of mental test scores, (pp. 397-472), Reading, MA: Addison-Wesley. Google Scholar | |
|
Bock, R. D. (1972). Estimating item parameters and latent ability when responses are scored in two or more nominal categories. Psychometrika, 37, 29-51. Google Scholar | Crossref | ISI | |
|
Bock, R. D., Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika, 46, 443-458. Google Scholar | Crossref | ISI | |
|
Dimitrov, D. M. (2003). Marginal true-score measures and reliability for binary items as a function of their IRT parameters. Applied Psychological Measurement, 27, 440-458. Google Scholar | SAGE Journals | ISI | |
|
Graham, J. M. (2006). Congeneric and (essentially) tau-equivalent estimates of score reliability: What they are and how to use them. Educational and Psychological Measurement, 66, 930-944. Google Scholar | SAGE Journals | ISI | |
|
Green, B. F. (1952). A note on the calculation of weights for maximum battery reliability. Psychometrika, 17, 57-61. Google Scholar | |
|
Green, S. B., Yang, Y. (2009a). Commentary on coefficient alpha: A cautionary tale. Psychometrika, 74, 121-135. Google Scholar | Crossref | ISI | |
|
Green, S. B., Yang, Y. (2009b). Reliability of summed item scores using structural equation modeling: An alternative to coefficient alpha. Psychometrika, 74, 155-167. Google Scholar | Crossref | ISI | |
|
Kim, S., Feldt, L. S. (2010). The estimation of the IRT reliability coefficient and its lower and upper bounds, with comparisons to CTT reliability statistics. Asia Pacific Education Review, 11, 179-188. Google Scholar | Crossref | ISI | |
|
Kistner, E. O., Muller, K. E. (2004). Exact distributions of intraclass correlation and Cronbach’s alpha with Gaussian data and general covariance. Psychometrika, 69, 459-474. Google Scholar | Crossref | Medline | ISI | |
|
Knott, M., Bartholomew, D. J. (1993). Constructing measures with maximum reliability. Psychometrika, 58, 331-338. Google Scholar | Crossref | ISI | |
|
Kolen, M. J., Zeng, L., Hanson, B. A. (1996). Conditional standard errors of measurement for scale scores using IRT. Journal of Educational Measurement, 33, 129-140. Google Scholar | Crossref | ISI | |
|
Li, H. (1997). A unifying expression for the maximal reliability of a linear composite. Psychometrika, 62, 245-249. Google Scholar | Crossref | ISI | |
|
Lord, F. M., Novick, M. R. (1968). Statistical theories of mental test scores. Reading, MA: Addison-Wesley. Google Scholar | |
|
MacCann, R. G. (2004). Reliability as a function of the number of item options derived from the “knowledge or random guessing” model. Psychometrika, 69, 147-157. Google Scholar | Crossref | ISI | |
|
McDonald, R. P. (1999). Test theory: A unified treatment. Mahwah, NJ: Lawrence Erlbaum. Google Scholar | |
|
Mellenbergh, G. J. (1994). A unidimensional latent trait model for continuous item responses. Multivariate Behavioral Research, 29, 223-236. Google Scholar | Crossref | Medline | ISI | |
|
Mellenbergh, G. J. (1996). Measurement precision in test score and item response models. Psychological Methods, 1, 293-299. Google Scholar | Crossref | ISI | |
|
Muraki, E., Engelhard, G. (1985). Full-information item factor analysis: Applications of EAP scors. Applied Psychological Measurement, 9, 417-430. Google Scholar | SAGE Journals | ISI | |
|
Nicewander, W. A. (1990). A latent-trait based reliability estimate and upper bound. Psychometrika, 55, 65-74. Google Scholar | Crossref | ISI | |
|
Nicewander, W. A. (1993). Some relationships between the information function of IRT and the signal/noise ratio and reliability coefficient of classical test theory. Psychometrika, 58, 139-141. Google Scholar | Crossref | ISI | |
|
Penev, S., Raykov, T. (2006). Maximal reliability and power in covariance structure models. British Journal of Mathematical and Statistical Psychology, 59, 75-87. Google Scholar | Crossref | Medline | ISI | |
|
Raykov, T. (1997). Scale reliability, Cronbach’s coefficient alpha, and violations of essential tau-equivalence with fixed congeneric components. Multivariate Behavioral Research, 32, 329-353. Google Scholar | Crossref | Medline | ISI | |
|
Raykov, T. (2004). Estimation of maximal reliability: A note on a covariance structure modeling approach. British Journal of Mathematical and Statistical Psychology, 57, 21-27. Google Scholar | Crossref | Medline | ISI | |
|
Raykov, T., Penev, S. (2006). A direct method for obtaining approximate standard error and confidence interval for maximal reliability for composites with congeneric measures. Multivariate Behavioral Research, 41, 15-28. Google Scholar | Crossref | Medline | ISI | |
|
Revelle, W., Zinbarg, R. E. (2009). Coefficients alpha, beta, omega, and the glb: Comments on Sijtsma. Psychometrika, 74, 145-154. Google Scholar | Crossref | ISI | |
|
Samejima, F. (1969). Estimation of latent ability using a response pattern of graded scores. (Psychometric Monograph No. 17). Richmond, VA: Psychometric Society. Google Scholar | Crossref | |
|
Samejima, F. (1994). Estimation of reliability coefficients using the test information function and its modifications. Applied Psychological Measurement, 18, 229-244. Google Scholar | SAGE Journals | ISI | |
|
Sijtsma, K. (2009a). On the use, the misuse, and the very limited usefulness of Cronbach’s alpha. Psychometrika, 74, 107-120. Google Scholar | Crossref | Medline | ISI | |
|
Sijtsma, K. (2009b). Reliability beyond theory and into practice. Psychometrika, 74, 169-173. Google Scholar | Crossref | Medline | ISI | |
|
Takane, Y., de Leeuw, J. (1987). On the relationship between item response theory and factor analysis of discretized variables. Psychometrika, 52, 393-408. Google Scholar | Crossref | ISI | |
|
Thompson, G. H. (1940). Weighting for battery reliability and prediction. British Journal of Mathematical and Statistical Psychology, 30, 357-360. Google Scholar | |
|
van Zyl, J. M., Neudecker, H., Nel, D. G. (2000). On the distribution of the maximum likelihood estimator of Cronbach’s alpha. Psychometrika, 65, 271-280. Google Scholar | Crossref | ISI | |
|
Yuan, K.-H., Bentler, P. M. (2002). On robustness of the normal-theory based asymptotic distributions of three reliability coefficient estimates. Psychometrika, 67, 251-259. Google Scholar | Crossref | ISI | |
|
Zinbarg, R., Revelle, W., Yovel, I., Li, W. (2005). Cronbach’s α, Revelle’s β, and McDonald’s ωh: Their relations with each other and two alternative conceptualizations of reliability. Psychometrika, 70, 122-133. Google Scholar | Crossref | ISI |
