Abstract
This paper considers some appropriate and inappropriate uses of coefficient kappa and alternative kappa-like statistics. Discussion is restricted to the descriptive characteristics of these statistics for measuring agreement with categorical data in studies of reliability and validity. Special consideration is given to assumptions about whether marginals are fixed a priori, or free to vary. In reliability studies, when marginals are fixed, coefficient kappa is found to be appropriate. When either or both of the marginals are free to vary, however, it is suggested that the "chance" term in kappa be replaced by 1/n, where n is the number of categories. In validity studies, we suggest considering whether one wants an index of improvement beyond "chance" or beyond the best a priori strategy employing base rates. In the former case, considerations are similar to those in reliability studies with the marginals for the criterion measure considered as fixed. In the latter case, it is suggested that the largest marginal proportion for the criterion measure be used in place of the "chance" term in kappa. Similarities and differences among these statistics are discussed and illustrated with synthetic data.
|
Bennett, E.M. , Alpert, R. , and Goldstein, A.C. Communications through limited response questioning. Public Opinion Quarterly, 1954, 18, 303-308. Google Scholar | Crossref | ISI | |
|
Cohen, J. A coefficient of agreement for nominal scales. EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT, 1960, 20, 37-46. Google Scholar | SAGE Journals | ISI | |
|
Cohen, J. Weighted kappa: Nominal scale agreement with provision for scaled disagreement or partial credit. Psychological Bulletin, 1968, 70, 213-220. Google Scholar | Crossref | Medline | ISI | |
|
Crittenden, K.S. and Hill, R.J. Coding reliability and validity of interview data. American Sociological Review, 1971, 36, 1073-1080. Google Scholar | Crossref | ISI | |
|
Cronbach, L.J. and Gleser, G.C. Psychological tests and personnel decisions. Urbana : University of Illinois Press, 1965. Google Scholar | |
|
Fleiss, J.L. Measuring agreement between two judges on the presence or absence of a trait . Biometrics, 1975, 31, 651-659. Google Scholar | Crossref | Medline | ISI | |
|
Goodman, L.A. and Kruskal, W.H. Measures of association for cross classifications. American Statistical Association Journal, 1954, 49, 732-764. Google Scholar | ISI | |
|
Gottfredson, G.D. and Holland, J.L. Vocational choices of men and women: A comparison of predictors from the self-directed search. Journal of Counseling Psychology , 1975, 22, 28-34. Google Scholar | Crossref | ISI | |
|
Guttman, L. Mathematical and tabulation techniques. In P. Horst (Ed.) The prediction of personal adjustment (Bulletin 48). New York: Social Science Council, 1941. Google Scholar | |
|
Guttman, L. The test-retest reliability of qualitative data. Psychometrika , 1946, 11, 81-95. Google Scholar | Crossref | Medline | |
|
Kendall, M.G. and Stuart, A. The advanced theory of statistics (Second edition, Vol. 2). New York: Hafner, 1967. Google Scholar | |
|
Krippendorff, K. Bivariate agreement coefficients for reliability of data. In E. F. Borgatta and G. W. Bohrnstedt (Eds.), Sociological methodology. San Francisco: Jossey-Bass, 1970 . Google Scholar | |
|
Lawlis, G.F. and Lu, E. Judgments of counseling process: Reliability, agreement, and error . Psychological Bulletin, 1972, 78, 17-20. Google Scholar | Crossref | Medline | ISI | |
|
Marx, T.J. Statistical measurement of agreement for data in the nominal scale with applications to educational research and decisions. Unpublished doctoral dissertation, Harvard University, 1972. Google Scholar | |
|
Scott, W.A. Reliability of content analysis: The case of nominal scale coding . Public Opinion Quarterly, 1955, 19, 321-325. Google Scholar | Crossref | ISI | |
|
Tinsley, H.A. and Weiss, D.J. Interrater reliability and agreement of subjective judgments. Journal of Counseling Psychology, 1975, 22, 358-376. Google Scholar | Crossref | ISI | |
|
Touchton, J.G. and Magoon, T.M. Occupational daydreams as predictors of vocational plans of college women . Journal of Vocational Behavior, 1977 , 10, 156-166. Google Scholar | Crossref | ISI |
