Abstract
Problems of scale typically arise when comparing test score trends, gaps, and gap trends across different tests. To overcome some of these difficulties, test score distributions on the same score scale can be represented by nonparametric graphs or statistics that are invariant under monotone scale transformations. This article motivates and then develops a framework for the comparison of these nonparametric trend, gap, and gap trend representations across tests. The connections between this framework and other nonparametric tools, including probability–probability (PP) plots, the Mann-Whitney U test, and the statistic known as P(Y > X), are highlighted. The author describes the advantages of this framework over scale-dependent trend and gap statistics and demonstrates applications of these nonparametric methods to frequently asked policy questions.
References
| Bamber, D The area above the ordinal dominance graph and the area below the receiver operating characteristic graphJournal of Mathematical Psychology197512387415 Google Scholar | Crossref | |
| Braun, HI, Qian, J Dorans, NJ, Pommerich, M, Holland, PW An enhanced method for mapping state standards onto the NAEP scaleLinking and aligning scores and scales2007New YorkSpringer313338 Google Scholar | Crossref | |
| Feuer, MJ, Holland, PW, Green, BF, Bertenthal, MW, Hemphill, FC Uncommon measures: Equivalence and linkage among educational tests1999Washington, DCNational Academy Press Google Scholar | |
| Gibbons, JD, Chakraborti, S Nonparametric statistical inference20034th edNew YorkMarcel Dekker Google Scholar | |
| Gini, C Measurement of inequality of incomesThe Economic Journal192131124126 Google Scholar | Crossref | |
| Haertel, EH, Lorie, WA Validating standards-based test score interpretationsMeasurement: Interdisciplinary Research and Perspectives2004261103 Google Scholar | Crossref | |
| Haertel, EH, Thrash, WA, Wiley, DE Metric-free distributional comparisons1978ChicagoML-Group for Policy Studies in Education Google Scholar | |
| Hanley, JA, McNeil, BJ The meaning and use of the area under a receiver operating characteristic (ROC) curveRadiology19821432936 Google Scholar | Crossref | Medline | |
| Hedges, LV, Olkin, I Statistical methods for meta-analysis1985Orlando, FLAcademic Press Google Scholar | Crossref | |
| Holland, PW Two measures of change in the gaps between the CDFs of test-score distributionsJournal of Educational and Behavioral Statistics200227317 Google Scholar | SAGE Journals | |
| Kolen, MJ, Brennan, RL Test equating, scaling, and linking: Methods and practices20042nd edNew YorkSpringer-Verlag Google Scholar | Crossref | |
| Koretz, D, Hamilton, L Brennan, RL Testing for accountability in K-12Educational measurement2006Westport, CTAmerican Council on Education and Praeger Publishers531578 Google Scholar | |
| Kotz, S, Lumelskii, Y, Pensky, M The stress-strength model and its generalizations2003River Edge, NJWorld Scientific Publishing Google Scholar | Crossref | |
| Lehmann, EL Ordered families of distributionsThe Annals of Mathematical Statistics195526399419 Google Scholar | Crossref | |
| Livingston, SA Double P-P plots for comparing differences between two groupsJournal of Educational and Behavioral Statistics200631431435 Google Scholar | SAGE Journals | |
| Lord, FM The “ability” scale in item characteristic curve theoryPsychometrika197520299326 Google Scholar | |
| Lord, FM Applications of item response theory to practical testing problems1980Hillsdale, NJErlbaum Google Scholar | |
| Lorenz, MO Methods of measuring the concentration of wealthPublication of the American Statistical Association19059209219 Google Scholar | Crossref | |
| Macmillan, NA, Creelman, CD Triangles in ROC space: History and theory of “nonparametric” measures of sensitivity and response biasPsychonomic Bulletin and Review19963164170 Google Scholar | Crossref | Medline | |
| Mann, HB, Whitney, DR On a test whether one of two random variables is stochastically larger than the otherAnnals of Mathematical Statistics1947185060 Google Scholar | Crossref | |
| McClellan, CA, Harris, NA Allen, NL, McClellan, CA, Stoeckel, JJ Data analysis for the NAEP 1999 long-term trend mathematics assessmentNAEP 1999 long-term trend technical analysis report: Three decades of student performance2005Washington, DCUS Government Printing Office6174NCES 2005-484U.S. Department of Education, National Center for Education Statistics, Google Scholar | |
| Metz, CE, Herman, BA, Shen, JH Maximum likelihood estimation of receiver operating characteristic (ROC) curves from continuously distributed dataStatistics in Medicine19981710331053 Google Scholar | Crossref | Medline | |
| Mislevy, RM, Johnson, EG, Muraki, E Scaling procedures in NAEPJournal of Educational Statistics199217131154 Google Scholar | SAGE Journals | |
| Simonoff, JS, Hochberg, Y, Reiser, B Alternative estimation procedures for P(X<Y) in categorized dataBiometrics198642895907 Google Scholar | Crossref | Medline | |
| Smith, WD Clarification of sensitivity measure AJournal of Mathematical Psychology1995398289 Google Scholar | Crossref | |
| Spencer, BD Test scores as social statistics: Comparing distributionsJournal of Educational Statistics1983a8249269 Google Scholar | SAGE Journals | |
| Spencer, BD On interpreting test scores as social indicators: Statistical considerationsJournal of Educational Measurement1983b20317333 Google Scholar | Crossref | |
| Swets, JA, Pickett, RM Evaluation of diagnostic systems: Methods from signal detection theory1982New YorkAcademic Press Google Scholar | |
| Tukey, JW Exploratory data analysis1977Cambridge, MAAddison-Wesley Google Scholar | |
| von Davier, AA, Holland, PW, Thayer, DT The kernel method of test equating2004New YorkSpringer-Verlag Google Scholar | Crossref | |
| Wilk, MB, Gnanadesikan, R Probability plotting methods for the analysis of dataBiometrika196855117 Google Scholar | Medline | |
| Yen, WM The choice of scale for educational measurement: An IRT perspectiveJournal of Educational Measurement198623299326 Google Scholar | Crossref | |
| Zwick, R Statistical and psychometric issues in the measurement of educational achievement trends: Examples from the National Assessment of Educational ProgressJournal of Educational Measurement199220299326 Google Scholar |
