Abstract
Test score distributions of schools or demographic groups are often summarized by frequencies of students scoring in a small number of ordered proficiency categories. We show that heteroskedastic ordered probit (HETOP) models can be used to estimate means and standard deviations of multiple groups’ test score distributions from such data. Because the scale of HETOP estimates is indeterminate up to a linear transformation, we develop formulas for converting the HETOP parameter estimates and their standard errors to a scale in which the population distribution of scores is standardized. We demonstrate and evaluate this novel application of the HETOP model with a simulation study and using real test score data from two sources. We find that the HETOP model produces unbiased estimates of group means and standard deviations, except when group sample sizes are small. In such cases, we demonstrate that a “partially heteroskedastic” ordered probit (PHOP) model can produce estimates with a smaller root mean squared error than the fully heteroskedastic model.
References
|
Agresti, A. (2002). Categorical data analysis (2nd ed.). Hoboken, NJ: John Wiley. Google Scholar | Crossref | |
|
Alvarez, R. M., Brehm, J. (1995). American ambivalence towards abortion policy: Development of a heteroskedastic probit model of competing values. American Journal of Political Science, 39, 1055–1082. Google Scholar | Crossref | |
|
Briggs, D. C. (2013). Measuring growth with vertical scales. Journal of Educational Measurement, 50, 204–226. Google Scholar | Crossref | |
|
Casella, G., Berger, R. L. (2002). Statistical inference (2nd ed.). Pacific Grove, CA: Duxbury. Google Scholar | |
|
Cizek, G. J. (Ed.). (2012). Setting performance standards: Foundations, methods, and innovations (2nd ed.). New York, NY: Routledge. Google Scholar | |
|
Cox, C. (1995). Location—Scale cumulative odds models for ordinal data: A generalized non-linear model approach. Statistics in Medicine, 14, 1191–1203. Retrieved from http://doi.org/10.1002/sim.4780141105 Google Scholar | |
|
Dorfman, D. D., Alf, E. (1969). Maximum-likelihood estimation of parameters of signal-detection theory and determination of confidence intervals—Rating-method data. Journal of Mathematical Psychology, 6, 487–496. Google Scholar | Crossref | |
|
Freeman, E., Keele, L., Park, D., Salzman, J., Weickert, B. (2015, 8 14). The plateau problem in the heteroskedastic probit model. Retrieved from http://arxiv.org/abs/1508.03262v1 Google Scholar | |
|
Greene, W. H., Hensher, D. A. (2010). Modeling ordered choices: A primer. New York, NY: Cambridge University Press. Google Scholar | Crossref | |
|
Gu, Y., Fiebig, D. G., Cripps, E., Kohn, R. (2009). Bayesian estimation of a random effects heteroscedastic probit model. Econometrics Journal, 12, 324–339. Retrieved from http://doi.org/10.1111/j.1368-423X.2009.00283.x Google Scholar | |
|
Hedberg, E. C., Hedges, L. V. (2014). Reference values of within-district intraclass correlations of academic achievement by district characteristics: Results from a meta-analysis of district-specific values. Evaluation Review, 38, 546–582. Retrieved from http://doi.org/10.1177/0193841X14554212 Google Scholar | |
|
Hedeker, D., Demirtas, H., Mermelstein, R. J. (2009). A mixed ordinal location scale model for analysis of ecological momentary assessment (EMA) data. Statistics and Its Interface, 2, 391. Google Scholar | Crossref | Medline | |
|
Hedges, L. V., Hedberg, E. C. (2007). Intraclass correlation values for planning group-randomized trials in education. Educational Evaluation and Policy Analysis, 29, 60–87. Retrieved from http://doi.org/10.3102/0162373707299706 Google Scholar | |
|
Ho, A. D. (2008). The problem with “Proficiency”: Limitations of statistics and policy under No Child Left Behind. Educational Researcher, 37, 351–360. Retrieved from http://doi.org/10.3102/0013189X08323842 Google Scholar | |
|
Ho, A. D. (2009). A nonparametric framework for comparing trends and gaps across tests. Journal of Educational and Behavioral Statistics, 34, 201–228. Retrieved from http://doi.org/10.3102/107699860933275 Google Scholar | |
|
Ho, A. D., Reardon, S. F. (2012). Estimating achievement gaps from test scores reported in ordinal “Proficiency” categories. Journal of Educational and Behavioral Statistics, 37, 489–517. Retrieved from http://doi.org/10.3102/1076998611411918 Google Scholar | |
|
Holland, P. W. (2002). Two measures of change in the gaps between the CDFs of test-score distributions. Journal of Educational and Behavioral Statistics, 27, 3–17. Retrieved from http://doi.org/10.3102/10769986027001003 Google Scholar | |
|
Horowitz, J. L., Sparmann, J. M., Daganzo, C. F. (1982). An investigation of the accuracy of the Clark approximation for the multinomial probit model. Transportation Science, 16, 382–401. Retrieved from http://doi.org/10.1287/trsc.16.3.382 Google Scholar | |
|
Jacob, R. T., Goddard, R. D., Kim, E. S. (2013). Assessing the use of aggregate data in the evaluation of school-based interventions: Implications for evaluation research and state policy regarding public-use data. Educational Evaluation and Policy Analysis. Retrieved from http://doi.org/10.3102/0162373713485814 Google Scholar | |
|
Jennings, J. (2011). Open letter to the member states of PARCC and SBAC. Center on Education Policy. Retrieved from http://www.cep-dc.org/displayDocument.cfm?DocumentID=359 Google Scholar | |
|
Keane, M. P. (1992). A note on identification in the multinomial probit model. Journal of Business & Economic Statistics, 10, 193–200. Retrieved from http://doi.org/10.1080/07350015.1992.10509898 Google Scholar | |
|
Keele, L., Park, D. K. (2006, 3). Difficult choices: An evaluation of heterogeneous choice models (Working Paper). Retrieved from http://www3.nd.edu/˜rwilliam/oglm/ljk-021706.pdf Google Scholar | |
|
Lord, F. M. (1980). Applications of item response theory to practical testing problems. New York, NY: Routledge. Google Scholar | |
|
McCullagh, P. (1980). Regression models for ordinal data. Journal of the Royal Statistical Society, Series B (Methodological), 42, 109–142. Google Scholar | |
|
Mislevy, R. J., Johnson, E. G., Muraki, E. (1992). Chapter 3: Scaling procedures in NAEP. Journal of Educational and Behavioral Statistics, 17, 131–154. Retrieved from http://doi.org/10.3102/10769986017002131 Google Scholar | |
|
Neter, J., Wasserman, W., Kutner, M. H. (1990). Applied linear statistical models: Regression, analysis of variance, and experimental designs (3rd ed.). Homewood, IL: Richard D. Irwin, Inc. Google Scholar | |
|
Reardon, S. F., Ho, A. D. (2015). Practical issues in estimating achievement gaps from coarsened data. Journal of Educational and Behavioral Statistics, 40, 158–189. Retrieved from http://doi.org/10.3102/1076998615570944 Google Scholar | |
|
Shear, B. R., Castellano, K. E., Lockwood, J. R. (2016, 4). Using the Fay-Herriot model to improve inferences from coarsened proficiency data. Presented at the National Council on Measurement in Education 2016 Annual Meeting, Washington, DC. Google Scholar | |
|
StataCorp . (2013). Stata statistical software: Release 13. College Station, TX: StataCorp LP. Google Scholar | |
|
Tosteson, A. N. A., Begg, C. B. (1988). A general regression methodology for ROC curve estimation. Medical Decision Making, 8, 204–215. Retrieved from http://doi.org/10.1177/0272989X8800800309 Google Scholar | |
|
U.S. Department of Education . (2015). State assessments in reading/language arts and mathematics: School year 2012-13 EDFacts Data Documentation. Washington, DC. Retrieved from http://www.ed.gov/edfacts Google Scholar | |
|
Williams, R. (2009). Using heterogeneous choice models to compare logit and probit coefficients across groups. Sociological Methods & Research, 37, 531–559. Retrieved from http://doi.org/10.1177/0049124109335735 Google Scholar | |
|
Williams, R. (2010). Fitting heterogeneous choice models with oglm. The Stata Journal, 10, 540–567. Google Scholar | SAGE Journals |

