The interpretation of psychometric test results is usually based on norm scores. We compared semiparametric continuous norming (SPCN) with conventional norming methods by simulating results for test scales with different item numbers and difficulties via an item response theory approach. Subsequently, we modeled the norm scores based on random samples with varying sizes either with a conventional ranking procedure or SPCN. The norms were then cross-validated by using an entirely representative sample of N = 840,000 for which different measures of norming error were computed. This process was repeated 90,000 times. Both approaches benefitted from an increase in sample size, with SPCN reaching optimal results with much smaller samples. Conventional norming performed worse on data fit, age-related errors, and number of missings in the norm tables. The data fit in conventional norming of fixed subsample sizes varied with the granularity of the age brackets, calling into question general recommendations for sample sizes in test norming. We recommend that test norms should be based on statistical models of the raw score distributions instead of simply compiling norm tables via conventional ranking procedures.

American Educational Research Association, American Psychological Association, National Council on Measurement in Education . (2014). Standards for educational and psychological testing. American Educational Research Association.
Google Scholar
American Psychiatric Association . (2013). Diagnostic and statistical manual of mental disorders: DSM-5 (5th ed.).
Google Scholar | Crossref
American Psychological Association . (n.d.). APA dictionary of psychology. Retrieved April 14, 2020, from https://dictionary.apa.org/reference-population
Google Scholar
Andersen, E., Madsen, M. (1977). Estimating the parameters of the latent population distribution. Psychometrika, 42(3), 357-374. https://doi.org/10.1007/BF02293656
Google Scholar
Arthur, D. (2012). Recruiting, interviewing, selecting & orienting new employees (5th ed.). AMACOM American Management Association.
Google Scholar
Bracken, B. A. (1988). Ten psychometric reasons why similar tests produce dissimilar results. Journal of School Psychology, 26(2), 155-166. https://doi.org/10.1016/0022-4405(88)90017-9
Google Scholar
Brosius, H.-B., Haas, A., Koschel, F. (2008). Methoden der empirischen Kommunikationsforschung [Methods of empirical communication sciences]. Springer VS.
Google Scholar
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Lawrence Erlbaum.
Google Scholar
Cole, T. J., Green, P. J. (1992). Smoothing reference centile curves: The LMS method and penalized likelihood. Statistics in Medicine, 11, 1305-1319.
Google Scholar | Crossref | Medline | ISI
De Ayala, R. J . (2009). The theory and practice of item response theory. Guilford Press.
Google Scholar
Duncan, B. A., Stevens, A. (2011). High-stakes standardized testing: Help or hindrance to public education? National Social Science Journal, 36(2), 35-42.
Google Scholar
Duvall, J. C., Morris, R. J. (2006). Assessing mental retardation in death penalty cases: Critical issues for psychology and psychological practice. Professional Psychology: Research and Practice, 37(6), 658-665. https://doi.org/10.1037/0735-7028.37.6.658
Google Scholar
Eid, M., Gollwitzer, M., Schmitt, M. (2017). Statistik und Forschungsmethoden [Statistics and research methods]. Beltz.
Google Scholar
Eid, M., Schmidt, K. (2014). Testtheorie und Testkonstruktion [Test theory and test construction]. Hogrefe.
Google Scholar
Embretson, S. E., Reise, S. P. (2000). Item response theory. Lawrence Erlbaum.
Google Scholar | Crossref
Faul, F., Erdfelder, E., Buchner, A., Lang, A.-G. (2009). Statistical power analyses using G*Power 3.1: Tests for correlation and regression analyses. Behavior Research Methods, 41(4), 1149-1160. https://doi.org/10.3758/BRM.41.4.1149
Google Scholar
Fox, J.-P., Klein Entink, R., van der Linden, W. (2007). Modeling of responses and response times with the Package CIRT. Journal of Statistical Software, 20(7), 1-14. https://doi.org/10.18637/jss.v020.i07
Google Scholar
Friedman, J., Hastie, T., Tibshirani, R. (2010). Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software, 33(1). https://doi.org/10.18637/jss.v033.i01
Google Scholar
Gregory, R. J. (1996). Psychological testing. History, principles, and applications (2nd ed.). Allyn & Bacon.
Google Scholar
Grob, A., Hagmann-von Arx, P. (2018). IDS-2: Intelligenz- und Entwicklungsskalen für Kinder und Jugendliche [Intelligence and development scales for children and adolescents]. Hogrefe.
Google Scholar
Hansen, B. E. (2004, May). Nonparametric estimation of smooth conditional distributions [Unpublished doctoral dissertation]. University of Wisconsin, Department of Economics.
Google Scholar
Horn, J. L., Cattell, R. B. (1967). Age differences in fluid and crystallized intelligence. Acta Psychologica, 26, 107-129. https://doi.org/10.1016/0001-6918(67)90011-X
Google Scholar
Kaufman, A. S., Kaufman, N. L. (2004). Kaufman Assessment Battery for Children (2nd ed.). Pearson Clinical Assessment.
Google Scholar
Kline, P. (2015). A handbook of test construction: Introduction to psychometric design. Routledge.
Google Scholar
Kubinger, K., Holocher-Ertl, S. (2014). Adaptives Intelligenz Diagnostikum 3 (AID3) [Adaptive Intelligence Diagnostic System]. Hogrefe.
Google Scholar
Lenhard, W., Lenhard, A., Gary, S. (2018). cNORM: Continuous Norming (Version 1.2.2). Vienna: The Comprehensive R Network. https://cran.r-project.org/web/packages/cNORM/
Google Scholar
Lenhard, A., Lenhard, W., Gary, S. (2019). Continuous norming of psychometric tests: A simulation study of parametric and semi-parametric approaches. PloS One, 14(9), e0222279. https://doi.org/10.1371/journal.pone.0222279
Google Scholar
Lenhard, W., Lenhard, A., Schneider, W. (2017). ELFE II - Ein Leseverstndnistest fr Erst- bis Siebtklssler [A reading comprehension test for grade 1 to 7]. Hogrefe.
Google Scholar
Lenhard, A., Lenhard, W., Segerer, R., Suggate, S. (2015). Peabody Picture Vocabulary Test - Revision IV (German Adaption). Pearson Assessment.
Google Scholar
Lenhard, A., Lenhard, W., Suggate, S., Segerer, R. (2016, Online first). A Continuous Solution to the Norming Problem. Assessment, 25(1), 112 -125. https://doi.org/10.1177/1073191116656437
Google Scholar
Lenhard, A., Bender, L., Lenhard, W. (in press). Einstufungstest Deutsch als Fremdsprache (E-DaF) [Placement test for German as a foreign language]. Heidelberg: Springer.
Google Scholar
Lienert, G. A., Raatz, U. (1998). Testaufbau und Testanalyse [Test construction and test analysis]. Psychologie Verlags Union.
Google Scholar
Lumley, T. (2017). leaps: Regression subset selection. https://cran.r-project.org/web/packages/leaps/index.html
Google Scholar
McDonald, R. P. (1999). Test theory: A unified treatment. Lawrence Erlbaum.
Google Scholar
Miller, A. J. (2002). Subset selection in regression (2nd ed.). Chapman & Hall/CRC.
Google Scholar | Crossref
Oosterhuis, H. E. M., van der Ark, L. A., Sijtsma, K. (2016). Sample size requirements for traditional and regression-based norms. Assessment, 23(2), 191-202. https://doi.org/10.1177/1073191115580638
Google Scholar
Rasch, G. (1980). Probabilistic model for some intelligence and achievement tests. University of Chicago Press.
Google Scholar
Rigby, R. A., Stasinopoulos, D. M. (2005). Generalized additive models for location, scale and shape. Journal of the Royal Statistical Society: Series C (Applied Statistics), 54(3), 507-554. https://doi.org/10.1111/j.1467-9876.2005.00510.x
Google Scholar
Snijders, J. Th., Tellegen, P. J., Laros, J. A. (1989). Snijders-Oomen non-verbal intelligence test: Manual and research report (SON-R 512–17). Wolters-Noordhoff.
Google Scholar
Soloman, S. R., Sawilowsky, S. S. (2009). Impact of rank-based normalizing transformations on the accuracy of test scores. Journal of Modern Applied Statistical Methods, 8(2), 448-462. https://doi.org/10.22237/jmasm/1257034080
Google Scholar
Stemmler, M., Lehfeld, H., Siebert, J., Horn, R. (2017). Ein kurzer Leistungstest zur Erfassung von Störungen des Gedächtnisses und der Aufmerksamkeit [A short performance test for assessing disorders of memory and attention]. Diagnostica, 63(4), 243-255. https://doi.org/10.1026/0012-1924/a000178
Google Scholar
Stern, W. (1912). Die psychologischen Methoden der Intelligenzprüfung [The psychological methods of testing intelligence]. Johann Ambrosius Barth.
Google Scholar
Stock, C., Marx, P., Schneider, W. (2017). Basiskompetenzen für Lese-Rechtschreibleistungen (BAKO 1-4) [Basic competencies for reading and spelling]. Hogrefe.
Google Scholar
Tellegen, P. J., Laros, J. A. (2012). SON-R 6-40: Non-verbal intelligence test: I. Research report. Hogrefe uitgevers.
Google Scholar
Terman, L. M. (1916). The measurement of intelligence: An explanation of and a complete guide for the use of the Stanford revision and extension of the Binet-Simon Intelligence Scale. Houghton Mifflin.
Google Scholar | Crossref
Van Breukelen, G. J. P., Vlaeyen, J. W. S. (2005). Norming clinical questionnaires with multiple regression: The Pain Cognition List. Psychological Assessment, 17(3), 336-344. https://doi.org/10.1037/1040-3590.17.3.336
Google Scholar
Voncken, L., Albers, C. J., Timmerman, M. E. (2019a). Model selection in continuous test norming with GAMLSS. Assessment, 26(7), 1329-1346. https://doi.org/10.1177/1073191117715113
Google Scholar
Voncken, L., Albers, C. J., Timmerman, M. E. (2019b). Improving confidence intervals for normed test scores: Include uncertainty due to sampling variability. Behavior Research Methods, 51(2), 826-839. https://doi.org/10.3758/s13428-018-1122-8
Google Scholar
Wasserman, J. D. (2018). A history of intelligence assessment: The unfinished tapestry. In Flanagan, D., McDonough, E. M. (Eds.), Contemporary intellectual assessment: Theories, tests, and issues (4th ed., pp. 3-55). Guilford Press.
Google Scholar
Wechsler, D. (1939). The measurement of adult intelligence. Williams & Wilkins.
Google Scholar | Crossref
Wechsler, D. (2014). WISC-V Technical and interpretive manual. Pearson.
Google Scholar
Wright, B. D., Stone, M. H. (1979). Best test design: Rasch measurement. Mesa Press.
Google Scholar
Zachary, R. A., Gorsuch, R. L. (1985). Continuous norming: Implications for the WAIS-R. Journal of Clinical Psychology, 41(1), 86-94. https://doi.org/10.1002/1097-4679(198501)41:1%3C86::AID-JCLP2270410115%3E3.0.CO;2-W
Google Scholar
Zhu, J., Chen, H.-Y. (2011). Utility of inferential norming with smaller sample sizes. Journal of Psychoeducational Assessment, 29(6), 570-580. https://doi.org/10.1177/0734282910396323
Google Scholar
Access Options

My Account

Welcome
You do not have access to this content.



Chinese Institutions / 中国用户

Click the button below for the full-text content

请点击以下获取该全文

Institutional Access

does not have access to this content.

Purchase Content

24 hours online access to download content

Research off-campus without worrying about access issues. Find out about Lean Library here

Your Access Options


Purchase

EPM-article-ppv for $37.50

Article available in:

Related Articles

Articles Citing this One: 0

Cookies Notification

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Find out more.
Top