Diagnostic tools can help schools more consistently and fairly match instructional resources to the needs of their students. To ensure the best educational outcome for each child, diagnostic decision-making systems seek to balance time, clarity, and accuracy. However, recent research notes that many educational decisions tend to be made using professional judgment alone. Judgments grounded on data, statistical models, and even informal prediction models, however, outperform those based on intuition alone. The purpose of this manuscript is to describe the theoretical basis for signal detection and methods for statistically evaluating diagnostic decisions in education. We make recommendations to help test developers and consumers apply this methodology to other diagnostic systems in education and interpret the use of signal detection methods for educational screeners and diagnostic tests.

Ardoin, S. P., Christ, T. J., Morena, L. S., Cormier, D. C., Klingbeil, D. A. (2013). A systematic review and summarization of the recommendations and research surrounding Curriculum-Based Measurement of oral reading fluency (CBM-R) decision rules. Journal of School Psychology, 51, 118.
Google Scholar | Crossref | Medline | ISI
Baker, S. K., Smolkowski, K., Katz, R., Fien, H., Seeley, J. R., Kame‘enui, E. J., Thomas Beck, C. (2008). Reading fluency as a predictor of reading proficiency in low-performing, high-poverty schools. School Psychology Review, 37, 1837.
Google Scholar | ISI
Baker, S. K., Smolkowski, K., Smith, J. M., Fien, H., Kame‘enui, E. J., Thomas Beck, C. (2011). The impact of Oregon Reading First on student reading outcomes. Elementary School Journal, 112, 307331.
Google Scholar | Crossref | ISI
Brooks, H. E. (2004). Tornado-warning performance in the past and future: A perspective from signal detection theory. Bulletin of the American Meteorological Society, 85, 837843.
Google Scholar | Crossref | ISI
Burkel, R. H., Chiou, C.-P., Keyes, T. K., Meeker, W. Q., Rose, J. H., Sturges, D. J., . . . Tucker, W. T. (2002). A methodology for the assessment of the capability of inspection systems for detection of subsurface flaws in aircraft turbine engine components (Final Report, DOT/FAA/AR-01/96). Washington, DC: Federal Aviation Administration, Office of Aviation Research, U.S. Department of Transportation.
Google Scholar
Cohen, J. (1988). Statistical power analysis for the behavioral sciences. Hillsdale, NJ: Lawrence Erlbaum.
Google Scholar
Dana, J., Dawes, R. M. (2004). The superiority of simple alternatives to regression for social science predictions. Journal of Educational and Behavioral Statistics, 29, 317331. doi:10.3102/10769986029003317
Google Scholar | SAGE Journals | ISI
Dawes, R. M. (1962). A note on base rates and psychometric efficiency. Journal of Consulting Psychology, 26, 422424.
Google Scholar | Crossref | Medline
Dawes, R. M. (1979). The robust beauty of improper linear models in decision making. American Psychologist, 34, 571582. doi:10.1037/0003-066X.34.7.571
Google Scholar | Crossref | ISI
Fleiss, J. L. (1981). Statistical methods for rates and proportions (2nd ed.). New York, NY: Wiley.
Google Scholar
Florida Department of Education . (2014). Florida Assessments for Instruction in Reading Aligned to the Language Arts Florida Standards: FAIR – FS Grades 3 through 12 Administration Manual. Tallahassee: Author. Retrieved from http://www.fldoe.org/core/fileparse.php/5423/urlt/312AdminManual.pdf
Google Scholar
Gigerenzer, G., Goldstein, D. (1996). Reasoning the fast and frugal way: Models of bounded rationality. Psychological Review, 103, 650669.
Google Scholar | Crossref | Medline | ISI
Goldberg, L. R. (1970). Man versus model of man: A rational, plus some evidence, for a method of improving on clinical inference. Psychological Bulletin, 73, 422432.
Google Scholar | Crossref | ISI
Good, R. H., Kaminski, R. A. (Eds.). (2002). Dynamic Indicators of Basic Early Literacy Skills (6th ed.). Eugene, OR: Institute for the Development of Educational Achievement. Available from http://dibels.uoregon.edu/
Google Scholar
Good, R. H., Kaminski, R. A., Dewey, E. N., Wallin, J., Powell-Smith, K. A., Latimer, R. J. (2011). DIBELS Next technical manual [draft]. Eugene, OR: Dynamic Measurement Group.
Google Scholar
Grove, W. M., Zald, D. H., Lebow, B. S., Snitz, B. E., Nelson, C. (2000). Clinical versus mechanical prediction: A meta-analysis. Psychological Assessment, 12, 1930.
Google Scholar | Crossref | Medline | ISI
Harcourt Educational Measurement . (2002). Stanford Achievement Test [SAT-10]. San Antonio, TX: Author.
Google Scholar
Harper, R., Reeves, B. (1999). Reporting of precision of estimates for diagnostic accuracy: A review. British Medical Journal, 318, 13221323.
Google Scholar | Crossref | Medline
Hedges, L. V., Olkin, I. (1985). Statistical methods for meta-analysis. New York, NY: Academic Press.
Google Scholar | Crossref
Hintze, J. M., Ryan, A. L., Stoner, G. (2003). Concurrent validity and diagnostic accuracy of the dynamic indicators of basic early literacy skills and the Comprehensive Test of Phonological Processing. School Psychology Review, 32, 541556.
Google Scholar | ISI
Jenkins, J. R., Hudson, R. F., Johnson, E. S. (2007). Screening for at-risk readers in a response to intervention framework. School Psychology Review, 36, 582600.
Google Scholar | ISI
Kahneman, D. (2011). Thinking, fast and slow. New York, NY: Farrar, Straus and Giroux.
Google Scholar
Katz, D., Foxman, B. (1993). How well do prediction equations predict? Using receiver operating characteristic curves and accuracy curves to compare validity and generalizability. Epidemiology, 4, 319326.
Google Scholar | Crossref | Medline | ISI
Kettler, R. J., Feeney-Kettler, K. A. (2011). Screening systems and decision making at the preschool level: Application of a comprehensive validity framework. Psychology in the Schools, 48, 430441. doi:10.1002/pits.20565
Google Scholar | Crossref | ISI
Kloo, A., Zigmond, N. (2008, February). Implementing progress monitoring in a really low achieving school among very low-skilled teachers. Paper presented at the 2008 annual Pacific Coast Research Conference, Coronado, CA.
Google Scholar
Kraemer, H. (1992). Evaluating medical tests: Objective and quantitative guidelines. Newbury Park, CA: SAGE.
Google Scholar
Lynn, S. K., Barrett, L. F. (2014). “Utilizing” signal detection theory. Psychological Science, 25, 16631673. doi:10.1177/0956797614541991
Google Scholar | SAGE Journals | ISI
MacGinitie, W., MacGinitie, R. (2006). Gates-MacGinitie Reading Tests (4th ed.). Iowa City, IA: Houghton Mifflin.
Google Scholar
Malhotra, R., Indrayan, A. A. (2010). A simple nomogram for sample size for estimating sensitivity and specificity of medical tests. Indian Journal of Ophthalmology, 58, 519522. doi:10.4103/0301-4738.71699
Google Scholar | Crossref | Medline | ISI
Martin, S. D., Shapiro, E. S. (2011). Examining the accuracy of teachers’ judgments of DIBELS performance. Psychology in the Schools, 48, 343356.
Google Scholar | Crossref | ISI
Mason, S. J., Graham, N. E. (1999). Conditional probabilities, relative operating characteristics, and relative operating levels. Weather and Forecasting, 14, 713725.
Google Scholar | Crossref | ISI
McAlenney, A. L., Coyne, M. D. (2015). Addressing false positives in early reading assessment using intervention response data. Learning Disability Quarterly, 38, 5365.
Google Scholar | SAGE Journals | ISI
McGrath, R. E. (2008). Predictor combination in binary decision-making situations. Psychological Assessment, 20, 195205. doi:10.1037/a0013175
Google Scholar | Crossref | Medline | ISI
Meehl, P. E. (1954). Clinical versus statistical prediction: A theoretical analysis and a review of the evidence. Minneapolis: University of Minnesota Press.
Google Scholar | Crossref
Nelson, J. M. (2008). Beyond correlational analysis of the Dynamic Indicators of Basic Early Literacy Skills (DIBELS): A classification validity study. School Psychology Quarterly, 23, 542552.
Google Scholar | Crossref | ISI
Neyman, J., Pearson, E. S. (1933). On the problem of the most efficient tests of statistical hypotheses. Philosophical Transactions of the Royal Society of London, 231, 289337.
Google Scholar | Crossref
Pepe, M. S. (2003). The statistical evaluation of medical tests for classification and prediction. New York, NY: Oxford University Press.
Google Scholar
Peterson, W. W., Birdsall, T. G., Fox, W. C. (1954). The theory of signal detectability. IRE Professional Group on Information Theory, 4, 171212.
Google Scholar | Crossref
Petscher, Y., Kim, Y.-S., Foorman, B. R. (2011). The importance of predictive power in early screening assessments: Implications for placement in the response to intervention framework. Assessment for Effective Intervention, 36, 158166.
Google Scholar | SAGE Journals
Rice, M. E., Harris, G. T. (2005). Comparing effect sizes in follow-up studies: ROC area, Cohen’s d, and r. Law and Human Behavior, 29, 615620.
Google Scholar | Crossref | Medline | ISI
Ritchie, S. J., Bates, T. C. (2013). Enduring links from childhood mathematics and reading achievement to adult socioeconomic status. Psychological Science, 24, 13011308. doi:10.1177/0956797612466268
Google Scholar | SAGE Journals | ISI
Roberts, R., Lewinsohn, P., Seeley, J. (1991). Screening for adolescent depression: A comparison of depression scales. Journal of the American Academy of Child & Adolescent Psychiatry, 30, 5866.
Google Scholar | Crossref | Medline | ISI
Salvia, J., Ysseldyke, J. E., Bolt, S. (2013). Assessment: In special and inclusive education. Belmont, CA: Wadsworth.
Google Scholar
SAS Institute . (2009). SAS/STAT® 9.2 user’s guide (2nd ed.). Cary, NC: Author. Retrieved from http://support.sas.com/documentation/index.html
Google Scholar
Seeley, J. R., Stice, E., Rohde, P. (2009). Screening for depression prevention: Identifying adolescent girls at high risk for future depression. Journal of Abnormal Psychology, 118, 161170. doi:10.1037/a0014741
Google Scholar | Crossref | Medline | ISI
Silberglitt, B., Hintze, J. (2005). Formative assessment using CBM-R cut scores to track progress toward success on state-mandated achievement tests: A comparison of methods. Journal of Psychoeducational Assessment, 23, 304325.
Google Scholar | SAGE Journals | ISI
Slovic, P., Rorer, L., Hoffman, P. (1971). Analyzing use of diagnostic signs. Investigative Radiology, 6, 1826.
Google Scholar | Crossref | Medline | ISI
Streiner, D. L., Cairney, J. (2007). What’s under the ROC? An introduction to receiver operating characteristics curves. Canadian Journal of Psychiatry, 52, 121128.
Google Scholar | SAGE Journals | ISI
Swets, J. A. (1973). The relative operating characteristic in psychology. Science, 182, 9901000.
Google Scholar | Crossref | Medline | ISI
Swets, J. A. (1988). Measuring the accuracy of diagnostic systems. Science, 240, 12851293.
Google Scholar | Crossref | Medline | ISI
Swets, J. A. (1996). Signal detection theory and ROC analysis in psychology and diagnostics: Collected papers. Hillsdale, NJ: Lawrence Erlbaum.
Google Scholar
Swets, J. A., Dawes, R. M., Monahan, J. (2000). Psychological science can improve diagnostic decisions. Psychological Science in the Public Interest, 1, 126.
Google Scholar | SAGE Journals
Wagner, R. K., Torgesen, J. K., Rashotte, C. A. (1999). CTOPP: Comprehensive Test of Phonological Processing. Austin, TX: Pro-Ed.
Google Scholar
Youden, W. J. (1950). Index for rating diagnostic tests. Cancer, 3, 3235.
Google Scholar | Crossref | Medline | ISI
Zhou, X.-H., McClish, D. K., Obuchowski, N. A. (2002). Statistical methods in diagnostic medicine. New York, NY: Wiley.
Google Scholar | Crossref
View access options

My Account

Welcome
You do not have access to this content.



Chinese Institutions / 中国用户

Click the button below for the full-text content

请点击以下获取该全文

Institutional Access

does not have access to this content.

Purchase Content

24 hours online access to download content

Your Access Options


Purchase

AEI-article-ppv for $15.00