Abstract
Diagnostic tools can help schools more consistently and fairly match instructional resources to the needs of their students. To ensure the best educational outcome for each child, diagnostic decision-making systems seek to balance time, clarity, and accuracy. However, recent research notes that many educational decisions tend to be made using professional judgment alone. Judgments grounded on data, statistical models, and even informal prediction models, however, outperform those based on intuition alone. The purpose of this manuscript is to describe the theoretical basis for signal detection and methods for statistically evaluating diagnostic decisions in education. We make recommendations to help test developers and consumers apply this methodology to other diagnostic systems in education and interpret the use of signal detection methods for educational screeners and diagnostic tests.
|
Ardoin, S. P., Christ, T. J., Morena, L. S., Cormier, D. C., Klingbeil, D. A. (2013). A systematic review and summarization of the recommendations and research surrounding Curriculum-Based Measurement of oral reading fluency (CBM-R) decision rules. Journal of School Psychology, 51, 1–18. Google Scholar | Crossref | Medline | ISI | |
|
Baker, S. K., Smolkowski, K., Katz, R., Fien, H., Seeley, J. R., Kame‘enui, E. J., Thomas Beck, C. (2008). Reading fluency as a predictor of reading proficiency in low-performing, high-poverty schools. School Psychology Review, 37, 18–37. Google Scholar | ISI | |
|
Baker, S. K., Smolkowski, K., Smith, J. M., Fien, H., Kame‘enui, E. J., Thomas Beck, C. (2011). The impact of Oregon Reading First on student reading outcomes. Elementary School Journal, 112, 307–331. Google Scholar | Crossref | ISI | |
|
Brooks, H. E. (2004). Tornado-warning performance in the past and future: A perspective from signal detection theory. Bulletin of the American Meteorological Society, 85, 837–843. Google Scholar | Crossref | ISI | |
|
Burkel, R. H., Chiou, C.-P., Keyes, T. K., Meeker, W. Q., Rose, J. H., Sturges, D. J., . . . Tucker, W. T. (2002). A methodology for the assessment of the capability of inspection systems for detection of subsurface flaws in aircraft turbine engine components (Final Report, DOT/FAA/AR-01/96). Washington, DC: Federal Aviation Administration, Office of Aviation Research, U.S. Department of Transportation. Google Scholar | |
|
Cohen, J. (1988). Statistical power analysis for the behavioral sciences. Hillsdale, NJ: Lawrence Erlbaum. Google Scholar | |
|
Dana, J., Dawes, R. M. (2004). The superiority of simple alternatives to regression for social science predictions. Journal of Educational and Behavioral Statistics, 29, 317–331. doi:10.3102/10769986029003317 Google Scholar | SAGE Journals | ISI | |
|
Dawes, R. M. (1962). A note on base rates and psychometric efficiency. Journal of Consulting Psychology, 26, 422–424. Google Scholar | Crossref | Medline | |
|
Dawes, R. M. (1979). The robust beauty of improper linear models in decision making. American Psychologist, 34, 571–582. doi:10.1037/0003-066X.34.7.571 Google Scholar | Crossref | ISI | |
|
Fleiss, J. L. (1981). Statistical methods for rates and proportions (2nd ed.). New York, NY: Wiley. Google Scholar | |
|
Florida Department of Education . (2014). Florida Assessments for Instruction in Reading Aligned to the Language Arts Florida Standards: FAIR – FS Grades 3 through 12 Administration Manual. Tallahassee: Author. Retrieved from http://www.fldoe.org/core/fileparse.php/5423/urlt/312AdminManual.pdf Google Scholar | |
|
Gigerenzer, G., Goldstein, D. (1996). Reasoning the fast and frugal way: Models of bounded rationality. Psychological Review, 103, 650–669. Google Scholar | Crossref | Medline | ISI | |
|
Goldberg, L. R. (1970). Man versus model of man: A rational, plus some evidence, for a method of improving on clinical inference. Psychological Bulletin, 73, 422–432. Google Scholar | Crossref | ISI | |
|
Good, R. H., Kaminski, R. A. (Eds.). (2002). Dynamic Indicators of Basic Early Literacy Skills (6th ed.). Eugene, OR: Institute for the Development of Educational Achievement. Available from http://dibels.uoregon.edu/ Google Scholar | |
|
Good, R. H., Kaminski, R. A., Dewey, E. N., Wallin, J., Powell-Smith, K. A., Latimer, R. J. (2011). DIBELS Next technical manual [draft]. Eugene, OR: Dynamic Measurement Group. Google Scholar | |
|
Grove, W. M., Zald, D. H., Lebow, B. S., Snitz, B. E., Nelson, C. (2000). Clinical versus mechanical prediction: A meta-analysis. Psychological Assessment, 12, 19–30. Google Scholar | Crossref | Medline | ISI | |
|
Harcourt Educational Measurement . (2002). Stanford Achievement Test [SAT-10]. San Antonio, TX: Author. Google Scholar | |
|
Harper, R., Reeves, B. (1999). Reporting of precision of estimates for diagnostic accuracy: A review. British Medical Journal, 318, 1322–1323. Google Scholar | Crossref | Medline | |
|
Hedges, L. V., Olkin, I. (1985). Statistical methods for meta-analysis. New York, NY: Academic Press. Google Scholar | Crossref | |
|
Hintze, J. M., Ryan, A. L., Stoner, G. (2003). Concurrent validity and diagnostic accuracy of the dynamic indicators of basic early literacy skills and the Comprehensive Test of Phonological Processing. School Psychology Review, 32, 541–556. Google Scholar | ISI | |
|
Jenkins, J. R., Hudson, R. F., Johnson, E. S. (2007). Screening for at-risk readers in a response to intervention framework. School Psychology Review, 36, 582–600. Google Scholar | ISI | |
|
Kahneman, D. (2011). Thinking, fast and slow. New York, NY: Farrar, Straus and Giroux. Google Scholar | |
|
Katz, D., Foxman, B. (1993). How well do prediction equations predict? Using receiver operating characteristic curves and accuracy curves to compare validity and generalizability. Epidemiology, 4, 319–326. Google Scholar | Crossref | Medline | ISI | |
|
Kettler, R. J., Feeney-Kettler, K. A. (2011). Screening systems and decision making at the preschool level: Application of a comprehensive validity framework. Psychology in the Schools, 48, 430–441. doi:10.1002/pits.20565 Google Scholar | Crossref | ISI | |
|
Kloo, A., Zigmond, N. (2008, February). Implementing progress monitoring in a really low achieving school among very low-skilled teachers. Paper presented at the 2008 annual Pacific Coast Research Conference, Coronado, CA. Google Scholar | |
|
Kraemer, H. (1992). Evaluating medical tests: Objective and quantitative guidelines. Newbury Park, CA: SAGE. Google Scholar | |
|
Lynn, S. K., Barrett, L. F. (2014). “Utilizing” signal detection theory. Psychological Science, 25, 1663–1673. doi:10.1177/0956797614541991 Google Scholar | SAGE Journals | ISI | |
|
MacGinitie, W., MacGinitie, R. (2006). Gates-MacGinitie Reading Tests (4th ed.). Iowa City, IA: Houghton Mifflin. Google Scholar | |
|
Malhotra, R., Indrayan, A. A. (2010). A simple nomogram for sample size for estimating sensitivity and specificity of medical tests. Indian Journal of Ophthalmology, 58, 519–522. doi:10.4103/0301-4738.71699 Google Scholar | Crossref | Medline | ISI | |
|
Martin, S. D., Shapiro, E. S. (2011). Examining the accuracy of teachers’ judgments of DIBELS performance. Psychology in the Schools, 48, 343–356. Google Scholar | Crossref | ISI | |
|
Mason, S. J., Graham, N. E. (1999). Conditional probabilities, relative operating characteristics, and relative operating levels. Weather and Forecasting, 14, 713–725. Google Scholar | Crossref | ISI | |
|
McAlenney, A. L., Coyne, M. D. (2015). Addressing false positives in early reading assessment using intervention response data. Learning Disability Quarterly, 38, 53–65. Google Scholar | SAGE Journals | ISI | |
|
McGrath, R. E. (2008). Predictor combination in binary decision-making situations. Psychological Assessment, 20, 195–205. doi:10.1037/a0013175 Google Scholar | Crossref | Medline | ISI | |
|
Meehl, P. E. (1954). Clinical versus statistical prediction: A theoretical analysis and a review of the evidence. Minneapolis: University of Minnesota Press. Google Scholar | Crossref | |
|
Nelson, J. M. (2008). Beyond correlational analysis of the Dynamic Indicators of Basic Early Literacy Skills (DIBELS): A classification validity study. School Psychology Quarterly, 23, 542–552. Google Scholar | Crossref | ISI | |
|
Neyman, J., Pearson, E. S. (1933). On the problem of the most efficient tests of statistical hypotheses. Philosophical Transactions of the Royal Society of London, 231, 289–337. Google Scholar | Crossref | |
|
Pepe, M. S. (2003). The statistical evaluation of medical tests for classification and prediction. New York, NY: Oxford University Press. Google Scholar | |
|
Peterson, W. W., Birdsall, T. G., Fox, W. C. (1954). The theory of signal detectability. IRE Professional Group on Information Theory, 4, 171–212. Google Scholar | Crossref | |
|
Petscher, Y., Kim, Y.-S., Foorman, B. R. (2011). The importance of predictive power in early screening assessments: Implications for placement in the response to intervention framework. Assessment for Effective Intervention, 36, 158–166. Google Scholar | SAGE Journals | |
|
Rice, M. E., Harris, G. T. (2005). Comparing effect sizes in follow-up studies: ROC area, Cohen’s d, and r. Law and Human Behavior, 29, 615–620. Google Scholar | Crossref | Medline | ISI | |
|
Ritchie, S. J., Bates, T. C. (2013). Enduring links from childhood mathematics and reading achievement to adult socioeconomic status. Psychological Science, 24, 1301–1308. doi:10.1177/0956797612466268 Google Scholar | SAGE Journals | ISI | |
|
Roberts, R., Lewinsohn, P., Seeley, J. (1991). Screening for adolescent depression: A comparison of depression scales. Journal of the American Academy of Child & Adolescent Psychiatry, 30, 58–66. Google Scholar | Crossref | Medline | ISI | |
|
Salvia, J., Ysseldyke, J. E., Bolt, S. (2013). Assessment: In special and inclusive education. Belmont, CA: Wadsworth. Google Scholar | |
|
SAS Institute . (2009). SAS/STAT® 9.2 user’s guide (2nd ed.). Cary, NC: Author. Retrieved from http://support.sas.com/documentation/index.html Google Scholar | |
|
Seeley, J. R., Stice, E., Rohde, P. (2009). Screening for depression prevention: Identifying adolescent girls at high risk for future depression. Journal of Abnormal Psychology, 118, 161–170. doi:10.1037/a0014741 Google Scholar | Crossref | Medline | ISI | |
|
Silberglitt, B., Hintze, J. (2005). Formative assessment using CBM-R cut scores to track progress toward success on state-mandated achievement tests: A comparison of methods. Journal of Psychoeducational Assessment, 23, 304–325. Google Scholar | SAGE Journals | ISI | |
|
Slovic, P., Rorer, L., Hoffman, P. (1971). Analyzing use of diagnostic signs. Investigative Radiology, 6, 18–26. Google Scholar | Crossref | Medline | ISI | |
|
Streiner, D. L., Cairney, J. (2007). What’s under the ROC? An introduction to receiver operating characteristics curves. Canadian Journal of Psychiatry, 52, 121–128. Google Scholar | SAGE Journals | ISI | |
|
Swets, J. A. (1973). The relative operating characteristic in psychology. Science, 182, 990–1000. Google Scholar | Crossref | Medline | ISI | |
|
Swets, J. A. (1988). Measuring the accuracy of diagnostic systems. Science, 240, 1285–1293. Google Scholar | Crossref | Medline | ISI | |
|
Swets, J. A. (1996). Signal detection theory and ROC analysis in psychology and diagnostics: Collected papers. Hillsdale, NJ: Lawrence Erlbaum. Google Scholar | |
|
Swets, J. A., Dawes, R. M., Monahan, J. (2000). Psychological science can improve diagnostic decisions. Psychological Science in the Public Interest, 1, 1–26. Google Scholar | SAGE Journals | |
|
Wagner, R. K., Torgesen, J. K., Rashotte, C. A. (1999). CTOPP: Comprehensive Test of Phonological Processing. Austin, TX: Pro-Ed. Google Scholar | |
|
Youden, W. J. (1950). Index for rating diagnostic tests. Cancer, 3, 32–35. Google Scholar | Crossref | Medline | ISI | |
|
Zhou, X.-H., McClish, D. K., Obuchowski, N. A. (2002). Statistical methods in diagnostic medicine. New York, NY: Wiley. Google Scholar | Crossref |

