The articles presented in this Special Issue provide evidence for many statistically significant relationships among error scores obtained from the Kaufman Test of Educational Achievement, Third Edition (KTEA)-3 between various groups of students with and without disabilities. The data reinforce the importance of examiners looking beyond the standard scores when analyzing results. Although the data in these articles are powerful by themselves, this commentary explores the potential advantages of considering additional information to increase the practicality of these results. Although statistical significance may provide evidential validity of the results, and the articles inform clinical practice and offer valuable leads for further research that should be pursued, the present authors question whether the data as presented provide sufficient information to determine the predictive and practical utility of these initial results. The next step, we believe, is to extend these novel approaches to data from larger, carefully defined samples of students with specific learning challenges and disabilities.
As a society, we are most comfortable when we feel in control. To do this, we often group things together and neatly package them into categories, in an attempt to better understand and predict. As psychologists, we follow suit by taking behavior and turning it into a hypothetical construct. For example, behaviors that are disorganized, not efficient for studying, or attending the way we expect are often labeled as deficits in “executive functioning” skills. Intelligent behavior is now known as “intelligence.” Taking it further, since as early as the 19th century, we have been quantifying this specific hypothetical construct (intelligence) in an attempt to help us better understand and even predict a person’s ability and potential.
Psychological testing (both cognitive and academic achievement) provides us with a means of packaging a person’s performance, analyzing it, and then inferring from it predictions about the person’s future performance or ability. One primary goal of testing may be to evaluate whether a child meets the criteria to be identified as needing special education services. When analyzing and interpreting the results of our assessments, we will often look for “signs” indicative of a specific disorder or disability to aid in the determination of that problem. These signs can be a set of behaviors or scores that appear more often in persons identified as having some specific disorder. For example, when we see lowered scores on memory tests and executive function scales, we might want to consider whether or not this person has an attention deficit disorder as the accumulation of signs may be more commonly associated with that diagnosis versus some other diagnosis—or from “normal” behavior.
Error analysis is another means of assessing performance to determine differences. By looking within individuals’ profiles to evaluate the specific errors made and compare them with similar profiles, we can judge whether the patterns are similar or dissimilar, expected or not. The normed error-analysis procedure provided by the Kaufman Test of Educational Achievement, Third Edition (KTEA-3; Kaufman & Kaufman, 2014) provides examiners with more information about a student’s performance, over and above the standard scores obtained. Kaufman and Kaufman (2014) suggest that error analysis might be useful for several reasons, including providing a more precise level of diagnostic information, determining the location of a weak skill, gauging the severity of skill deficiencies, and identifying common sources of difficulty underlying several skill areas.
The several editions of the KTEA (KTEA/Normative Update [NU], KTEA-II, and KTEA-3; Kaufman & Kaufman, 1985, 1998, 2004, 2014) are unique among individually administered achievement tests in providing normative error-analysis scores for the subtests. Each subtest has defined categories of errors (e.g., Addition, Fractions, Literal Comprehension, Inferential Comprehension, Long Vowel, Consonant Digraph, Capitalization, Punctuation, Blending, Segmenting) so the examiner can count the number of errors in each category and compare those sums with the numbers of errors made by examinees in the normative sample who were in the same grade and working at the same level (defined by the examinee’s stopping point or item set for the subtest). This error analysis allows the examiner to rate each skill as a Weakness (lowest 25%), a Strength (highest 25%), or Average (middle 50%) compared with other students in the same grade who were working at the same level of difficulty. These norms allow data-based analysis of relative strengths and weaknesses within subtests for each student regardless of the total standard score for the total subtest.
One author of this commentary (JOW) and his students and colleagues have used the KTEA error analyses since 1985 and have found the analyses very helpful in understanding the academic achievement of some examinees and in making specific instructional recommendations. However, until now, we had seen no evidence regarding the meaning and potential utility of the scores beyond providing additional detail about an individual’s performance on a subtest.
The articles in this Special Issue explore possible interpretations and relationships of various error patterns on the KTEA-3. O’Brien, Pan, Courville, Bray, and Breaux (2017) conducted an exploratory factor analysis (a better choice than confirmatory factor analysis according to, for example, Carroll, 1998) of the error scores on five KTEA-3 subtests of students from the KTEA-3 norming and special samples, and a panel of experts in measurement and curriculum (including two of the authors of this commentary) suggested possible names and explanations for the obtained factors. The other articles in this Special Issue used those obtained factors to investigate the relationships between error-score factors within and across subtests and a variety of other issues, such as disability categories and patterns of strengths and weaknesses. The studies in this Special Issue are a promising departure from the usual practice of analyzing examinee’s scores on achievement tests or comparing achievement-test scores with cognitive ability scores.
Based upon the articles in this Special Issue, we have ample evidence to suggest that different groups of children, some identified as having specific learning difficulties, do show, on some aspects of error analysis, statistically significant performance differences from a particular comparison group. Tables 1 through 4 present a summary of the error-analysis findings from the various studies separated by academic area (Reading, Writing, Math, and Listening Comprehen-sion and Oral Expression). Given the vast amount of information gleaned from all of these articles, we still wonder if there is presently enough information available to actually use the data in some diagnostic way. Would a certain score on a specific error analysis provide strong enough evidence to aid in our diagnostic interpretation of the test? Use of error analysis for diagnosis would be, of course, a step beyond the primary purpose of simply attempting to understand the nature and possible causes of an individual’s high or low scores on a subtest.
|
Table 1. Significant Reading Achievement Error Comparisons by Study Groups.

|
Table 2. Significant Writing Achievement Error Comparisons by Study Groups.

|
Table 3. Significant Math Achievement Error Comparisons by Study Groups.

|
Table 4. Significant Listening Comprehension and Oral Expression Achievement Error Comparisons by Study Groups.

Avitia, Pagirsky, et al. (2017) examined the relationship between errors made by students who are classified with a learning disability in reading/writing (LDRW) and those with a language impairment (LI). While the authors identified that these clinical groups exhibited more errors than their controls, they also identified specific patterns of errors made by these groups. The authors note that this information can be useful in choosing appropriate interventions for students, as LDRW students may need more instruction in word reading and decoding, and those with LI may need more support in learning letter-sounds. In another study, Avitia, DeBiase, et al. (2017) analyzed the error patterns of those with a specific learning disability in math (SLD-M), and specific learning disability in reading and writing (SLD-RW). The labels alone suggest that the groups are very different, but are they? The authors found that “there were more similarities than differences between the SLD-R/W and SLD-M samples” (Avitia, DeBiase, et al., 2017, p. 16) when looking at the error patterns on the achievement assessment. This information is important because it reminds us that a specific label does not automatically justify a narrow approach to intervention and also that error analysis may reveal information not apparent from traditional test scores. While a diagnosis of SLD-M does not automatically necessitate reading interventions, perhaps more often than not, it may. We see that it would at least be imprudent to ignore reading when assessing a student referred for math problems. Taken together, these articles not only highlight the importance of being aware of group differences but also justify the continued need to look within individual profiles when identifying appropriate interventions.
The articles presented in this issue prove that analyzing the errors made throughout an assessment is a useful means of analysis and prediction and that there is utility to the use of normed error-analysis procedures. Apart from these performance errors, we should also be mindful of our own errors when testing and interpreting. The definition of learning disability lends itself to varied interpretation across professionals, yet we assume that we have always accurately labeled the specific of specific learning disability accurately. We start by grouping behaviors and labeling them as a “learning disability,” and then labeling errors as the “specific learning disability,” but can we be sure? Is it fair to label based on our own judgment and then say whether this profile that we have created matches with what we would expect that profile to be. The patterns of strengths and weaknesses approach allow us to look at patterns of performance to understand individual needs. By removing the labels and studying a person’s profile, we can better tailor an effective intervention approach.
In our world of expectation and analysis, we aim to predict future behavior based on current behavior, so as to intervene if needed. We compare performance to determine whether differences are expected or if help is needed. In reading comprehension, a great deal of effort is spent in determining what foundational skills are needed to be able to read well. For example, phonological processing is a “strong predictor of emerging literacy skills” (Choi, Hatcher, Langley, Liu, & Bray, 2017). To determine how adept one’s phonological processing skills are, it is most common to assess the use of segmentation versus deletion. However, the factor analysis finds two factors (Basic Phonological Awareness [BPA] and Advanced Phonological Processing [APP]), and this study reveals that, within the APP factor, deletion is just as important, and should be analyzed.
Assessing the statistical significance of score differences between groups is extremely important as it assesses whether the scores differ from what would be expected on the basis of chance alone. If a particular score is statistically different between groups, then we might validly assume that the difference provides us with the basis for some useful information for either diagnosing a symptom or a disorder, or it may help us in developing interventions. Messick (1989) defines validity as “an integrated evaluative judgment of the degree to which empirical evidence and theoretical rationales support the adequacy and appropriateness of inferences and actions based on test scores or other modes of assessment” (p. 5). The cases presented in this special edition certainly do provide us with one form of validity that we need when making diagnostic inferences. However, we suggest that this alone is not enough. Future research using the KTEA-3 error data might serve to provide evidence of predictive power, a type of predictive validity that refers to the accuracy of a decision made on the basis of a given measure (or score, or sign). Predictive power refers to the extent to which a test (or measure, score, or sign) agrees with some outcome criterion.
While these articles provide us data to suggest statistical significance, can we, without knowing how frequently these differences occur, be certain that they accurately predict a problem? What if these differences are just natural deviations that we did not think to account for? As Sattler (2008) notes, tests of significance, although highly useful, do not tell us the complete story. Significance can be highly dependent on sample size, with large samples more likely to produce significant results. As seen in the studies submitted to this journal, many statistically significant results were in fact identified (see summary Tables 1 to 4). Unfortunately, none of the studies provided any additional information. Lacking from each study were both effect sizes (ESs) and base rate data. This omission is not in any way a critique of the studies themselves, only a commentary on the practicality of the findings.
ES is a statistical index that is based on the standard deviation units, independent of sample size. It assesses the magnitude of differences found between two or more groups’ means rather than the probability that the results are due to chance. ES provides a context for interpreting the “meaningfulness” of results independent of sample size of statistical significance. Cohen (1988) defined ES as small (.20), medium (.50), and large (.80). As an example of the application of ES to the current studies, we converted the actual means and standard deviations related to significant findings from one study (Pagirsky et al., 2017). Table 5 presents those results. As can be seen, of the 13 statistically significant error comparisons, the ESs ranged from medium (.52) to large (1.20).
|
Table 5. Effect Sizes Related to Significant Results Report by Pagirsky, et al.

Considering the base rates, or prevalence rates, of the specific conditions, may be very useful in determining the strength and accuracy of any interpretation made from the findings. These rates are important as they are the rates against which we judge the accuracy of our predictions. By looking at the frequency of these similar patterns, we can more clearly determine whether these statistically significant differences are in fact so uncommon as to warrant specific interpretations, or merely a natural significant deviation from the norm. A difference may be too great to be likely to occur by chance, but still be fairly common. For example, using the results of the Pagirsky et al. (2017) study, the largest ES (1.20) was found between the attention deficit hyperactive disorder (ADHD) Reading Problems (M = 8.9, SD = 2.6) and the control group (M = 11.5, SD = 1.6) in the area of Spelling—Factor 1 (Spelling to letter mapping). These results suggest that, as a group, the sample of children with ADHD and reading problems performed significantly differently from the control group. Group information may differ from individual information. What was the frequency of children in the ADHD groups (ADHD with reading problems n = 45; ADHD with no reading problems n = 46) who performed significantly low on Spelling Factor 1 compared with the frequency of the control group (n = 63) who performed low? While we know the total sample size of each group, identifying the frequency of students who performed low in each of these groups would provide meaningful and practical information.
In sum, the articles presented in this Special Issue provide compelling information regarding a different approach to analyzing students’ test results, the statistical significance of the relationships, and, consequentially the interpretation of the data. Although this evidential validity is a crucial component to the analysis of assessment results, the authors question whether, by itself, it is enough information to determine predictive validity.
The evidential validity, though not proof of predictive validity, is clearly a foundation for further research. Also, the results of the studies imply that examiners might do well to include error analysis in their achievement test and, when possible, to use normed error-analysis procedures. There is also evidence to support the incorporation of normed error-analysis procedures in new editions of academic achievement tests.
Authors’ Note
Ron Dumont and John Willis reviewed and commented on the factor analysis of the KTEA-3 error scores and on pre-publication drafts of several of the articles in this Special Edition.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: Willis was paid for consultation with the authors and publishers of the KTEA-II and KTEA-3 during the test-development phases.
|
Avitia, M., DeBiase, E., Pagirsky, M., Root, M. M., Howell, M., Pan, X., Knupp, T. (2017). Achievement error differences of students with reading versus math disorders. Journal of Psychoeducational Assessment, 35(1-2), 111-123. Google Scholar | SAGE Journals | |
|
Avitia, M., Pagirsky, M., Courville, T., DeBiase, E., Knupp, T., Ottone-Cross, K. (2017). Differences in errors between students with language and reading disabilities. Journal of Psychoeducational Assessment, 35(1-2), 149-154. Google Scholar | SAGE Journals | |
|
Breaux, K. C., Avitia, M., Koriakin, T., Bray, M. A., DeBiase, E., Courville, T., . . . Grossman, S. (2017). Patterns of strengths and weaknesses (PSW) on the WISC-V, DAS-II, and KABC-II and their relationship to students’ errors in oral language, reading, writing, spelling, and math. Journal of Psychoeducational Assessment, 35(1-2), 168-185. Google Scholar | SAGE Journals | |
|
Carroll, J. B. (1998). Human cognitive abilities: A critique. In McArdle, J. J., Woodcock, R. W. (Eds.), Human cognitive abilities in theory and practice (pp. 5-23). Mahwah, NJ: Erlbaum. Google Scholar | |
|
Choi, D., Hatcher, R. C., Langley, S. D., Liu, X., Bray, M. A., Courville, T., . . . DeBiase, E. (2017). What do phonological processing errors tell about students’ skills in reading, writing, and oral language? Jounral of Psychoeducational Assessment, 35(1-2), 24-46. Google Scholar | SAGE Journals | |
|
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Lawrence Erlbaum. Google Scholar | |
|
Kaufman, A. S., Kaufman, N. L. (1985). Kaufman Test of Educational Achievement. Circle Pines, MN: American Guidance Service. Google Scholar | |
|
Kaufman, A. S., Kaufman, N. L. (1998). Kaufman Test of Educational Achievement. (normative update). Circle Pines, MN: American Guidance Service. Google Scholar | |
|
Kaufman, A. S., Kaufman, N. L. (2004). Kaufman Test of Educational Achievement (2nd ed.). San Antonio, TX: The Psychological Corporation. Google Scholar | |
|
Kaufman, A. S., Kaufman, N. L. (2014). Kaufman Test of Educational Achievement. (3rd ed.). Bloomington, MN: NCS Pearson. Google Scholar | |
|
Koriakin, T. A., Kaufman, A. S. (2017). Investigating patterns of errors for specific comprehension and fluency difficulties. Journal of Psychoeducational Assessment, 35(1-2), 138-148. Google Scholar | SAGE Journals | |
|
Koriakin, T. A., White, E., Breaux, K. C., DeBiase, E., O’Brien, R., Howell, M., . . . Courville, T. (2017). Patterns of cognitive strengths and weaknesses and relationships to math errors. Journal of Psychoeducational Assessment, 35(1-2), 155-167. Google Scholar | SAGE Journals | |
|
Liu, X., Marchis, L., DeBiase, E., Breaux, K. C., Courville, T., Pan, X., . . . Kaufman, A.S. (2017). Do cognitive patterns of strengths and weaknesses differentially predict the errors on reading, writing, and spelling? Journal of Psychoeducational Assessment, 35(1-2), 186-205. Google Scholar | SAGE Journals | |
|
Messick, S. (1989). Validity. In Linn, R. L. (Ed.), Educational measurement (3rd ed., pp. 13-103). New York, NY: Macmillan. Google Scholar | |
|
O’Brien, R., Pan, X., Courville, T., Bray, M. A., Breaux, K. (2017). Exploratory factor analysis of reading, spelling, and math errors. Journal of Psychoeducational Assessment, 35(1-2), 7-23. Google Scholar | SAGE Journals | |
|
Ottone-Cross, K. L., Dulong-Langley, S., Root, M. M., Gelbar, N., Bray, M. A., Luria, S. R., . . .Pan, X. (2017). Beyond the mask: Analysis of error patterns on the KTEA-3 for students with giftedness and learning disabilities. Journal of Psychoeducational Assessment, 35(1-2), 74-93. Google Scholar | SAGE Journals | |
|
Pagirsky, M. S., Koriakin, T. A., Avitia, M., Costa, M., Marchis, L., Maykel, C., . . . Pan, X. (2017). Do the kinds of achievement errors made by students diagnosed with ADHD vary as a function of their reading ability? Journal of Psychoeducational Assessment, 35(1-2), 124-137. Google Scholar | SAGE Journals | |
|
Root, M. M., Marchis, L., White, E., Courville, T., Choi, D., Bray, M. A., Pan, X., . . . Wayte, J. (2017) How achievement error patterns of students with mild intellectual disability differ from low IQ and low achievement students without diagnoses. Journal of Psychoeducational Assessment, 35(1-2), 94-110. Google Scholar | SAGE Journals | |
|
Sattler, J. M. (2008). Assessment of children: Cognitive foundations (5th ed.). San Diego, CA: Jerome M. Sattler. Google Scholar | |
|
Stewart, C., Root, M. M., Koriakin, T., Choi, D., Hack, S., Bray, M. A., . . . Courville, T. (2017). Biological gender differences in students’ errors on mathematics achievement tests. Journal of Psychoeducational Assessment, 35(1-2), 47-56. Google Scholar | SAGE Journals |

