This study investigated developmental gender differences in mathematics achievement, using the child and adolescent portion (ages 6-19 years) of the Kaufman Test of Educational Achievement–Third Edition (KTEA-3). Participants were divided into two age categories: 6 to 11 and 12 to 19. Error categories within the Math Concepts & Applications and Math Computation subtests of the KTEA-3 were factor analyzed and revealed five error factors. Multiple ANOVA of the error factor scores showed that, across both age categories, female and male mean scores were not significantly different across four error factors: math calculation, geometric concepts, basic math concepts, and addition. They were significantly different on the complex math problems error factor, with males performing better at the p < .05 significance level for the 6 to 11 age group and at the p < .001 significance level for the 12 to 19 age group. Implications in light of gender stereotype threat are discussed.

Scientific comparison of biological gender performance on mathematics achievement has occurred frequently, but with contradictory results. Traditionally, perceptions regarding male and female math performance in the United States favor males. This viewpoint is often perpetuated by individuals who contribute to the development of youth, such as teachers, who historically believe that boys are naturally better at math and tend to overrate males’ abilities when compared with females’ (Ernest, 1976; Li, 1999). Parents follow this trend as well, presuming their sons have superior mathematical ability (Furnham, Reeves, & Budhani, 2002) and assuming that math is more difficult for their daughters (Yee & Eccles, 1988). While the literature does not strongly support its validity, this viewpoint continues to have deleterious effects on female math performance through stereotype threat (Good, Aronson, & Harder, 2008).

The literature on gender differences in math performance, concerning both cognitive abilities and academic achievement, is broad and well documented. Research suggests that differences in math skills in elementary and middle school years are negligible, though findings are inconsistent (Camarata & Woodcock, 2006). One study investigating math performance on the Kaufman Test of Educational Achievement–Second Edition (KTEA-II; Kaufman & Kaufman, 2004) for ages 7 to 19 found that there were no gender differences in latent math ability but that females outperformed males on math computation (Reynolds, Scheiber, Hajovsky, Schwartz, & Kaufman, 2015). Another study found that female students generally receive higher grades in mathematics courses, but their scores on high-stakes standardized tests are often lower than those of male students (Kimball, 1989). Voyer and Voyer (2014) found similar findings regarding grading, although the difference in grades was minor. Lindberg, Hyde, Linn, and Petersen (2010) conducted two analyses of large data sets of United States adolescents from the 1990s and 2000s. They found no significant differences in mathematics performance between males and females. However, findings from a meta-analysis conducted in the late 1980s demonstrated that males begin to outperform females in high school on mathematics problem solving and that the discrepancy continues into college (Hyde, Fennema, & Lamon, 1990). The magnitude of difference, though, was smaller than it was during the 1970s and 1980s (Hyde et al., 1990). Similarly, Reilly, Neumann, and Andrews (2015) reported a difference in math performance in Grade 12 favoring males that was stable over the 1990s and 2000s. Others found gender differences favoring males in a sample of individuals ages 22 to 90 (Kaufman, Kaufman, Liu, & Johnson, 2009). Gender differences in mathematics performance have thus been studied extensively, but results do not support a ubiquitous viewpoint that males outperform females throughout development.

Marshall (1983) categorized gender-based mathematics types of error and found that both genders made language-related errors; however, boys were more likely to make errors in translation, and girls made more errors due to confusion of meaning. Girls were also more likely to make errors due to the use of irrelevant rules (such as always selecting the smallest number), misuse of spatial information (such as misreading units of scale), or choosing the incorrect operation, among other errors. On the contrary, boys were more likely to make errors due to formula selection and failure to persevere to problem completion (Marshall, 1983). Manger and Eikeland (1998) assessed mathematical achievement and spatial visualization skills in Norwegian sixth graders and found no gender differences in spatial visualization skills. Boys and girls did not differ significantly on easier task performance; however, boys had significantly higher math scores on the most difficult tasks.

Bench, Lench, Liew, Miner, and Flores (2015) found that, when completing a math test, men overestimated their performance compared with women, creating a positivity bias. However, women with more positive previous mathematics experiences also exhibited an overestimation of self-performance. Similarly, when achievement-related beliefs were reported during pretest and posttest in a sample of 8- to 9-year-olds and 13- to 14-year-olds, Stipek and Gralinski (1991) found that girls were less likely to believe that they could be successful. In addition, girls were more likely to attribute failure to lack of capacity, expected to perform poorly, and rated their ability lower than boys (Stipek & Gralinski, 1991). Conversely, they were less likely than boys to attribute success to high ability. While Vermeer, Boekaerts, and Seegers (2000) found no gender differences in relation to sixth grade students’ math computations, girls rated themselves lower on confidence on applied problem solving than boys and attributed failure to the difficulty of the task. Girls also had higher persistence than did boys but only during applied problem solving.

Van de gaer, Pustjens, Van Damme, and De Munter (2008) studied the relationship between mathematics course selection and mathematics achievement in upper grade level Flemish students in United States–equivalent grades of 7, 8, 10, and 12. They found that, when achievement level and number of hours exposed to math were controlled for in Grade 8, boys and girls elected to take the same amount of mathematics courses in Grade 10. However, when the same controls were in place for Grade 10, boys were significantly more likely to select math courses in Grade 12 than girls.

Researchers have focused on the impact of gender stereotyping on female performance in math and found that gender differences in math performance, particularly at the higher levels, can be eliminated by accounting for stereotype threat. Stereotype threat is defined in this scenario as, “When women perform math, unlike men, they risk being judged by the negative stereotype that women have weaker math ability” (Spencer, Steele, & Quinn, 1999, p. 4). They found that men outperformed women on complex but not simple math. However, when the math exercise was presented as a task that does not show gender differences, women and men scored equally well. Finally, when the same task was specifically presented as one that does reveal gender differences, women scored significantly lower than men. Similarly, Good et al. (2008) studied stereotype threat in women in upper level college math courses, who were likely proficient and motivated math students. Women in the nonthreat condition performed better than those in the stereotype threat condition and also scored higher than men in both conditions. Stereotype threat therefore affects the performance of females even at the upper levels and eliminating that threat can have a positive impact on female math performance.

The purpose of the present study was to investigate biological gender differences on math error factor scores on the Kaufman Test of Educational Achievement–Third Edition (KTEA-3; Kaufman & Kaufman, 2014) across two general age levels: 6 to 11 (lower) and 12 to 19 (upper). This research used the normative cohort data of the KTEA-3 to evaluate this comparison. The following research questions were used to guide this research:

  • Research Question 1: Are there gender differences in error factor scores of math achievement subtests of the KTEA-3?

  • Research Question 2: Do gender differences in error factor scores of math achievement subtests of the KTEA-3 exist at both the lower and upper age levels?

Participants

The participants in this study were students tested during the standardization of the KTEA-3 (Kaufman & Kaufman, 2014) between August 2012 and July 2013. The KTEA-3 Technical and Interpretive Manual (Kaufman, Kaufman, & Breaux, 2014) contains the demographic data for the full standardization sample. The sample was normed on both Forms A and B of the KTEA-3, with approximately half of the sample completing Form A and half Form B.

For this study, we selected participants between ages 6 and 19 from the full normative sample. This yielded a total study sample of 3,377, which included 1,758 females and 1,619 males in Grades 1 to 12 (median = Grade 4). Table 1 presents the sample size and demographic information for the total sample and by the lower and upper age bands. The gender samples within each age band had similar distributions on the demographic variables of sex, ethnicity, parent’s education, and geographic region. In addition, the full normative sample is reported in the KTEA-3 Technical and Interpretive Manual to be reflective of 2012 United States census data (Kaufman et al., 2014).

Table

Table 1. Sample Demographics.

Table 1. Sample Demographics.

Measures

The KTEA-3 is an individually administered standardized test of academic achievement for grades prekindergarten through 12, which covers the age range of 4 through 25 years. The KTEA-3 provides an analysis of the student’s strengths and weaknesses in the areas of reading, mathematics, and written and oral language. The KTEA-3 provides subtest error analysis within subtests. Selected for this study were the Math Computation and Math Concepts & Applications subtests. In the Math Computation subtest, the individual solves mathematical problems with paper and pencil. In the Math Concepts & Applications subtest, the individual responds orally to real-life mathematical application questions in the areas of number concepts, arithmetic, time and money, measurement, and multiple-step problems. The problems are read aloud to the individual and are accompanied by a print copy of the problem or a pictorial illustration.

Analysis

A multistep process was used to investigate the relationship between a student’s gender and his or her corresponding KTEA-3 error scores in math. The first analytical step in this process was the derivation of factor scores.

The KTEA-3 contains a distinct error analysis system within 10 of the subtests. For those subtests, experts noted the error categories that students were likely to demonstrate. Within each error category, students received a grade level, normative denotation of weak, average, or strength in ability. This denotation is based on a comparison between the student’s total error score and the average number of errors made by the normative sample (Kaufman et al., 2014). This denotation is known as “skill status.” Within each subtest, students received several skill status error scores using this error analysis system. Exploratory factor analysis and principal components analysis were then utilized to create a reduced error score variable set, so these skill status error scores could be used in further analyses. For this study, only the exploratory factor analysis results are pertinent (O’Brien et al., 2017).

To create the factor scores, polychoric correlation matrices were created for each subtest. Use of parallel analysis (PA; Horn, 1965), scree plot inspection (Cattell, 1966), and factor structure content review determined the number of factors to extract. For this study, two factors emerged from the Math Computation subtest and three from the Math Concepts & Applications subtest.

The final analytic step investigated whether the errors made on the KTEA-3 math subtests differed by gender. To test this hypothesis, one-way MANOVAs were conducted with math subtest error factor scores as dependent variables and gender as the independent variable. Furthermore, to look at the developmental differences of this relationship, the MANOVAs were conducted separately for ages 6 to 11 and ages 12 to 19. Prior to conducting the analyses, the math factor scores were examined for univariate normality issues and outliers. Any extreme cases were analyzed to verify their impact on the distributional properties of each subtest. Using criteria of |2| skewness and |6| kurtosis (Lix, Keselman, & Keselman, 1996), no violations of normality were observed. To examine the assumption of homogeneity of within-group covariance matrices, a two-step analysis process was utilized (Huberty & Petoskey, 2000). First, for each analysis, the Box F test was calculated. For the Math Concepts & Applications subtest, the Box tests were statistically significant for ages 6 to 11 (χ2 = 31.93, p ≤ .0001) and ages 12 to 16 (χ2 = 15.25, p = .018). The Box tests were not statistically significant for the Math Computation subtest. However, as noted by Huberty and Petoskey (2000), the Box test is an extremely powerful test. Therefore, as a follow-up analysis, the natural log of the determinant of the covariance matrix for each level of the independent variable was compared with the natural log determinant of the pooled matrix (Huberty & Petoskey, 2000; Olejnik, 2010) for the Math Concepts & Applications subtest. In the judgment of the researchers, the differences were relatively close, with the largest difference between a given group and the pooled natural log determinant equal to −.15.

Results

Statistical analysis yielded a total of five error factors within the two math subtests included in this study. Two error factors were identified within the Math Computation subtest. Factor 1 reflected basic mathematical computation beyond addition, such as subtraction, regrouping, multiplication facts, and logic. Factor 2 consisted of addition. In the Math Concepts & Applications subtest, three error factors emerged. Factor 1 was math calculation, which included single-step operations and computations, addition, subtraction, multiplication, division, word problems, fractions, and algebra. Factor 2 was related to measurement and geometric concepts that utilize visual-spatial information. Factor 3 included questions with complex processes such as tables and graphs, time and money, multistep problems, abstract concepts, and problems that rely more heavily on working memory. These factors were identified and named by curriculum and assessment experts in the field (J. Willis & R. Dumont, personal communication, March 26, 2016; N. Mather, personal communication, March 26, 2016).

Descriptive statistics on the math subtest and composite scores were completed for the lower and upper age groups across males and females. In the 6 to 11 age range, males scored slightly higher than females (means of 100.2 and 99.0, respectively) on the Math Concepts & Applications subtest, whereas females slightly outscored males on both the Math Computation subtest (99.4 female and 98.3 male means) and the Math Fluency subtest (99.2 female and 98.3 male means). However, in the 12 to 19 age range, the gap between males and females widened on the Math Concepts & Applications subtest (101.6 male and 98.3 female means), evened out on the means on the Math Computation subtest (100.6 male and 100.5 female means), and reversed on the Math Fluency subtest (100.0 male and 98.8 female means). Table 2 contains these scores along with the composite index standard scores and standard deviations.

Table

Table 2. Group Means and Standard Deviations on Subtest and Composite Scores of Interest.

Table 2. Group Means and Standard Deviations on Subtest and Composite Scores of Interest.

Descriptive statistics were also completed on the factor scores within each subtest. The Math Concepts & Applications subtest had three error factors, and in the 6 to 11 age group, males slightly outperformed females on the Math Calculation (9.8 male and 9.6 female means), Geometric Concepts (10.8 male and 10.6 female means), and Complex Math Problems (10.7 male and 10.2 female means) error factors. In the 12 to 19 age group, males also outscored females on the Math Calculation (10.6 male and 10.4 female means), Geometric Concepts (9.4 male and 9.3 female means), and Complex Math Problems (9.9 male and 9.4 female means) error factors. On the Math Computation subtest, male and female scores were identical in the 6 to 11 age range on both the Basic Math Concepts (10.8 means) and the Addition (10.6 means) error factors. They remained equal on the Addition error factor in the 12 to 19 age range (10.1 means), but males slightly outperformed females on the Basic Math Concepts error factor (10.2 male and 10.0 female means). Table 3 illustrates these results for males and females across the two age groups.

Table

Table 3. Mean Error Scores by Group.

Table 3. Mean Error Scores by Group.

MANOVAs were completed on the error factor scores within each subtest. The only significant difference between males and females occurred across both age groups on the Complex Math Problems error factor (ages 6 to 11, F value = 6.19, p < .05, mean difference = 0.15; ages 12 to 19, F value = 10.01, p < .001, mean difference = 0.19). On all other error factors, differences between males and females were not significant. Tables 4 and 5 contain MANOVA results for both age groups.

Table

Table 4. MANOVA Result and Pairwise Comparisons by Error Factors for Ages 6 to 11.

Table 4. MANOVA Result and Pairwise Comparisons by Error Factors for Ages 6 to 11.

Table

Table 5. MANOVA Result and Pairwise Comparisons by Error Factors for Ages 12 to 19.

Table 5. MANOVA Result and Pairwise Comparisons by Error Factors for Ages 12 to 19.

Discussion

The purpose of this study was to examine the differences in mathematics achievement between males and females on the KTEA-3, using a lower (6-11 years) and upper (12-19 years) age range grouping. The current findings are generally consistent with previous investigations of gender differences in math achievement, wherein the differences are negligible except on complex math problems. The previous research base has shown gender differences in math performance at various ages, although more pronounced from adolescence into adulthood, with males being favored.

The present study shows that there are no significant gender differences in math performance between the ages of 6 and 11 and of 12 and 19 for math calculation, geometric concepts, basic math concepts, and addition error factors. The exception to that is with real-life complex math problems (Complex Math Problems factor), which contain multiple steps and where individuals are required to respond orally. On that error factor, males significantly outperformed females on the KTEA-3 at the p < .05 (6-11 age group) and p < .001 (12-19 age group) levels.

Previous research has found that teachers and parents perceive boys’ mathematics skills as generally superior to girls’. Furthermore, it has been suggested that females are less inclined to partake in mathematics courses at the higher levels. These judgments about female performance in mathematics have led to a significant negative impact on female performance of complex math through a mechanism known as stereotype threat. This impact exists even at the upper echelon of math ability and, when removed, soundly negates and even reverses the difference between male and female math performance. The current study supports the observation that differences between males and females across lower and upper age categories are negligible across most mathematics skills and that significant differences emerge only on more complex tasks, with an increase in discrepancy in the older age band. This study also demonstrates that these differences persist today, though the public has been aware of the stereotypes as well as the danger of stereotype threat and has increased efforts in recent years to raise females’ interest and participation in science, technology, engineering, and math (STEM) fields. Overall, the research suggests that gender differences in math performance are more likely attributable to learned behaviors and the deleterious effects of stereotype threat rather than innate skills or ability.

Limitations and Future Directions

There are several limitations to this study. First, there are no reliability measurements available, such as test–retest. However, that is due to the reality of research based on a one-time collection of normative data and cannot be rectified. Second, the two age ranges selected for study were fairly large and represent broad spans of development. Future studies that can create smaller, more developmentally appropriate age ranges may reveal distinctions between male and female performance that were not readily discernable with the current age ranges.

Of great importance to this field and the study of the impact of gender-based stereotype threat on mathematics achievement is that future studies should specifically investigate demographic questions about gender affiliation and biological gender to evaluate whether affiliations with typical biological gender characteristics despite biological gender also expose humans to the effects of stereotype threat. Investigating potential differences between individuals who do or do not identify with their biological gender and the impact of stereotype threat can be useful. This unique population of study would greatly enhance our understanding of gender-based mathematics achievement discrepancies as they relate to stereotype threat. In addition, the current findings suggest that greater efforts to raise awareness of the legitimacy and potential impact of stereotype threat should occur.

Furthermore, studies manipulating perceptions about the task, such as those conducted by Spencer et al. (1999), could provide additional support for the direct impact of beliefs about ability on math tasks and performance. Also of interest is that females and males may demonstrate different profiles of abilities in different math areas. Future studies that investigate these differences to connect assessment with intervention and instruction would be beneficial.

The authors wish to thank NCS Pearson for providing the standardization and validation data for the Kaufman Test of Educational Achievement–Third Edition (KTEA-3). Copyrights by NCS Pearson, Inc., used with permission. They also wish to thank Alan and Nadeen Kaufman for their supervision of the comprehensive error analysis research program.

Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.

Bench, S. W., Lench, H. C., Liew, J., Miner, K., Flores, S. A. (2015). Gender gaps in overestimation of math performance. Sex Roles, 72, 536-546.
Google Scholar | Crossref
Camarata, S., Woodcock, R. (2006). Sex differences in processing speed: Developmental effects in males and females. Intelligence, 34, 231-252.
Google Scholar | Crossref | ISI
Cattell, R. B. (1966). The scree test for the number of factors. Multivariate Behavioral Research, 1, 245-276. doi:10.1207/s15327906mbr0102_10
Google Scholar | Crossref | Medline | ISI
Ernest, J. (1976). Mathematics and sex. Berkeley: University of California Press.
Google Scholar
Furnham, A., Reeves, E., Budhani, S. (2002). Parents think their sons are brighter than their daughters: Sex differences in parental self-estimations and estimations of their children’s multiple intelligences. Journal of Genetic Psychology, 163, 24-39. doi:10.1080/00221320209597966
Google Scholar | Crossref | Medline
Good, C., Aronson, J., Harder, J. A. (2008). Problems in the pipeline: Stereotype threat and women’s achievement in high-level math courses. Journal of Applied Developmental Psychology, 29, 17-28.
Google Scholar | Crossref | ISI
Horn, J. L. (1965). A rationale and test for the number of factors in factor analysis. Psychometrika, 30, 179-185. doi:10.1007/BF02289447
Google Scholar | Crossref | Medline | ISI
Huberty, C. J., Petoskey, M. D. (2000). Multivariate analysis of variance and covariance. In Tinsley, H., Brown, S. (Eds.), Handbook of applied multivariate statistics and mathematical modeling (pp. 183-208). New York, NY: Academic Press.
Google Scholar | Crossref
Hyde, J. S., Fennema, E., Lamon, S. J. (1990). Gender differences in mathematics performance: A meta-analysis. Psychological Bulletin, 107, 139-155.
Google Scholar | Crossref | Medline | ISI
Kaufman, A. S., Kaufman, J. C., Liu, X., Johnson, C. K. (2009). How do educational attainment and gender relate to fluid intelligence, crystallized intelligence, and academic skills at ages 22–90 years? Archives of Clinical Neuropsychology, 24, 153-163.
Google Scholar | Crossref | Medline
Kaufman, A. S., Kaufman, N. L. (2004). Kaufman Test of Educational Achievement—Second Edition (KTEA-II). Circle Pines, MN: American Guidance Service.
Google Scholar
Kaufman, A. S., Kaufman, N. L. (2014). Kaufman Test of Educational Achievement—Third Edition (KTEA—3). Bloomington, MN: Pearson.
Google Scholar
Kaufman, A. S., Kaufman, N. L., Breaux, K. C. (2014). Kaufman Test of Educational Achievement—Third Edition (KTEA—3) technical & interpretive manual. Bloomington, MN: Pearson.
Google Scholar
Kimball, M. M. (1989). A new perspective on women’s math achievement. Psychological Bulletin, 105, 198-214.
Google Scholar | Crossref | ISI
Li, Q. (1999). Teachers’ beliefs and gender differences in mathematics: A review. Educational Research, 41, 63-76.
Google Scholar | Crossref
Lindberg, S. M., Hyde, J. S., Linn, M. C., Petersen, J. L. (2010). New trends in gender and mathematics performance: A meta-analysis. Psychological Bulletin, 136, 1123-1135. doi:10.1037/a0021276
Google Scholar | Crossref | Medline | ISI
Lix, L. M., Keselman, J. C., Keselman, H. J. (1996). Consequences of assumption violations revisited: A quantitative review of alternatives to the one-way analysis of variance F test. Review of Educational Research, 66, 579-619. doi:10.3102/00346543066004579
Google Scholar | SAGE Journals
Manger, T., Eikeland, O. (1998). The effects of spatial visualization and students’ sex on mathematical achievement. British Journal of Psychology, 89, 17-25.
Google Scholar | Crossref | Medline
Marshall, S. P. (1983). Sex differences in mathematical errors: An analysis of distracter choices. Journal for Research in Mathematics Education, 14, 325-336.
Google Scholar | Crossref
Olejnik, S. (2010). Multivariate analysis of variance. In Hancock, G., Mueller, R. (Eds.), The reviewer’s guide to quantitative methods in the social sciences (pp. 315-328). New York, NY: Routledge.
Google Scholar
O’Brien, R., Pan, X., Courville, T., Bray, M. A., Breaux, K. C., Avitia, M., Choi, D. (2017). Exploratory factor analysis of reading, spelling, and math errors. Journal of Psychoeducational Assessment., 35(1-2) 8-24.
Google Scholar
Reilly, D., Neumann, D. L., Andrews, G. (2015). Sex differences in mathematics and science achievement: A meta-analysis of national assessment of educational progress assessments. Journal of Educational Psychology, 107, 645-662. doi:10.1037/edu0000012
Google Scholar | Crossref
Reynolds, M. R., Scheiber, C., Hajovsky, D. B., Schwartz, B., Kaufman, A. S. (2015). Gender differences in academic achievement: Is writing an exception to the gender similarities hypothesis? The Journal of Genetic Psychology, 176, 211-234. doi:10.1080/00221325.2015.1036833
Google Scholar | Crossref | Medline
Spencer, S. J., Steele, C. M., Quinn, D. M. (1999). Stereotype threat and women’s math performance. Journal of Experimental Social Psychology, 35, 4-28.
Google Scholar | Crossref | ISI
Stipek, D. J., Gralinski, J. H. (1991). Gender differences in children’s achievement-related beliefs and emotional responses to success and failure in mathematics. Journal of Educational Psychology, 83, 361-371.
Google Scholar | Crossref
Van de gaer, E., Pustjens, H., Van Damme, J., De Munter, A. (2008). Mathematics participation and mathematics achievement across secondary school: The role of gender. Sex Roles, 59, 568-585. doi:10.1007/s11199-008-9455-x
Google Scholar | Crossref
Vermeer, H. J., Boekaerts, M., Seegers, G. (2000). Motivational and gender differences: Sixth-grade students’ mathematical problem-solving behavior. Journal of Educational Psychology, 92, 308-315. doi:10.1037//0022-0663.02.2.308
Google Scholar | Crossref
Voyer, D., Voyer, S. D. (2014). Gender differences in scholastic achievement: A meta-analysis. Psychological Bulletin, 140, 1174-1204. doi:10.1037/a0036620
Google Scholar | Crossref | Medline | ISI
Yee, D. K., Eccles, J. S. (1988). Parent perceptions and attributions for children’s math achievement. Sex Roles, 19, 317-333.
Google Scholar | Crossref | ISI