The “Big Two” in Hiring Discrimination: Evidence From a Cross-National Field Experiment

We tested whether signaling warmth and competence (“Big Two”) in job applications increases hiring chances. Drawing on a field experimental data from five European countries, we analyzed the responses of employers (N = 13,162) to applications from fictitious candidates of different origin: native candidates and candidates of European, Asian, or Middle-Eastern/African descent. We found that competence signals slightly increased invitation rates, while warmth signals had no effect. We also found ethnic discrimination, a female premium, and differences in callbacks depending on job characteristics. Importantly, however, providing stereotype signals did not reduce the level of ethnic discrimination or the female premium. Likewise, we found little evidence for interactions between stereotype signals and job demands. While speaking against the importance of “Big Two” signals in application documents, our results highlight the importance of group membership and hopefully stimulate further research on the role of in particular ethnic stereotypes for discrimination in hiring.


I. Supplementary Tables & Figures
13,162 Standard errors are in parenthesis *** p<.01, ** p<.05, * p<.10 (two-tailed) All estimates result from linear regression models with robust standard errors and controls for religion, grades, and study country. 13,162 Standard errors are in parenthesis *** p<.01, ** p<.05, * p<.10 (two-tailed) All estimates result from linear regression models with robust standard errors and controls for religion, grades, and study country. 13,162 Standard errors are in parenthesis *** p<.01, ** p<.05, * p<.10 (two-tailed) All estimates result from linear regression models with robust standard errors, analytical weights (that adjust for differences in the number of observations by origin group, country, and occupations), and controls for religion, grades, and study country. Figure S1: Results of separate regressions with more fine-grained regional origin groups These coefficien ts plots s how the effects (with 9 0% an d 95% con fidence intervals) of warm th a nd competence s ignals on res pon s e ra tes in s eparate lin ear regress ion s for more fin e-grained regiona l origin g roup s. Th e follow ing orig in cou ntries belong to ea ch group: Cen tra l an d Northern Eu rop e: F ran ce, Ger many, Netherla nds, Norway & U nited Kingdom (not for n atives, res pectively); South ern Europ e: Greece, I ta l y & Spain (th e latter not in Spa in ); Eas tern Europe: Alban ia, Bulga ria, Pola nd, Roman ia & Russia; Eas t As ia: China, Japan & South-Korea; South-Eas t As ia: Ind onesia & Vietnam; Sou th Asia: Ind ia & Pa kista n; MENA: Egypt, I ran, Iraq. Leb anon, Morocc o & Tu rkey; an d Sub -Sah aran Africa: E thiop ia, Nigeria, & Uga nda. A ll es timates resu lt from lin ear regres sions models w ith robu st sta nda rd errors an d con trols for job can dida tes' religion, gra des, and the coun try of s tudy.

Figure S2: Main effects for more fine-grained regional origin groups
This coefficien ts p lot show s th e main effects from the linea r regress ion (w ith robu st standa rd errors a nd con trols for job cand ida tes' religion, grad es, a nd the coun try of study ) of employer res pon se on th e indepen de n t variab les (w ith 90% a nd 95% confid ence intervals) . The ana lysis cons id ers more fin e -grain ed regiona l orig in group s. Th e follow ing origin cou ntries belong to ea ch g roup : Cen tral and Northern Eu rop e: France, German y, Neth erla nds , Norway & U nited Kingd om ( not for natives, resp ectively); Southern Europe: Greece, I ta ly & Spa in (the la tter n ot in Spain); Eas tern Eu rop e: These coefficien ts plots s how the effects (with 9 0% an d 95% con fidence intervals) of warm th a nd competence s ignals on res pon se ra tes in s eparate lin ear regress ion s for male an d fema le job can didates of d ifferen t orig in in occupa tion s with low or high levels of cu stomer con tact or qua lifica tion req uirements, res pectively. All estimates res ult from lin ea r regress ion s models with robus t sta nda rd errors a nd con trols for job cand ida tes ' religion, grad es, a nd stu d y coun try.

II. Validation study
As proposed by the anonymous reviewers of our manuscript and the editor, we conducted a posthoc validation study to test whether the warmth and competence manipulation in the application documents increased, as intended, the perceived warmth and competence of the job applicant.

Design and method
The post-hoc validation study was conducted with a German convenience sample. The online survey was advertised on Facebook in groups that are predominantly used by German (psychology) students. Data were gathered between March and April 2020 1 . 318 users participated in the validation study (MAge = 26.3, SDAge = 6.23; 71 % female).
In this study, each respondent was randomly assigned to one job application. The application documents consisted of a cover letter and a CV with photo. In contrast to the main study, we did not show copies of school leaving or job training certificates to reduce the number of documents and the amount of information; in particular, we expected some people to fill in the survey on a mobile phone and copies of certificate would not have been easily readable. However, we informed respondents that this information had also been provided by the job candidate. After reading the application documents, respondents were asked to evaluate the job candidate with respect to warmth and competence.

Experimental treatments
We used four profiles from the main study: male and female applicants (with typical German names and photos, i.e. white skin and brown hair) that applied for two medium-skilled jobs, one with little (payroll clerk) and one with rather frequent customer contact (hotel receptionist). In addition, we randomly assigned the warmth and competence treatments from the main study.
Each respondent was randomly assigned to one out of the resulting 16 (gender by job by warmth by competence: 2-2-2-2 design) profiles.

Dependent variables
We asked respondents to evaluate warmth and competence using the items from the four facets of agency and communion that had been proposed by Abele et al. (2016) 2 , asking the participants to rate how much the respective traits applied to the applicant. A bipolar format with a 5-point Likert-scale (e.g., from "little efficient" to "very efficient") was used for each of the 20 items. The scale considers more recent research on the Big Two suggesting two facets of communion (warmth and morality) and two facets of agency (competence and assertiveness). Table S6 further below lists all items in English and in German. scree-test also yielded two factors with eigenvalues larger than one. Using R, we did a parallel analysis, which also suggested a two-factor solution.

To test whether the four dimensions proposed by
Next, we investigated the rotated factor solutions. An orthogonal varimax rotation of the twofactor solution yielded a very conclusive pattern of factor loadings (see Table S6), while the rotated factor loadings in the four-factor solution did not provide a meaningful pattern. 3 All items from the two facets of agency, that is assertiveness (AA1-5) and competence (AC1-5), and two items from the morality facet of communion (CM4 & CM5) loaded on the first factor with values larger than .40 (except for AA5: loading <.40). The second factor was comprised of all ten communion items with factor loading larger than .40, including the warmth (CW1-5) and the morality (CM1-5) facets. We excluded the two morality items that loaded highly on both factors (CM4 & CM5) and the assertiveness item (AA5) with loadings below .40 from the two factors. The rotated factors explained 51 and 49 percent of the variance. In sum, our explanatory factor analyses recommended a two-factor solution, while confirmatory factor analyses were slightly more supportive of the four factor solutions, which ties in with the facets model that was developed by Abele et al. (2016). Both models fit the data quite well. Since the decision of whether the facets model or classic Big-Two models are better suited to describe fundamental stereotype content dimensions is beyond the scope of this validation study, we computed in total six scales to test the validity of our experimental material: four scales that reflect the facets model (i.e. Warmth, Morality, Assertiveness, and Competence) and two scales that combine the facets (i.e. Communion and Agency) 5 . The internal consistency of all six scales was high (see Table S7). On average, job applicants were rated positively on all four facets and on the two fundamental dimensions; with values slightly above 3.5 on response scales from 1 to 5 (see Table S7). 4 When limiting the analyses to the 17 items from Table S6 with high loading and no cross-loading, the fit indices further improved -but the four-factor solution had again a slightly better fit (four-factor solution: RMSEA = .064, CFI = .939, SRMR = .046, Chi2(118)=255.410, p<.001; two-factor solution: RMSEA = .067, CFI = .930, SRMR = .053; Chi2(118)=283.023, p<-001) 5 In the mains study, we termed these two scales perceived "warmth" and "competence" instead of "communion" and "agency" to be more consistent with the designation of the experimental treatments.
14 Exploratory factor analyses, by contrast, yielded consistently two factors with eigenvalues larger than one. In the rotated two-factor solution (orthogonal, varimax), two out of the five morality items had high cross-loadings (i.e. they loaded strongly on both factors) and the loading of one assertiveness item was weaker than .40 (see Table S6). We therefore excluded these three items (see the crossed items) und summarized the remaining 17 items to two scales: Communion and Agency. Because the results of confirmatory factor analyses were slightly in favor of a four-factor solution, we in addition provide the descriptive statistics for the four facets scales: Warmth, Morality, Assertiveness, and Competence. higher in communion (b=.13, p<.10) and warmer (b=.13, p<.10) than applicants who did not include this information in the application (see the first two columns in row A of Table S8). For the morality facet there was no effect (b=.05, ns); which makes intuitive sense, since the warmth treatment in the application documents did not reflect morality but a warm and social personality. Neither adding the competence treatment (row B) nor adding the competence treatment, applicants' gender, and the job (row C) changed the results. In a similar vein, applicants who described themselves as competent were also perceived as more agentic (b=.25, p<.01), assertive (b=.29, p<.001), and competent (b=.19, p<.05) than applicants who did not include this information in the application (see row A on the right-hand side of Table   S8). Again, these effects hardly changed when simultaneously controlling for the warmth treatment (row B) or the warmth treatment, applicants' gender, and the job to the models (row C). Importantly, the competence treatment had no effect on perceived communion, warmth, or morality, and the warmth treatment had no effect on perceived agency, assertiveness, or competence.

Communi on
In line with previous studies in gender stereotypes, we confirmed that female job applicants were generally (and in line with gender stereotypes) perceived as being warmer and higher in communion than male job applicants-independent of the absence or presence of the warmth treatment. Warmth and competence ratings did not vary systematically between job types.

Summary and Discussion
In sum, the validation study confirmed that both the warmth and the competence treatments had the intended effects. However, the effect of the warmth treatment was only marginally significant (i.e., significant with a directed hypothesis) and both effects -i.e. the effect of the warmth treatment and the effect of the competence treatments-were rather small in terms of effect size.
The low power of our manipulation is of course a potential problem for the interpretation of nullfindings in the main study, because it might imply either that the hypothesis was false or the manipulation too weak. However, there was hardly any alternative. Our experimental material was created for a large-scale field experiment on the discrimination of ethnic minorities in the labor market in five European countries, which means that the plausibility of the experimental material (the applicant had to be a real competitor for the advertised position and it was important to avoid the risk of detection) and the comparability across national contexts (e.g., while in all countries under study self-reports are a typical component of applications documents, Otherwise the findings of such a study would be misleading. The use of excessively strong, artificial, or unusual experimental treatments could bias the observable between-group and between-country results in either direction. The priority of our study was to create carefully constructed, plausible and comparable application documents, which comes at the cost of having rather unobtrusive warmth and competence signals and thus a rather weak experimental manipulation. The significant impact of the warmth and competence signals on candidates' assessment, however, suggests that we succeeded in finding the right balance between signal strength and plausibility.