Sticky Floors, Double-Binds, and Double Whammies: Adjusting for Research Performance Reveals Universities’ Gender Pay Gap is Not Disappearing

We use 12 years of holistic research performance scores for each academic in all New Zealand universities to ask whether gendered gaps in pay, age, research performance score, and performance-adjusted pay are narrowing with time. We find that the gender gaps in age and research performance score narrowed from 2006 to 2018, but the gender gaps in pay and performance-adjusted pay did not. Controlling for research performance score, age, and field, women’s odds of advancement to high ranks converged to around two-thirds of men’s advancement odds. Similarly, the lifetime performance-adjusted gender pay gap has converged to around one-third the price of an average home. Trends suggest that, no matter how much women improve their research scores and “lean-in” for promotions, the performance-adjusted pay gap has plateaued to an equilibrium. Data from an entire national workforce fail to support the most common explanations for academic gender pay gaps—child-bearing and demographic inertia. But results are consistent with a systemic dynamic in which double-binds and double-whammies create a sticky floor that starts women at lower pay and impedes their advancement over time. We conclude with some suggested remedies from the literature of behavioral science. Without concerted efforts to change the structures of hiring and promotion, the gender pay gap in universities will not disappear of its own inertia.

In asking whether the gender performance gap explains the gender pay gap, we explore several theoretical explanations for gender pay gaps around the world in many labor fields-gendered differences in employment choices, child-bearing, and familial responsibilities-and several more unique to university workplacesdemographic inertia, the gender age gap, and cultural and structural dynamics within the university workplace.To this rich theoretical and empirical foundation, we add two theoretical explanations that have implications far beyond New Zealand-the double-whammy and the double-bind.Traditional explanations for an academic pay gap include universities expecting women to teach more (El-Alayli et al., 2018) coupled with gender bias in teaching evaluations (Boring, 2017;MacNell et al., 2015;Murray et al., 2020), and ''asking women more often'' (O'Meara et al., 2017) to perform administrative service and pastoral care roles (Guarino & Borden, 2017) with low ''promotability'' (Babcock et al., 2017).Recent ground-breaking research finds that under-represented groups (including women in science) create more impactful innovations, yet their higher impact gains lower recognition and lower rewards in academic appointments and tenure (Hofstra et al., 2020).
We track individual research performance scores assigned by government-appointed national panels (Brower & James, 2020), academic rank (Lecturer, Senior Lecturer, Associate Professor, Professor), and advertised pay for every ranked university academic in New Zealand (NZ) from 2006 to 2018 (2006: 1,993 women, 3,422 men; 2012: 2017 women, 3,258 men; 2018: 2,388 women, 3,289 men).We analyze four gender gaps in university staff: age; research performance score; pay; and performance-adjusted pay.We use this globallyunique dataset to ask whether the performance-adjusted pay gap (Brower & James, 2020)-that portion of the gender pay gap that age and performance do not explain-is diminishing with time.
We find, despite women's gains in narrowing the performance gap from 2006 to 2012, performance improvements stabilized from 2012 to 2018.And the lifetime performance-adjusted pay gap (also adjusted for age and inflation) has decreased only slightly, from just over to just under a third the 2019 price of an average home.
The size, depth, and intricacy of our dataset makes it a major contribution to the wealth of recent gender, diversity, and inequality research.Others have found gender pay and rank gaps in universities and departments (Chen & Crown, 2019;Cooray et al., 2014), but no one else has tracked research performance over more than a decade across all fields and all universities in an entire national workforce.
This paper expands on previous research on NZ universities finding that, in 2012, a man's odds of attaining the rank of Professor were roughly double the odds of a woman of the same research score and age (Brower & James, 2020).While the previous work was a point in time, the data for this paper span 12 years, allowing exploration of both changes over time and possible solutions.This paper is a globally unique longitudinal study tracking individuals through time only made possible with the additional data from 2003/6 and 2018.Indeed tracking research performance over time makes us uniquely placed to speak to, and refute, three of the most common explanations for an academic gender pay gapchild-bearing, gender bias in research or teaching scores, and demographic inertia-and comment on a fourthgendered patterns in salary negotiations within hiring and promotions.

New Zealand's Performance Based Research Fund
Our dataset includes 12 years of anonymized research scores, for every university researcher at all of New Zealand's universities, obtained through the Official Information Act 1982.The scores arise from the globally-unique Performance Based Research Fund (PBRF).While other countries evaluate research on a collective level, PBRF is the only assessment to score each and every researcher individually (Brower & James, 2020).
Once every 6 years (in 2003, 2006, 2012, 2018) PBRF appoints a panel of 2 to 4 subject experts in each of 42 subject areas (e.g., Human Geography, Earth sciences) to review and score a portfolio for every researcher who self-assigns to that subject area.Panelists are generally professors and prominent researchers, both nationally and internationally.At least one of the members of each panel is usually based overseas.The first PBRF assessment was in 2003, but was problematic enough that every graded researcher had the option to be re-assessed in 2006.If an individual had a score for both 2003 and 2006, we used the 2006 score.If there was no 2006 score but the person was still employed by a New Zealand university, we used the 2003 score.To protect anonymity, we collated subjects into six fields (Arts, Business, Education, Engineering, Medicine, and Science).
In a portfolio, a researcher chooses their four best research outputs (books, articles, exhibitions), describes their significance and impact, and lists other publications during the 6 years.Research outputs comprise 70% of the score.The outputs and their impact vary greatly by field.Impact might range from contribution to pharmaceutical development for a chemist, or high citation count for a physicist or biologist, or inspiring legislative change for a scholar of law or politics.Hence the chemistry panel does not review political science portfolios.The remaining 30% assesses the individual's contribution to the research environment and peer esteem (e.g., journal editing, key-notes, conference organization, awards) during the assessment period.PBRF makes some allowance for part-time employment and extraordinary circumstances that impaired an individual for at least 3 of the 6 years.2018 was the first time panelists received unconscious bias training.
The PBRF panel scores every portfolio from 0 to 700 points.Scores then cluster into grades (600-700 = A; 400-599 = B; 200-399 = C; 0-200 = R (research inactive)).An individual who began doing independent (post-degree) academic research within the assessment period may be classified as new and emerging if they are in either the R or C grades, becoming R(NE) or C(NE).The Tertiary Education Commission allocates research funds to universities based on staff grades and fields.For example, an A grade chemist attracts more funding than a B grade chemist, who attracts more than a C grade chemist; R grades attract no funding.Research funds from PBRF accrue to the university, not to the individual researcher.Scores are known to the individual, Faculty dean, and Deputy Vice Chancellor Research, not to the individual's head of school or department.
Between 2006 and 2012 the average research score across all individuals in NZ increased by almost 70 points from 354 to 423.Between 2012 and 2018, scores were more stable, with the average score increasing by only 10 points to 433.This initial increase likely reflects the bedding in of a new system, as individuals, universities, and the governing body (Tertiary Education Commission or TEC) familiarized themselves with the scoring system (Buckle & Creedy, 2018).

Methods
We use a national dataset spanning 12 years to analyze the performance-adjusted gender pay gap in NZ universities.Borrowing from the methods of Brower and James 2020 (Brower & James, 2020), we model score as a function of age using a linear model Score;Age 2 + Gender + Field to compare men's career trajectories to women's in each assessment round.Each individual's salary is estimated using publicly available salary bands for each university.All salaries are adjusted to the 2018 levels to avoid biases due to inflation or similar.We model salary using a linear model Salary;Age 2 + Gender + Field + Score: Our aim is to find the Salary trajectory of an individual of either gender in a particular field from age 30 to 65.We first find this individual's expected Score every year from age 30 to 65.We then use the expected Score at each Age to predict their expected Salary.The lifetime gender pay gap is the predicted total lifetime pay difference from age 30 to 65 for individuals, assuming the predicted mean score in each year.
This lifetime pay gap is affected by Score in two ways: (1) explicitly through the Gender variable in the Salary model and (2) implicitly through the difference in the Score trajectories of men and women.In other words, women are paid differently because they are female and because they have lower research scores.To separate these effects, we use the performance-adjusted gender pay gap.This compares the predicted lifetime male salary to the equivalent female salary, assuming both were on the average male score trajectory (Brower & James, 2020).
In other words, the performance-adjusted pay gap removes the implicit effect of women's different research scores, and renders more obvious the gender effect (which could in turn be related to teaching differences, administrative duties, or many things other than gender bias).If the sole contributor to the gender pay gap were, say, time off for child-bearing or lower productivity due to child-rearing responsibilities, we would expect women to have much lower research scores.Hence if children were the primary driver of a performance-driven gender pay gap, we would see a very small performance-adjusted gender pay gap precisely because we have already corrected for the baby-induced score gap.
Odds ratios, comparing men's odds of attaining a certain rank to women's odds, were calculated from the Gender coefficient of the logistic model logit Is Rank or above ð Þ ;Age 2 + Gender + Field + Score: When too few records were available, Field was omitted.
Comparisons of means used two-sided t-tests; ratio comparisons used binomial tests.Significance was at the .05level throughout.All analysis was done in Matlab 2018b.

Gender Gaps by Field of Research
Representation Gap.Combining all fields, the proportion of female academic staff is approaching parity (Figure 1A).Arts and Medicine have reached parity; Business and Science have become less female dominated; Engineering and Education made minimal progress.Progress appear relatively steady when compared across the two time periods 2006 to 2012 and 2012 to 2018.
Age Gap.Despite more women entering university employment, the gender age gap narrowed only slightly from 2006 to 2018 with the biggest shifts in Engineering and Science (Figure 1B).Among individuals who entered the dataset between 2012 and 2018, in any field, women were on average significantly, but only slightly, older than men (women 42.3 years old in 2018; men 41.6; p = 0:03); and women who left were younger (women 50.3 years in 2012; men 53.8; p = 10 À10 ).Hence, it is likely men stay in academia longer than womenarriving younger and leaving older.Score Gap.Using the linear model for predicting research score by age we see that women's predicted scores at age 50 are still lower than men's (Figure 1C).Although this score gap across all fields narrowed markedly from 2006 to 2012 (15.1%-8.3%), it changed little (8.3%-7.9%)from 2012 to 2018.Considering individual research fields reveals the same pattern, of a large improvement between 2006 and 2012 followed by a much smaller improvement or plateau between 2012 and 2018, can be seen in Arts, Business, Education, Medicine, and, to a lesser extent, Science.
Pay Gap.Using the linear model to predict Salary for an individual with the expected Score at each age between 30 and 65, we can calculate the expected lifetime earnings of a man and a woman in each Field at each of the three time points (Figure 1D).The lifetime gender pay gap across all fields shows a steady narrowing from 2006 to 2018, from $NZ366,000 to $NZ302,000.However when we consider fields individually, the improvement is very unevenly distributed across all academics.Again, Arts is approaching parity and there are clear improvements in Business and Science; but in Education, Medicine, and Engineering there is no clear pattern of improvement.The biggest pay gap is seen in medical research fields.This group is dominated by New Zealand's two large medical schools (Auckland and Otago) but also contains many other researchers at other universities in related areas like Health Sciences.
Performance-Adjusted Pay Gap.The performance adjusted pay gap considers the lifetime salary of a woman with the expected (and usually higher) research performance score of a man (Figure 1E).Superimposing women of equal scores to the average male onto the average female pay trajectory eliminates any effects of the gender score gap.When we consider all fields collectively while adjusting for score, we see no clear gains from 2006 to 2018.The performance adjusted gap was $NZ172,000, $NZ186,000, and $NZ161,000 in 2006, 2012, and 2018 respectively (all in 2018 dollars).When we consider fields of research separately, only Science appears to have made any clear progress.By 2018, 15% of women scored A grades, although 23% of men scored an A.

Gaps by Rank and PBRF Grade
Clearly women have made gains since 2006 but the question remains whether these gains are felt equally by women at every academic rank and PBRF grade.To explore this, we compare the gender odds ratio of a woman achieving a particular rank compared to a man.This accounts for differences in performance scores and field between the genders.Starting with individuals with any research score (Figure 2B), we see that men's comparative odds of achieving a particular rank increase with grade.Accounting for differences in Score, Field, and Age, a man's odds of achieving Professor are approximately double a woman's of the same age and score.By comparison, his odds of achieving a rank of at least Senior Lecturer are around 1.5 times hers.Looking across the years, men's comparative odds of being ranked Associate (AP) or full Professor (P) decreased from just over double women's odds in 2006 (P: OR = 2:1, p = 10 À8 ; AP: OR = 2:0, p = 10 À11 ) to just under double in 2018 (P: OR = 1:8, p = 10 À9 ; AP: OR = 1:7, p = 10 À10 ), (Figure 2B).But the performanceadjusted gender odds ratio of making it out of the lowest, Lecturer (L), rank, whilst overall lower, worsened between 2006 and 2018 (L (2006): OR = 1:5, p = 10 À7 ; L (2018): OR = 1:7, p = 10 À8 ).
If we instead consider only A-grade individuals (Figure 2C) the picture is closer to parity (odds ratios closer to 1) at the middle ranks but not the top.In 2018 (green bars), the highest scoring women (A-grade, 600 ł Score\700) have similar odds as A-grade men of attaining the middle to upper-middle ranks (SL or AP), but significantly worse odds of reaching the highest rank (P).From 2006 to 2018, A-grade women's odds of being a Professor changed very little at just over two-thirds of A-grade male odds.
For B-grade (400 ł Score\600) women (Figure 2D), the odds of achieving any rank are significantly worse than men with the same score, and worse than A-grade women's odds.Again we see the pattern that, as the rank increases, so does the gender odds ratio.Whilst there appears to have been some improvement between 2006 and 2018, there is no clear pattern.
Finally, women with the lowest scores (CR/C grades: Score\400, comprising 42% of women in 2018) (Figure 2E) have the biggest odds difference to men of achieving any rank above Lecturer.The pattern of odds increasing (getting worse for women) with rank is most apparent in this group, particularly in the 2006 and 2018 data.Figure 2A shows that the R/C grades are the most common grade for women in NZ universities.

Discussion
We analyze research performance scores for every university researcher in New Zealand and find that gender gaps in age, performance, pay, and performance-adjusted pay narrowed only slightly from 2012 to 2018.The odds ratios of men and women with similar score attaining the same rank hover between 1.5 and 2, meaning men's odds are 1.5 to twice women's.The lower the research score, the more likely men are to outrank women of similar score and age.The higher the rank being considered, the less likely women with any score are to attain it.These patterns changed little in 12 years.
The observed pay gap patterns are different to pay gaps observed outside academia, where the pay gap is narrowing most quickly at the bottom and middle of the wage distribution, but widening at the top (Blau & Kahn, 2017).We find the floor has become ''stickier'' (Bjerk, 2008) for women in the lowest research scores and ranks, while the glass ceiling for the top female researchers remains unchanged.
We now compare our results to international trends in, and theoretical explanations for, gender pay gaps.

Child-bearing
We theorize that our results do not support the most commonly touted explanation for the academic gender pay gap-child-bearing (Mason et al., 2013).We expect that parental leave for child-bearing and larger familial responsibilities could drive a large gender score gap.But if children were the primary explanatory variable for the pay gap, adjusting for research performance score would cause the performance-adjusted gender pay gap to all but disappear.Hence, child-bearing cannot explain the persistent performance-adjusted pay gap observed across a national workforce of university academics.Indeed, even the A-grade women's odds of attaining a professorial rank and pay have stagnated well below the odds of men with the same score.

Elite Institutions
A possibly unique feature of the New Zealand university system is its egalitarian nature.With only eight institutions, and common student entry requirements to all of them, the hierarchical structure of elite institutions which employ mainly men and lower prestige institutions with more female staff seen in other countries is largely absent.This puts our study in a unique position to examine the root causes of the gender pay gap without the confounding issues of institutional and employment choices playing a role.

Gender Bias in Research Evaluation
Similarly, gender bias in perceptions of research quality (Krawczyk & Smyk, 2016) and narrow views of what constitutes research impact or innovation (Hofstra et al., 2020), while well documented, cannot explain our results.Indeed if research scores are gender biased, it is possible women are performing better than their scores suggest because their research quality (Krawczyk & Smyk, 2016) or innovations and research impact (Hofstra et al., 2020) are less recognized than those of majority groups, or that women's research topic choices are less highly regarded in the PBRF scoring process (Hoppe et al., 2019).If so, our observed performance-adjusted pay gap underestimates the lived experience of the gender pay gap in universities.

Demographic Inertia
Our results are also inconsistent with the demographic inertia prediction, that gender gaps are a hangover from the previous generation (Shaw & Stanton, 2012) that will remedy themselves with time.Indeed our ability to measure research productivity over time adds to and confirms others' finding that the ''pipeline'' is an inadequate explanation for gender pay gaps in universities (Momani et al., 2019).

Gendered Negotiation Behaviors and Sticky Floors
Conversely, gendered negotiation patterns in hiring and/ or promotions (Babcock et al., 2017) could explain some of the observed pay gap, both with and without adjusting for score.While some research finds women's choices against competitive activities can explain some of the gender pay gap (Heinz et al., 2016), the logical solution to that problem-that women should ''lean in'' (Sandberg, 2015)-does not work.Indeed ''leaning in'' with greater confidence works better for men than for women (Bowles et al., 2007;Risse, 2020).Further, recent research suggests that women are as likely as men ask for promotions, but less likely to get what they ask for (Artz et al., 2018).Hence it seems unlikely that any amount of women applying for promotions sooner or behaving more confidently or competitively in the workplace will bridge the academic gender pay gap we observe.
Structural factors can influence salary negotiation behavior.When salary negotiation at time of hiring is normalized as an acceptable option, women ask as often as men for higher salaries.But when salary negotiation is not a stated option, women agree to work for less than men do (Leibbrandt & List, 2015).NZ universities have transparent pay scales for nearly all academic staff.This could inadvertently contribute to a pattern of women starting their careers at lower ranks, hence creating, or at least exacerbating, a ''sticky floor'' problem (Bjerk, 2008) for women in academia and contributing to the performance-adjusted pay gap we observe.

Double-Binds and Double-Whammies
Having tested the most common theoretical and empirical explanations, we now use our data to contribute to theory with implications beyond New Zealand.In an environment such as NZ universities, where collegiality is important in both promotions and hiring decisions, academic women might also suffer Bohnet's ''doublebind'' problem, in which women are ''perceived as either likable or competent but not both'' (Bohnet, 2016).If women are indeed ''asked more often'' to do administrative and pastoral care duties that take time and attention away from high-status and high-reward research (O'Meara et al., 2017), and they must be collegial to get promoted, refusing the tasks with low ''promotability'' (Babcock et al., 2017) risks hurting their promotion prospects as much as publishing less would.
Others have observed that students and organizations simultaneously expect more from women (Babcock et al., 2017;Card et al., 2019;El-Alayli et al., 2018;Guarino & Borden, 2017;O'Meara et al., 2017), while disadvantaging them in teaching evaluations (Boring et al., 2016;MacNell et al., 2015).If universities over-demand and under-reward teaching and service, they create a ''doublewhammy'' (Brower & James, 2020).In other words, the ''double-whammy'' posits that women might simultaneously research less due to higher teaching and service expectations, while still failing to meet the burden of those higher expectations (Brower & James, 2020).The doublewhammy is consistent with our observed persistent gaps in research score and performance-adjusted pay.
While unconscious bias might be at the root of the performance-adjusted pay gap patterns we observe, behavioral science suggests combatting bias is not the best way to fix the inequities it creates (Bohnet, 2016).Training people to try to consciously suppress unconscious bias has limited success (Chang et al., 2019).Instead, behavioral science suggests changing the organizational structures around the ''choice architecture'' (Thaler & Sunstein, 2009) of hiring and promotions will be more efficient and effective at creating a more equitable workplace.Interventions could include re-framing job advertisements and monitoring pay equity closely and transparently (Bohnet, 2016).

Conclusion
We conclude that, no matter how much women improve their scores, the performance-adjusted pay gap is beginning to plateau to an equilibrium in which a woman's odds of reaching the top ranks are two-thirds of a comparable man's odds, even after adjusting for age and research score.Neither improving women's research performance nor waiting out demographic inertia will make the academic gender pay gap disappear any time soon.''Leaning in'' and applying for promotion more aggressively do not hold much prospect either.Interventions to change the ''choice architecture'' in hiring and promotions, as well as recognizing diverse forms of research impact in both processes, might meet more success.Though our methods to measure a gender pay gap across an entire national workforce are unique to New Zealand, the explanations and suggested solutions have international relevance, within universities and beyond.

Figure 2
Figure2considers the gender gaps in terms of academic rank and PBRF grade.In 2006, only 5% of women (compared to 14% of men) achieved the highest research grade of A (Score ø 600), (Figure2A).

Figure 1 .
Figure 1.Five gender gaps in NZ universities measured in 2006, 2012, and 2018 by field of research: (A) the proportion of female academic staff (* connotes significant difference from parity (p\0:05)), (B) the age gap between the average man and the average woman, positive denotes men are older, (C) the gap in predicted research score at age 50, measured as a percentage of men's expected scores, (D) the predicted lifetime gender pay gap between the average man and the average woman, and (E) the lifetime performance-adjusted pay gap, also in $NZ 1000s, for men and women of the same score.

Figure 2 .
Figure 2. Gender gaps by research performance level and rank: (A) the distribution of research grade for each gender, (B-E) controlling for age, score, and field, the gender odds ratios of achieving a minimum rank from 2006 to 2018 (* connotes significant difference from parity (p\0:05)), (B) all researchers, (C) A grade researchers only, (D) B grade researchers, and (E) R and C grade researchers.