Female Underperformance Hypothesis Revisited: Methodological Review and Empirical Testing

Comparison between the performance of female and male-managed firms has long been a subject of research interest. Although the argument is that firms run by women have lower performance than those run by men, there is no agreement on the effects of managerial gender on companies’ financial outcomes. This study conducts a methodological review of quantitative research on the relationship between female business leadership and firm performance from 2010 to 2020. This review identifies the most frequently used dependent and explanatory variables and econometric models in the literature. Most studies have not considered endogeneity bias in their model specifications; therefore, these results could be biased and unreliable. We select empirical models to test the female underperformance hypothesis using a sample of Chilean firms. Our findings suggest that managers’ gender does not significantly affect business performance when endogeneity is addressed. Our methodological review reveals a significant gap in the research on female managers and firm performance in the Latin American context, and the empirical test provides new evidence in this vein. Plain Language Summary A Fresh Look at the Female Underperformance Hypothesis Comparison between the performance of female and male-managed firms has long been a subject of research interest. Although the argument is that firms run by women have lower performance than those run by men, there is no agreement on the effects of managerial gender on companies’ financial outcomes. This study conducts a methodological review of quantitative research on the relationship between female business leadership and firm performance from 2010 to 2020. This review identifies the most frequently used dependent and explanatory variables and econometric models in the literature. Most studies have not considered endogeneity bias in their model specifications; therefore, these results could be biased and unreliable. We select empirical models to test the female underperformance hypothesis using a sample of Chilean firms. Our findings suggest that managers’ gender does not significantly affect business performance when endogeneity is addressed. Our methodological review reveals a significant gap in the research on female managers and firm performance in the Latin American context, and the empirical test provides new evidence in this vein.


Introduction
Many studies have examined the relationship between women's presence in corporate leadership positions and firm performance (Gipson et al., 2017;Lam et al., 2013;Mohan, 2014).For over 30 years, researchers have compared the financial reports of companies owned by men and women (Cuba et al., 1983;Fischer, 1992;Johnson & Storey, 1993).This research field began in the area of entrepreneurship and rapidly extended to studying the performance of companies with women in management positions (Fairlie & Robb, 2009;Kolev, 2012).Over time, the claim that women-owned or -managed companies perform worse than those owned by men or with men as CEOs stuck in the literature (Du Rietz & Henrekson, 2000), and this assumption was labeled the Female Underperformance Hypothesis (FUH) (Dean et al., 2019).
Research on FUH has generated mixed results, with evidence supporting and rejecting this hypothesis (Bennouri et al., 2018;Martı´n-Ugedo et al., 2019;Singhathep & Pholphirul, 2015).Some studies have found that female management positively affects firm performance (Christiansen et al., 2016;Conyon & He, 2017;Moreno-Go´mez et al., 2018;Noland et al., 2016), while others have found that female-managed companies perform worse (Lim et al., 2019;Singhathep & Pholphirul, 2015).For example, Menicucci et al. (2019) have found that women-managed hotels in Italy outperformed those managed by men in terms of hotel growth.Kaur and Singh (2019) have suggested that firms led by female CEOs are negatively related to firm performance.The third group of studies has indicated that managers' gender has no significant effect on firm performance (Singh et al., 2019;Unite et al., 2019;Vu et al., 2019).Kristanti and Iswandi (2019) have shown that gender diversity in leadership positions is insignificant for business performance.Considering that there isn't a clear agreement on the FUH in previous studies, we are motivated to take another look at this issue and gather new evidence.We test the FUH using state-of-the-art dependent and explanatory variables and econometric methods and apply them to new data (Brahma et al., 2021;Watson, 2020;D -a.ng et al., 2020).The increasing number of women in business leadership positions in recent decades has fueled continued interest in evaluating the financial performance of women-led companies (Helfat et al., 2006;Noland et al., 2016).As many researchers have pointed out, accepting FUH reinforces stereotypical social constructions of gender, in which management positions are interpreted as being more compatible with a masculine image (Carmona et al., 2018;Cook & Goodman, 2006;Foley et al., 2005;Metcalfe, 2007).Most studies on FUH are quantitative and based on econometric methods using only financial indicators as dependent variables (J.Chen et al., 2018;Isidro & Sobral, 2015;Moreno-Go´mez et al., 2018).Research on this topic has been conducted using data from large publicly listed companies (Terjesen et al., 2009).This research methodology cannot encapsulate all dimensions of firm performance and its focus on large firms limits the findings to a small portion of the business ecosystem (INE, 2017(INE, , 2022;;Krishnan & Park, 2005;Rodrı´guez-Domı´nguez et al., 2012).Given these limitations, we believe there is room for new studies on FUH that consider women's recent business participation and new business ecosystems.
This study conducts an extensive review of quantitative research methodologies that implement different estimations to test the FUH in female-managed firms.We identify the variables and econometric methods used in the literature and apply them to a sample of 2,323 Chilean firms.The firms in this study have many different characteristics, such as legal establishment, sector, size, and age, which reflect the heterogeneity of the Chilean business system.Our findings do not support FUH when the specification endogeneity bias is resolved.Under the correct specification of the econometric model, female-managed firms do not underperform compared with businesses run by men.We show that a methodological research design can induce imprecise conclusions if the endogeneity bias is not addressed or non-representative samples are used.The endogeneity problem has been recognized in some empirical studies, and the authors have explored techniques to address this issue.However, endogeneity bias is not the primary concern in most specifications to establish the effect of CEO's gender on firm performance.Although endogeneity bias may seem to be a non-critical issue in the exploration of FUH, we demonstrate that specification problems should be considered, at least to ensure that the results are robust under alternative estimation methods.We contribute to the literature on female business leadership by suggesting a novel research agenda with methodological recommendations for future studies.
Section 1 presents the theoretical framework of this study.Section 2 describes the methodological design of the study.Section 3 presents the results discussed in Section 4. Finally, the conclusions are presented in Section 5. Finally, the implications of our study are presented in Section 6.

Theoretical Framework
Researchers have compared the financial data of maleand female-led firms over 30 years (Cuba et al., 1983;Fischer, 1992;Johnson & Storey, 1993).This research began with entrepreneurship, comparing the performance of small firms initiated by both men and women (Fairlie & Robb, 2009;Hisrich & Brush, 2019;Loscocco et al., 1991).This topic was later expanded to studying the financial performance of companies managed by women (Kolev, 2012).Recently, researchers have focused on measuring firm performance in companies in which women serve on the board of directors (Aggarwal et al., 2019;Rose, 2007).In this research on women and firm performance, an underlying assumption is that femaleowned firms or firms managed by women have lower performance than those that are male-owned or where the CEO is male (Du Rietz & Henrekson, 2000).This assumption is known in the literature as the FUH (Dean et al., 2019;Watson, 2002;Westhead, 2003).The FUH has been studied in diverse contexts, including entrepreneurship, management, and corporate governance.However, the body of research in these three fields can be understood as literature on female leadership and its relationship with business and financial performance (Kotiranta et al., 2010).Most studies have not explicitly mentioned FUH as a motivation for research, although their findings either support or reject this hypothesis (Zolin et al., 2013).
The first studies addressing women and firm performance can be found in female entrepreneurship by the end of the 1970s and 1980s (Moore, 1990).As more women started businesses and began working independently, interest grew as to whether these women-run businesses performed as well as those created by men (Chaganti, 1986).Findings have shown that womenowned companies have lower performance than maleowned companies (C.G. Brush, 1992;Hisrich & Brush, 1987;Loscocco et al., 1991).Several studies with similar results have reinforced this idea.Currently, there is agreement that the types of business opportunities that women approach tend to be in industries with relatively low performance (e.g., retail and service) (Ahl, 2006;Ahl & Marlow, 2012;Gupta et al., 2019;Hundley, 2001;Sullivan & Meek, 2012).
Traditionally, a manager's position is associated with a masculine image, which is a persistent stereotype of company leadership (Ellemers & Nadal, 2018).Maledominated organizations are often reluctant to accept women in leadership roles because of the lingering perception that women do not have the attributes necessary to be successful in roles and positions typically viewed as masculine (Grover, 2015;Heilman, 2012).The small number of women in corporate leadership positions supports these exclusionary tendencies (Nili, 2019).Some evidence has demonstrated that women have fewer opportunities to access managerial positions because of gender bias (Sarrio´et al., 2002).The increased presence of women in management positions has motivated renewed interest in understanding the distinct characteristics of women managers and their ability to meet or surpass the performance of their male counterparts (Helfat et al., 2006;Kanuri & Malm, 2018;Noland et al., 2016).
Studies of FUH have yielded contradictory findings.Many studies have shown that female leadership positively affects firm performance (Christiansen et al., 2016;Conyon & He, 2017;Moreno-Go´mez et al., 2018;Noland et al., 2016), whereas others have maintained that women-run companies perform worse (Lim et al., 2019;Singhathep & Pholphirul, 2015).Others have claimed that a manager's gender has no significant effect on firm performance (Singh et al., 2019;Unite et al., 2019;Vu et al., 2019).Recently, some studies have explicitly challenged the FUH.Watson (2020) have indicated that the low performance of companies managed by women is a myth.Ho et al. (2015) have argued that female leaders' contributions to firm performance cannot be measured solely in economic terms.Academic interest in women who are successful in business leadership positions has grown slowly; thus, some studies have focused on these women and their abilities, instead of simply comparing the performance of different genders (C.Brush, 2019;Connell, 2019;Crittenden & Bliton, 2019).
In corporate governance, the representation of women in the board directory is of interest to policymakers.Some countries have introduced gender diversity requirements or quotas for boards of directors publicly traded companies (Terjesen et al., 2009).For example, in 2003, Norway adopted a gender quote requiring a 40% female board representation in publicly limited state-owned companies.In 2011, Italy implemented a directive that listed companies' boards should have at least 33% under-represented genders (ILO Bureau for Employers ' Activities, 2020).Most studies on board gender diversity have related to firm performance; only a few have looked beyond financial results and included the demographic, human capital, and social capital contributions of women (Bear et al., 2010;Bennouri et al., 2018;Kirsch, 2018).Studying gender diversity in boards of directors involves many ethical considerations related to female participation in different social environments (Ntim, 2015;Porcena et al., 2021).Similar to female management, research on the gender diversity of directive boards and its impact on firm performance has not been conclusive (Reddy & Jadhav, 2019).For example, Post and Byron (2015;p.1546)have found that ''female board representation is positively related to accounting returns and that this relationship is more positive in countries with stronger shareholder protections.'' Terjesen et al. (2016, p. 447) have suggested that ''external independent directors do not contribute to firm performance unless the board is gender-diversified.''Unite et al. (2019, p. 65) have found that female and male corporate leaders in Philippine firms have comparable competency levels and Iba ´n ˜ez et al.
that increasing the presence of women on corporate boards has no discernible effect on firm performance.
Considering that boards' decision-making processes include contributions from most board members (men and women), it is difficult to identify the specific contributions of female (or male) members and arrive at a deeper understanding of how that contribution influences firm performance (Strydom et al., 2017).Recently, Nekhili et al. (2018) have suggested that family firms have a significantly positive relationship between the appointment of a female chair and firm performance.In addition, a negative and significant relationship is observed between female chairs and return on assets in non-family firms.Gradually, this conversation has transitioned to researching the active role of women on boards, paying special attention to economies in which the continued dominance of patriarchy has delayed the participation of women at the highest levels of the corporate hierarchy (Farhan & Nayan, 2018;Green & Homroy, 2018).
Our research explores in depth the methodological issues applied to test the FUH in selected quantitative studies to answer the following research questions: RQ1.What methods and variables have been used to test FUH?RQ2.What conclusions do the results of previous quantitative studies offer regarding FUH? RQ3.Do the findings and conclusions obtained from the FUH hold when alternative methods and data are used?
The Roots of FUH: Foundations and Evidence FUH research has been conducted using several theoretical approaches.For example, Jayeola et al. (2020) have proposed the upper echelon theory as a framework to examine whether female-and male-led informal businesses differ in terms of performance.Lemma et al. (2023) have used liberal and social feminist theories to explain the performance gap between female-owned enterprises and their male-owned peers in Kenya and South Africa.Other perspectives include the social constructionist feminism theory (Justo et al., 2015), goal theory, theory of planned behavior, resource-based theory (Demartini, 2018;Watson et al., 2017), and entrepreneurship theory (Crane, 2022).The most commonly used theories are liberal and social feminist theories (Gottschalk & Niefert, 2013;Justo et al., 2015;Lemma et al., 2023;Westhead, 2003;Zolin et al., 2013).The liberal feminist theory suggests that women lack access to relevant resources, such as education, business experience, or financial capital.The social feminist theory suggests that women have different attitudes and values and adopt different approaches to business (Arra´iz, 2018;Bardasi et al., 2011;Boateng, 2018).Most studies have focused on female-owned firms instead of women-led businesses; that is, the primary interest is the performance of womenowned start-ups.The liberal and social feminist theories are compatible because liberal feminism focuses on gender barriers, whereas social feminism focuses on female behavior.Both theories may be unified into the social role theory SRT), which considers two approaches to explaining gender roles in behavior and society.
A businessman's image is predominantly based on stereotypical notions of masculinity (Edelman et al., 2018).According to the SRT, gender differences and similarities in the behaviors of women and men are determined by gender roles (A.Eagly, 1987;A. H. Eagly & Mladinic, 1989;A. Eagly & Wood, 2016).Masculine gender roles include attributes, such as self-confidence, feeling superior, easily making decisions, being active, and being independent.Feminine gender roles include attributes like kindness, being helpful, being emotional, showing warmth to others, awareness of others' feelings, and gentleness (Abele, 2003;Spence & Helmreich, 1979).Previous research has demonstrated that feminine gender roles are strongly related to family roles, whereas masculine gender roles are associated with career success, business leadership, higher task orientation, and higher change potential (Abele, 2003;Kulich et al., 2018;Ramsey, 2017;Wille et al., 2018).These notions of expected female and male behaviors present the premise that women are less likely to succeed in the business world.For example, the traits desirable for a manager/ entrepreneur are autonomy, need for achievement, selfefficacy, and risk-taking propensity (Comeig & Lurbe, 2018;Edelman et al., 2018;Rucker et al., 2018).These traits are closer to male stereotypes than to female gender roles.Therefore, it is plausible that women are less capable of achieving better firm performance because of the lack of these desirable attributes.Moreover, a businessowner's decision to hire a manager depends on the firm's goals and organizational design.Tondji (2022) has suggested that strategic delegation to overconfident managers, who have decided to overinvest in R&D and produce more market output, induces higher profits and welfare under certain conditions.Overconfidence is a male attribute; therefore, owners may prefer hiring male managers when R&D technology is less productive, and the main goal is cost reduction.
Our methodological review provides extensive evidence for the development of quantitative research on the FUH.As noted, the empirical evidence on this topic is mixed, and we found several studies without a theoretical foundation.In a study on top managers' gender differences in the context of firm performance in the manufacturing sector in the Czech Republic, Egerova4 SAGE Open and Noskova´(2019) have found that enterprises with women in top management teams have shown better financial performance than companies without women in top management teams.Conversely, Kristanti and Iswandi (2019), in a study of corporate firms in Indonesia, have suggested that although the influence of gender diversity is not significant, there are differences in a company's performance, when led by either female or male CEOs.Singhathep and Pholphirul (2015) have demonstrated that female general managers negatively affect short-term financial performance, including sales and annual benefits, in manufacturing firms in Thailand.Following the literature, we enunciate the FUH in Watson's (2002, p.92) words.
H1. Female-controlled businesses (on average) will generate lower outputs measured in terms of Return on Equity (ROE), Return on Assets (ROA), and net income per employee than male-controlled businesses.

Methodology
A two-stage approach is adopted in this study.First, a methodological review of previous empirical studies on the relationship between female leadership and firm performance is conducted.This review aims to identify the quantitative models and variables used to test FUH in female-managed firms.In the second stage, we specify three econometric models for testing the FUH based on Chilean business data, using the findings from the methodological review.

Methodological Review
We conducted a methodological review and selected the most representative empirical research on the relationship between female managers and firm performance.Our method is similar to previous methodological reviews in the management literature (Ball & Foster, 1982;Bash et al., 2021).Figure 1 shows the flowchart of the research design.We used the Web of Science (WoS) and Scopus databases to search for articles using the keywords ''female underperformance hypothesis,'' ''female CEO performance,'' ''gender diversity board,'' and ''female entrepreneur performance.''We limit the results to publications from 2010 to the first half of 2020 because we detect increased research on this topic over these 10 years.Articles that did not contain keywords related to FUH during the first review were discarded.In the second review, we read the abstracts and selected studies with empirical analyses.We analyzed the introduction of each article in the third review stage.We thoroughly read each of the remaining articles to make the final selection for the fourth review.
We selected a final sample of 66 articles for quantitative analysis.Following an information-gathering formula, we grouped the quantitative elements of each article into the following categories: field of FUH, origin database, journal quartile classification, science category, sample selection criteria, methods, and main results.Among the 66 articles, we considered 26 that focused only on the relationship between female managers (e.g., CEO, CFO, TMT, or executive director) and firm performance.We obtained the variables and methods for testing the FUH from this process and used these outputs to select the methods and variables for empirical testing using our database.

Metrics in a FUH Empirical Research
A total of 66 quantitative articles in the WoS and Scopus databases published from 2010 to the first semester of 2020 were selected to test the FUH.We organized the papers' metrics by year, field (female managers, female owners, and board gender diversity), origin database, and journal quartile classification.The year with the highest number of publications was 2019 (11 from WoS and six from Scopus). Figure 2 shows the distribution of papers by year and the database of origin.
Tables 1 and 2 present information on the number of articles by the database of origin, journal quartile classification, scope of FUH, and sample type.Among the total articles, 24% corresponded to Scopus publications, and 76% were available in the WoS database.Most articles in Scopus were in the Q2 classification (41%), and more papers from WoS were in the Q1 classification (43%).Regarding article scope, 39% of the articles focused on the relationship between female managers and firm performance, 12% analyzed the effect of female ownership on firm performance, and 48% examined the influence of board gender diversity on firm financial outcomes.Regarding sample type, 68% of the articles used corporate firms, and 32% considered a mixed sample (corporate and non-corporate) from specific industries.
FUH research has been published in top journals, primarily in finance, management, business, economics, and entrepreneurship.Considering that FUH is a specialized field, we identified journals that have published two or more papers related to FUH across all themes (female manager, female owner, and board gender diversity).From Table 3, we can see that the Journal of Business Ethics has the majority of published articles (five papers).

Methods and Measures in a FUH Empirical Research
Our interest is to test FUH in female-managed businesses; hereafter, our analysis excludes FUH from the female businessowner and board gender diversity Iba ´n ˜ez et al.
context.Our focus on FUH in female managers is justified by a more comprehensive analysis of female leaders' decision-making, which is unclear in the other two contexts (Strydom et al., 2017).Therefore, we identify 26 articles (out of 66 original) in which the authors tested the relationship between female managers and firm performance.Table 4 presents the 26 papers, indicating the author names, publication year, sample type, geographic context, methods, and main results.
The studies listed in Table 4 have different motivations for using specific methods.For example, Kolev (2012) has considered GLS to complement linear regression in studying CEOs' gender differences in corporate firm performance.This is because regression is blind to the fact that we might have one firm led by a female CEO in one period and 100 firms led by female CEOs in another period.Strøm et al. (2014) have built upon the Heckman (1978) endogenous dummy variable model and followed the two-step procedure in estimating managers' gender differences in the financial performance of microfinance institutions from different countries.Recently, to reduce selection bias because female      Iba ´n ˜ez et al.However, 23% of these studies have not mentioned the reasons for their choice of method.The main results of FUH studies reveal that only 23% of the findings support the claim that firms with female managers have lower performance than male-managed businesses.A positive relationship between female managers and firm performance has been observed in 42% of studies.Finally, 35% of researchers have found mixed or non-significant results; that is, female-managed firms have better or lower performance than male-managed businesses depending on specific conditions, or insignificant relationships have been observed.The listed studies significantly differed in terms of the number of observations and database construction.Half of the studies used a sample of publicly listed corporations; these firms represent a small portion of enterprises in the business ecosystem.For example, Chile's Ministry of Economy has estimated that listed companies represent 2.5% of all  Chilean companies (INE, 2017).A similar phenomenon occurs in Spain, where less than 2% of Spanish companies are listed on stock exchanges (INE, 2022).Firms that do not trade publicly are very heterogeneous and subsequently better at embodying a variety of potential organizational forms and leadership styles (Mangematin et al., 2003).The lack of research in Latin American countries is noteworthy in the geographical context of testing FUH in female managers.Ordinary least-squares regression (OLS) was the most frequently used methodology (35%).Only 23% of the studies used adequate methods to address endogeneity.A total of 10 studies used panel data and specified panel regression models with fixed effects or fixed plus random effects; only two of the studies used instrumental variables as endogenous regressors.The most common method for face endogeneity was instrumental variables (19%), specifically the two-stage least squares model (2SLS).One study specified a Propensity Score Matching model (PSM), taking gender as a selection variable (treatment) to estimate the differences in the performance of female-and malemanaged firms.The most commonly used performance variables were ROE and ROA (65%); in four studies, the authors used an employee-based variable as a firm performance proxy (e.g., profit per employee, employee productivity, and value-added per employee).The main control variables were firm size and age, exports, family ownership, market share, and capital intensity.In all cases, the independent variable was the firm's manager's gender.

Testing Female Underperformance Hypothesis
Using the information collected from the methodological review, we selected three econometric methods, three dependent variables, and seven independent variables (including gender) from previous research on FUH in female-managed firms.We used data from the Fifth Longitudinal Survey of Firms (ELE5for its acronym in Spanish).The last version, from 2017, contains information on 6,480 Chilean firms with different sizes and industries.We selected 2,323 companies with complete observations for each variable.Most firms do not trade publicly and are representative of the Chilean business ecosystem.We analyzed the data using the following three econometric models: OLS, 2SLS, and Matching Estimators.The criteria for including OLS and 2SLS are their high frequency of usage, and Matching Estimators are selected because this is a modern approach for solving endogeneity bias (see Diwisch et al., 2009;Simonsen & Skipper, 2006).The variables considered are those most commonly used in the literature on FUH in the context of female managers.The three dependent variables, ROE, ROA, and net income per employee (NIE), are proxies for the employee-based variable of firm performance.The principal explanatory variable is the gender of the business manager, and the six control variables are average wages, capital intensity, whether the company exports, firm size and age, and market share.
The dependent variables for the three models were ROE, ROA, and NIE.We obtained ROE by dividing net income by total equity, ROA was calculated by dividing net income by total assets, and NIE by dividing net income by the total number of employees and applying a natural logarithm.The main explanatory variable (treatment) is female manager, a dummy variable that takes the value of one for companies with a female manager and zero otherwise.The control variables are average wages, obtained by dividing the firms' total paid salaries by the total number of employees in the natural logarithm; capital intensity, measured by dividing the total net assets by the total number of employees in the natural logarithm; exports is a dummy variable that takes the value one for firms that export and zero otherwise; size corresponds to the total number of workers in the company's natural logarithm; age is the number of years since the firm's founding, and market share is obtained by dividing each firm's net income by the total income of all firms in the sample.The data for each variable are annual cross-sections at the end of 2017.
OLS is the reference model, that is, the specification without considering endogeneity bias.We use 2SLS to obtain a consistent estimator of endogeneity (Maydeu-Olivares et al., 2019;Schmidt, 2020;Wooldridge, 2010).Endogenous explanatory variables can be instrumental based on the values these variables took in past periods (D.Chen et al., 2021;Hall, 1988;Stock & Watson, 2012;Y. Wang & Bellemare, 2019;Yogo, 2004).Therefore, we obtain instruments for the proposed endogenous variables (average wage, capital intensity, and firm size) by observing these variables for the same firms in the Fourth Longitudinal Survey of Firms (ELE4for its acronym in Spanish) from 2015 (INE, 2015(INE, , 2017)).
Non-parametric matching estimators are frequently used in impact evaluation studies (Abadie & Cattaneo, 2018;Clarke et al., 2019).The general idea behind this methodology is to determine the impact of treatment on outcomes using information on the treatment group and subjects similar to those in the treatment group who did not receive treatment.Using this information, we can construct a counterfactual for non-treatment (Lei & Cande`s, 2021;Vinha, 2006).We followed the following three approaches: nearest-neighbor matching with one neighbor, nearest-neighbor matching with five neighbors, and PSM.This approach allowed us to compare firms that are as similar as possible and whose main difference is the manager's gender, considering the same control variables proposed for the OLS and 2SLS models.We report the average treatment effect on the general population (ATE) and the average treatment effect on the Iba ´n ˜ez et al. treated population (ATET).Overall, the ATE estimator is more rigorous than the ATET because the assumptions for ATET are less restrictive, and the standard error of the estimated ATET is generally larger than the standard error of the estimated ATE (Abadie et al., 2004;Imbens, 2004;Wooldridge, 2020).

Strategy to Approach Endogeneity
''Technically, endogeneity occurs when a predictor variable (x) in a regression model correlated with the error term (e) in the model (Lynch & Brown, 2011, p. 112).''Concerns about endogeneity bias have recently increased because this specification problem is frequently encountered in econometric models (Cameron & Trivedi, 2005;Davidson & MacKinnon, 1993;Wooldridge, 2010Wooldridge, , 2020)).Moreover, models that use instrumental variables to deal with endogeneity bias have been refined to achieve better efficiency in econometric model estimation.The decision to consider an endogenous variable depends on the researcher and their knowledge of the theory underlying the research problem (Hamilton & Nickerson, 2003;Nakamura & Nakamura, 1998).
Suppose there is suspicion that any of the variables included in the model are endogenous.In this case, a variety of statistics allow us to test for an endogeneity specification problem.Our strategy for identifying the presence of endogenous variables includes a set of statistics that test for endogeneity, the weak-instrument problem, and overidentifying restrictions.If an endogeneity problem is not present in the model, the OLS estimator is more efficient than the model, including instrumental variables (Anatolyev & Skolkova, 2019).Therefore, it is essential to compare both estimators to evaluate which is more efficient (OLS vs. 2SLS).In this study, we use Hausman's specification test to identify whether there is a systematic difference in the estimates; that is, the null hypothesis that the 2SLS estimators are indeed efficient (and consistent) estimators of the true parameters (Hausman, 1978).
Considering the evidence in this methodological review, we test endogeneity based on suggestions from previous studies and apply adequate statistics to assess this issue.We use the Durbin and Wu-Hausman statistics to test whether the suggested endogenous variables are exogenous.Both statistics test the null hypothesis that the proposed endogenous variables could be treated as exogenous.If both tests are significant, we can reject the null hypothesis of exogeneity and treat the variables under consideration as endogenous (Durbin, 1954;Hausman, 1978;Wu, 1974).Implementing a 2SLS estimator approach to deal with endogeneity bias is crucial for finding valid instruments; that is, variables that are sufficiently correlated with the included endogenous regressors but uncorrelated with the error term (Bound et al., 1995).Accordingly, we follow the suggestions of Hall (1988), Stock andWatson (2012), andYogo (2004) to use the endogenous variables lagged by one period as instruments.
We use the Anderson-Rubin test to test the hypothesis that the coefficients of the endogenous regressors in the structural equation are jointly equal to zero and that the overidentifying restrictions are valid (Anderson & Rubin, 1949;Baum et al., 2010).If the Anderson-Rubin test is significant, the null hypothesis of weak instruments can be rejected.We also report an F-version of the Cragg-Donald Wald statistic to test the hypothesis of weak instruments.If the Cragg-Donald test is significant, we can reject the null hypothesis of weak instruments (Cragg & Donald, 1993).
We report Sargan's overidentification test of all instruments; however, our 2SLS equation is exactly identified; that is, we have the same number of instruments as the endogenous variables (Sargan, 1958(Sargan, , 1988)).Similarly, we consider Anderson's canonical correlation test to assess whether the instruments are correlated with the endogenous regressors; that is, if it satisfies the rank condition that the correlation or covariance between endogenous regressors and instruments is nonzero.If Anderson's canonical correlation test is significant, we reject the null hypothesis of under-identification (Anderson, 1984;Baum et al., 2010).
The matching estimators ''address the issue of selfselection bias and allow for a decomposition of treatment effects on outcomes (Titus, 2007, p. 487).''The FCEO variable may induce an endogeneity problem rooted in selection bias (self-selection) when estimating the differences in the performance of female-and male-led firms.Matching estimators (nearest-neighbor matching and PSM) deal with endogeneity bias because they allow treatment effects to be estimated by matching firms based on their similarities (Abadie et al., 2004;Abadie & Imbens, 2016).Therefore, the matching estimators isolate the effect of the third variables on the treatment effects.However, a simple matching estimator is biased when matching is not exact in finite samples (Abadie & Imbens, 2006, 2011).To reduce this bias, we include a bias adjustment based on the following covariates: average wages, capital intensity, exports, size, age, and market share.We follow the procedure proposed by Abadie et al. (2004) to estimate a biascorrected matching estimator that adjusts the difference within the matches for differences in their covariate values.We use the Hausman specification test to determine which matching method generates consistent and efficient estimators using the nearest neighbor, with one neighbor as the reference model.

Results
Table 5 presents the overall sample descriptive statistics for each category of manager gender (men and women).Women run fewer than 20% of the firms in our sample.
The statistics by gender show that women-run firms have lower results for each of the result variables, and the remaining variables mirror this trend.Notable differences are observed in the firm variables of capital intensity and size.Women-led firms are smaller and less capital-intensive; that is, they use more work than capital in their operations (Recio, 1997).Low capital intensity could be motivated by the attitude of female management toward risk, with women tending to be more conservative in their decisions to invest capital (Loukil & Yousfi, 2016).Moreover, women-led businesses are concentrated in small and medium-size sectors (SMEs); therefore, women-led firms have fewer resources to invest in innovation, infrastructure, and business growth strategies (Guerrero et al., 2020;Iba´n˜ez et al., 2020).
We use an OLS regression with robust errors for each of the performance variables (ROE, ROA, and NIE) as a reference model.The results are summarized in Table 6.Concerning the gender variable, in the OLS regressions, we observed that a female manager's presence negatively and significantly affected the NIE variable.The relationship between gender and ROA/ROE was non-significant.We implemented an instrumental variable model, estimated using a two-stage least squares model, to deal with the supposed endogeneity problem.If the variables proposed as endogenous are exogenous, our reference model (OLS) produces estimators more efficiently than an alternative model that considers instrumental variables.
Tables 7 and 8 report the 2SLS results (first and second stages), including instrumental variables for dealing with endogeneity bias.Compared with the OLS model, the coefficients of the gender variable in 2SLS are nonsignificant for the three dependent variables (at the 0.05 significance level).Based on the results of the first-stage regressions, we reject the hypothesis that the matrix of reduced form coefficients has rank=K-1 (underidentified).Therefore, the Anderson canonical correlation test is highly significant, meaning that the instruments are sufficiently correlated with the endogenous variables and the rank condition is satisfied.The Cragg-Donald Wald  Iba ´n ˜ez et al.
statistic is significant (at the 0.05 significance level); therefore, we can reject the hypothesis of weak-instruments.These first-stage results are identical across the three proposed models (ROE, ROA, and NIE) because we use the same variables in all specifications.Considering that we included the same number of instruments for the endogenous regressor, the Sargan test of overidentification indicates that the equation is exactly identified for the three proposed models (ROE, ROA, and NIE).The Anderson-Rubin test of joint significance of the endogenous regressors was significant only in the model with NIE as the dependent variable.Therefore, the endogenous regressors (jointly) do not significantly differ from zero in models with ROE-and ROA-dependent variables.Similarly, the Durbin score and Wu-Hausman test are significant only in the NIE model.Therefore, the proposed endogenous variables should be treated as exogenous in the ROE and ROA models.We compared the OLS and 2SLS estimators using the Hausman specification test.Consistent with our results given the absence of endogeneity in ROE and ROA models, we cannot reject the hypothesis that the OLS estimators are efficient (and consistent) estimators of the true parameters (ROE: x 2 = 2.76, r ..05;ROA: x 2 = 0.42, r ..05).Therefore, a comparison of the two estimators does not suggest substantial differences.In the NIE model, we reject the hypothesis of differences between OLS and 2SLS estimators; thus, the 2SLS estimators are efficient (and consistent) estimators of the true parameters (NIE: x 2 = 12.69, r \ .05).
Briefly, a manager's gender does not influence firm performance because the relationship between female CEO and ROE/ROA is insignificant when considering the OLS and 2SLS estimators.Moreover, femalemanaged firms do not underperform compared to malemanaged firms when estimations are treated for endogeneity bias in the NIE model.
The PSM and nearest-neighbor (with one and five neighbors) methods are used to estimate the difference in firm performance between female-and male-managed firms (Table 9).Non-significant differences are observed between the ROE of firms run by women-and malemanagers, considering the ATE and ATET estimators of the three proposed models.The same results are observed for the ROA-dependent variable models.We find differences between the performance measures of NIE in the three models.When the PSM ATE estimator is used, female-managed firms show a significantly lower performance (NIE) than male-managed firms.However, when the nearest-neighbor with one-neighbor ATE and ATET estimators are considered, we do not find significant differences in NIE between female-and men-led businesses.Moreover, the PSM and nearest-neighbor with oneneighbor ATET estimators show non-significant

SD
Coef. differences between female-and male-led firms.Femalemanaged firms show significantly lower performance (NIE) than male-managed firms when the nearestneighbor with five-neighbor ATET estimator is used.We implement the Hausman specification test using nearest-neighbor matching with one neighbor as the consistent estimator (reference model) to determine the best model for estimating the differences between female-and male-led business performances (Table 10).We test the null hypothesis that the difference in the coefficients is not systematic.We reject this hypothesis in all comparisons.The nearest-neighbors with five neighbors and PSM estimators are less efficient than nearest-neighbors with one neighbor estimator.Thus, the nearest neighbor with a one-neighbor estimator is an efficient and consistent estimator of the true parameters.Therefore, female-managed firms do not underperform compared to male-managed firms.

Discussion
This research had the following two objectives: (1) to identify the empirical methods and most frequent variables in the literature to test the relationship between manager gender and firm performance and (2) to apply selected methods and variables for testing FUH using a Chilean business sample.We conducted a methodological review to assess and select empirical studies using  Iba ´n ˜ez et al.
quantitative methods.We then conducted an empirical analysis to test the FUH for the following three models: OLS, nearest-neighbor matching, and PSM.We identified three dependent variables, the explanatory variable of manager gender, and the six control variables most commonly used in the literature.Additionally, our review revealed a significant gap in the research on female leadership in the Latin American context.We examined a sample that was more representative of the business ecosystem than the samples used in previous studies.This study shows no significant relationship between firms' financial performance and female business leadership when the endogeneity bias is resolved.
Conversely, other empirical research on FUH shows mixed results; several studies support the idea that a business run by women has lower financial performance than a firm managed by men (e.g., Lim et al., 2019;Singhathep & Pholphirul, 2015).Other studies have demonstrated a positive relationship between female managers and firm performance (e.g., Conyon & He, 2017;Moreno-Go´mez & Calleja-Blanco, 2018).Additionally, some authors have argued that the manager's gender has no significant effect on firm performance or that this relationship depends on specific conditions (e.g., Singh et al., 2019;Unite et al., 2019).
There are several explanations for the variety of conclusions drawn from FUH empirical research, principally concerning methodological issues.Several studies have limited their samples to firms that trade in stock markets.However, this is a small portion of the business ecosystem and does not embody a variety of organizational forms and leadership structures (see: Valls & Cruz, 2019;Vu et al., 2019).Additionally, our methodological review showed that several studies did not correct for endogeneity or other specification problems.From the methodological review, we identified only 23% of the articles that recognized endogeneity bias and adopted adequate techniques to test for FUH in female management.Studies that have implemented models with endogeneity bias solutions have shown no consensus on the effects of female managers on firm performance, that is, whether they have found positive, negative, or non-significant relations.
Economic models frequently have an endogeneity bias, in which the model's explanatory variables are strongly related to the outcome variables (Franzese, 2009;Nakamura & Nakamura, 1998).The instrumental variables approach (2SLS) is the most commonly used method for controlling endogeneity; however, obtaining valid instruments is difficult.An alternative solution is to instrumentalize the endogenous variables using lagged variables.Although this technique is not exempt from criticism, it is the most commonly used approach when better instruments are not available (see : Hall, 1988;Yogo, 2004).Another suitable method for addressing endogeneity is matching estimators.An important question to consider when comparing two groups (women vs. men) is how to differentiate between companies with specific characteristics.There is reason to believe that women-run firms differ from male-run firms in both observable and non-observable ways.In this case, nonparametric matching estimators (such as PSM) are helpful in comparing firms that are as similar as possible and whose main difference is rooted in the gender of their managers (e.g., Diwisch et al., 2009;Simonsen & Skipper, 2006).Previous research findings are inconclusive regarding the relationship between female management and firm performance (Brahma et al., 2021;Watson, 2020).According to Watson (2020), FUH is a myth.Our findings contribute to the theoretical and empirical confirmation of this claim, at least in relation to female management.We do not find relevant empirical evidence to support FUH when endogeneity bias is solved and a sample that is more representative of the business ecosystem is used.
We propose a research agenda that focuses on the positive role of women in business, instead of the assumption that female leaders negatively affect firm performance.This new direction creates exciting opportunities for future research, as we leave behind the study of FUH.From a career development perspective, in which individuals try to climb the corporate ladder, it would be interesting to better understand how women perceive professional goals and the extent to which those goals represent their professional and personal expectations.Considering the differences in cultural contexts across countries, future research can explore how cultural stereotypes of masculinity might affect women's professional expectancies and ways to assess female managers' performance.This is an opportunity to conduct research in the Latin American context and emerging economies.
The extent to which female directors have decisionmaking autonomy is also a potential area of research.The mere presence of women on boards does not guarantee that they have the power to make decisions within the board; this is a weakness in policy design that imposes gender quotas on directive boards.Women's participation in board decision-making depends on the social interactions among board members and how women negotiate, acquire, and exercise power.Qualitative studies can provide new insights into the intricate dynamics of female leadership.

Conclusions
Our findings highlight the importance of using adequate econometric methods to establish the reliability and validity of the interpretations arising from quantitative analyses.Using an OLS model, we find that female managers have a significantly negative effect on NIE, but this result is not retained when 2SLS or matching estimators are implemented.We contribute to the research on female leadership in business by highlighting the methodological issues that must be considered in empirical studies.Additionally, our review reveals a significant gap in Latin American empirical research on female-led business performance and overall management, as pointed out earlier by many authors (Aguinis et al., 2020;Fritz & Silva, 2018;Nicholls-Nixon et al., 2011;Perez-Batres et al., 2012).Thus, we provide new evidence for femalemanaged firm performance in the Latin American context.
This study had a few limitations.These include the lack of longitudinal data, the absence of variables representing the professional capacities of managers, and the lack of structural aspects, such as business competency and dynamics.Business results have an intertemporal characteristic and probably do not match short-term decisions.Instead, they result from strategies that combine multi-dimensional objectives over different periods.Therefore, levels of education and managers' trajectories can influence their ability to make decisions conducive to specific financial results (G.Wang et al., 2018).The composition of the industry, the intensity of competition, and the firms' position in the business context also determine firm performance.Finally, the benefits of gender diversity and female participation in leadership positions should not be evaluated solely based on an economic perspective (Hossain et al., 2017;McGuinness et al., 2017).Instead, we must ask how diversifying human capital, such as gender, racial-ethnic identification, and disability status, can affect business outcomes beyond financial indicators and profits.

Implications
This study had several theoretical, methodological, and practical implications.We show the roots and evidence of FUH testing based on previous research analyses, considering the most frequently used approaches.Theoretically, our results imply that the relationship between manager gender and firm performance is a context-based phenomenon because it depends on several factors, which may explain why we observe different results in different countries.From a methodological perspective, we expose the relevance of a well-specified model to test the relationship between manager gender and firm performance because we demonstrate the differences in the consistency and efficiency of estimators under various methods.Therefore, we recommend verifying whether an endogeneity bias exists and the quality and adequacy of the sample before running the model.
FUH is also present in shared knowledge; in practice, people tend to believe that women are less capable of successfully running a business.This notion is rooted in social gender hierarchies that may differ across countries.Therefore, public policymakers should work to reduce gender gaps in business and other social spheres, emphasizing the legitimization of women in top leadership positions.Considering that the negative relationship between female managers and firm performance is not always true, practitioners may consider hiring more women in top management positions, given the other benefits that women's leadership may have in overall firm performance and not only in the financial results.

Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Table 1 .
Article Classification by Metrics and Source Per Year.

Table 2 .
Article Classification by Field and Sample Type Per Year.

Table 3 .
Journals With Two or More Articles in FUH.

Table 4 .
Articles That Tested FUH in the Female Manager (CEO, CFO, or TMT) Context.

Table 9 .
Estimation of Differences in Results by the Matching Methods.