On the Nuisance of Control Variables in Causal Regression Analysis

Control variables are included in regression analyses to estimate the causal eﬀect of a treatment on an outcome. In this paper, we argue that the estimated eﬀect sizes of controls are unlikely to have a causal interpretation themselves, though. This is because even valid controls are possibly endogenous and represent a combination of several diﬀerent causal mechanisms operating jointly on the outcome, which is hard to interpret theoretically. Therefore, we recommend refraining from interpreting marginal eﬀects of controls and focusing on the main variables of interest, for which a plausible identiﬁcation argument can be established. To prevent erroneous managerial or policy implications, coeﬃcients of control variables should be clearly marked as not having a causal interpretation or omitted from regression tables altogether. Moreover, we advise against using control variable estimates for subsequent theory building and meta-analyses.


On the Nuisance of Control Variables in Causal Regression Analysis
Multivariate regression is an important tool for empirical research in organization studies, management, and economics.Beyond settings in which regression analysis is used to statistically predict a left-hand side variable given a set of explanatory variables, the main purposes of these methods is to control for confounding influence factors between a treatment and an outcome in order to obtain consistent causal effect estimates.† However, in practice scholars often overstate the role of control variables in regressions.In this paper we argue that, while essential for the identification of causal effects, control variables do not necessarily have a causal interpretation themselves.This is because even valid controls are often correlated with other unobserved factors, which render their marginal effects uninterpretable from a causal inference perspective ().Consequently, researchers need to be careful with attaching too much meaning to control variables and should consider to ignore them when interpreting the results of their analyses.
Drawing substantive conclusions from control variable estimates is common, however.Authors frequently make use of formulations such as: "control variables have expected signs" or "it is worth noting the coefficients ‡ of our control variables".Below, we present the results of a literature review of papers published in Organization Science and Strategic Management Journal between 2015 and 2020, in which we found that 47 percent of manuscripts using regression methods also explicitly discussed the estimated effect sizes of controls.This is in line with Carlson and Wu (2012), who identified 48 percent of papers published in the Academy of Management Journal, Journal of Applied Psychology and Strategic Management Journal in 2007 that interpreted and discussed the effects of controls.Moreover, in our own experience as authors of quantitative research papers, we † We refer to consistency in the statistical sense that the estimator converges in probability to the true underlying parameter (Wooldridge, 2002, Def. 3.8).‡ In linear regression models, marginal effects of explanatory variables are constant and can be represented by a single parameter.In the following we will therefore mainly refer to regression coefficients of control variables although our arguments also apply to marginal effect estimates obtained from nonlinear models.frequently encountered instances in which reviewers asked us to provide an interpretation of control variable coefficients.The justification that was often given was that, although they were not the main focus of the analysis, controls could still provide valuable information for other researchers in the field who are investigating related research questions.
The methodological literature in organizational research usually highlights that control variables should carry the same importance in an empirical analysis as the main independent variables of interest ().To increase rigor and improve transparency of published research articles, Becker (2005) recommends to report all regression coefficients of control variables as well as their significance levels.Similarly, Spector and Brannick (2011) advocate that controls should be given equal status to the main treatment variable in the analysis.Atinc et al. (2012) consider it to be best practice to provide an ex-ante prediction of the sign of the relationship between the controls and outcome variable based on theory, which should subsequently be checked against the empirical evidence.In a recent paper, Becker et al. (2015) provide a more cautionary recommendation regarding generalizing from control variable estimates if it involves out-of-sample extrapolation, but otherwise considers it to be appropriate.Overall, the general consensus in the organizational literature thus seems to be that interpreting control variable estimates is safe, as it adds to the body of cumulative evidence regarding a particular effect size.This paper builds on the graphical framework to causality ().Causal diagrams have already been established as a powerful tool for determining which control variables are relevant to a given regression model ().In addition, the approach offers a distinct perspective on the proper interpretation and communication of control variable results, which differs from prior methodological practices found in the organizational and management literature.In the following, we will explicate the view that control variables, while certainly an important ingredient in many causal research designs, do not have the same status as the main variables of interest in an empirical analysis.In particular, we will argue that in many situations valid controls can nonetheless be endogenous.Therefore, interpreting their estimated effect sizes in light of prior theory could lead to potentially misleading conclusions.A valid causal interpretation of control variables rests on strong assumptions and usually requires accounting for all influence factors of the outcome variable under study.Since this is unlikely to be fulfilled in many research contexts, we recommend authors to exercise caution when interpreting control variables and consider omitting estimated coefficients of control variables from regression tables, or relegating them to an appendix.Finally, we discuss what our recommendations imply for the practice of meta-analysis, which has recently gained traction in many fields including organizational research (Aguinis et al., 2010).

Do researchers attach substantive meaning to control variables?
To assess the degree to which researchers interpret control variable estimates in their studies, we conducted a review of all articles published in Organization Science and Strategic Management Journal between January 2015 and December 2020.We chose these two journals because of their high prestige in the management and organization field as well as their reputation for high-quality empirical research.Our sample includes all quantitative articles that employed parametric regression models such as OLS, logit, probit, Poisson, etc.This choice was made because effect sizes of control variables can usually not be summarized by a single coefficient (or marginal effect) in non-and semiparametric models.The use of such methods is rare in our sample anyway though.
We manually categorized papers according to whether they interpret or draw substantive insights from the coefficients or marginal effect estimates of control variables.
Examples of such an interpretation range from "the control variable CEO tenure is positively related to performance" to "the effect sizes of of control variables are in line with previous studies".The latter interpretation is thereby of relevance because authors of future research papers might be tempted to develop theory based on this seemingly accumulating empirical evidence.The result of our review shows that interpreting control variables was common practice in the analyzed journals during our period of observation.For the Strategic Management Journal, we identified a total of 497 quantitative research articles, of which 233 (47 percent) proceeded to interpret the effects of controls variables.For Organization Science, out of a total of 275 quantitative articles, 131 (47 percent) provided an interpretation of control variable estimates.Detailed results of the literature review and example quotes demonstrating the different ways in which authors interpret control variable estimates in practice are reported in the supplemental material to this paper.

The causal interpretation of control variables
The relationship between the main explanatory variables and controls in a regression can be complex, therefore it is beneficial to explicitly depict them in a causal diagram (Pearl, 2000).Durand and Vaara (2009) were the first to introduce causal graphs as a tool to management researchers.Cinelli et al. (2022) provide a useful overview of the different functions of control variables in regression analyses, explicitly leveraging the graphical framework.Here, we focus on the causal interpretation and reporting of control variable estimates, which has been a topic of ongoing debate in the organizational literature.
Figure 1a presents a simple model with a treatment variable X and an outcome variable Y .Both variables are connected by an arrow, denoting the direction of causal influence between them.In addition, there are two confounding variables, Z 1 and Z 2 , that are affecting the treatment and the outcome.Z 1 and Z 2 are correlated, as a result of a common influence factor they share, which is denoted by the dashed bidirected arc in the graph.The fact that Z 1 and Z 2 are correlated creates what is known as a backdoor path between the treatment and the outcome (Pearl, 2000).X and Y are not only connected by the direct causal path X → Y , but also by a second path, X ← Z 1 Z 2 → Y , which creates a spurious, non-causal correlation between them.
--Insert Figure 1 about here --Backdoor paths are defined as any sequence of arrows connecting the treatment and outcome variable (irrespective of their orientation) that remains if arrows emitted by the treatment are deleted from the graph (Pearl, 2000).Because of the latter requirement they are easy to spot in the causal diagram.Since all the arrows emitted by X are deleted, backdoor paths have to point into X instead; i.e., they enter "through the backdoor", which is where the name comes from.
Control variables in a multivariate regression model are invoked to block such backdoor paths and obtain a consistent estimate of the causal effect of X on Y , in which case one speaks of an effect to be causally identified.For this purpose, it is sufficient to control for any variable that lies on the open path.* Thus, in the example of Figure 1a, the researcher has the choice between either controlling for Z 1 or Z 2 , since both would allow to identify the causal effect of interest.The choice between different admissible sets of control variables is thereby of high practical relevance.Researchers often have fairly detailed knowledge about the treatment assignment mechanism Z 1 → X; e.g., because there are organizational or administrative rules that determine individual treatment status, which can be exploited for identification purposes ().At the same time, the set of variables Z 2 that are direct influence factors of Y will likely be large.Thus, in practical applications it might be much easier to control the treatment assignment mechanism instead of trying to include all variables that have an effect on the outcome in a regression.
--Insert Table 1 about here --Nevertheless, although controlling for Z 1 is sufficient to obtain a consistent estimate for X, its marginal effect will itself not correspond to any causal effect of Z 1 on Y .That is because Z 1 is correlated with Z 2 and will thus partially pick up an effect of Z 2 on Y too (Cinelli & Hazlett, 2020).To illustrate this phenomenon quantitatively, we parameterize * Technical note: Requiring the path to be previously unblocked rules out that the variable which is adjusted for is a collider (Hünermund & Bareinboim, 2023).A discussion of colliders and bad controls goes beyond the scope of this note ().
the causal graph in Figure 1a in the following way: with n = 10, 000, and U, ε i being standard normal.True effect sizes are set equal to one.
Note that U is assumed to be unobserved and appears in the functions assigning values to Z 1 and Z 2 .This creates an error correlation between the two variables.We then run a regression of Y on X and Z 1 , which gives a consistent coefficient estimate for X ( βX = 1.017), while the effect of Z 1 ( βZ 1 = 0.499) turns out to be biased.By contrast, if we also include Z 2 in the regression, the coefficient of Z 1 drops to zero ( βZ 1 = -0.019),which corresponds to its actual causal effect on Y in this example (since Z 1 does not appear in line 4 of eq. 1).Detailed simulation results together with associated standard errors (bootstrapped with 1,000 replications) are reported in Table 1.
Figures 1b and 1c highlight under which conditions effect estimates of control variables can be interpreted causally.In Figure 1b there are two backdoor paths: Both paths can be intercepted by Z 1 , which is thus a valid control variable.Data simulated according to the following system: with coefficients again set equal to one, confirm that the causal effect of X can be consistently estimated in a regression of Y on X and Z 1 ( βX = 0.993).However, once again the coefficient estimate for Z 1 ( βZ is a valid control variable in eq. 2, it is nonetheless endogenous (Frölich, 2008).Note that the unobserved variable U enters the combined error term ν = u + ε 3 in line 3 of eq. 2. At the same time, U is an argument of the function assigning values to Z 1 (line 1 of eq. 2), which lets Z 1 become correlated with the error term.§ This is different in Figure 1c.Here the two backdoor paths are X ← Z 1 → Y and When we simulate data according to: we now find that a regression of Y on X and Z 1 provides a consistent estimate for both X as well as for Z 1 ( βX = 1.001; βZ 1 = 1.004;Table 1, col. 5).In this situation, the regression coefficient for the control variable Z 1 has a causal interpretation.This is because, unlike in the previous situations, we are able to account for all influence factors of Y , apart from the exogenous error term ε 3 .In particular, there is no unobserved variable U jointly affecting the outcome Y and (at least one of) the regressors (X, Z 1 ) anymore.
Finally, Figure 1d depicts a more complex setting, with several admissible sets of controls, each sufficient to identify the causal effect of X on Y (Textor & Liśkiewicz, 2011).
One possibility in this situation is to simply control for Z 1 , which is the only direct influence factor of X, and thus blocks all paths entering X through the backdoor.To § For ease of exposition, all coefficients in our simulations are set equal to one.However, it is easy to construct examples where the sign of the control variable Z 1 changes; e.g., by setting the coefficient of U equal to −3 in line 3 of eq. 2, which results in βZ1 = −0.518(std.err.= 0.029).
witness, we simulate data from the system: and regress Y on X and Z 1 , which gives a consistent estimate for the effect of X ( βX = 0.991; Table 1, col. 6).Similarly, controlling for the direct influence factors of Y (Z 3 , Z 4 , and Z 5 ) also blocks all backdoor paths and leads to a consistent effect estimate for X ( βX = 1.006;Table 1, col. 7).A third alternative is to control for the entire set of covariates and Z 5 ) which also leads to a consistent estimate for X ( βX = 1.003;Table 1, col.8), although this would be the most data-intensive identification strategy leading to slightly less precise estimates compared to the previous specification, due to fewer degrees of freedom.This example illustrates that the minimally sufficient set of controls (here: Z 1 ) for identifying the causal effect of X is often much smaller than the total number of confounding variables in a model.At the same time, the estimated marginal effects for the control variables only have a causal interpretation if all the direct influence factors of Y (here: Z 3 , Z 4 , and Z 5 ) are accounted for in the regression.As we argued above, this is unlikely to be the case, since in many real-world settings the number of causal factors determining Y might be prohibitively large.¶

Examples
In the following section we present practical applications that illustrate our previous theoretical points.To start as simply as possible, we were looking for examples that employ standard regression models instead of more advanced empirical techniques.One challenge in this regard is that papers with simple OLS regressions often refrain from ¶ Alternatively, if omitted factors are unrelated to all other regressors, a causal interpretation of control variable coefficients would be possible too.This seems likewise implausible in many applications though.
making causal claims and instead resort to alternative formulations to describe effect sizes, such as "association", "pattern", or "link" (Hernán, 2018).However, a recent paper by Hoffman and Strezhnev (2023) provides a fitting example.The authors estimate the causal effect of longer travel time on the probability of a default judgement being made in eviction cases as a result of defendants not showing up or being late to court.Using OLS in a sample of more than 200,000 eviction proceedings in the city of Philadelphia between 2005 and 2021, they find that an increase of one hour in estimated travel time raises the likelihood of a default judgement by 3.8 to 8.6 percent.This effect is meaningful because defaults are difficult to reopen and tenants who fail to show up in court cannot benefit from "Civil Gideon" protections offered in major urban areas in the U.S.
In their models, Hoffman and Strezhnev control for neighborhood characteristics such as census tract income levels, as well as race and ethnicity.Interestingly, for our discussion, they also control for building characteristics and find a positive and statistically significant effect size of multi-unit apartment buildings (compared to row houses or single family dwelling) on the probability of default judgements.However, as the authors discuss in their paper, this relationship is unlikely to have a causal interpretation, since building characteristics might be correlated with other influence factors such as unfavorable terms in residential leases or the geographical distribution of dwellings within the city.We now turn to an example closer to the organizational research context.(1965)(1966)(1967)(1968)(1969)(1970)(1971)(1972)(1973)(1974)(1975).

Early research exposure and career choices
After a first screening round, applicants were invited to an interview on NIH campus to determine who would eventually be selected to participate in the program.
Selection criteria were related to applicants' prior research activities (which Azoulay et al. measure by their number of pre-ATP publications), their academic achievements (proxied by whether they were elected to the AΩA Honor Medical Society), experience (i.e., whether they held a Ph.D. at the time of application and the number of internships they had completed), and the reputation of the institutions where applicants received their training (measured by NIH grants for applicants' medical school and internship hospital).
Importantly, Azoulay et al. argue that although the pool of applicants to the ATP was indeed a highly selected group, selection at the (second) interview stage was based entirely on these observable characteristics.Applicants were early in their career and rather homogeneous in their characteristics.It was therefore hard to select them based on their future research potential beyond a few observable markers.This feature of the particular institutional setting allows Azoulay et al. to employ a selection-on-observables design.
Based on that they estimate that ATP participants were twice as likely to pursue a research-focused career later on compared to unsuccessful applicants.As a result, trainees accumulated more publications, citations and grant funding over their life-cycle.
Furthermore, they were significantly more likely to receive prestigious career awards, including the Nobel Prize, and to become elected members of the National Academy of Sciences.
--Insert Figure 2  prior research activities as well as future career choices.Thus, while prior research activity is a valid control for the effect of ATP participation, it is also endogenous, similar to the situation in Figure 1b.Consequently, even if we were to find a positive correlation between prior research activities and pursuing a research career (which is not reported in Azoulay et al.), it would be premature to conclude that, e.g., early publication success during medical school is a significant driver of subsequent career choices, since both of these variables are likely confounded by an applicant's overall ability.The research design only allows to draw policy conclusions for the treatment variable ATP participation.Researchers should therefore be careful not to overinterpret their empirical results, even if that promises to offer interesting additional perspectives on a given research topic.
For simplicity, we focus on career choice as the primary outcome and disregard several other dimensions of scientific career success that Azoulay et al. investigate.

Analyst coverage and innovation
In applied empirical research, it is not uncommon for estimated treatment effects to change significantly when more advanced identification strategies are employed compared to standard OLS regression.For example, Hopp et al. (2020) find that CEO appearance is no longer related to company performance once firm fixed effects are incorporated in the analysis.Using a discrete choice experiment, Mas and Pallais (2017) show that on average workers value flexible work arrangements much less than a simple compensating wage differentials regression would indicate.Furthermore, because of simultaneity bias, policing levels and crime rates are often positively related, while Mello (2019) demonstrates a negative causal effect of policing on crime, exploiting a natural experiment in a difference-and-difference design.
If subsequent research uses the same variables as controls, however, it becomes immediately clear that their estimated regression coefficients should not be interpreted in a causal way.One case in point comes from the literature on analyst coverage and innovation.He and Tian (2013) find a negative relationship between analyst coverage and patenting in a study of U.S. public firms from 1993 to 2005, using difference-and-differences and an instrumental variable approach.It has been theorized that this result arises because external stock market analysts following a firm often exert excessive pressure on executives, which can worsen managerial myopia and impede investment in long-term innovation projects.For these reasons, analyst coverage is used as a control variable in other studies of R&D activities in publicly listed firms.However, since it is not the main variable of interest in these studies, often less stringent identification strategies are employed, possibly leading to unexpected results.For example, C. Chen et al. (2016) and Huang et al. (2022) consistently find positive effects of analyst coverage in regressions with the natural logarithm of patents as the dependent variable, which seemingly contradicts He and Tian (2013).
Analyst coverage can be a valid control variable even though it is endogenous (if the underlying causal structure, e.g., corresponds to Figure 1b).Nevertheless, interpreting positive regression coefficients as evidence against He and Tian (2013) would be a mistake.
To use the analogy of Bayesian updating, control variable estimates from studies such as C. Chen et al. (2016) and Huang et al. (2022) should not alter the posterior probability of analyst coverage having a negative effect on innovation.Consequently, in another study by S.-S.Chen et al. (2021), in which analyst coverage is also included as a control variable in a patent count regression, the regression coefficients of the controls are not reported.This is in accordance with the recommendations that we will discuss in the following.

Discussion and Recommendations
Beyond pure prediction tasks, the purpose of regression analysis in organizational research is typically to build and test theories that explain the causal mechanisms underlying a studied phenomenon (Sutton & Staw, 1995).In this paper, we argued that attaching substantive meaning to the marginal effects of biased control variables is problematic, however, as researchers could develop false intuitions or draw erroneous managerial and policy conclusions.Therefore, we think it is advisable to not discuss the results obtained for control variables in quantitative papers, unless the researchers can be sure that they have accounted for all relevant influence factors of the outcome in a regression (all-causes regression).Since in many practical settings this is unlikely to be the case, we recommend to treat controls as nuisance parameters, which are included in the analysis for identification purposes (and discussed as such) but their effects are not interpreted ().This corresponds to the way control variables are treated by non-parametric matching estimators (Heckman et al., 1998) and modern machine learning techniques for high-dimensional settings ().These methods do not report estimation results related to controls, either because there would be simply too many covariates in the analysis (which is the primary use-case for machine learning) or marginal effects of control variables are not returned by the estimation protocol (as in the matching case).
Our recommendations thereby depart from prior literature insofar as control variable should not be promoted to have equal status with the other variables in the study (Spector & Brannick, 2011, p. 297).Research designs based on control variables are employed to estimate the causal effect of a treatment variable on an outcome.As such, the treatment variable cannot be endogenous, otherwise estimates would be biased and other, more suitable research designs (such as instrumental variables, regression discontinuity designs, etc.) should be applied.By contrast, control variables can be endogenous (Frölich, 2008) and, as we argued in the preceding theoretical discussion, will likely be so in practice.* * Controls should be chosen to close all backdoor paths between a treatment and outcome, based on a theoretical model of the context under study (Bono & McNamara, 2011).As we have demonstrated previously, it is thereby not necessary to include all causal influence factors of the outcome variable in a regression.Our example (Azoulay et al., 2021) illustrates that in many cases it might actually be easier to control the treatment assignment mechanism instead, if institutional knowledge is richer about what determines treatment take-up compared to the potentially long list of variables that affect the outcome.Moreover, in many situations, researchers have the choice between different valid adjustment sets (see Figure 1d), which highlights their auxiliary nature for the analysis.
--Insert Table 2 about here --Since accounting for all influence factors of the outcome might be unrealistic in many contexts and control variables are therefore likely to be endogenous, interpreting their effect sizes in light of theory is potentially dangerous.Authors could infer wrong conclusions for managerial advice and subsequent studies might be inclined to build theory based on biased empirical results.To avoid this, we therefore recommend to refrain from interpreting control variables in published papers.Moreover, predicting the sign of control * * Similarly, the widespread notion that one endogenous variable in the regression would bias all other coefficients is also incomplete, as our simulation of the causal model in Figure 1b (Table 1, col. 4) demonstrates.
variable estimates ex-ante (Atinc et al., 2012) is difficult if endogenous control variables can pick up the effect of a multitude of other influence factors.Therefore, formulations such as "estimates of control variables have expected signs" should be avoided.As a "nudge" to stir the research community away from overinterpreting control variables, we find it appropriate that authors omit their coefficients entirely from regression tables or relegate them to an appendix.This corresponds to the way how estimation results are presented, e.g., in papers using nearest neighbor or propensity score matching.We acknowledge that omitting control variable coefficients in regression tables constitutes a trade-off with respect to transparency.
However, we believe that this suggestion is justified by the lower risk of drawing incorrect theoretical inferences from empirical studies.When authors have important reasons for reporting regression coefficients of control variables, the format of Table 1 constitutes a viable compromise, in our view.Compared to the standard format of regression table, Table 1 clearly separates controls from the treatment variable and includes a note that the control variable estimates should not be interpreted causally.
We emphasize that we agree with Becker (2005) in that control variables should be carefully discussed and authors need to justify their validity based on prior theory.However, their estimated coefficients are less relevant.It suffices to discuss the rationale for selecting specific control variables in the empirical section and to clearly indicate their inclusion in the table notes.Since the proper justification of a regression design can only come from theory, we caution against deciding about the inclusion of control variables based on their incremental contribution to the R 2 of the model (Carlson & Wu, 2012).This is the celebrated "no causes in, no causes out" principle (Cartwright, 1989), which states that the validity of causal inferences must ultimately be supported by theoretical considerations external to the data.For example, bad controls (as discussed in Cinelli et al. (2022)) often have a lot of predictive power but nonetheless lead to invalid causal inferences.Therefore, we also do not see a reason why authors should report models with and without control variables and compare the share of explained variance between them (as suggested, e.g., by Atinc et al. (2012) and Becker et al. (2015)).
Our recommendations are in line with Westreich and Greenland (2013) who discuss a similar problem with respect to the interpretation of potentially endogenous controls in epidemiology.Because epidemiological studies usually present the results of multivariate regression analyses right after a table with descriptive statistics of the data, they coined the term table 2 fallacy.Keele et al. (2020) discuss related examples from the field of political science.They emphasize that for estimates of control variables to be given a causal interpretation, their effects need to be themselves causally identified.Since this is only plausible if there are no omitted variables (or the controls are unrelated to the omitted variables), we recommend researchers to focus attention on one causal factor (or a small set) at a time, for which backdoor paths can realistically be enumerated, and treat control variables as nuisance parameters instead.
Finally, we caution against including estimates of potentially biased controls in meta-analyses (Aguinis et al., 2010).Such studies pool the effects of a focal variable on an outcome across several papers.According to Becker (2005), systematic reporting of control variables facilitates cumulative science and knowledge aggregation by significantly increasing the pool of studies from which effect sizes for meta-analyses can be drawn from: "Nonreporting of control variable findings hinder any meta-analyses that would have otherwise included the controls.For instance, in a study of the relationship between employee commitment and organizational citizenship behavior, a researcher might control for extraversion and agreeableness but not report the findings for the controls.As a result, later meta-analyses cannot include these findings in the assessment of connections between personality and organizational citizenship behavior."(Becker, 2005, p. 285) This recommendation refers to meta-analyses of partial correlations and marginal effects ("meta-regression"), which are increasingly common in organizational research and economics (Stanley & Doucouliagos, 2012).Compared to zero-order correlations, they have the advantage of being able to filter out other potential confounding influence factors in settings when randomized control trials are not feasible.† † However, the quoted passage fails to mention that control variables (here: extraversion and agreeableness) are unlikely to have a causal interpretation themselves and therefore add little to the evidence base regarding a certain effect size.As their coefficients may represent a combination of several different causal mechanisms jointly operating on the outcome (here: citizenship behavior), they do not provide accurate information about a theoretically meaningful quantity.
Moreover, coefficients can vary substantially depending on which admissible adjustment sets are used (e.g., compare columns 6-8 in Table 1).Consequently, meta-analyses should be restricted to the main treatment variable(s), for which a plausible identification argument can be established, which highlights once again the unequal status of treatment and control variables in regression analysis.
To conclude, there is no reason to be worried if the estimated coefficients of control variables do not have expected signs, since they are likely to be biased anyway in practical applications.Instead, researchers should rather focus on interpreting the marginal effects of the main variables of interest in their manuscripts.The estimation results obtained for controls, by contrast, have little substantive meaning and can therefore safely be omitted-or relegated to an appendix.This approach will not only prevent researchers from drawing wrong causal conclusions based on endogenous controls, but will furthermore allow to streamline the discussion sections of quantitative research papers and save on valuable manuscript space.

Figure 1a
Figure 1b Figure 1c Figure 1d (1) Treatment Variable: 1.011 1.009 (0.006) (0.008) Note: Simulation results (as discussed in the main text) using different backdoor-admissible adjustment sets for the causal models depicted in Figure 1a-d.Bootstrapped standard errors (with 1, 000 replications and n = 10, 000) in parentheses.True effect sizes for all variables equal to one.

Figure 1a
Figure 1b Figure 1c Figure 1d (1)     "Facility experience has a significantly negative influence on waste change (i.e., the longer a facility has been reporting on a particular chemical, the greater the improvement in operational performance).One year of facility experience improves operational performance by 1.49%; this translates to a real waste reduction of about 7,300 pounds.This finding is consistent with existing research that links experience and problem-solving outcomes positively while indicating that the facilities in the study have not reached diminishing marginal returns."(Berchicci, Dutt & Mitchell, 2019, p. 1041), Organization Science "we ran the regression only with the control variables and found that inventor productivity, tenure age, and firm concentration in the focal patent class tended to have statistically significant effects and expected signs.The positive effect of patenting productivity was consistent with results reported in prior work (Hoisl 2007, Palomeras andMelero 2010).The negative effect of tenure also was consistent, suggesting that more experienced employees tended to stay (Braguinsky et al. 2012).The positive effect of age was to be expected in our sample of employees with high human capital, as transitioning to entrepreneurship may require extensive prior work experience."(Gambardella, Ganco, and Honoré, 2015, p. 466), Organization Science "The sign and direction of control variables align with what has already been documented extensively (Clement, 1999;Clement, Hales, & Xue, 2011;Hong & Kubik, 2003;Hong, Kubik, & Solomon, 2000;Irvine, 2004;Loh & Mian, 2006)."(Uribe, 2020(Uribe, , p. 1920)), Strategic Management Journal Ex-post rationalizations based on theory "Of the control variables in Models ( 1)-( 3), we see that greater lagged count of posted reviews has a negative effect on dispensaries' focus on medical use in their identity statements.One possibility for this negative coefficient is that as a dispensary's customer base increases, it tends to move to broaden its appeal by decreasing its earlier medical-use orientation."(Hsu, Koçak, and Kovács, 2018, p. 184), Organization Science "Interestingly, we did not find a relationship between our dependent variable and the passage of a state RPS or state-level incentives.There are a variety of explanations for these results.First, while many states in our sample have passed an RPS, most of these policies were enacted on or after 2004, and have set goals for or after the year 2020.Since our data sample ends in 2009, it is likely that there is insufficient time lapse to observe substantial effects.In addition, provisions behind the PTC might prevent some projects from taking full advantage of state-level incentives and diminish their overall value (Wiser, Bolinger, and Gagliano, 2002)."(Pacheco & Dean, 2015, p. 1098), Strategic Management Journal "Model 1 is the baseline formulation with all the controls.Older banks were more likely to enter dormancy.Perhaps these banks have learned from their experience under previous detrimental policies that were later reversed.Strong recent performance under the detrimental public policy encourages a bank to enter dormancy.This is perhaps a factor of the overconfidence bred by recent good performance.Banks with a higher deposits-to-liability ratio did not prefer dormancy.This indicates that, as expected, a greater reliance on branch banking reduces a bank's capacity to enter dormancy."(Kozhikode, 2016, p. 199), Organization Science Azoulay et al. (2021) investigate the effect of early career exposure to frontier research on the career trajectory of potential innovators.Their specific empirical setting is the Associate Training Program (ATP) of the National Institutes of Health (NIH) in the U.S. The ATP was started in 1953 as a training program for recent MD graduates.Participants were sent to the NIH intramural campus in Bethesda, Maryland, to receive a two to three years research training under the supervision of NIH investigators.Since the NIH was originally established within the Marine Hospital Service, participation in the program fulfilled a draftee's military service obligation.Therefore, applying to the ATP became particularly popular among young physicians during the Vietnam War period

Figure 1 .Figure 2 .
Figure 1.Examples of causal diagrams with valid control variable Z 1 about here --Figure 2 synthesizes the assumptions leading to the empirical strategy in Azoulay et al. in form of a causal diagram.Controlling for applicants' prior research activities, academic achievements, experience and school reputation (the authors incorporate several covariates for each of these dimensions, including medical school and internship hospital fixed effects) is a valid backdoor adjustment set for estimating the causal effect of ATP participation on the choice of pursuing a research career in this graph.The analysis depends crucially on the assumption that the unobserved (latent) variable research potential does not directly affect program participation (ATP Participation Research Potential); i.e., interviewers are not able to select applicants based on private information.
(Austin & Stuart, 2015) inverse probability weighting estimator(Austin & Stuart, 2015).As such, covariates are only used to estimate the propensity score of receiving treatment and do not appear in an outcome regression.However, in this setting it would also not be advisable to interpret the effect of control variables such as prior research activities on career choice.The latent node research potential jointly affects an applicant's

Table 2 (
which is an adapted version of Table1) illustrates such a regression table format, in which check marks are included to indicate which variables were controlled for.

Table A1 .
Strategic Management Journal

Table A1 .
Strategic Management Journal

Table A1 .
Strategic Management Journal

Table A1 .
Strategic Management Journal

Table A1 .
Strategic Management Journal

Table A1 .
Strategic Management Journal

Table A1 .
Strategic Management Journal

Table A2 .
Organization Science

Table A2 .
Organization Science

Table A2 .
Organization Science

Table A2 .
Organization Science which gave the percentage change in the hazard associated with a one-unit increase in covariate.The results suggested that if the Number of competitors increased by one, then there was a 41% increase in the focal technology's hazard of technology exit."(Chen,Qian, and Narayanan, 2017, p. 2588), Strategic Management Journal "A number of results for the control variables are noteworthy.Business line diversification negatively affects the diversification and dissimilarity of technology-sourcing portfolios.The reason may be that business line diversification diverts financial and human resources from investments in technologysourcing vehicles.We note that the size of the top management team is also negatively related to both diversification and dissimilarity (although not significant for the latter)."(Lungeanu,SternandZajac,2016, p. 865), Strategic Management Journal "A few results from Model 1 are worth noting.First, the estimates of the controls appear aligned with our expectations: Population age exhibits a negative and significant effect on growth rates....The effects of MU density and MU density2 are consistent with existing evidence on the consequences of densitydependent legitimation and competition on organizational growth rates.C4 market share exhibits a positive and significant effect on MU growth-a result aligned with the current understanding of resource partitioning."(Liu&Wezel,  2015, p. 301), Organization Science