Abstract
An essential feature common to all empirical social research is variability across units of analysis. Individuals differ not only in background characteristics but also in how they respond to a particular treatment, intervention, or stimulation. Moreover, individuals may self-select into treatment on the basis of anticipated treatment effects. To study heterogeneous treatment effects in the presence of self-selection, Heckman and Vytlacil developed a structural approach that builds on the marginal treatment effect (MTE). In this article, we extend the MTE-based approach through a redefinition of MTE. Specifically, we redefine MTE as the expected treatment effect conditional on the propensity score (rather than all observed covariates) as well as a latent variable representing unobserved resistance to treatment. As with the original MTE, the new MTE also can be used as a building block for evaluating standard causal estimands. However, the weights associated with the new MTE are simpler, more intuitive, and easier to compute. Moreover, the new MTE is a bivariate function and thus is easier to visualize than the original MTE. Finally, the redefined MTE immediately reveals treatment-effect heterogeneity among individuals who are at the margin of treatment. As a result, it can be used to evaluate a wide range of policy changes with little analytical twist and design policy interventions that optimize the marginal benefits of treatment. We illustrate the proposed method by estimating heterogeneous economic returns to college with National Longitudinal Study of Youth 1979 data.
1. Introduction
An essential feature common to all empirical social research is variability across units of analysis. Individuals differ not only in background characteristics but also in how they respond to a particular treatment, intervention, or stimulation. In the language of causal inference, the second type of variability is called treatment-effect heterogeneity. Due to the ubiquity of treatment-effect heterogeneity, all statistical methods designed for drawing causal inferences can identify causal effects only at an aggregate level; they overlook within-group, individual-level heterogeneity (Holland 1986; Xie 2013). Moreover, when treatment effects vary systematically by treatment status, the average difference in outcome between the treated and untreated units is a biased estimate of the average treatment effect in the population (Winship and Morgan 1999).
Depending on data and assumptions about how individuals select into treatment, three major approaches have been proposed to studying heterogeneous treatment effects. First, we simply can include interaction terms between treatment status and a set of effect modifiers in a standard regression model. A drawback of this approach is that the results may be sensitive to the functional form specifying how treatment and covariates jointly influence the outcome of interest. Fortunately, recent developments in nonparametric modeling have allowed the idea to be implemented without strong functional form restrictions (e.g., Hill 2011). Second, recent sociological studies have focused on how treatment effect varies by the propensity score, that is, the probability of treatment given a set of observed covariates (e.g., Brand and Xie 2010; Xie, Brand, and Jann 2012). The methodological rationale for this approach is that under the assumption of ignorability, the interaction between treatment status and the propensity score captures all the treatment-effect heterogeneity that is consequential for selection bias (Rosenbaum and Rubin 1983). Treatment-effect heterogeneity along the propensity score also has profound policy implications. For instance, if the benefits of a job training program are greater among individuals who are more likely to enroll in the program, expanding the size of the program may reduce its average effectiveness.
The aforementioned two approaches for studying heterogeneous treatment effects both rely on the assumption of ignorability, that is, after controlling for a set of observed confounders, treatment status is independent of potential outcomes. This assumption is strong, unverifiable, and unlikely to be true in most observational studies. Two types of unobserved selection may invalidate the ignorability assumption. On the one hand, if treatment status is correlated with some fixed unobserved characteristics such that treated units would have different outcomes from untreated units even without treatment, traditional regression and matching methods would lead to biased estimates of average causal effects. This bias is usually called pretreatment heterogeneity bias or Type I selection bias (Xie et et al. 2012). As Breen, Choi, and Holm (2015) show, this type of selection easily could contaminate estimates of heterogeneous treatment effects by observed covariates or the propensity score. A variety of statistical and econometric methods, such as instrumental variables (IV), fixed-effects models, and regression discontinuity (RD) designs, have been developed to address pretreatment heterogeneity bias.
The second type of unobserved selection arises when treatment status is correlated with treatment effect in a way that is not captured by observed covariates. This is likely when individuals (or their agents) possess more knowledge than the researcher about their individual-specific gains (or losses) from treatment and act on it (Bjorklund and Moffitt 1987; Heckman and Vytlacil 2005; Roy 1951). The bias associated with this type of selection has been termed treatment-effect heterogeneity bias or Type II selection bias (Xie et al. 2012). For example, research considering heterogeneous returns to schooling has argued that college education is selective because it disproportionately attracts young persons who would gain more from attending college (e.g., Carneiro, Heckman, and Vytlacil 2011; Moffitt 2008; Willis and Rosen 1979). Similar patterns of self-selection have been observed in a variety of contexts, such as migration (Borjas 1987), secondary-school tracking (Gamoran and Mare 1989), career choice (Sakamoto and Chen 1991), and marriage dissolution (Smock, Manning, and Gupta 1999).
The third approach, developed by Heckman and Vytlacil (1999, 2001a, 2005, 2007b), accommodates both types of unobserved selection through the use of a latent index model for treatment assignment. Under this model, all the treatment-effect heterogeneity relevant for selection bias is captured in the marginal treatment effect (MTE), a function defined as the conditional expectation of treatment effect given observed covariates and a latent variable representing unobserved, individual-specific resistance to treatment. This approach has been called the MTE-based approach (Zhou and Xie 2016). As Heckman, Urzua, and Vytlacil (2006) show, a wide range of causal estimands, such as the average treatment effect (ATE) and the treatment effect of the treated (TT), can be expressed as weighted averages of MTE. Moreover, MTE can be used to evaluate average treatment effects among individuals at the margin of indifference to treatment, thus allowing researchers to assess the efficacy of marginal policy changes (Carneiro et al. 2010). For example, using data from the 1979 National Longitudinal Survey of Youth (NLSY79), Carneiro and colleagues (2011) found that if a policy change expanded each individual’s probability of attending college by the same proportion, the estimated return to one year of college education among marginal entrants to college would be only 1.5 percent, far lower than the estimated population average of 6.7 percent.
In the MTE framework, the latent index model ensures that all unobserved determinants of treatment status are summarized by a single latent variable and that the variation of treatment effect by this latent variable captures all the treatment-effect heterogeneity that may cause selection bias. Our basic intuition is that under this model, treatment-effect heterogeneity that is consequential for selection bias occurs only along two dimensions: (1) the observed probability of treatment (i.e., the propensity score) and (2) the latent variable for unobserved resistance to treatment. In other words, after unobserved selection is factored in through the latent variable, the propensity score is the only dimension along which treatment effect may be correlated with treatment status. Therefore, to identify population-level and subpopulation-level causal effects such as ATE and TT, it would be sufficient to model treatment effect as a bivariate function of the propensity score and the latent variable. In this article, we show that such a bivariate function is not only analytically sufficient but also essential to the evaluation of policy effects.
Specifically, we redefine MTE as the expected treatment effect conditional on the propensity score (rather than the entire vector of observed covariates) and the latent variable representing unobserved resistance to treatment. This redefinition offers a novel perspective to interpret and analyze MTE that supplements the current approach. First, although projected onto a unidimensional summary of covariates, the redefined MTE is sufficient to capture all the treatment-effect heterogeneity that is consequential for selection bias. Thus, as with the original MTE, it can be used as a building block for constructing standard causal estimands such as ATE and TT. The weights associated with the new MTE, however, are simpler, more intuitive, and easier to compute. Second, by discarding treatment effect variation that is orthogonal to the two-dimensional space spanned by the propensity score and the latent variable, the redefined MTE is a bivariate function, thus easier to visualize than the original MTE. Finally, and perhaps most importantly, the redefined MTE immediately reveals treatment-effect heterogeneity among individuals who are at the margin of treatment. It can thus be used to evaluate a wide range of policy effects with little analytical twist and design policy interventions that optimize the marginal benefits of treatment. To facilitate practice, we also provide an R package, localIV, for estimating the redefined MTE as well as the original MTE via local instrumental variables (Zhou 2019), which is available from the Comprehensive R Archive Network (CRAN).
This article is clearly not the first to characterize the problem of selection bias using the propensity score. Since the seminal work of Rosenbaum and Rubin (1983), propensity score–based methods, such as matching, weighting, and regression adjustment, have been a mainstay strategy for drawing causal inferences in the social sciences. In a series of articles, Heckman and colleagues established the key roles of the propensity score in a variety of econometric methods, including control functions, instrumental variables, and the MTE approach (Heckman 2010; Heckman and Hotz 1989; Heckman and Navarro-Lozano 2004; Heckman and Robb 1986).1 In the MTE approach, for example, incremental changes in the propensity score serve as “local instrumental variables” that identify the MTE at various values of the unobserved resistance to treatment. Moreover, the weights with which MTE can be aggregated up to standard causal estimands depend solely on the conditional distribution of the propensity score given covariates. We show that the propensity score offers not only a tool for identification but also a perspective from which we can better summarize, interpret, and analyze treatment-effect heterogeneity due to both observed and unobserved characteristics.
The rest of this article is organized as follows. Section 2 reviews the MTE-based approach for studying heterogeneous treatment effects. Specifically, we discuss the generalized Roy model for treatment selection, the definition and properties of MTE, and the estimation of MTE and related weights. Section 3 presents our new approach that builds on the redefinition of MTE. The redefined MTE enables us to directly examine the variation of ATE, TT, and policy-relevant causal effects across individuals with different values of the propensity score. In this framework, designing a policy intervention boils down to weighting individuals with different propensities of treatment. Section 4 illustrates our new approach by estimating heterogeneous economic returns to college with NLSY79 data. Section 5 discusses our conclusions.
2. The MTE-Based Approach: A Review
2.1. The Generalized Roy Model
The MTE approach builds on the generalized Roy model for discrete choices (Heckman and Vytlacil 2007a; Roy 1951). Consider two potential outcomes, and , a binary indicator for treatment status, and a vector of pretreatment covariates . denotes the potential outcome if the individual were treated (), and denotes the potential outcome if the individual were not treated (). We specify the outcome equations as
| (1) |
| (2) |
where , , the error term captures all unobserved factors that affect the baseline outcome (), and the error term captures all unobserved factors that affect the treatment effect (). In general, the error terms and need not be statistically independent of , although they have zero conditional means by construction. The observed outcome can be linked to the potential outcomes through the switching regression model (Quandt 1958, 1972):
| (3) |
Treatment assignment is represented by a latent index model. Let be a latent tendency for treatment, which depends on both observed () and unobserved () factors:
| (4) |
| (5) |
where is an unspecified function and is a latent random variable representing unobserved, individual-specific resistance to treatment, assumed to be continuous with a strictly increasing distribution function. The vector includes all the components of , but it also includes some instrumental variables (IV) that affect only the treatment status . The key assumptions associated with Equations 1, 2, 4, and 5 are
Assumption 1. are statistically independent of given (independence).
Assumption 2. is a nontrivial function of given (rank condition).
The latent index model characterized by Equations 4 and 5 combined with Assumptions 1 and 2 is equivalent to the Imbens-Angrist (Imbens and Angrist 1994) assumptions of independence and monotonicity for the interpretation of IV estimands as local average treatment effects (LATE; Vytlacil 2002). Given Assumptions 1 and 2, the latent resistance is allowed to be correlated with and in a general way. For example, research considering heterogeneous returns to schooling has argued that individuals may self-select into college on the basis of their anticipated gains. In this case, will be negatively correlated with because individuals with higher values of tend to have lower levels of unobserved resistance .2
2.2. Marginal Treatment Effects
To define the MTE, it is best to rewrite the treatment assignment Equations 4 and 5 as
| (6) |
where is the cumulative distribution function of given and denotes the propensity score given . is the quantile of given , which by definition follows a standard uniform distribution. From Equation 6, we can see that affects treatment status only through the propensity score .3
The MTE is defined as the expected treatment effect conditional on pretreatment covariates and the normalized latent variable :
| (7) |
Because is the quantile of , the variation of over values of reflects how treatment effect varies with different quantiles of the unobserved resistance to treatment. Alternatively, can be interpreted as the average treatment effect among individuals who are indifferent between treatment or not with covariates and the propensity score .
A wide range of causal estimands, such as ATE and TT, can be expressed as weighted averages of (Heckman et al. 2006). To obtain population-level causal effects, needs to be integrated twice, first over given and then over . The weights for integrating over are shown in Table 1. Note that these weights are conditional on . To estimate overall ATE, TT, and treatment effect of the untreated (TUT), we need to further integrate estimates of , , and against appropriate marginal distributions of .
|
Table 1. Weights for Constructing , , and from

The estimation of , however, is not straightforward because neither the counterfactual outcome nor the latent variable is observed. Moreover, the estimation of weights can be practically challenging (except for the ATE case) because it involves estimating the conditional density of given and the latter is usually a high-dimensional vector. We turn to these estimation issues now.
2.3. Estimation of MTE and Weights in Practice
Given Assumptions 1 and 2, can be nonparametrically identified using the method of local instrumental variables (LIV).4 To see how it works, let us first write out the expectation of the observed outcome given the covariates and the propensity score . According to Equation 3, we have
| (8) |
Taking the partial derivative of Equation 8 with respect to , we have
Because is a function of observed (or estimable) quantities, the previous equation means is identified as long as falls within ), the conditional support of given . In other words, is nonparametrically identified over , the support of the joint distribution of and .
In practice, however, it is difficult to condition on nonparametrically, especially when is high-dimensional. Therefore, in most empirical work using LIV, it is assumed that is jointly independent of (e.g., Carneiro et al. 2011; Carneiro and Lee 2009; Maestas, Mullen, and Strand 2013). Under this assumption, the MTE is additively separable in and :
| (9) |
The additive separability not only simplifies estimation, but it allows to be identified over (instead of )). The previous equation also suggests a necessary and sufficient condition for the MTE to be additively separable:
Assumption 3. does not depend on x (additive separability).
This assumption is implied by (but does not imply) the full independence between and (for a similar discussion, see Brinch, Mogstad, and Wiswall 2017).
In most applied work, the conditional means of and given are further specified as linear in parameters: and . Given the linear specification and Assumptions 1, 2, and 3, can be written as
| (10) |
where is an unknown function that can be estimated either parametrically or nonparametrically.5
First, in the special case where the error terms are assumed to be jointly normal with zero means and an unknown covariance matrix , the generalized Roy model characterized by Equations 1, 2, 4, and 5 is fully parameterized, and the unknown parameters can be jointly estimated via maximum likelihood.6 This model specification has a long history in econometrics and is usually called the “normal switching regression model” (Heckman 1978; for a review, see Winship and Mare 1992). With the joint normality assumption, Equation 9 reduces to
| (11) |
where is the covariance between and , is the standard deviation of , and denotes the inverse of the standard normal distribution function.7 By plugging in the maximum likelihood estimates (MLE) of , we obtain an estimate of for any combination of and .
The joint normality of error terms is a strong and restrictive assumption. When errors are not normally distributed, imposition of normality may lead to substantial bias in estimates of the model parameters (Arabmazar and Schmidt 1982). To avoid this problem, Heckman and colleagues (2006) proposed to fit Equation 10 semiparametrically using a double residual procedure (Robinson 1988). In this case, the estimation of can be summarized in four steps:
Estimate the propensity scores using a standard logit/probit model, denote them as .8
Fit local linear regressions of , , and on and extract their residuals , , and .
Fit a simple linear regression of on and (with no intercepts) to estimate the parametric part of Equation 10, that is, , and store the remaining variation of as .
Fit a local quadratic regression (Fan and Gijbels 1996) of on to estimate and its derivative .
The MTE is then estimated as
| (12) |
With estimates of , we still need appropriate weights to estimate aggregate causal effects such as ATE and TT. As shown in Table 1, most weights involve the conditional density of given . Because the latter is often a high-dimensional vector, direct estimation of these weights is challenging. In their empirical application, Carneiro and colleagues (2011) conditioned on an index of , , instead of . In other words, they used as an approximation to . To estimate the former, we can first estimate the bivariate density using kernel methods and then divide the estimated bivariate density by the marginal density . As we will see, these ad hoc methods for estimating weights are no longer needed with our new approach.
3. A Propensity Score Perspective
3.1. A Redefinition of MTE
Under the generalized Roy model, a single latent variable not only summarizes all unobserved determinants of treatment status but also captures all the treatment-effect heterogeneity by unobserved characteristics that may cause selection bias. In fact, the latent index structure implies that all the treatment-effect heterogeneity that is consequential for selection bias exists only along two dimensions: (1) the propensity score and (2) the latent variable representing unobserved resistance to treatment. This is directly reflected in Equation 6: a person is treated if and only if his or her propensity score exceeds his or her (realized) latent resistance . Therefore, given both and , treatment status is fixed (either 0 or 1) and thus independent of treatment effect:
This expression resembles the Rosenbaum and Rubin (1983) result on the sufficiency of the propensity score except that we now condition on in addition to . Thus, to characterize selection bias, it would be sufficient to model treatment effect as a bivariate function of the propensity score (rather than the entire vector of covariates) and the latent variable . We redefine MTE as the expected treatment effect given and :
| (13) |
Compared with the original MTE, has two immediate advantages. First, because it conditions on the propensity score rather than the whole vector of , it captures all the treatment-effect heterogeneity that is relevant for selection bias in a more parsimonious way. Second, by discarding treatment effect variation that is orthogonal to the two-dimensional space spanned by and , is a bivariate function and thus easier to visualize than .
As with , also can be used as a building block for constructing standard causal estimands such as ATE and TT. However, compared with the weights on , the weights on are simpler, more intuitive, and easier to compute. The weights for ATE, TT, and TUT are shown in the first three rows of Table 2. To construct , we simply integrate against the marginal distribution of —a standard uniform distribution. To construct , we integrate against the truncated distribution of given . Likewise, to construct , we integrate against the truncated distribution of given . To obtain population-level ATE, TT, and TUT, we further integrate , , and against appropriate marginal distributions of . For example, to construct TT, we integrate against the marginal distribution of the propensity score among treated units.
|
Table 2. Weights for Constructing ATE, TT, TUT, PRTE, and MPRTE from

In practice, can be estimated as a byproduct of . Under Assumptions 1, 2, and 3,9 can be written as
| (14) |
A proof of Equation 14 is given in Appendix A. Comparing Equation 14 with Equation 9, we see that the only difference between the original MTE and is that the first component of is now the conditional expectation of given the propensity score rather than . Therefore, to estimate , we need only one more step after implementing the procedures described in Section 2.3: fit a nonparametric curve of with respect to (e.g., using a local linear regression) and combine it with existing estimates of .
3.2. Policy-Relevant Causal Effects
The redefined MTE can be used not only to construct traditional causal estimands but also, in the context of program evaluation, to draw implications for how the program should be revised in the future. To predict the impact of an expansion (or contraction) in program participation, one needs to examine treatment effects for individuals who would be affected by such an expansion (or contraction). To formalize this idea, Heckman and Vytlacil (2001b, 2005) define the policy-relevant treatment effect (PRTE) as the mean effect of moving from a baseline policy to an alternative policy per net person shifted into treatment, that is,
They further show that under the generalized Roy model, the PRTE depends on a policy change only through its effects on the distribution of the propensity score . Specifically, conditional on , the PRTE can be written as a weighted average of , where the weights depend only on the distribution of before and after the policy change. Within this framework, Carneiro and colleagues (2010) further define the marginal policy-relevant treatment effect (MPRTE) as a directional limit of the PRTE as the alternative policy converges to the baseline policy. Denoting by , the cumulative distribution function of , they consider a set of alternative policies indexed by a scalar , : }, such that corresponds to the baseline policy. The MPRTE is defined as
We follow their approach to analyzing policy effects but without conditioning on . Whereas Carneiro and colleagues (2010) assume that the effects of all policy changes are through shifts in the conditional distribution of given , we focus on policy changes that shift the marginal distribution of directly. In other words, we consider policy interventions that incorporate individual-level treatment-effect heterogeneity by values of , whether their differences in are induced by their baseline characteristics or the instrumental variables . In Section 3.5, we compare these two approaches in more detail and discuss some major advantages of our new approach.
Specifically, let us consider a class of policy changes under which the th individual’s propensity of treatment is boosted by (in a way that does not change his or her treatment effect), where denotes the individual’s propensity score and is a positive, real-valued function such that for all . The policy change thus nudges everyone in the same direction, and two persons with the same baseline probability of treatment share an inducement of the same size. For such a policy change, the PRTE given and becomes
As with standard causal estimands, can be expressed as a weighted average of :
Here, the weight on is constant (i.e., ) within the interval of and zero elsewhere.
To examine the effects of marginal policy changes, let us consider a sequence of policy changes indexed by a real-valued scalar . Given , we define the MPRTE as the limit of as approaches zero:
| (15) |
Hence, we have established a direct link between and : At each level of the propensity score, the MPRTE is simply the at the margin where . As shown in the last row of Table 2, can also be expressed as a weighted average of using the Dirac delta function.
Figure 1 illustrates the relationships between ATE, TT, TUT, and MPRTE. Panel a shows a shaded gray plot of for heterogeneous treatment effects in a hypothetical setup. In this plot, both the propensity score and the latent resistance (both ranging from 0 to 1) are divided into 10 equally spaced strata, yielding 100 grids, and a darker grid indicates a higher treatment effect. The advantage of such a shaded gray plot is that we can use subsets of the 100 grids to represent meaningful subpopulations. For example, we present the grids for treated units in Panel b, untreated units in Panel c, and marginal units in Panel d. Thus, evaluating ATE, TT, TUT, and MPRTE simply means taking weighted averages of over the corresponding subsets of grids.
3.3. Treatment-Effect Heterogeneity among Marginal Entrants
For policymakers, a key question of interest would be how varies with the propensity score . From Equations 14 and 15, we see that consists of two components:
| (16) |
The first component reflects how treatment effect varies by the propensity score, and the second component reflects how treatment effect varies by the latent resistance . Among marginal entrants, is equal to so that these two components fall on the same dimension.
To see how the two components combine to shape , let us revisit the classic example on economic returns to college. In the labor economics literature, researchers often have found a negative association between and , suggesting a pattern of positive selection, that is, individuals who benefit more from college are more motivated than their peers to attend college in the first place (e.g., Blundell, Dearden, and Sianesi 2005; Carneiro et al. 2011; Heckman, Humphries, and Veramendi 2018; Moffitt 2008; Willis and Rosen 1979). In this case, the second component of Equation 16 would be a decreasing function of . On the other hand, the literature has not paid much attention to the first component, that is, whether individuals who by observed characteristics are more likely to attend college also benefit more from college. A number of observational studies suggest that nontraditional students, such as racial and ethnic minorities or students from less educated families, experience higher returns to college than do traditional students, although interpretation of such findings remains controversial due to potential unobserved selection biases (e.g., Attewell and Lavin 2007; Bowen and Bok 1998; Dale and Krueger 2011; Maurin and McNally 2008; for a review, see Hout 2012).10 However, if the downward slope in the second component were sufficiently strong, would also decline with . In this case, we would, paradoxically, observe a pattern of negative selection (Brand and Xie 2010): Among students who are at the margin of attending college, those who by observed characteristics are less likely to attend college would actually benefit more from college.
To better understand the paradoxical implication of self-selection, let us revisit Figure 1. From Panel a, we see that in the hypothetical data, treatment effect declines with at each level of the propensity score, suggesting unobserved self-selection. In other words, individuals may have self-selected into treatment on the basis of their anticipated gains. On the other hand, at each level of the latent variable , treatment effect increases with the propensity score, indicating that individuals who by observed characteristics are more likely to be treated also benefit more from the treatment. This relationship, however, is reversed among the marginal entrants. As shown in Panel d, among the marginal entrants, individuals who appear less likely to be treated (bottom left grids) have higher treatment effects. This pattern of negative selection at the margin, interestingly, is exactly due to an unobserved positive selection into treatment.
3.4. Policy as a Weighting Problem
In Section 3.2, we used to denote the increment in treatment probability at each level of the propensity score . Because is defined as the pointwise limit of as approaches zero, the mathematical form of does not affect . However, in deriving the population-level (i.e., unconditional) MPRTE, we need to use as the appropriate weight, that is,
| (17) |
Here is the marginal distribution function of the propensity score, and is a normalizing constant (see Appendix B for a derivation). Thus, given the estimates of , a policymaker could use the previous equation to design a formula for to boost the population-level MPRTE. This is especially useful if varies systematically with the propensity score . For example, if one found that the marginal return to college declines with the propensity score , a college expansion program targeted at students with relatively low values of (e.g., a means-tested financial aid program) would yield higher average marginal returns than would a universal expansion of college enrollment regardless of student characteristics.11
In practice, for a given policy , we can evaluate the aforementioned integral directly from sample data using
| (18) |
where is the estimated propensity score for unit in the sample. When the sample is not representative of the population by itself, sampling weights need to be incorporated into these summations.
3.5. Comparison with Carneiro and Colleagues (2010)
In the previous discussion, PRTE and MPRTE are defined for a class of policy changes in which the intensity of policy intervention depends on the individual’s propensity score . In other words, inducements are differentiated between individuals with different values of , whether their differences in are determined by the baseline covariates or the instrumental variables . This approach to defining MPRTE contrasts sharply with the approach taken by Carneiro and colleagues (2010, 2011), who stipulate that all policy changes have to be “conditioned on .” In their approach, inducements are allowed to vary across individuals with different values of but not across individuals with different values of . For convenience, we call Carneiro et al’s approach the conditional approach and our approach the unconditional approach. Compared with the conditional approach, the unconditional approach to studying policy effects has several major advantages.
First, as noted earlier, preferential policies under the conditional approach distinguish individuals with different instrumental variables () but not individuals with different baseline characteristics (). To see the limitation of such policies, let us revisit the college education example and consider a simplistic model where the only baseline covariate is family income and the only instrumental variable is the presence of four-year colleges in the county of residence. In this case, an “affirmative” policy—a policy that favors students with lower values of —would be a policy that induces students who happen to live in a county with no four-year colleges, regardless of family income. Given that equals at the margin, this policy benefits students with relatively low s at all levels of family income. To the extent that there is self-selection into college (i.e., ), this policy would yield a larger MPRTE than would a neutral policy. However, if is largely determined by family income rather than the local presence of four-year colleges (a plausible scenario), the variation of conditional on would be very limited, as would the gain in MPRTE from a preferential policy. In contrast, the unconditional approach distinguishes individuals with different values of , most of which may be driven by rather than . Because equals at the margin, this approach can sort out marginal entrants with different levels of effectively. Therefore, preferential policies under the unconditional approach are more effective in exploiting unobserved heterogeneity in treatment effects.
Second, because treatment effect in general depends on the observed covariates as well as the latent resistance , an ideal policy intervention should exploit the variation of treatment effect along both dimensions. The conditional approach, however, differentiates individuals with different s but not individuals with different observed characteristics (at least in practice). In contrast, by focusing on the propensity score , the unconditional approach accounts for treatment-effect heterogeneity in both observed and unobserved dimensions. Because equals at the margin, the bivariate function degenerates into a univariate function of among marginal entrants (see Equation 16). Thus, by weighting individuals with different values of , the unconditional approach captures the “collision” of observed and unobserved heterogeneity at the margin. To see why the latter is more effective, consider an extreme scenario where there is no unobserved sorting (i.e., is constant) but treatment effect varies considerably by . In this case, the unconditional approach can partly exploit the variation of treatment effect by (through the first component of Equation 16), whereas the conditional approach cannot (because it focuses exclusively on the second component of Equation 16).
Finally, the unconditional approach is computationally simpler. , so no further step is needed to estimate once we have estimates of . The conditional approach, by contrast, needs to build on using policy-specific weights. As shown in Table 3, these policy-specific weights generally involve the conditional density of given . Because is usually a high-dimensional vector, estimation of these weights is difficult and often tackled with ad hoc methods (see Section 2.3).
|
Table 3. Weights for Constructing from

4. Illustration with NLSY Data
To illustrate the new approach, we reanalyze the data from Carneiro and colleagues’ (2011) study on economic returns to college education. We first describe the data, then demonstrate treatment-effect heterogeneity using the newly defined , and finally, evaluate the effects of different marginal policy changes.
4.1. Data Description
We reanalyze a sample of white males (N = 1,747) who were 16 to 22 years old in 1979, drawn from the NLSY 1979. Treatment () is college attendance defined by having attained any postsecondary education by 1991. Under this definition, the treated group consists of 865 individuals, and the comparison group consists of 882 individuals. The outcome is the natural logarithm of hourly wage in 1991.12 Following the original study, we include in pretreatment variables (in both and ) linear and quadratic terms of mother’s years of schooling, number of siblings, the Armed Forces Qualification Test (AFQT) score adjusted by years of schooling, permanent local log earnings at age 17 (county log earnings averaged between 1973 and 2000), and permanent local unemployment rate at age 17 (state unemployment rate averaged between 1973 and 2000) as well as a dummy variable indicating urban residence at age 14 and cohort dummies. Also following Carneiro and colleagues (2011), we use the following instrumental variables (): (1) the presence of a four-year college in the county of residence at age 14, (2) local wage in the county of residence at age 17, (3) local unemployment rate in the state of residence at age 17, and (4) average tuition in public four-year colleges in the county of residence at age 17 as well as their interactions with mother’s years of schooling, number of siblings, and the adjusted AFQT score. In addition, four variables are included in but not in : years of experience in 1991, years of experience in 1991 squared, local log earnings in 1991, and local unemployment rate in 1991. More details about the data can be found in Carneiro and colleagues’ (2011) online appendix.
4.2. Heterogeneity in Treatment Effects
To estimate the bivariate function , we first need estimates of . In Section 2, we discussed a parametric and a semiparametric method for estimating . Here, we examine treatment-effect heterogeneity with the semiparametric estimates of (Equation 12) and thus .13 Figure 2 presents our key results for the estimated , with a shaded gray plot in which a darker grid indicates a higher treatment effect. The effect heterogeneity by the two dimensions—the propensity score and the latent resistance to treatment—exhibits an easy-to-interpret but surprising pattern. On the one hand, at each level of the propensity score, a higher level of the latent variable is associated with a lower treatment effect, indicating the presence of self-selection based on idiosyncratic returns to college. This pattern of “sorting on gain” echoes the classic findings reported in Willis and Rosen (1979) and Carneiro and colleagues (2011). On the other hand, the color gradient along the propensity score suggests that given the latent resistance to attending college, students who by observed characteristics are more likely to go to college also tend to benefit more from attending college.
If we read along the diagonal of Figure 2, however, we find that among students who are at the margin of indifference for attending college, those who appear less likely to attend college would benefit more from a college education, that is, declines with the propensity score . Figure 3 shows smoothed estimates of as well as its two components (see Equation 16). The negative association between and the latent resistance more than offsets the positive association between and the propensity score , resulting in the downward slope of . Echoing our discussion in Section 3.3, it is unobserved “sorting on gain” that leads to the negative association between the propensity score and returns to college among students at the margin.
We use weights given in Table 2 to estimate ATE, TT, and TUT at each level of the propensity score. Figure 4 shows smoothed estimates of , , , and . Several patterns are worth noting. First, there is a sharp contrast between and : A higher propensity of attending college is associated with a higher return to college on average (solid line) but a lower return to college among marginal entrants (dot-dash line). Second, (dashed line) is always larger than (dotted line), suggesting that at each level of the propensity score, individuals are positively self-selected into college based on their idiosyncratic returns to college. Finally, and converge to and at the extremes of the propensity score. When approaches 0, converges to and converges to . At the other extreme, when approaches 1, converges to and converges to . Looking back at Figure 1, we see that these relationships simply reflect compositional shifts in the treated and untreated groups as the propensity score changes from 0 to 1.

Figure 4. Heterogeneity in ATE, TT, TUT, and by propensity score based on semiparametric estimates of .
Note: ATE = average treatment effect; TT = treatment effect of the treated; TUT = treatment effect of the untreated; MPRTE = marginal policy-relevant treatment effect; MTE = marginal treatment effect.
4.3. Evaluation of Policy Effects
Given the estimates of , , and , we construct their population averages using appropriate weights across the propensity score. For example, to estimate TT, we simply integrate against the marginal distribution of the propensity score among individuals who attended college. The estimates of allow us to construct different versions of MPRTE, depending on how the policy change weights students with different propensities of attending college (see Equation 18). Table 4 reports our estimates of ATE, TT, TUT, and MPRTE under different policy changes from the parametric and semiparametric estimates of . To compare our new approach with Carneiro and colleagues’ (2011) original approach, we show estimates built on and those built on . Following Carneiro and colleagues (2011), we annualize the returns to college by dividing all parameter estimates by four, which is the average difference in years of schooling between the treated and untreated groups.
|
Table 4. Estimated Returns to One Year of College

The first three rows of Table 4 indicate that TT > ATE > TUT ≈ 0. That is, returns to college are higher among individuals who actually attended college than among those who attended only high school, for whom the average returns to college are virtually zero. Using either the parametric or semiparametric estimates of , our new approach and the original approach yield nearly identical point estimates and bootstrapped standard errors. This consistency affirms our argument that preserves all of the treatment-effect heterogeneity that is consequential for selection bias. Although the redefined MTE seems to contain less information than the original MTE (as it projects onto the dimension of ), the discarded information does not contribute to identification of average causal effects.
The last four rows of Table 4 present our estimates of MPRTE under four stylized policy changes: (1) , (2) , (3) , and (4) . Put in words, the first policy change increases everyone’s probability of attending college by the same amount, the second policy change favors students who appear more likely to go to college, the third policy change favors students who appear less likely to go to college, and the last policy change only targets students whose observed likelihood of attending college is less than 30 percent. For each policy change, the MPRTE is defined as the limit of the corresponding PRTE as goes to zero. The first policy change is also the first policy change considered by Carneiro and colleagues (2011:2760), that is, (see also the first row of Table 3). For this case, we estimated the MPRTE using both the original approach and our new approach. As expected, the two approaches yield the same results. However, the other three policy changes considered here cannot be readily accommodated within the original framework (see Section 3.5). Thus, we evaluate their effects using only our new approach, that is, via Equation 18.
Although the estimates of TUT are close to zero, all four policy changes imply substantial marginal returns to college. For example, under the first policy change, the semiparametric estimate of MPRTE is .083, suggesting that one year of college would translate into an 8.3 percent increase in hourly wages among marginal entrants. However, the exact magnitude of MPRTE depends heavily on the form of the policy change, especially under the semiparametric model. Whereas the marginal return to a year of college is about 5 percent if we expand everyone’s probability of attending college proportionally (policy change two), it can be as high as 15.5 percent if we only increase enrollment among students whose observed likelihood of attending college is less than 30 percent (policy change four). Figure 5 shows how different policy changes produce different compositions of marginal college entrants. Because students who benefit the most from college are located at the low end of the propensity score, a college expansion program targeted at those students will yield the highest marginal returns to college. Fortuitously, earlier research that did not account for the presence of unobserved selection reached similar policy implications (Brand and Xie 2010).
5. Discussion And Conclusion
Due to the ubiquity of population heterogeneity in social phenomena, it is impossible to evaluate causal effects at the individual level. All efforts to draw causal inferences in social science must be at the group level. Yet with observational data, even group-level inference is plagued by two types of selection bias: Individuals in the treated and comparison groups may differ systematically not only in their baseline outcomes but also in their treatment effects. Depending on whether unobserved selection is assumed away, traditional methods for causal inference from observational data can be divided into two classes, as shown in the first row of Table 5. The first class, including regression adjustment, matching, and inverse probability of treatment weighting (Robins, Hernan, and Brumback 2000), rests on the assumption of ignorability: After controlling for a set of observed covariates, treatment status is independent of both baseline outcomes and treatment effects. The second class of methods, including instrumental variables (IV), regression discontinuity (RD) designs (Hahn, Todd, and Van der Klaauw 2001; Thistlethwaite and Campbell 1960), and fixed-effects models, allows for unobserved selection into treatment but requires exogenous variation in treatment status—either between or within units—to identify causal effects.
|
Table 5. Methods for Identifying and Estimating Causal Effects from Observational Data

Both classes of methods allow treatment effects to vary in the population, but in common practices neither systematically models treatment-effect heterogeneity.14 When treatment effects are heterogeneous, some of these methods estimate quantities that are not of primary interest to the researcher. For example, when treatment effect varies according to the level of a covariate, main-effects-only regression models cannot recover standard causal estimands such as ATE or TT but instead estimate a conditional-variance-weighted causal effect that has little substantive meaning (Angrist and Pischke 2008; Elwert and Winship 2010). Moreover, when treatment effect is heterogeneous, IV and RD designs can only identify the average causal effect among individuals whose treatment status is influenced by the IV (Imbens and Angrist 1994) or in the case of fuzzy RD designs, by whether the running variable surpasses the “cutoff point” (Hahn, Todd, and Van der Klaauw 2001). Similarly, fixed-effects models can only identify the average causal effect among individuals who change their treatment status over the study period.
The second row of Table 5 summarizes the four approaches that can be used to systematically study treatment-effect heterogeneity, especially treatment-effect heterogeneity by pretreatment characteristics. The first approach, denoted as , includes the long-standing practice of adding interaction terms between treatment status and covariates in conventional regression models as well as recent proposals to fit nonparametric surfaces of potential outcomes and their difference (e.g., Hill 2011). The second approach, denoted as , models treatment effect as a univariate function of the propensity score (e.g., Xie et al. 2012; Zhou and Xie 2016). Because the propensity score is the only dimension along which treatment effect may be correlated with treatment status, this approach not only provides a useful solution to data sparseness, but it also facilitates projection of treatment effects beyond particular study settings (Stuart et al. 2011; Xie 2013). However, as noted earlier, these two approaches rely on the assumption of ignorability. When ignorability breaks down, interpretation of the observed heterogeneity in treatment effects becomes ambiguous (Breen et al. 2015).
The latter two approaches, that is, the MTE-based approach and our extension of it, accommodate unobserved selection through use of a latent index model for treatment assignment. In this model, a scalar error term is used to capture all the unobserved factors that may induce or impede treatment. As a result, treatment status is determined by the “competition” between the propensity score and the latent variable representing unobserved resistance to treatment. Therefore, the propensity score and the latent variable are the only two dimensions along which treatment status may be correlated with treatment effects. The MTE, as in Heckman and Vytlacil’s (1999, 2001a, 2005, 2007b) original formulation, is asymmetrical with respect to these two dimensions because it conditions on the entire vector of observed covariates as well as the latent variable . Because of this asymmetry, the original MTE-based approach has a number of drawbacks, including (1) an exclusive attention (in practice) to unobserved heterogeneity (rather than observed heterogeneity) in treatment effects, (2) difficulty of implementation due to unwieldy weight formulas, and (3) inflexibility in the modeling of policy effects (see Section 3.5).
To overcome these limitations, we presented an extension of the MTE framework through a redefinition of MTE. By conditioning on the propensity score and the latent variable , the redefined MTE not only treats observed and unobserved selection symmetrically, but it more parsimoniously summarizes all the treatment-effect heterogeneity that is consequential for selection bias. As a bivariate function, it can be easily visualized. As with the original MTE, the redefined MTE also can be used as a building block in evaluating aggregate causal effects. Yet the weights associated with the new MTE are simpler, more intuitive, and easier to compute (compare Table 2 with Tables 1 and 3). Finally, the new MTE immediately reveals heterogeneous treatment effects among individuals who are at the margin of treatment, thus allowing us to design more cost-effective policy interventions.
Our extension of the MTE approach is not a panacea. Like the original approach, it hinges on credible estimates of . Identification of requires at least a valid IV in the treatment assignment equation. Moreover, under either the parametric or semiparametric model, the statistical efficiency of estimates of depends heavily on the strength of IVs (Zhou and Xie 2016). When the IVs are relatively weak in determining treatment status, MTE-based estimates of aggregate causal effects can be imprecise. Nonetheless, as long as valid instruments are present, more precise estimates can always be achieved with a larger sample size.
Appendix A: Identification of under Assumptions 1, 2, and 3
From Assumption 1, we know . Because and are functions of and , respectively, . follows a standard uniform distribution for each , so we also have . By the rules of conditional independence, we have . Using this fact and the law of total expectation, we can write as
| (19) |
Thus is simply the conditional expectation of given . Given Assumption 3, can be written as Equation 14. Substituting it into Equation 19 yields
Appendix B: Derivation of Equation 17
To see why Equation 17 holds, consider the overall PRTE for a given . Given that , the size of inducement reflects the share of individuals that are induced into treatment (“compliers”), and the overall PRTE is a weighted average of with weights :
where denotes the marginal distribution function of the propensity score. We then define the population-level MPRTE as the limit of as approaches zero. Under some regularity conditions,15 we can take the limit inside the integral
Denoting , we obtain Equation 17.
Acknowledgements
The authors benefited from communications with Daniel Almirall, Matthew Blackwell, Jennie Brand, James Heckman, Jeffrey Smith, and Edward Vytlacil.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: Research reported in this publication was supported by the Eunice Kennedy Shriver National Institute of Child Health & Human Development of the National Institutes of Health under Grant No. R01-HD-074603-01. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
Notes
1.
Heckman and Robb (1986) also framed propensity score matching as a special case of control function methods.
2.
In the classic Roy model (Roy 1951), . In that case, and .
3.
The property that affects treatment status only through the propensity score in an additively separable latent index model is called index sufficiency (Heckman and Vytlacil 2005).
4.
An alternative method to identify the MTE nonparametrically is based on separate estimation of and (see Brinch, Mogstad, and Wiswall 2017; Heckman and Vytlacil 2007b).
5.
In estimating , we need to impose constraints on and such that . is from its definition. .
6.
The maximum likelihood estimation can be easily implemented in R using the sampleSelection package (see Toomet and Henningsen 2008).
7.
Because the treatment assignment model is now a probit model, is usually normalized to 1.
8.
More flexible methods, such as generalized additive models and boosted regression trees, also can be used to estimate propensity scores (e.g., McCaffrey, Ridgeway, and Morral 2004).
9.
In a companion paper (Zhou and Xie forthcoming), we discuss the regions over which can be nonparametrically identified with and without the assumption of additive separability.
10.
Studies that use compulsory schooling laws, differences in the accessibility of schools, or similar features as instrumental variables also find larger economic returns to college than do least squares estimates (Card 2001). However, this comparison does not reveal how returns to college vary by covariates or the propensity score.
11.
Admittedly, the discussion here provides no more than a theoretical guide to practice. In the real world, designing specific policy instruments to produce a target form of can be a challenging task.
12.
Hourly wage in 1991 is defined as an average of deflated (to 1983 constant dollars) nonmissing hourly wages reported between 1989 and 1993.
13.
Results based on parametric estimates of (Equation 11) are substantively similar.
14.
Although matching and weighting methods are well equipped to estimate ATE, TT, and TUT under the assumption of ignorability, they are seldom used to study treatment-effect heterogeneity by individual characteristics.
15.
A sufficient (but not necessary) condition is that is bounded over . By the mean value theorem, can be written as where . is thus also bounded. By the dominated convergence theorem, the limit can be taken inside the integral.
References
|
Angrist, Joshua D., Pischke, Jörn-Steffen. 2008. Mostly Harmless Econometrics: An Empiricist’s Companion. Princeton, NJ: Princeton University Press. Google Scholar | Crossref | |
|
Arabmazar, Abbas, Schmidt, Peter. 1982. “An Investigation of the Robustness of the Tobit Estimator to Non-normality.”Econometrica 50(4):1055–63. Google Scholar | Crossref | Medline | |
|
Attewell, Paul, Lavin, David. 2007. Passing the Torch: Does Higher Education for the Disadvantaged Pay off across the Generations?New York: Russell Sage Foundation. Google Scholar | |
|
Bjorklund, Anders, Moffitt, Robert. 1987. “The Estimation of Wage Gains and Welfare Gains in Self-Selection.”The Review of Economics and Statistics 69(1):42–49. Google Scholar | Crossref | ISI | |
|
Blundell, Richard, Dearden, Lorraine, Sianesi, Barbara. 2005. “Evaluating the Effect of Education on Earnings: Models, Methods and Results from the National Child Development Survey.”Journal of the Royal Statistical Society: Series A (Statistics in Society) 168(3):473–512. Google Scholar | Crossref | ISI | |
|
Borjas, George J. 1987. “Self-Selection and the Earnings of Immigrants.”The American Economic Review 77(4):531–53. Google Scholar | |
|
Bowen, William G., Bok, Derek. 1998. The Shape of the River. Long-Term Consequences of Considering Race in College and University Admissions. Princeton, NJ: Princeton University Press. Google Scholar | Crossref | |
|
Brand, Jennie E., Xie, Yu. 2010. “Who Benefits Most from College? Evidence for Negative Selection in Heterogeneous Economic Returns to Higher Education.”American Sociological Review 75(2):273–302. Google Scholar | SAGE Journals | ISI | |
|
Breen, Richard, Choi, Seong-soo, Holm, Anders. 2015. “Heterogeneous Causal Effects and Sample Selection Bias.”Sociological Science 2:351–69. Google Scholar | Crossref | |
|
Brinch, Christian N., Mogstad, Magne, Wiswall, Matthew. 2017. “Beyond LATE with a Discrete Instrument.”Journal of Political Economy 125(4):985–1039. Google Scholar | Crossref | |
|
Card, David . 2001. “Estimating the Return to Schooling: Progress on Some Persistent Econometric Problems.”Econometrica 69(5):1127–60. Google Scholar | Crossref | |
|
Carneiro, Pedro, Heckman, James J., Vytlacil, Edward J. 2010. “Evaluating Marginal Policy Changes and the Average Effect of Treatment for Individuals at the Margin.”Econometrica 78(1):377–94. Google Scholar | Medline | |
|
Carneiro, Pedro, Heckman, James J., Vytlacil, Edward J. 2011. “Estimating Marginal Returns to Education.”American Economic Review 101(773):2754–81. Google Scholar | Medline | |
|
Carneiro, Pedro, Lee, Sokbae. 2009. “Estimating Distributions of Potential Outcomes Using Local Instrumental Variables with an Application to Changes in College Enrollment and Wage Inequality.”Journal of Econometrics 149(2):191–208. Google Scholar | Crossref | |
|
Dale, Stacy, Krueger, Alan B. 2011. “Estimating the Return to College Selectivity over the Career Using Administrative Earnings Data.” Technical report, National Bureau of Economic Research, Cambridge, MA. Google Scholar | |
|
Elwert, Felix, Winship, Christopher. 2010. “Effect Heterogeneity and Bias in Main-Effects-Only Regression Models.” Pp. 327–36 Heuristics, Probability and Causality: A Tribute to Judea Pearl, edited by Dechter, R., Geffner, H., Halpern, J. Y. London: College Publications. Google Scholar | |
|
Fan, Jianqing, Gijbels, Irene. 1996. Local Polynomial Modelling and Its Applications. Vol. 66. London: Chapman and Hall. Google Scholar | |
|
Gamoran, Adam, Mare, Robert D. 1989. “Secondary School Tracking and Educational Inequality: Compensation, Reinforcement, or Neutrality?”American Journal of Sociology 94(5):1146–83. Google Scholar | Crossref | Medline | |
|
Hahn, Jinyong, Todd, Petra, Van der Klaauw, Wilbert. 2001. “Identification and Estimation of Treatment Effects with a Regression-Discontinuity Design.”Econometrica 69(1):201–209. Google Scholar | Crossref | ISI | |
|
Heckman, James J. 1978. “Dummy Endogenous Variables in a Simultaneous Equation System.”Econometrica 46(4):931–59. Google Scholar | Crossref | Medline | |
|
Heckman, James J. 2010. “Building Bridges between Structural and Program Evaluation Approaches to Evaluating Policy.”Journal of Economic literature 48(2):356–98. Google Scholar | Crossref | Medline | |
|
Heckman, James J., Hotz, V. Joseph. 1989. “Choosing among Alternative Nonexperimental Methods for Estimating the Impact of Social Programs: The Case of Manpower Training.”Journal of the American Statistical Association 84(408):862–74. Google Scholar | Medline | |
|
Heckman, James J., Humphries, John Eric, Veramendi, Gregory. 2018. “Returns to Education: The Causal Effects of Education on Earnings, Health and Smoking.”Journal of Political Economy 126(S1):S197–S246. Google Scholar | Crossref | Medline | |
|
Heckman, James J., Navarro-Lozano, Salvador. 2004. “Using Matching, Instrumental Variables, and Control Functions to Estimate Economic Choice Models.”The Review of Economics and Statistics 86(1):30–57. Google Scholar | Crossref | ISI | |
|
Heckman, James J., Robb, Richard. 1986. “Alternative Methods for Solving the Problem of Selection Bias in Evaluating the Impact of Treatments on Outcomes.” Pp. 63–107 in Drawing Inferences from Self-selected Samples, edited by Wainer, H. New York: Springer. Google Scholar | Crossref | |
|
Heckman, James J., Urzua, Sergio, Vytlacil, Edward J. 2006. “Understanding Instrumental Variables in Models with Essential Heterogeneity.”The Review of Economics and Statistics 88(3):389–432. Google Scholar | Crossref | ISI | |
|
Heckman, James J., Vytlacil, Edward J. 1999. “Local Instrumental Variables and Latent Variable Models for Identifying and Bounding Treatment Effects.”Proceedings of the National Academy of Sciences of the United States of America 96(8):4730–34. Google Scholar | Crossref | Medline | |
|
Heckman, James J., Vytlacil, Edward J. 2001a. “Local Instrumental Variables.” Pp. 1–46 in Nonlinear Statistical Modeling: Proceedings of the Thirteenth International Symposium in Economic Theory and Econometrics: Essays in Honor of Takeshi Amemiya, edited by Hsiao, C., Morimune, K., Powel, J. L.New York: Cambridge University Press. Google Scholar | Crossref | |
|
Heckman, James J., Vytlacil, Edward J. 2001b. “Policy-Relevant Treatment Effects.”American Economic Review 91(2):107–11. Google Scholar | Crossref | Medline | |
|
Heckman, James J., Vytlacil, Edward J. 2005. “Structural Equations, Treatment Effects, and Econometric Policy Evaluation.”Econometrica 73(3):669–738. Google Scholar | Crossref | ISI | |
|
Heckman, James J., Vytlacil, Edward J. 2007a. “Econometric Evaluation of Social Programs, Part I: Causal Models, Structural Models and Econometric Policy Evaluation.” Chapter 71 in Handbook of Econometrics. Vol. 6, edited by Heckman, J. J., Leamer, E. E.Elsevier. Google Scholar | Crossref | |
|
Heckman, James J., Vytlacil, Edward J. 2007b. “Econometric Evaluation of Social Programs, Part II: Using the Marginal Treatment Effect to Organize Alternative Econometric Estimators to Evaluate Social Programs, and to Forecast their Effects in New Environments.” Chapter 71 in Handbook of Econometrics, Vol. 6, edited by Heckman, J. J., Leamer, E. E.Elsevier. Google Scholar | Crossref | |
|
Hill, Jennifer L. 2011. “Bayesian Nonparametric Modeling for Causal Inference.”Journal of Computational and Graphical Statistics 20(1):217–40. Google Scholar | Crossref | |
|
Holland, Paul W. 1986. “Statistics and Causal Inference.”Journal of the American Statistical Association 81(396):945–60. Google Scholar | |
|
Hout, Michael . 2012. “Social and Economic Returns to College Education in the United States.”Annual Review of Sociology 38:379–400. Google Scholar | Crossref | ISI | |
|
Imbens, Guido W., Angrist, Joshua D. 1994. “Identification and Estimation of Local Average Treatment Effects.”Econometrica 62(2):467–75. Google Scholar | Crossref | Medline | |
|
Maestas, Nicole, Mullen, Kathleen J., Strand, Alexander. 2013. “Does Disability Insurance Receipt Discourage Work? Using Examiner Assignment to Estimate Causal Effects of SSDI Receipt.”The American Economic Review 103(5):1797–829. Google Scholar | Crossref | |
|
Maurin, Eric, McNally, Sandra. 2008. “Vive la Révolution! Long-Term Educational Returns of 1968 to the Angry Students.”Journal of Labor Economics 26(1):1–33. Google Scholar | Crossref | |
|
McCaffrey, Daniel F., Ridgeway, Greg, Morral, Andrew R. 2004. “Propensity Score Estimation with Boosted Regression for Evaluating Causal Effects in Observational Studies.”Psychological Methods 9(4):403–25. Google Scholar | Crossref | Medline | |
|
Moffitt, Robert . 2008. “Estimating Marginal Treatment Effects in Heterogeneous Populations.”Annales d’Economie et de Statistique (91/92):239–61. Google Scholar | |
|
Quandt, Richard E. 1958. “The Estimation of the Parameters of a Linear Regression System Obeying Two Separate Regimes.”Journal of the American Statistical Association 53(284):873–80. Google Scholar | Crossref | |
|
Quandt, Richard E. 1972. “A New Approach to Estimating Switching Regressions.”Journal of the American Statistical Association 67(338):306–10. Google Scholar | Crossref | |
|
Robins, James M., Hernan, Miguel Angel, Brumback, Babette. 2000. “Marginal Structural Models and Causal Inference in Epidemiology.”Epidemiology 11(5):550–60. Google Scholar | Crossref | |
|
Robinson, Peter M. 1988. “Root-N-Consistent Semiparametric Regression.”Econometrica 56(4):931–54. Google Scholar | Crossref | |
|
Rosenbaum, Paul R., Rubin, Donald B. 1983. “The Central Role of the Propensity Score in Observational Studies for Causal Effects.”Biometrika 70(1):41–55. Google Scholar | Crossref | ISI | |
|
Roy, Andrew Donald . 1951. “Some Thoughts on the Distribution of Earnings.”Oxford Economic Papers 3(2):135–46. Google Scholar | Crossref | |
|
Sakamoto, Arthur, Chen, Meichu D. 1991. “Inequality and Attainment in a Dual Labor Market.”American Sociological Review 56(3):295–308. Google Scholar | Crossref | |
|
Smock, Pamela J., Manning, Wendy D., Gupta, Sanjiv. 1999. “The Effect of Marriage and Divorce on Women’s Economic Well-Being.”American Sociological Review 64(6):794–812. Google Scholar | Crossref | ISI | |
|
Stuart, Elizabeth A., Cole, Stephen R., Bradshaw, Catherine P., Leaf, Philip J. 2011. “The Use of Propensity Scores to Assess the Generalizability of Results from Randomized Trials.”Journal of the Royal Statistical Society: Series A (Statistics in Society) 174(2):369–86. Google Scholar | Crossref | |
|
Thistlethwaite, Donald L., Campbell, Donald T. 1960. “Regression-Discontinuity Analysis: An Alternative to the Ex Post Facto Experiment.”Journal of Educational Psychology 51(6):309–317. Google Scholar | Crossref | ISI | |
|
Toomet, Ott, Henningsen, Arne. 2008. “Sample Selection Models in R: Package sampleSelection.”Journal of Statistical Software 27(7):1–23. Google Scholar | Crossref | ISI | |
|
Vytlacil, Edward . 2002. “Independence, Monotonicity, and Latent Index Models: An Equivalence Result.”Econometrica 70(1):331–41. Google Scholar | Crossref | |
|
Willis, Robert J., Rosen, Sherwin. 1979. “Education and Self-Selection.”Journal of Political Economy 87(5):S7–S36. Google Scholar | Crossref | ISI | |
|
Winship, Chris, Mare, Robert D. 1992. “Models for Sample Selection Bias.”Annual Review of Sociology 18:327–50. Google Scholar | Crossref | |
|
Winship, Chris, Morgan, Stephen. 1999. “The Estimation of Causal Effects from Observational Data.”Annual Review of Sociology 25:659–706. Google Scholar | Crossref | ISI | |
|
Xie, Yu . 2013. “Population Heterogeneity and Causal Inference.”Proceedings of the National Academy of Sciences 110(16):6262–68. Google Scholar | Crossref | |
|
Xie, Yu, Brand, Jennie, Jann, Ben. 2012. “Estimating Heterogeneous Treatment Effects with Observational Data.”Sociological Methodology 42(1):314–47. Google Scholar | SAGE Journals | |
|
Zhou, Xiang . 2019. localIV: Estimation of Marginal Treatment Effects using Local Instrumental Variables. R package version 0.2.1, available at the Comprehensive R Archive Network (CRAN). Google Scholar | |
|
Zhou, Xiang, Xie, Yu. 2016. “Propensity Score-Based Methods Versus MTE-Based Methods in Causal Inference: Identification, Estimation, and Application.”Sociological Methods & Research 45(1):3–40. Google Scholar | SAGE Journals | ISI | |
|
Zhou, Xiang, Xie, Yu. Forthcoming. “Marginal Treatment Effects from a Propensity Score Perspective.”Journal of Political Economy. Google Scholar |
Author Biographies
Xiang Zhou is an assistant professor in the Department of Government at Harvard University. He received a PhD in sociology and statistics from the University of Michigan. His research broadly concerns quantitative methodology, economic inequality and mobility, and contemporary Chinese society. His work has appeared in American Sociological Review, American Journal of Sociology, Journal of Political Economy, and Proceedings of the National Academy of Sciences, among other peer-reviewed journals.
Yu Xie is Bert G. Kerstetter ’66 University Professor of Sociology and director of Paul and Marcia Wythes Center on Contemporary China, Princeton University. He is also a Visiting Chair Professor of the Center for Social Research, Peking University. His main areas of interest are social stratification, demography, statistical methods, Chinese studies, and sociology of science. His recently published books include Marriage and Cohabitation (University of Chicago Press 2007) with Arland Thornton and William Axinn, Statistical Methods for Categorical Data Analysis (Emerald 2008, second edition) with Daniel Powers, and Is American Science in Decline? (Harvard University Press 2012) with Alexandra Killewald. His methodological work is on categorical data analysis, causal inference, and survey research. Xie is also a former editor of Sociological Methodology.




