Instrumental-variable estimation of large-T panel-data models with common factors

In this article, we introduce the xtivdfreg command, which implements a general instrumental-variables (IV) approach for fitting panel-data models with many time-series observations, T, and unobserved common factors or interactive effects, as developed by Norkute et al. (2021, Journal of Econometrics 220: 416–446) and Cui et al. (2020a, ISER Discussion Paper 1101). The underlying idea of this approach is to project out the common factors from exogenous covariates using principal-components analysis and to run IV regression in both of two stages, using defactored covariates as instruments. The resulting two-stage IV estimator is valid for models with homogeneous or heterogeneous slope coefficients and has several advantages relative to existing popular approaches. In addition, the xtivdfreg command extends the two-stage IV approach in two major ways. First, the algorithm accommodates estimation of unbalanced panels. Second, the algorithm permits a flexible specification of instruments. We show that when one imposes zero factors, the xtivdfreg command can replicate the results of the popular Stata ivregress command. Notably, unlike ivregress, xtivdfreg permits estimation of the two-way error-components paneldata model with heterogeneous slope coefficients.


Introduction
The common factor approach is highly popular among panel-data practitioners because it offers a wide scope for controlling for omitted variables and rich sources of unobserved heterogeneity, including models with cross-sectional dependence; see, for example, Chudik and Pesaran (2015), Juodis and Sarafidis (2018), and Wansbeek (2012, 2021).
For panels where both of the cross-sectional and time-series dimensions (N and T , respectively) tend to be large, popular estimation approaches have been developed by Pesaran (2006) and Bai (2009) known in the literature as common correlated effects The resulting two-stage instrumental-variables (2SIV) approach combines features from both Pesaran (2006) and Bai (2009). In particular, following Pesaran (2006), the covariates of the model are assumed to be subject to a linear common factor structure. However, following Bai (2009), the common factors are projected out using PCA rather than cross-sectional averages. A major distinctive feature of 2SIV is that it eliminates the common factors from the error term and the regressors separately in two stages. In comparison, CCE eliminates the factors from the error and the regressors jointly, whereas IPC eliminates only the factors in the error.
2SIV is appealing for several reasons. First, CCE and IPC suffer from incidental parameters bias because an increasing number of parameters needs to be estimated as either T or N grows; see Westerlund and Urbain (2015) and Juodis, Karabiyik, and Westerlund (2021). Therefore, bias correction is required to ensure that inferences remain valid asymptotically. In contrast, 2SIV does not require bias correction in either dimension. This property is important because approximate procedures aiming to recenter the limiting distribution of particular estimators may not be able to fully eliminate all bias terms, especially those of high order; in such cases, substantial size distortions can occur in finite samples. Second, the CCE approach requires the so-called rank condition, which assumes that the number of factors does not exceed the rank of the (unknown) matrix of cross-sectional averages of the unobserved factor loadings. 2SIV does not require such a condition because the factors are estimated using PCA rather than cross-sectional averages. Third, the 2SIV objective function is linear in the parameters, and therefore the method is robust and computationally inexpensive. 1 In comparison, IPC relies on nonlinear optimization, and therefore convergence to the global optimum might not be guaranteed (Jiang et al. Forthcoming). Fourth, 2SIV shares a major attractive feature of CCE over IPC because it permits estimation of panels with heterogeneous slope coefficients. Last, 2SIV allows for endogenous regressors, so long as external instruments are available.
In this article, we introduce a new command, xtivdfreg, that implements the 2SIV approach and extends it in two major ways. First, the algorithm accommodates estimation of unbalanced panels. To achieve this, we use a variant of the expectationmaximization approach proposed by Stock and Watson (1998) and Bai, Liao, and Yang (2015). Second, the algorithm permits a flexible specification of instruments. In particular, it accommodates cases where 1) the covariates are driven by entirely different factors; 2) the covariates have a different number of factors, including no factors at all; and 3) different lags of defactored covariates are used as instruments.
We show that when one imposes zero factors and requests the 1SIV estimator, the xtivdfreg command can replicate the results of the popular ivregress command. Essentially, the two-stage least-squares (2SLS) estimator of the two-way error-components panel-data model can be viewed as a special case of the proposed 2SIV approach in that the former does not defactor the instruments. Notably, unlike ivregress, xtivdfreg permits estimation of the two-way error-components panel-data model with heterogeneous slope coefficients.
We illustrate the method with two examples. First, we use a panel dataset consisting of 300 U.S. financial institutions, each one observed over 56 time periods. We attempt to shed some light on the determinants of banks' capital adequacy ratios. The results are compared with those obtained by using popular panel methods, such as the fixed-effects and 2SLS estimators, as well as the CCE estimator of Pesaran (2006). In the second example, we use macrodata used by Eberhardt and Teal (2010) for the estimation of cross-country production functions in the manufacturing sector. The dataset is unbalanced, containing observations on 48 developing and developed countries during the period 1970 to 2002.
The remainder of the article is organized as follows. Section 2, outlines the 2SIV approach developed by Norkute et al. (2021) and Cui et al. (2020) and discusses implementation with unbalanced panel data. Section 3 describes the syntax of the xtivdfreg command. Section 4 illustrates the command using real datasets. Section 5 concludes.

Models with homogeneous coefficients
We consider the following autoregressive distributed lag panel-data model with homogeneous slopes and a multifactor error structure: 2 y it = αy i,t−1 + β x it + u it ; i = 1, 2, . . . , N ; t = 1, 2, . . . , T and u it = γ y,i f y,t + ε it 2. The estimation procedures described in this article apply also to static panels that arise by imposing α = 0 or models with higher-order lags of y i,t and x it . Models with heterogeneous slopes are considered in section 2.2.
|α| < 1, β = (β 1 , β 2 , . . . , β K ) such that at least one of {β k } K k=1 is nonzero, and it ) is a K × 1 vector of regressors. The error term of the model is composite, where f y,t and γ y,i denote m y × 1 vectors of true unobserved factors and factor loadings, respectively, and ε it is an idiosyncratic error.
The vector of regressors x it is assumed to be subject to the following data-generating process: 3 is an idiosyncratic error term that is assumed to be independent from ε it . 4 Thus, x it satisfies strict exogeneity with respect to ε it , although it can be endogenous with respect to the total error term, u it , via the factor component. This assumption ensures that one does not need to seek for external instruments. However, as discussed in remark 4, endogeneity with respect to ε it can be allowed straightforwardly, provided there are valid external instruments available for estimation.
Stacking the T observations for each i yields . . , f y,T ) is T × m y , and ε i = (ε i1 , ε i2 , . . . , ε iT ) . Similarly, Let W i = (y i,−1 , X i ) and θ = (α, β ) . The model can be written more succinctly as The 2SIV approach involves two stages. In the first stage, the common factors in X i are asymptotically eliminated using PCA, and the defactored regressors are used as instruments to obtain consistent estimates of the structural parameters of the model, θ. Pesaran (2006), (1) implies a genuine restriction on the data-generating process, which is not actually required for the IPC estimator of Bai (2009). However, while this assumption is typically taken for granted by practitioners when using CCE, it is testable within the 2SIV framework, based on the overidentifying restrictions test statistic that is readily available in overidentified models. This issue is discussed in more detail at the end of section 2.1. 4. Individual-specific and time-specific effects can be easily accommodated by replacing y it , x it with the transformed variables ẏ it ,ẋ it , whereẏ it = y it − y i − y t + y,

As in
it is defined analogously. 5. In practice, it is not necessary that all regressors be subject to a common factor structure and thus correlated with the factor component of the error term, u it . We discuss one such situation in section 4.1, remark 7.
In the second stage, the entire model is defactored based on estimated factors extracted from the first-stage residuals, and another IV regression is implemented using the same instruments as in stage one.

First-stage IV estimator
Define F x as √ T times the eigenvectors corresponding to the m x largest eigenvalues of Consider the following empirical projection matrices: In this case, the matrix of instruments can be formulated as which is of dimension T × 2K. Thus, the degree of overidentification of the model is 2K − (K + 1).

Remark 1.
Further lags of X i can be used as instruments straightforwardly. To illustrate, let q z denote the total number of lags of X i used as instruments, and define F x,−τ as √ T times the eigenvectors corresponding to the m x largest eigenvalues of the T × T matrix N i=1 X i,−τ X i,−τ /N T , where X i,−τ = L τ X i for τ = 1, . . . , q z . The corresponding empirical projection matrices are of the same form as in (3) with F x,−1 replaced by F x,−τ . Moreover, in the case where the covariates are strictly exogenous, leads of X i can also be used as instruments; see remark 8 in section 4.1 for more details. In the absence of any lags of X i (and further lags of y i ) included in the model as regressors, the degree of overidentification is equal to q z K − (K + 1).
The 1SIV estimator of θ is defined as where 6. In this section, both my and mx are treated as known. In practice, these quantities can be estimated consistently using standard methods proposed in the literature, such as the information criteria proposed by Bai and Ng (2002) or the eigenvalue ratio test of Ahn and Horenstein (2013). The xtivdfreg command uses the latter.
as N and T grow jointly to infinity, that is, (N, T ) j → ∞, such that N/T → c, 0 < c < ∞. However, θ 1SIV is asymptotically biased. Rather than bias correcting this estimator, Norkute et al. (2021) and Cui et al. (2020) put forward a second-stage estimator, which is free from asymptotic bias and is potentially more efficient. For this purpose, the first-stage estimator is useful because it provides a consistent estimate of the error term of the model, which is required to implement the second-stage IV estimator.

Remark 2.
In the static panel case, where no lags of y i are included on the right-hand side and the model is exactly identified (that is, no lags of the regressors are used as instruments), the 1SIV estimator reduces to

Second-stage IV estimator
To implement the second stage, extract estimates of the space spanned by F y using residuals from the first stage; that is, Subsequently, the entire model is defactored, and a second IV regression is run using the same instruments as in stage one.
In particular, let where F y is defined as √ T times the eigenvectors corresponding to the m y largest eigenvalues of the T × T matrix N i=1 u i u i /N T . The (optimal) second-stage IV estimator is defined as where and As shown by Norkute et al. (2021), θ 2SIV is √ N T consistent and asymptotically normally distributed, such that Notice that the limiting distribution of θ 2SIV is correctly centered, and thus no bias correction is required. As demonstrated by Cui et al. (2020), the main intuition of this result lies in that F x Γ x,i is estimated from X i , whereas F y γ y,i is estimated from u i . Because V i , F y γ y,i , and ε i are independent from one another, any correlations that arise because of the estimation error of F y and F x are asymptotically negligible.

Remark 3.
In the static panel case, where no lags of y i are included on the right-hand side and the model is exactly identified, the second-stage IV estimator can be expressed as In this case, proposition 3.2 in Cui et al. (2020) reveals that the second-stage estimator is asymptotically equivalent to a least-squares estimator obtained by regressing y i −F y γ y,i on X i − F x Γ x,i . Moreover, the authors show that θ 2SIV is asymptotically as efficient as the bias-corrected CCE and IPC estimators.
Remark 4. The assumptions imposed thus far imply that X i satisfies strict exogeneity with respect to ε i because otherwise extracting principal components from X i may be invalid. When some of the regressors are endogenous (or weakly exogenous) with respect to ε it , 2SIV requires using external exogenous instruments. 8 To illustrate, let refer to the strictly exogenous and endogenous regressors, respectively, which are of dimension T ×K (exog) and T ×K (endog) . Furthermore, let denotes the matrix of external exogenous covariates. X (ext) i can still be correlated with the factor component; that is, it may be subject to a similar datagenerating process as in (2) The corresponding projection matrices are defined in the same way as in (3) . In this case, the matrix of instruments becomes One could extend the 1SIV estimator of θ defined in (5) by defactoring the entire model based on M Fx , that is, by using M Fx Z i instead of Z i . However, in this case, when the space of Fy spans the space of Fx, the resulting estimator would be asymptotically equivalent to the existing one defined in (6). 8. If external instruments cannot be found, identification requires that 1) the number of strictly exogenous regressors within X i be sufficiently large; and 2) these exogenous regressors be correlated with the endogenous ones so that they (and their lags) serve as informative instruments.
The overidentifying restrictions J-test statistic associated with the second-stage IV estimator is given by and Ω N T is defined in (7).
The overidentifying restrictions test is particularly useful in this approach. First, it is expected to pick up a violation of the exogeneity of the defactored covariates with respect to the idiosyncratic error in the equation for y i . Second, the orthogonality condition of the instruments is violated if the slope vector, θ, is cross-sectionally heterogeneous. In this case, the estimators proposed in this section may become inconsistent, and the J test is expected to reject the null hypothesis asymptotically.

Models with heterogeneous coefficients
We now turn our focus on models with heterogeneous coefficients. Let The IV estimator of θ i is defined as where As shown by Norkute et al. (2021), Note that the overidentifying restrictions test statistic is not valid for the model with heterogeneous coefficients. 9 Remark 5. In the static panel case, where no lags of y i are included on the right-hand side and the model is exactly identified, the individual-specific IV estimator reduces to Remark 6. When the model contains endogenous regressors, the matrices listed in (10) are given by

Unbalanced panels
When the panel-data model is unbalanced, that is, some observations are missing at random, our procedure needs to be modified to control for the unobserved common factors. Following Stock and Watson (1998) and Bai, Liao, and Yang (2015), we may distinguish between X i and X * i . X * i is a T × K matrix containing the true values of the regressors, and it is defined as in (2). Let x * (k) i,t denote the (t, k)th entry of X * i , and ι (k) i,t denote a binary indicator that takes the value unity if the kth variable for individual i at time t is observed and zero otherwise. Thus, we set x (k) x,t and γ ki denote some initial values for the factors and factor loadings, respectively. Also, let T = max {T 1 , T 2 , . . . , T N }, where T i denotes the maximum number of observations for individual i. 9. Using a similar line of argument as that in section 2.1, one could also consider a second-stage estimator by projecting Fy out from the model asymptotically, that is, formulating M Fy y i = M Fy W i θ i + M Fy u i and then estimating θ i . However, the need to deal with heterogeneous slopes here implies that (the space spanned by) Fy should be estimated using the residuals from the time-series IV regression, u i = y i − W i θ IV,i . Because θ IV,i is √ T consistent rather than √ N T consistent, the estimation of Fy may become very inefficient. Note that the estimation of Fx required for the IV estimator defined in (9) does not suffer from a similar problem, because it can be estimated using the raw data {X i } N i=1 . 10. When individual-specific and time-specific effects are included, i,t , which is defined similarly to footnote 4. Thus, we setẋ i,t is unobserved otherwise.
In the first iteration, the values of the regressors are set such that The factors in the first iteration, f (1) x,t , are extracted as √ T times the eigenvectors corresponding to the m x largest eigenvalues of the matrix . The corresponding factor loadings, γ ki , are the estimated individual-specific coefficients obtained by regressing x

Subsequent iterations are based on
The convergence criterion is defined with respect to the objective function denotes the estimated value of the kth regressor corresponding to the th iteration for individual i at time t, while f ( ) x,t and γ ( ) ki are defined similarly as before. The initial factor values are determined using a similar eigenvalue problem as outlined previously, this time based on x (k) i , a column vector of length T with missing values replaced by zeros. That is, f x,t is computed as √ T times the eigenvectors corresponding to the m x largest eigenvalues of the matrix with the (j 1 , j 2 ) entry being divided by the number of summands used when this number is larger than zero.
The same procedure is followed when extracting factors from lagged values of X i or from the residuals obtained from the first-stage estimation. 11 11. In practice, it is possible that the estimated number of factors in the regressors varies across different lags. In this case, we set mx equal to the maximum estimated value obtained across different lags.

Options
absorb(absvars) specifies categorical variables that identify the fixed effects to be absorbed. Typical use is absorb(panelvar) or absorb(panelvar timevar) for one-way or two-way fixed effects, respectively. 12 iv(varlist , fvar(fvars) lags(#) factmax(#) no eigratio no doubledefact ) specifies IV. One can specify as many sets of instruments as required. Variables in the same set are defactored jointly. External variables that are not part of the regression model can also be used as instruments in varlist. noeigratio requests to use a fixed number of factors as specified with the option factmax(#). By default, the eigenvalue ratio test of Ahn and Horenstein (2013) is used to compute the number of factors for each estimation stage and each set of instruments.
doubledefact requests to use a further defactorization stage of the entire model for the first-stage estimator, as, for example, described in footnote 7. nodoubledefact requests to avoid implementing this further defactorization stage. doubledefact is the default when the option mg is specified, and nodoubledefact is the default when the option mg is omitted.
fstage requests the 1SIV estimator to be computed instead of the second-stage IV estimator.
mg requests the mean-group estimator to be computed, which allows for heterogeneous slopes.
iterate(#) specifies the maximum number of iterations for the extraction of factors. If convergence is declared before this threshold is reached, it will stop when convergence is declared. The default is the number set using set maxiter. This option has no effect with strongly balanced panel data, in which case any iterations are redundant.
ltolerance(#) specifies the convergence tolerance for the objective function; see [R] Maximize. The default is ltolerance(1e-4). This option has no effect with strongly balanced panel data.
nodots requests not to display dots for the iteration steps. By default, one dot character is displayed for each iteration step. This option has no effect with strongly balanced panel data.

Stored results
xtivdfreg stores the following in e():

Example 1: Estimation of the determinants of banks' capital adequacy ratios
In this example, we illustrate the xtivdfreg command by estimating the effect of main drivers behind capital adequacy ratios for banking institutions. We make use of panel data from a random sample of 300 U.S. banks, each one observed over 56 time periods, namely, 2006:Q1-2019:Q4.
We focus on the model where i = 1, . . . , 300 and t = 2, . . . , 56. All data are publicly available, and they have been downloaded from the Federal Deposit Insurance Corporation website. 13 • CAR it stands for "capital adequacy ratio", which is proxied by the ratio of tier 1 (core) capital over risk-weighted assets.
• size it is proxied by the natural logarithm of banks' total assets.
• ROA it stands for the "return on assets", defined as annualized net income expressed as a percentage of average total assets. ROA is used as a measure of profitability.
• liquidity it is proxied by the loan-to-deposit ratio. Note that higher values of this variable imply a lower level of liquidity.
Finally, the error term is composite; η i and τ t capture bank-specific and time-specific effects, f y,t is an m y ×1 vector of unobserved common shocks with corresponding loadings given by γ y,i , and ε it is a purely idiosyncratic error. Note that m y is unknown.
Some discussion on the interpretation of the parameters that characterize (12) is useful. The autoregressive coefficient, α, reflects costs of adjustment that prevent banks from achieving optimal levels of capital adequacy instantaneously. β k , for k = 1, . . . , K(= 3), denote the slope coefficients of the model. β 1 measures the effect of size on capital adequacy behavior. Under the "too-big-to-fail hypothesis", large banks may count on public bailout during periods of financial distress, knowing that they are systematically very important (for example, Cui, Sarafidis, and Yamagata [2020b]). Essentially, this hypothesis reflects the classic moral hazard problem, where one party takes on excessive risk, knowing that it is protected against the risk and that another party will incur the cost. Under such a scenario, β 1 is expected to be negative. β 2 measures the effect of profitability on capital adequacy. Standard theory suggests that higher bank profitability dissuades a bank's risk taking, and thus it is associated with larger capital reserves because profitable banks stand to lose more shareholder value if downside risks realize (Keeley 1990). On the other hand, more profitable banks can borrow more and engage in risky activities on a larger scale under the presence of leverage constraints (Martynova, Ratnovski, and Vlahu 2020). A positive (negative) value of β 2 is consistent with the former (latter) interpretation. Lastly, the direction of the effect of liquidity, β 3 , is ultimately an empirical question as well. For instance, a positive value indicates that lower liquidity levels force banks to increase their capital reserves, arguably to reduce risk exposure.
We start by running the xtivdfreg command using two lags of the covariates as defactored instruments and up to a maximum of three factors. Thus, we use nine instruments in total, three for each covariate. There are four parameters, which implies that the degree of overidentification equals five. We control for bank-specific and timespecific effects by eliminating them prior to estimation. This baseline regression is obtained as follows: . use xtivdfreg_example (Capital Adequacy Ratios of U.S. Banking Institutions; Source: FDIC) . xtivdfreg l(0/1).CAR size ROA liquidity, absorb(id t) > iv(size ROA liquidity, lags (2) To illustrate the specification of the command in terms of the notation used in the article, let X i = (X denotes the regressor corresponding to the coefficient β k in (12), for k = 1, 2, 3. The matrix of instruments is given by The second-stage IV estimator is defined in (6).
All coefficients are statistically significant at the 1% level. Moreover, the p-value of the J-test statistic suggests that the overidentifying restrictions (instruments) are valid. The estimated number of factors in the first and second stages equals 1 in both cases; that is, m x = m y = 1.
xtivdfreg also reports the fraction of the variance of u it that is explained by the factor component, denoted as rho. Because the value of rho is roughly equal to 3/4 in the present sample, it appears that most of the variation in the composite error term is due to the single unobserved factor, conditional on bank-specific and time-specific effects. Therefore, estimators that fail to control for common shocks are likely to be severely biased.
The estimated autoregressive coefficient equals about 0.373, which suggests medium persistence in the CAR time series. The estimated coefficient of size is highly negative, so it is consistent with the "too-big-to-fail hypothesis", providing evidence of moral hazard-type behavior of banking institutions. Profitability (ROA) appears to have a positive effect on capital adequacy, which is in line with Keeley (1990). The positive estimate for β 3 shows that lower levels of bank asset liquidity (that is, higher values of liquidity) lead to an increase in capital reserves, all other things being equal. This implies that banking institutions suffering from a liquidity crunch tend to respond by raising their equity.
Finally, note that xtivdfreg reports an estimate of a constant term (intercept). This is obtained as the mean of the residuals in a separate step after computing the slope coefficients. 14 Whether a constant term is estimated has no effect on the computation of the slope coefficients because the latter are computed for the demeaned model with or without the absorption of fixed effects. The standard error of the constant term is computed with the influence-function approach of Kripfganz and Schwarz (2019).
Next we fit the same model, except that the slope coefficients are allowed to be heterogeneous: u it has the same structure as before. This regression is computed by adding the option mg. The results correspond to the MGIV estimator defined in (11): . xtivdfreg l(0/1).CAR size ROA liquidity, absorb(id t) > iv(size ROA liquidity, lags(2)) factmax (3)  As we can see, the estimated coefficients are similar to those obtained from the model that pools the data and imposes slope parameter homogeneity. This is not surprising, because otherwise failure to account for slope parameter heterogeneity would invalidate 14. For the model with heterogeneous slopes, the intercept is also treated as heterogeneous.
the overidentifying restrictions, thus likely leading to a rejection of the null hypothesis for the J statistic. Thus, conditional on common factors, bank-specific and time-specific effects, slope parameter heterogeneity does not appear to be relevant in the present sample.
In what follows, we examine alternative specifications for xtivdfreg and use other estimators. For exposition, table 1 below includes the results for the previous two baseline specifications (columns 1-2). notes: Standard errors in parentheses. * p < 0.10, * * p < 0.05, * * * p < 0.01 Columns 3-5 illustrate examples of IV estimators that allow for a more flexible specification of instruments than the baseline regression. In particular, column 3 shows results for a second-stage IV estimator that involves dropping ROA from the set of instruments and using an external variable instead, namely, ROE. 15 . xtivdfreg l(0/1).CAR size ROA liquidity, absorb(id t) > iv(size ROE liquidity, lags(2)) factmax(3) (output omitted ) The results in column 3 are similar to the baseline specification in column 1, except for the coefficient of ROA, which is statistically different at the 5% level. Note also that in this case the J-test statistic rejects the null hypothesis because the p-value equals 0.020. This implies that ROE may not form a valid instrument.
15. ROE stands for the "return on equity", defined as annualized net income expressed as a percentage of total equity on a consolidated basis. ROE represents an alternative measure of bank profitability.
Column 4 corresponds to a second-stage IV estimator that we can compute by typing . xtivdfreg l(0/1).CAR size ROA liquidity, absorb(id t) > iv(size ROA, lags(2) factmax(3)) iv(liquidity, lags(1) factmax (2)) (output omitted ) In this specification, {size, ROA} are defactored based on a common set of factors estimated jointly, whereas liquidity is defactored separately, based on its own estimated factors. Such an instrumentation strategy can be particularly useful under three circumstances: first, when size and ROA are driven by entirely different factors than liquidity; second, when size and ROA have a different number of factors than liquidity; and third, when different lags of the covariates are used as instruments. Column 5 corresponds to the same specification as in column 4, although it refers to its MGIV version: . xtivdfreg l(0/1).CAR size ROA liquidity, absorb(id t) > iv(size ROA, lags(2) factmax (3)) iv(liquidity, lags(1) factmax (2)) mg (output omitted ) As we can see, the output of columns 4-5 is similar to that reported in columns 1-2, respectively. Therefore, the estimates appear to be fairly robust to different choices of instruments.
In terms of the notation used in the article, the choice of instruments corresponding to columns 4-5 is given by Remark 7. For the MGIV estimator, although the matrix Z i above is formulated by defactoring X (1,2) i and X (3) i separately, the empirical projection matrix M Fx used to defactor the entire model 16 is computed by extracting factors jointly from the matrix of all covariates; that is, In practice, users can avoid extracting factors jointly from the matrix of all covariates. For motivation, suppose that X (3) i were a binary regressor that is not subject to a common factor structure. In that case, one may wish to 1) instrument X . xtivdfreg l(0/1).CAR size ROA liquidity, absorb(id t) > iv(size ROA, lags(2) factmax(3)) > iv(liquidity, lags(0) factmax (0)  Defactoring of X The columns in table 2 report results for several alternative popular estimators. To begin with, columns 1-2 correspond to the standard fixed-effects and 2SLS estimators, both of which accommodate a two-way error-components model, but they do not allow for common shocks: . xtreg l(0/1).CAR size ROA liquidity i.t, fe vce(cluster id) (output omitted ) . ivregress 2sls CAR size ROA liquidity (L.CAR = l(0/2).(size ROA liquidity)) > i.id i.t, vce(cluster id) (output omitted ) It is apparent that the estimated coefficients differ substantially compared with those obtained based on the 2SIV approach. In particular, the autoregressive coefficient appears to be biased upward, and-for the case of 2SLS (column 2)-the standard error of the estimate is much larger compared with the second-stage IV (see column 1 of table 1). On the other hand, the coefficients of ROA and liquidity seem to be biased in the opposite direction. Moreover, in three out of four cases, these coefficients are not statistically significant. This outcome is indicative of the importance of controlling for common shocks in the present example.
Column 3 reproduces the results of 2SLS using the xtivdfreg command. This is achieved by setting the number of factors equal to 0 and requesting the first-stage estimator: . xtivdfreg l(0/1).CAR size ROA liquidity, absorb(id t) > iv(size ROA liquidity, lags(2)) factmax(0) fstage (output omitted ) Thus, the popular 2SLS estimator can be viewed as a special case of the 2SIV approach and arises by imposing zero number of factors (that is, setting factmax(0)) and fitting the model in a single stage (fstage). Column 4 yields 2SLS-type results for a model with heterogeneous slopes. Note that this option is not allowed in ivregress.
. xtdcce2 CAR L.CAR size ROA liquidity, > crosssectional(CAR L.CAR size ROA liquidity) > cr_lags(2) pooled(L.CAR size ROA liquidity) pooledvce(nw) (output omitted ) . xtdcce2 CAR L.CAR size ROA liquidity, > crosssectional(CAR L.CAR size ROA liquidity) > cr_lags(2) (output omitted ) As we can see, the estimates of CCEP are smaller than those obtained by the secondstage IV estimator (columns 1 and 4 of table 1), and the differences are statistically significant. On the other hand, the estimates of CCEMG are fairly close to those of the MGIV estimator in most cases. The main exception is the coefficient of size, which appears to be much smaller and less precise for CCEMG.

Example 2: Estimation of cross-country production functions
To illustrate additional features of the xtivdfreg command for unbalanced panels, we use the macropanel dataset of Eberhardt and Teal (2010) for estimating cross-country production functions in the manufacturing sector. The dataset contains observations on 48 developing and developed countries during the period 1970 to 2002. These data are available as an ancillary file for the xtmg package, developed by Eberhardt (2012): . use http://www.stata-journal.com/software/sj12-1/st0246/manu_prod (Manufacturing productivity analysis (1970-2002)) Following Eberhardt and Teal (2010), we focus on the following model, which imposes constant returns to scale: The dependent and independent variables denote the log value added per worker and the log capital stock per worker, respectively, for i = 1, . . . , 48, with each country observed over T i observations.
We start by running the xtivdfreg command using two lags of the covariates as defactored instruments and up to a maximum of three factors. We use three instruments, and the degree of overidentification equals two. We control for bank-specific and timespecific effects by eliminating them prior to estimation. This baseline regression is computed by typing . xtivdfreg ly lk, absorb(list year) iv(lk, lags (2) Because the panel is unbalanced, the xtivdfreg command estimates the factors based on the iterative procedure described in section 2.3. The first line of dots reports the number of iterations required to estimate the factors in ln(K/L). The second (third) line of dots reports the number of iterations required to estimate the number of factors in the lagged (second lagged) value of ln(K/L). Finally, the last line corresponds to factor estimation from the first-stage residuals, which is relevant for the 2SIV estimator. In all cases, three iterations turn out to be sufficient for convergence.
The estimated coefficient of ln(K/L) is approximately equal to 0.5 and is statistically significant. The p-value of the J-test statistic indicates that the overidentifying restrictions are supported by the data. Moreover, m x = m y = 1, whereas the fraction of the variance of u it that is explained by the factor component appears to be around 0.3.
Note that the number of lines of dots not only is a function of the number of lags used as instruments but also depends on whether factors are extracted jointly or individually for each regressor separately. For illustration, consider a similar model as in (13) but without imposing constant returns to scale: ln (Y it ) = β 1 ln (L it ) + β 2 ln (K it ) + u it We specify the iv() option twice, one for each individual regressor. This yields . xtivdfreg lY lL lK, absorb(list year) iv(lL, lags(2)) iv(lK, lags (2) This time, the number of dotted lines corresponding to factor estimation from the covariates has doubled. This is because the factors are extracted separately for ln (L) and ln (K), and therefore the algorithm performs twice the number of iteration loops.
MGIV estimation of the baseline regression in (13) is computed by typing . xtivdfreg ly lk, absorb(list year) iv(lk, lags(2)) factmax (3)  As we can see, the estimate of the coefficient of ln(K/L) is similar to that of the homogeneous model. 17 For further analysis using this example, see Eberhardt and Teal (2010).

Conclusion
xtivdfreg is useful for estimating large panel-data models with unobserved common factors or interactive effects. The slope coefficients can be either homogeneous or heterogeneous. The command accommodates a flexible specification of instruments and incorporates the two-way error-components model as a special case. Results obtained from the popular ivregress command can be reproduced using xtivdfreg by imposing zero factors.

Programs and supplemental materials
To install a snapshot of the corresponding software files as they existed at the time of publication of this article, type . net sj 21-3 . net install st0650 (to install program files, if available) . net get st0650 (to install ancillary files, if available) To update the xtivdfreg package to the latest version, type . net install xtivdfreg, from("http://www.kripfganz.de/stata/") replace 8 References