Generalized Linear Factor Score Regression: A Comparison of Four Methods

Factor score regression has recently received growing interest as an alternative for structural equation modeling. However, many applications are left without guidance because of the focus on normally distributed outcomes in the literature. We perform a simulation study to examine how a selection of factor scoring methods compare when estimating regression coefficients in generalized linear factor score regression. The current study evaluates the regression method and the correlation-preserving method as well as two sum score methods in ordinary, logistic, and Poisson factor score regression. Our results show that scoring method performance can differ notably across the considered regression models. In addition, the results indicate that the choice of scoring method can substantially influence research conclusions. The regression method generally performs the best in terms of coefficient and standard error bias, accuracy, and empirical Type I error rates. Moreover, the regression method and the correlation-preserving method mostly outperform the sum score methods.


Introduction
In the social and psychological sciences, interest often lies in modeling relationships that involve factors. Factors are latent variables that can be inferred by the use of manifest variables, for example, questionnaire items. Common examples of factors are the Big Five personality traits, quality of life, and socioeconomic status. Structural equation modeling (SEM) is a popular tool for evaluating factor structures. The strength of SEM is its ability to consistently estimate latent variable relationships. However, because of simultaneous estimation, SEM is sensitive to bias from model misspecification and may require large samples. These drawbacks have motivated the development of alternative approaches. One such alternative is factor score regression (FSR), which has recently seen growing research interest (Hayes & Usami, 2020).
Unlike SEM, FSR is a stepwise approach to estimating factor structures. First, relationships between manifest variables and factors are estimated in a measurement model. Factors are then related to a latent or observed outcome in a regression model. Numerical values that serve as input to the regression model must be assigned to the factors. Factor score estimates are calculated for this purpose, indicating the estimated relative position of an observation on the factors (Grice, 2007). Estimated factor scores can be obtained in mainstream statistical software as they are routinely used in scale construction and as input to other statistical methods.
A sometimes overlooked fact, which is known as factor score indeterminacy, is that theoretical factor scores are not unique. In short, infinitely many sets of factor scores are consistent with a given measurement model because it is algebraically underdetermined. Various factor scoring methods, which handle the indeterminate part of factor scores differently, have been proposed. Unfortunately, recommendations on the choice of scoring method when estimating FSR models are still limited.
Another limitation of the FSR literature is the exclusive consideration of normally distributed outcomes. This is problematic because applications with other outcome types are left without theoretical guidance and justification. The generalized linear modeling framework can account for many outcome types encountered in practice. Results on the performance of scoring methods in generalized linear FSR (GLFSR) have not yet been presented in the literature.
This article compares the performance of four factor scoring methods 1 when estimating GLFSR models. Focus is restricted to the case of normally distributed manifest variables, factor score estimates as independent variables, and an observed outcome that is either continuous (normally distributed), binary, or a count variable. In contrast to previous research, both homogeneous and heterogeneous loading structures as well as different degrees of interfactor correlation are considered. A selection of frequently applied scoring methods are evaluated, 2 namely the regression method (Thomson, 1934;Thurstone, 1935), the correlation-preserving method by Ten Berge et al. (1999), and the total sum method and the weighted sum method described in, for example, Distefano et al. (2009). The performance of the correlation-preserving method and the sum score methods have received only limited attention in the literature on FSR methods, warranting an explicit evaluation in the current study. Evaluations are done by performing a meta-analysis on simulated data.
The rest of the article is organized as follows. First, we define the measurement model, factor scoring methods, and the regression model. Second, the simulation study design is delineated. Third, a presentation of the simulation results is provided. A discussion of the results concludes the article.

Factor Measurement
The Measurement Model Confirmatory factor analysis can be used to define factors through a set of manifest variables. The measurement model is defined in this article as where x is a vector of p standardized manifest variables, F is a vector of k factors, and L is a p3k matrix of loading coefficients. d is assumed to be a vector of p uncorrelated measurement errors. Moreover, factors and measurement errors are independent and normally distributed such that F;N k (0, F) and d;N p (0, Θ). For convenience, factor variances are fixed at unity so that F is a correlation matrix. Following Yang-Wallentin et al. (2010), the measurement error covariance matrix is defined as Θ = I À diag½LFL 0 when standardized manifest variables are analyzed, where I denotes the identity matrix. It follows from the measurement model set-up that x;N p (0, S = LFL 0 + Θ), where S is a correlation matrix. A major motivation for analyzing a completely standardized measurement model is that this enables the interpretation of each loading as the correlation coefficient between a given manifest variable and factor, 3 assuming no cross-loadings. Because factor score indeterminacy is a function of the loadings, the restriction of each loading coefficient to the interval ½À1, 1 also facilitates the control of its extent. The matrices S, L, and F play a central role in factor score estimation and need to be estimated. S can be estimated by the sample correlation matrix. Estimates of L and F are obtained in the estimation of Equation 1. It is well-known that using maximum likelihood (ML) estimation to apply a covariance structure to a sample correlation matrix can produce biased model parameter estimates (Cudeck, 1989). An alternative approach suggested by Jöreskog et al. (2016) is to use unweighted least squares (ULS) estimation when S is a correlation matrix. It has been shown (Yang-Wallentin et al., 2010) that the ML and ULS discrepancy functions can be written on the form The vector r contains elements from the sample correlation matrixŜ. The vector s contains elements from the population correlation matrix disregarding measurement error covariances, that is, LFL 0 = S À Θ. In ULS estimation, r and s contain the lower diagonal elements of the corresponding matrices. Diagonal elements are included as well in ML estimation, which can produce misleading estimates because the diagonal elements of LFL 0 are not fixed at unity. The weight matrix W Ã is also defined differently in ULS and ML estimation, which influences the minimization of the discrepancy function and thus the parameter estimates. W Ã is more complex in ML estimation. To see this, let K Ã be a matrix such that left-multiplying r in ML estimation with K Ã yields r in ULS estimation. Also let D Ã be the duplication matrix and be the Kronecker product operator. Then, in ML estimation whereas W Ã is simply the identity matrix in ULS estimation.

Factor Score Indeterminacy
The measurement model is underdetermined because the number of factors and measurement errors exceeds the number of manifest variables. To see how this affects the assignment of numerical values to factors, define the block matrix in line with Rigdon et al. (2019a), and note that When rank(S Ã ) . rank(S), Guttman (1955) has shown that Equation 3 can be solved for F to obtain Factor scores in this case consist of a determinate part FL 0 S À1 x and an indeterminate part Bg, where B is a parameter matrix and g is a vector containing k orthonormal variables connected to the factors. The variables in g can be altered to generate infinitely many sets of factor scores consistent with the measurement model parameters, holding the determinate component constant, which gives rise to factor score indeterminacy.
Higher degrees of factor score indeterminacy allow for larger possible differences among sets of ''true'' factor scores consistent with Equation 4 (Guttman, 1955). Thus, in settings with highly indeterminate true factor scores, the validity of factor score estimation can be put into question. While factor score indeterminacy, which depends on, for example, overall loading levels (Rigdon et al., 2019a) and interfactor correlations (Rigdon et al., 2019b), does not directly influence the estimation of parameters within the measurement model, high degrees of this phenomenon translate into high degrees of indeterminacy in correlations between factors and variables outside the measurement model (Steiger, 1979). Therefore, factor score indeterminacy can severely affect conclusions about the relationship between factors and an external outcome variable.

The Regression Method
Let E be the expectation operator. Krijnen et al. (1996) have demonstrated that the regression score estimatorF reg minimizes the trace and determinant of the mean square error matrix E½(F À F)(F À F) 0 .F reg is the best linear predictor of F and can be written (Skrondal & Laake, 2001) aŝ F reg maximizes the correlation between factors and the corresponding factor score estimates, that is, diag½Corr(F,F reg ) = diag½FL 0 S À1 LF 1 2 = r. However, the covariance matrix ofF reg is not equal to F, which can be seen through an application of the Woodbury matrix identity (Duncan, 1944).
. Larger values of (I + L 0 Θ À1 LF) À1 , which occur when the overall communality level increases, decrease the bias in the covariance matrix ofF reg . This result should not be surprising becauseF reg equals only the determinate part of factor scores and the indeterminate part grows smaller when communalities increase. Skrondal and Laake (2001) showed that the regression score estimator produces asymptotically unbiased regression coefficient estimates when the outcome variable is observed. However, Devlieger et al. (2016) found that this result is valid only when using a reference-variable approach to assign factor variance scales. The simulations by Lastovicka and Thamodaran (1991), which used the unit-variance approach, support this claim. The authors found biases in estimated regression coefficients, which grew smaller when the overall variance explained in manifest variables grew larger. Thus,F reg is expected to be biased in this article, but the bias should decrease when manifest variables have higher correlations on average with factors.

The Correlation-Preserving Method
The correlation-preserving score estimatorF corr minimizes detfE½(F À F)(F À F) 0 g subject to the constraint E(F corrF 0 corr ) = F (Krijnen et al., 1996). Thus,F corr differs fromF reg in that it is asymptotically correlation-preserving, at the cost of not maximizing its correlation with F. The correlation-preserving score estimator can be defined (Ten Berge et al., 1999) aŝ Other correlation-preserving score estimators exist in the literature. For example, Anderson and Rubin (1956) developed a score estimator for orthogonal factors, which was generalized to an arbitrary F by McDonald (1981). A comparison with Jöreskog (2000) shows that the latent variable score estimator in LISREL is equivalent to the McDonald score estimator. Moreover, this score estimator coincides withF corr when Θ is nonsingular (Ten Berge et al., 1999). Thus,F corr contains the score estimator in McDonald (1981) and LISREL as a special case. The literature is mainly focused onF corr with E(F corrF 0 corr ) = I, that is, the Anderson-Rubin score method. Lastovicka and Thamodaran (1991) found that the Anderson-Rubin method is associated with less biased coefficient estimates than the regression method in FSR, including situations with correlated factors. However, it is still unknown how the correlation-preserving method and the regression method compare whenF corr is not restricted to produce factor score estimates that asymptotically reflect uncorrelated factors.

Sum Score Methods
Sum score methods have been understudied in the FSR literature, partly because these methods are less sophisticated approaches and are expected to have poorer performance than other scoring methods. Sum scoring is regularly encountered in practice because of its simplicity, which motivates an explicit comparison of this class of factor scoring methods together with more sophisticated approaches.
The total sum method and the weighted sum method are considered in this article. The total sum score estimator is defined aŝ L Ã equals the loading matrix L after nonzero elements have been replaced with either À1 or 1 depending on their original signs (Beauducel & Hilger, 2019). If all manifest variables have the same polarity, then the estimated factor score for an observation on a given factor is simply the sum of observed values on the manifest variables correlated with the factor. A drawback ofF tot is that all relevant manifest variables have equal weights. The weighted sum methodF wei addresses this issue by considering L instead of L Ã (Distefano et al., 2009). Weights then depend on the strength of linear relationship between manifest variables and factors.F wei is defined aŝ Note that one expects the performance ofF wei to approach that ofF tot when communalities grow larger. In comparison with the regression method and the correlationpreserving method,F tot andF wei do not maximize correlations with F nor preserve interfactor correlations.

The Regression Model
Suppose that a set of factors F is related to an n31 outcome vector y by the k31 vector of regression weights g. n here denotes the sample size. The n3k matrix of factor score estimatesF replaces F in GLFSR because F is unobserved.F is related to y by the k31 vector of regression weights g Ã , whose elements depend on which factor scoring method is used to calculateF. Defining the conditional outcome E(yjF)[m y , the GLFSR model can be written 4 as The link function g enables the modeling of a variety of outcome distributions. Conditional on the explanatory variables, each outcome realization is assumed to be independent and from an exponential dispersion family distribution (Agresti, 2015). Ordinary FSR based on normally distributed outcomes is a special case of GLFSR, where the identity link g(m y ) = m y is applied (Madsen & Thyregod, 2010). Other common link functions are the logit link g(m y ) = log m y =(1 À m y ) h i for logistic regression and the log link g(m y ) = log m y for Poisson regression.
Maximum likelihood estimation can be applied to estimate g Ã . Let L(g Ã ) be the likelihood function and define l(g Ã ) = log L(g Ã ). The maximum likelihood estimator maximizes l(g Ã ) or equivalently L(g Ã ). This occurs whenĝ Ã is the solution (Agresti, 2015) to where D = diag½ ∂m 1 ∂h 1 , ∂m 2 ∂h 2 , . . . , ∂m n ∂h n and V = diag½Var(y 1 ), Var(y 2 ), . . . , Var(y n ).ĝ Ã is found by iterative algorithms such as the Newton-Raphson or Fisher scoring method because Equation 11 generally has no closed-form solution. The Fisher scoring method uses the expected information matrix I (g Ã ) to update the parameter estimator until convergence is reached, that is, when the difference between the current and previous update is sufficiently small (Stroup, 2013). Under regularity conditions, 5 it has been shown (Madsen & Thyregod, 2010) that Thus,ĝ Ã is an asymptotically biased estimator of g when g Ã 6 ¼ g. However, the practical importance of finite-sample biases in coefficient and standard error estimates has yet to be examined in GLFSR.

Simulation Parameters
A simulation study was performed to compare the factor scoring methods in GLFSR.
The simulation parameters are presented in Table 1. A total of 1,000 replications were made for each specific condition. A total of 3,780 conditions were considered in this article.
The Loading-Generating Distribution. Loading coefficients are often assumed to be equal in the literature. This assumption is unrealistic in many research settings. We adopted a new approach to defining the loading-generating distribution. Loading coefficients were in each replication drawn from a uniform distribution with endpoints depending on a condition-specific mean and standard deviation. 6 Loading coefficients were ensured to fall into a limited section of the interval ½0:244, 0:956. We thus eliminated the issues of drawing unequally likely or impossible loadings, which were prevalent in some previous studies. Mean loading levels were selected from Koran (2020), ranging from relatively low (m l = 0:40) to relatively high (m l = 0:80). Both equal (s l = 0) and unequal (s l . 0) loading coefficients were considered.
Interfactor Correlation. A two-factor measurement model, which is the simplest multifactor measurement model, was assumed. We considered both uncorrelated (f 12 = 0) and correlated (f 12 6 ¼ 0) factors.
Factor Scoring Method and Regression Model. The factor scoring methods mentioned in the Introduction were considered. The GLFSR models in the simulation study corresponded to Equation 10. The models had an observed outcome and factor score estimates, based on factors with normally distributed manifest variables, as independent variables.

Data-Generating Process
Figure 1 displays a path diagram of the simulation measurement model. Two factors, f 1 and f 2 , were analyzed. The first four out of eight manifest variables (x 1 , x 2 , . . . , x 8 ) were related to f 1 and the rest to f 2 by the loading coefficients (l 11 , l 21 , l 31 , l 41 ) and (l 52 , l 62 , l 72 , l 82 ). Factors were related by the interfactor correlation coefficient f 12 .
For each unique combination of the simulation parameter values, data were generated in the following steps in R (R Development Core Team, 2019): 1. True factor scores were generated by drawing n observations from a N 2 (0, F) distribution. Loading coefficients were sampled from a U (a, b) distribution.
Manifest variable data were generated as linear combinations of L, F, and d in line with Equation 1. Free measurement model parameters were estimated using unweighted least squares estimation with robust standard errors in the lavaan package (Rosseel, 2012). 2. Factor score estimates were calculated. True factor scores were used to generate outcome values, where the true regression model had coefficients g = (1, 2). g was estimated with all combinations of regression model and factor scoring method using base R. Last, nonconvergence information as well as coefficient and standard error estimates were extracted.
The inverse link of the linear predictor was used to generate outcome values (Carsey & Harden, 2013). Let f 1i and f 2i , i = 1, 2, . . . , n, be individual factor scores on the first and second factor. A conditional variance of 10 for each outcome realization y i was arbitrarily selected such that y i jf 1i , f 2i ;N (f 1i + 2f 2i , 10) in ordinary FSR.

Performance Assessment
Data were evaluated by the use of five performance criteria listed in Table 2. There were no evident reasons for results to differ between the factors in Figure 1. Thus, focus was restricted to the first factor. Data were trimmed by discarding cases with the upper and lower 0.50 percent ofĝ Ã 1 values in each unique condition to reduce the negative influence of outliers.
Given the recommendations by Skrondal (2000), meta-analysis models were formulated to relate the simulation parameters in Table 1 to the criteria. Separate analyses of variance (ANOVA) with first-order interactions were performed in line with Devlieger et al. (2016). In contrast to the majority of previous FSR studies, we examined practical significance in terms of standardized effect size as preferred by Boomsma (2013).
First, Type III sum of squares decompositions were evaluated in terms of effect size. Second, the most informative relationships were analyzed in relation to established guidelines. The partial omega-squaredv 2 p was selected as effect size measure because it is a less biased, population-based, version of the conventional partial etasquared (Lakens, 2013).v 2 p can be interpreted as the estimated proportion of the total variance in the dependent variable explained by a given independent variable when the influence of other independent variables has been eliminated.v 2 p for a given independent variable is defined in a fixed-effects design aŝ where df effect denotes the degrees of freedom, MSS effect is the mean sum of squares for the independent variable, N = 3, 780 is the number of observations, and MSS error is the mean error sum of squares (Olejnik & Algina, 2003). Bootstrapped 95% confidence intervals were reported forv 2 p . In lack of a systematic evaluation of effect sizes in previous FSR research, confidence intervals forv 2 p that covered any value in the interval ½0:14, 1:00 were deemed to signify practically important effects.v 2 p . 0:14 suggests a large effect according to Cohen (2013).
Nonconvergence Rates. The indicator function I(A c ) equaled 1 if estimates of S, F, or Θ were not positive definite, if jf 12 j . 1, if loading estimates were missing, or if regression coefficient and standard error estimates were missing. We expected nonconvergence rates to be lower when more observations were analyzed, manifest variables had stronger correlations on average with factors, and factors were weakly correlated. In addition, factor scoring methods were expected to have approximately the same proportion of nonconvergent replications based on the findings by Devlieger et al. (2016). Nonconvergence rates exceeding 10% were considered problematic in this article.
Bias. The relative bias compares estimated regression coefficients with the true coefficient value 1. The relative standard error bias compares the mean estimated standard error with the empirical standard deviation acting as an estimate of the theoretical standard error. Regression coefficients and standard errors were Empirical Type I error rate (%)â = 100 C P C c = 1 I(p c \0:05) Note. C tot = replications per condition. C = convergent replications per condition after data trimming.â = empirical Type I error rate.
expected to mostly be underestimated based on previous simulation studies, for example, Skrondal and Laake (2001) and Devlieger et al. (2016). Moreover, biases were expected to generally decrease when the mean loading level grew higher. The sum score methods were expected to mainly be associated with substantial biases because they utilize little model information. We also expected bias levels for the regression method and the correlation-preserving method to converge and approach zero for higher mean loading levels. Following Hoogland and Boomsma (1998), absolute values of coefficient and standard error biases less than 5% and 10% were considered acceptable.
Root Mean Square Error (RMSE). The RMSE can be considered an indicator of accuracy because it depends on both estimator bias and variance. A lower RMSE indicates higher accuracy. We expected higher accuracy for higher mean loading levels.
In addition, RMSE values for the regression method and the correlation-preserving method were expected to decrease when analyzing larger samples.
Empirical Type I Error Rates (â). The indicator function I(p c \0:05) equaled 1 if the estimated p value for a two-sided t test of H 0 : g Ã 1 = 1 was less than 5%. Pronounced over-rejections were expected to occur, for example, when manifest variables were weakly correlated with factors. Furthermore, we expected the increased rejection power from larger samples to produce higherâ levels when biases were present. Empirical Type I error rates outside of the interval ½0:025, 0:075 were deemed problematic in line with the ''liberal'' criterion suggested by Bradley (1978).

Results
Simulations resulted in 3,780,000 replications, of which about 3.60% were nonconvergent. A total of 3,606,180 replications were available for meta-analysis after the data trimming described in the previous section. Table 3 presents descriptive statistics for the simulated data.
All regression models were possible to estimate given a properly estimated measurement model. This is in line with the fact that nonconvergence statistics were equal for all regression models and factor scoring methods. Coefficients and standard errors were indicated to be generally underestimated. Root mean square error statistics were similar across the regression models, indicating similar accuracy in ordinary, logistic, and Poisson FSR. In addition, mean empirical Type I error rates generally fell outside of the acceptance interval, suggesting pronounced null hypothesis over-rejections in many cases. Table 4 displays sum of squares decompositions for the meta-analyses. The volatility of loading coefficients (s l ) had a limited impact on the performance criteria. Estimated location effects 7 for s l were marginal, indicating no notable difference in performance criteria when equal or unequal loading coefficients were considered.  Note. Negative 95% confidence interval endpoints forv 2 p have been replaced with '' 2''. Non-conv. = nonconvergence; RB = relative bias; RSEB = relative standard error bias; RMSE = root mean square error.â = empirical Type I error rate.

Nonconvergence Rates
Only simulation parameters related to the measurement model influenced nonconvergence rates according to the meta-analysis. Nonconvergence rates averaged over loading volatility levels, regression models, and factor scoring methods are displayed in Figure 2, including a horizontal line marking the 10% acceptance threshold.
The meta-analysis in Table 4 suggested an important interaction between the mean loading level, sample size, and interfactor correlation. In particular, sample size and interfactor correlation had notable impacts only when manifest variables were weakly correlated with factors on average (m l = 0:40). Nonconvergence rates were less than 10% in all cases when manifest variables had relatively high mean correlations with factors (m l = ½0:60, 0:80).
Nonconvergence rates were the most problematic when manifest variables on average weakly indicated the factors (m l = 0:40) and factors were highly correlated (jf 12 j = 0:80). In these cases, at least 500 observations were needed to keep nonconvergence rates below 10%. Samples of at least 100 observations mostly sufficed when m l = 0:40 and jf 12 j\0:80.

Relative Bias
The meta-analysis in Table 4 suggested similar relative bias (RB) levels across the regression models. However, comparing logistic and Poisson to ordinary FSR, RB levels were in some cases shifted downward to an extent that influenced bias acceptability. RB values averaged over loading volatility levels are thus displayed for each regression model in Figures 3, 4, and 5, including 5% acceptance intervals. In line with our expectations, coefficients were generally underestimated. The mean loading level had an impact on biases according to the meta-analysis in Table  4, but it depended on the specific factor scoring method. RB values for the regression method and the correlation-preserving method converged and approached zero for higher mean loading levels. In contrast, bias decreases were marginal for the total sum method. RB values for the weighted sum method in fact increased for higher mean loading levels. Biases were almost always severe for the sum score methods, which therefore were discarded from further RB analysis.
Coefficients were often less biased with the regression method than the correlation-preserving method. Compared with f 12 = 0, RB levels for the correlationpreserving method shifted downward when f 12 \0 and upward when f 12 . 0. Because RB values for the regression method did not follow this pattern, considerable differences between the two methods arose when factors were negatively correlated.
The correlation-preserving method was mostly associated with unacceptable RB levels that were less severe when manifest variables had relatively high mean correlations with factors (m l = ½0:60, 0:80) and factors were positively correlated. In ordinary FSR, the regression method performed acceptably in most cases when at least 500 observations were analyzed. In addition, smaller sample sizes could often provide acceptable RB levels for the regression method when there was a relatively high mean loading level (m l = ½0:60, 0:80) and f 12 . 0. In logistic FSR, no factor scoring method had a consistently acceptable bias level. In Poisson FSR, RB values for the regression method were within or close to the edges of the acceptance interval when

Relative Standard Error Bias
Figures 6, 7, and 8 display relative standard error bias (RSEB) values averaged over levels of loading variance for each regression model, including 10% acceptance intervals. The meta-analysis in Table 4 suggested that substantial RSEB differences exist across regression models. Estimated standard errors were severely biased in all cases in Poisson FSR. RSEB values were mostly less than 260% in this case. In contrast, absolute RSEB values were mainly acceptable in ordinary and logistic FSR. For these two regression models, problematic RSEB values were found primarily in situations where manifest variables were weakly correlated with factors on average (m l = 0:40) and factors were correlated. Contrary to our expectations, no substantial differences in RSEB values were found between the sum score methods and the other methods.

Root Mean Square Error
The meta-analysis in Table 4 on root mean square error (RMSE) values indicated no noteworthy differences across the regression models. RMSE values averaged over loading volatility levels and regression models are displayed in Figure 9. As expected, the regression method and the correlation-preserving method displayed higher accuracy in larger samples. In contrast, the sum score methods were, in large, unaffected by sample size.
The regression method and the correlation-preserving method mostly outperformed the sum score methods in terms of accuracy. In addition, the regression method generally had the highest accuracy. The accuracy of the regression method Figure 6. Relative standard error bias, ordinary factor score regression. and the correlation-preserving method increased when the mean loading level grew larger and factors were weakly correlated. Moreover, the negative influence of interfactor correlations on the accuracy of the regression method and the correlationpreserving method decreased notably when the mean loading level increased.  Accuracy differences between the regression method and the correlation-preserving method were, in large, negligible when manifest variables had relatively high mean correlations with factors (m l = ½0:60, 0:80) and factors were positively correlated.

Empirical Type I Error Rates
Empirical Type I error rates (â) averaged over loading volatility levels are displayed in Figures 10, 11, and 12, including acceptance intervals covering values from 2.50% to 7.50%. The meta-analysis in Table 4 suggested differences inâ patterns across the regression models. Poisson FSR stood out with severely inflated Type I error rates in all cases, caused in part by severe standard error underestimations.
The sum score methods had more problematicâ patterns than the other methods. For instance,â values for the total sum method were almost always at 100%. The sum score methods were discarded from furtherâ analysis due to their problematic performance.
The meta-analysis in Table 4 indicated a noteworthy impact of the mean loading level on empirical Type I error rates. This was reflected inâ levels for the regression method and the correlation-preserving method in ordinary and logistic FSR, where over-rejections were the most severe when manifest variables were on average weakly correlated with factors. Moreover, in line with the relative bias results,â levels were more problematic for the correlation-preserving method when factors were negatively correlated. The meta-analysis also confirmed the expected impact of sample size on empirical Type I error rates. Focusing on ordinary and logistic FSR, larger samples were associated with higherâ values for the regression method and the correlation-preserving method when coefficient biases were present. Increases in a levels when analyzing larger samples were greater for the correlation-preserving method than the regression method. Figure 11. Type I error rates, logistic factor score regression. Figure 10. Type I error rates, ordinary factor score regression.
â levels in logistic FSR were mainly unacceptable. However, in ordinary FSR, the regression method mostly displayed acceptableâ levels when manifest variables had relatively high mean correlations with factors (m l = ½0:60, 0:80). In contrast, the correlation-preserving method had unacceptableâ levels in the majority of cases. As expected from the relative bias results, differences inâ levels between the regression method and the correlation-preserving method were negligible when the mean loading level was relatively high (m l = ½0:60, 0:80) and f 12 . 0.

Discussion
In this article, we examined the performance of a selection of factor scoring methods when estimating regression coefficients in GLFSR. Our simulations indicate that relationships between factors and an observed outcome are recovered with approximately the same accuracy in ordinary, logistic, and Poisson FSR. However, levels of coefficient and standard error bias associated with each factor scoring method differed substantially across regression models in some cases. Therefore, we conclude that ordinary FSR results are not representative for GLFSR as a whole.
Ordinary Factor Score Regression. The results indicate that the regression method has the best performance when analyzing a normally distributed outcome. Given sufficiently large samples, the regression method was associated with mostly acceptable levels of coefficient bias and empirical Type I errors unlike the other factor scoring methods. The lesser bias of the correlation-preserving method in Lastovicka and Thamodaran (1991), compared with the regression method, was consequently not confirmed in most cases. Lastovicka and Thamodaran (1991) used exploratory factor analysis, analyzed correlations with maximum likelihood estimation, and considered other simulation parameters than those in this article, which may in part explain the differing results. We also conclude that standard errors are, in large, acceptably recovered in ordinary FSR. Problematic standard error estimates were found mainly in situations with low mean loading levels and highly correlated factors, which should not reflect the majority of practical settings. Thus, the conclusion by Skrondal and Laake (2001) that standard errors are in general ''way off target'' could not be supported. Differences in results may partly be due to the analysis of latent outcomes, scaling of factor variances with the reference-variable approach, and consideration of different simulation parameters in Skrondal and Laake (2001) compared with this article.
Logistic Factor Score Regression. We conclude that the regression method is superior in logistic FSR. The regression method generally had the least problematic coefficient biases and empirical Type I error rates. Furthermore, standard errors were mostly estimated with an acceptable level of bias. However, practitioners should be aware that estimated relationships are in most cases negatively biased to a problematic extent in logistic FSR.
Poisson Factor Score Regression. The simulations indicate that the regression method has the best overall performance in Poisson FSR. Given sufficiently large samples and high mean loading levels, the regression method was in many cases uniquely associated with acceptable coefficient bias levels. Nonetheless, standard errors were always severely underestimated. Therefore, significance testing should not be trusted in Poisson FSR even when coefficient biases are negligible.
In summary, the regression method is recommended for factor scoring in GLFSR. Still, the correlation-preserving method can be a viable alternative when factors are positively correlated and manifest variables have strong correlations on average with factors. In this case, one may choose to either maximize correlations between estimated and theoretical factor scores with the regression method or preserve interfactor correlations with the correlation-preserving method. However, we emphasize that the correlation-preserving method should not be used when factors are negatively correlated because of severe coefficient biases.
In almost all cases, the sum score methods produced severe coefficient biases that translated into empirical Type I error rates of 100%. Thus, despite their simplicity, this article strongly argues against the use of sum score methods in GLFSR, including the common special case with normally distributed outcomes.
Similar to Devlieger et al. (2016), the proportion of nonconvergent replications was the same for all factor scoring methods and regression models. In addition, we can conclude that nonconvergence issues are uniquely attributed to the measurement model estimation in GLFSR. Nonconvergence issues were eliminated if one analyzed either a large sample or a factor structure with a relatively high mean loading level.
Because nonconvergent replications occurred mostly in situations with small samples and weak manifest variables, these cases should be interpreted with additional care.
In contrast to the majority of previous research, both equal and unequal loading coefficients were considered in this article. An interesting finding was that all previously discussed results held true in both homogeneous and heterogeneous loading structures with regard to the negligible impact of loading volatility on the performance criteria. From the perspective of external validity, we advocate the consideration of unequal loadings to reflect a broader range of loading structures found in practice.
Many aspects of GLFSR are still to be explored. For example, it would be of interest to examine how the results in this article may be influenced by different extents and types of missing data, model misspecification, choosing the reference-variable approach to scaling factors, and analyzing a latent outcome variable that can be either continuous or discrete. Results may differ from our simulation study depending on which scoring method is used for scoring the latent outcome variable and how scoring methods for independent and dependent latent variables are combined, which should be explored in future research. Moreover, it would be of interest to compare GLFSR with recent developments in the structural equation modeling literature to account for manifest and latent variables from exponential dispersion family distributions. However, generalized structural equation modeling is at the moment only available in a limited number of statistical software and with some model and output restrictions still to be addressed, rendering such comparisons difficult to make. In addition, it should be beneficial for practitioners if approaches to potentially offset problematic coefficient and standard error biases in GLFSR were investigated.

Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The research reported in this article has been supported by the Swedish Research Council (VR) under the program ''New Statistical Methods for Latent Variable Models, 2017-01175''.

ORCID iD
Fan Yang-Wallentin https://orcid.org/0000-0001-8293-7304 Notes 1. The Bartlett method has intentionally been excluded. When only independent variables are latent, Devlieger et al. (2016) have demonstrated that Bartlett factor score estimates produce asymptotically biased FSR coefficient estimates, which is not the case for regression score estimates. It should be noted that the authors derived this result using a referencevariable approach to defining factor variance scales.