A Best Linear Empirical Bayes Method for High-Dimensional Covariance Matrix Estimation

Covariance matrix estimation plays a significant role in both in the theory and practice of portfolio analysis and risk management. This paper deals with the available data prior to developing a factor model to enhance covariance matrix estimation. Our work has two main outcomes. First, for a general linear model with unknown prior parameters, a class of best linear empirical Bayes estimators is established through two kinds of architectures to improve the estimation accuracy by utilizing additional data prior. The theoretical results indicate two key points: the proposed estimators are equivalent to the linear minimum mean-square error estimator when complete or sufficient partial data prior are provided; and the proposed estimators perform better than the optimal weighted least squares method, which ignores the data prior in each situation. Second, the proposed estimators are used for calculating a high-dimensional covariance matrix through factor models. The numerical example and the simulation results verify the effectiveness of our methods.


Introduction
Estimation is a problem concerning how to make the best of the information contained in data for the purpose of inferring an unknown quantity.This information includes the prior information and the observable sample data.The two most popular philosophies for estimation are: Classical: the parameter is viewed as an unknown constant (not random) and thus it does not have a (prior, posterior, or marginal) distribution.Bayesian: the parameter is regarded as a random variable with a known prior distribution.
The debate between the two schools has been on-going for many decades.Some of the major criticisms leveled are (Robbins, 1983): The classical philosophy ignores the existence of any prior information, which makes the inference rely entirely on the sample observation data.This could represent a significant waste of information.The Bayesian philosophy forces people to select a prior distribution subjectively and/or arbitrarily and thus the correctness/appropriateness of the inference results is questionable.
The empirical Bayes methodology is a compromise between these two approaches.It assumes that the parameter is a random variable but with an unknown distribution which can be estimated from sample data.In economics, management, and many other disciplines, it is usually difficult for us to obtain any prior knowledge about the parameter, so we usually adopt a classical estimation theory to estimate the parameter, such as the least squares (LS) method.Although it is difficult to obtain prior knowledge of a parameter, we can generally estimate relatively reliable prior knowledge about the data from the sample observation data.Therefore, the empirical Bayes method can be used to deal with this kind of problem.Table 1 contrasts the classical, Bayesian, and empirical Bayes philosophies.This paper is motivated by the work of X. R. Li et al. (2003) which established and improved the weighted least squares (WLS) estimation, the best linear unbiased estimation (BLUE), and their generalized versions with complete, partial, or no prior knowledge of the parameter.In this work, we consider the widely used linear data model with unknown prior knowledge of the parameter and develop a class of best linear empirical Bayesian (BLEB) estimation methods when the data prior is assumed to be complete or incomplete, identified by two kinds of estimation architectures in explicit forms.The proposed method is essentially a type of Bayesian method that can improve the estimation performance since prior information regarding the data is considered, while this information is ignored in the classical estimation theory.
Covariance matrix estimation plays a significant role both in the theory and practice of portfolio analysis and risk management (De Jong, 2018;Ismail & Pham, 2019;Ledoit & Wolf, 2022;Menchero & Ji, 2021;Wang et al., 2021;Xin & Zhao, 2022).The famous mean-variance portfolio optimization theory of Markowitz (1952) indicates that we can create an optimal portfolio if the expected returns, the variance and the covariance of every asset can be estimated accurately.Therefore, we need an effective and accurate covariance matrix estimation method (Clifford & Feng, 2018;Lan et al., 2018;Wang & Xia, 2021).The importance of covariance matrix estimation has led to the emergence of a large number of estimation methods for covariance matrices in the existing literature (Agrawal et al., 2022;Dong & Tse, 2020;Fan & Mincheva, 2011;Harris & Yilmaz, 2010;Jiang et al., 2023;Ledoit & Wolf, 2003;Stein, 1977;Sun & Xu, 2022;Vassallo et al., 2021).However, high-dimensional covariance matrix estimation is challenging in nature and has been widely studied in recent years (Moura et al., 2020;So et al., 2022;Zhu et al., 2021).Moreover, dimensionality disaster is the main problem in high-dimensional covariance matrix estimation.For instance, in optimal portfolio allocation and portfolio risk assessment, the number of stocks, p, is usually of the same order as the sample size, n, which is in the order of hundreds or thousands.In particular, when p = 200, there are more than 10,000 unknown parameters in the covariance matrix to be inferred.However, we may only have roughly n = 260 by using weekly data for the past five years.Therefore, in this case, it is almost impossible to forecast the covariance matrix accurately without using any structure (Fan, 2005).
Multi-factor models have been commonly used theoretically and empirically in economics, finance and management.The well-known arbitrage pricing theory (APT) proposed by Ross (1976Ross ( , 1977) ) shows that the excessive return of assets has a certain relationship with specific factors through a special linear model.In this context, multi-factor models have been widely used and studied, (Aguilar & West, 2000;Alfelt et al., 2022;Bai, 2003;Chamberlain, 1983;Engle & Watson, 1981;Fan et al., 2008).Thanks to these multi-factor models, if several factors can capture the cross-sectional risks completely, the number of parameters to be estimated in the covariance matrix can be reduced significantly (De Nard et al., 2021;X. L. Li et al., 2022).For instance, taking a three-factor model as an example, there are only 4p instead of p p + 1 ð Þ=2 unknown parameters (Fan et al., 2008).
Moreover, when estimating a high-dimensional covariance matrix with a multi-factor model, the most critical aspect is the estimation of factor loadings or factor returns.When all or part of the prior information for the parameter is known, we can use the linear minimum mean-square error (LMMSE) estimation proposed by X. R. Li et al. (2003) to obtain the optimal linear unbiased estimator for the parameter, while prior information for factor loadings or factor returns is rarely impossible to obtain, and thus the classical LS estimator is commonly used for estimation.However, although it is difficult for us to obtain prior information for the parameter, prior information for asset returns can usually be summarized roughly from historical data or experience, so an estimation method that does not consider any prior information wastes a lot of prior information contained in sample data.
Generally speaking, the more abundant the prior information is, the higher the estimation accuracy will be.Therefore, in order to make full use of the available information, we first use the BLEB estimator proposed in this paper to estimate the factor loadings or factor returns, and then calculate the high-dimensional covariance matrix through the factor models.

Linear Estimation With a Linear Data Model
Linear estimation is extremely popular mainly due to its simplicity.There have been many theoretical achievements in linear estimation.Invented by Gauss in 1795, the LS approach is the oldest and one of the simplest methods for classical estimation philosophies.The WLS method treats the parameter as an unknown constant and makes inferences relying only on the sample observation data.Another famous linear estimation method, developed and perfected by X. R. Li et al. (2003), is the LMMSE estimator.LMMSE is a Bayesian method that views the parameter as a random variable and performs the estimation by combining the prior information for the parameter and the current sample observation data.These two well-known estimators are simple but powerful.Next, we will briefly introduce the linear data model, the WLS estimator, and the LMMSE estimator, which will be used and compared in the next Section.

Linear Data Model
Consider the linear data model: where vector y i is the sample data, X i is a matrix that is not a function of parameters vector b and e i is the error, or more compactly The error e has a mean e and its covariance matrix is given as R = cov e ð Þ.In general, R is given as a nonsingular diagonal matrix and X is a full column rank matrix.

Weighted Least-Squares Estimator (Classical Philosophy)
The linear WLS estimator of an unknown but nonrandom vector b using sample data Y is the estimator that minimizes the quadratic fitting error: where the weighting matrix W .O is symmetric.
Minimizing J gives the linear WLS estimator: with the estimation error covariance matrix: P WLS is minimized by choosing the optimal weighting matrix W = R À1 , and: where the optimal WLS estimator (OWLS) is: It can be seen that the linear WLS estimator is always unbiased and the OWLS estimator in fact minimizes the error covariance matrix of all linear unbiased estimators using linear data model (2) for a non-random parameter.
In particular, when we choose the weighted matrix Linear Minimum Mean-Square Error Estimator (LMMSE; Bayesian Philosophy) The LMMSE estimator is a linear Bayesian estimator of a random vector b with additional prior information: Under the linear data model (2), we have: It can be seen that the error e and the parameter b are usually uncorrelated (i.e.,C be = O) in many practical problem; thus: The LMMSE estimator is the one that is linear (actually affine) in the data Y and minimizes the following mean-square error (MSE) matrix:

Yuan and Yuan
Minimizing P gives the LMMSE estimator: The above results are valid if C À1 Y is replaced with the Moore-Penrose inverse C + Y when the inverse of C Y is singular.
The above LMMSE estimator is unbiased and is the best linear estimator for b with known prior information (and thus a BLUE).
It can be seen that the OWLS estimator b OWLS in Section 2.2 is also called a BLUE, with the assumption that b is a non-random constant vector without the concept of prior information.When we view the parameter b as a random vector with known prior information, the BLUE of the linear data model ( 2) is the LMMSE estimator bLMMSE .
As previously stated, in many research fields, such as the multi-factor model, we are seriously short of direct prior knowledge of the parameter, so LMMSE estimation with complete prior information cannot be directly used.Considering that the prior of the sample data is important for the final estimation accuracy, we will study the specific form of the linear empirical Bayes estimator under the general linear data model (2) in Section 3.

BLEB Estimators With Complete Data Prior
We develop two architectures to deal with this problem from different perspectives.
Express the Prior for the Parameter Explicitly.For convenience, we say that a BLEB estimator is has complete data prior if both the prior mean and the covariance of the sample data Y (as well as its correlation with the error e) are known, because the only prior knowledge of the sample data used by a BLEB estimator is its first two moments.Using the data model (2), the BLEB estimator with complete data prior can always be inferred by the first two moments of e and Y .In other words, we can express the unknown b, C b È É in terms of the known complete data prior Y , C Y , C eY f g .The following theorem presents the BLEB estimator in the case of complete data prior for data model (2).
Theorem 1 (BLEB estimator with complete data prior).Using data model (2), the BLEB estimator with complete data prior , and where the superscript + stands for the Moore-Penrose inverse, and for a full column rank matrix X , we have Usually, b and e are uncorrelated and in this case we have: Proof: See the Appendix.

Remarks:
Without using the prior for parameter b, the BLEB estimator with complete data prior is equivalent to the LMMSE estimator in (5) and thus a BLUE.It outperforms the OWLS estimator in formula (3) which does not use any prior information.This will be described in detail later in Theorem 5.
In practice, b and e are usually assumed to be uncorrelated.However, the exact data prior Y , C Y , C eY f gare usually unknown but can be estimated by the past sampling data, as in the general empirical Bayes methodology.For instance, given a set of (past) data S 1 , :::, S T f gand (current) sample data Y , where S i and Y have the same dimension d, the practical BLEB estimator is obtained by replacing Y , C Y f g in formula (7) with their sample mean: and this converges to the theoretical result in formula (7) under very general conditions due to the famous law of large numbers.
Treat the Prior Mean of the Sample Data as Data.For the linear data model (2), the problem of obtaining the BLEB estimator with complete data prior can always be converted to a problem of obtaining a BLUE estimator without any prior, as given by the Lemma 1 as follows.
Lemma 1: Given complete data prior Y , C Y , C eY f g , the problem of obtaining a BLEB estimator with complete data prior for the linear data model (2) can always be converted to obtain a BLUE estimator without any known prior.This can be achieved by treating the prior mean of sample data as an extra data using the following augmented linear data model (assume that b and e are uncorrelated): with: The following theorem presents a BLUB estimator without any known prior using the data model (9).
Theorem 2 (BLEB estimator with complete data prior by treating the prior mean of the sample data as data).Using data model ( 9), the BLEB estimator by treating the prior mean Y of sample data Y as data is: where This estimator is unique almost surely and it is equal to It can be seen that the error covariance matrix R A in the data model ( 9) is always singular since X A is full column rank.
Proof: See the Appendix.

Remarks:
This theorem shows that for optimal linear estimation, data prior can always be completely embedded into the linear data model ( 9) by viewing the prior mean as observation data.The BLEB estimator (10) is algebraically equivalent to the BLEB estimator (7) since they both use the same information just in different ways.The equivalence will be discussed and proved in Theorem 5.The above equivalence reveals that the prior information can also be viewed as a kind of observation data, and thus the BLEB estimator (10) can be thought of as a unification of Bayesian and classical linear estimators.It is noticed that, the form of the BLEB estimator ( 10) is similar to the OWLS estimator (3).However, the parameter b is random here which is different from the OWLS estimator, and the optimization objective here is the MSE matrix, not the fitting error in WLS, despite the OWLS also optimizing the MSE with non-random b.Details can be found in the poof of Theorem 2. In practice, the estimated RA may be nonsingular and then the BLEB estimator (10) will degenerate into Yuan and Yuan

BLEB Estimators With Partial Data Prior
Similar to Section 3.2, we consider using two architectures to deal with this problem from different perspectives.
Express the Prior for the Parameter Explicitly.In reality, sometimes the corresponding moments Y , C Y , C eY f gdo not fully exist.This would be the case where some but not all components of data prior Y , C Y , and C eY are available.For instance, there can be some newly listed assets when we estimate the mean and covariance of the return of assets from historical sample observation data.In this case, it is almost impossible to estimate relatively accurate prior for these newly listed assets from a few sample data.Therefore, we generally think that there is no data prior for these newly listed assets. Let Then the partial data prior can be denoted as 2), the BLEB estimator with partial data prior can sometimes be inferred by the first two moments of e and Y .In other words, we can express the unknown b, gives the BLEB estimator with sufficient partial data prior.It can be seen that, in this paper, sufficient partial data prior means that Q 1 X is full column rank and insufficient partial data prior means that Q 1 X is not full column rank.
Theorem 3 (BLEB estimator with sufficient partial data prior).Given Q 1 X of full column rank, and uncorrelated b and e, by using data model (2), the BLEB estimator with sufficient partial data prior where the superscript + stands for MP inverse, and:

Remarks:
Without using the prior for parameter b explicitly, the BLEB estimator with sufficient partial data prior (i.e., Q 1 X is full column rank) is equivalent to the LMMSE estimator in ( 5) and thus a BLUE.It outperforms the OWLS estimator in (3) which does not use any prior information.This will be described in detail later in Theorem 5. is full column rank, which is a necessary condition in this theorem for expressing the prior for the parameter explicitly and uniquely.
Treat the Prior Mean of the Sample Data as Data.Similar to Theorem 2, for the linear model ( 2), the problem of obtaining a BLEB estimator with partial data prior can also be converted to a problem of obtaining a BLUE estimator without any prior, as presented in Lemma 2 as follows.
Lemma 2: Given partial data prior Z 1 , L f g, the problem of obtaining a BLEB estimator with partial data prior for the linear data model (2) can always be converted to obtain a BLUE estimator without known prior by treating the partial prior mean of the sample data as extra data using the following augmented linear data model (assume that b and e are uncorrelated): with Similarly, once Z 1 is treated as sample observation data, there is no data information about Y at all.Further, obviously, the form of the data model ( 12) is the same as that of the data model ( 9).The BLEB estimator with partial data prior is then given in the following Theorem 4.
Theorem 4 (BLEB estimator with partial data prior by treating the prior mean of the sample data as data).Given partial data prior Z 1 , L f gand using data model ( 12), the BLEB estimator with partial data prior has the same form as that of the BLEB estimator (10) in Theorem 2: except with and the other matrices are defined in the same way as in the BLEB estimator (10).Note that the error covariance R A here is singular, and it depends on whether Q 1 X in the data model ( 12) is full column rank or not.
Proof: See the Appendix.

Remarks:
The BLEB estimator proposed in Theorem 3 is also a BLUE with the corresponding assumption.The BLEB estimator in Theorem 3 is essentially equivalent to the BLEB estimator with complete data prior in Theorem 1 and 2 when sufficient partial data prior is given.This makes sense as the prior knowledge of data Y or Z 1 is actually redundant in this situation.Then the complete prior for parameter b can be totally expressed by the complete data prior in Y or Z 1 .In other words, the sufficient partial data prior Z 1 , L, C eZ 1 f g is the same as the complete data prior Y , C Y , C eY f gwith respect to the complete prior for b, since Q 1 X is full column rank.This will be stated and proved in detail in Theorem 4. What is more serious is that when insufficient partial data prior is given (i.e., Q 1 X is not full column rank).In this case, the loss of partial data prior information will lead to a worse BLEB estimator which will have a larger MSE matrix than the BLEB estimator using sufficient partial or complete data prior.However, the BLEB estimator with insufficient partial data prior still has a smaller MSE matrix than the OWLS estimator since more information has been used.This will also be stated and proved in detail in Theorem 5.In practice, the estimated data prior is not equal to its theoretical value, and thus the final estimated results with partial data prior are different from the one which uses complete data prior even if sufficient partial data prior is given.Generally speaking, no matter whether Q 1 X is full column rank or not, the practical BLEB estimator using complete estimated data prior Y , ĈY , ĈeY n o will perform better than that using partial esti-

Equivalence and Relationships
The remarks of the previous theorem have mentioned the equivalence of the BLEB estimators ( 7), ( 10), ( 11), and ( 13) with a full column rank matrix Q 1 X .The remarks have also stated some important relationships among these different linear estimators under different assumptions.The following Theorem 5 will present the concrete quantitative relationships among several classical, Bayesian, and empirical Bayes linear estimators that have been mentioned or proposed in this article.
Theorem 5 (relationships).Assume that b, Y , e, C b , È C Y , R, C be, C bY g are all existing while different BLUE estimators can only use corresponding known prior information since they have their own prior assumptions, as previously stated.Then the LS estimator, the OWLS estimator, the LMMSE estimator, and the BLEB estimators proposed in Theorems 1, 2, 3, and 4 exist and must have the following relationships: bBLEBÀC , bBLEBÀCT , bBLEBÀS , bBLEBÀST , bLMMSE and their MSE matrices have the relationships: (Q 1 X is full column rank) Yuan and Yuan (Q 1 X is not full column rank) Where bBLEBÀC stands for the BLEB estimator with complete data prior in formula ( 7), bBLEBÀCT stands for the BLEB estimator with complete data prior by treating the prior mean of the sample data as data in formula (10), bBLEBÀS stands for the BLEB estimator with sufficient partial data prior in formula ( 11), bBLEBÀST stands for the BLEB estimator with sufficient partial data prior by treating the prior mean of sample data as a data in formula (13), bBLEBÀIST stands for the BLEB estimator with insufficient partial data prior by treating the prior mean of the sample data as a data in formula ( 13), bLMMSE stands for the LMMSE estimator in (5), bOWLS stands for the OWLS estimator in formula (3), and bLS stands for the LS estimator in formula (4).P BLEBÀC , P BLEBÀCT , P BLEBÀS , P BLEBÀST , P BLEBÀIST , P LMMSE , P OWLS and P LS represent the MSE matrix of the corresponding estimator with the same superscript, respectively.In addition, '','' indicates that the estimation accuracy of the former is the same as that of the latter, while ''beat'' indicates that the estimation accuracy of the former is higher than that of the latter.
Proof: See the Appendix.

Remarks:
The LS estimator bLS and the OWLS estimator bOWLS mentioned above are both classical linear estimation methods that consider the parameter b as an unknown constant vector for which prior information no longer exists.The LMMSE estimator bLMMSE is a Bayesian method that views the parameter as a random vector with known complete prior about b.The other five different BLEB estimators are all empirical Bayes estimation methods that also treat the parameter as a random vector but with unknown prior information.As stated before, all these estimators presented in Theorem 5 are BLUEs with their corresponding special assumptions, except for the LS estimator bLS .The LS estimator will become a BLUE when the error covariance matrix R = s 2 I is given.Theorem 5 shows that the more prior information is used, the more accurate the estimation result will be.This makes good sense: we should try to make the most of the prior information even if it can only be estimated from sample observation data.
In order to make the use of Theorems 1 to 5 clearer, the flow chart of the BLEB methods with different assumptions of data prior is shown in Figure 1.In addition, Table 2 summarizes the BLEB estimators and the other BLUE estimators for ease of employment.

A Numerical Example
Consider the following linear data model A. The LS estimator assumes that the parameter b is an unknown constant vector, which refuses to accept the existence of the prior information for the parameter presented above.Thus, by using formula (4), we have: bLS = ½1, 1:1 0 , P LS ' 3:33 À1:67 À1:67 The OWLS estimator has the same assumption as the LS estimator, and by using formula (3), we have: bOWLS = ½1, 1:1 0 , P OWLS ' 3:00 À1:50 À1:50 0:92 !C. The LMMSE estimator assumes that the complete prior information about b exist and is known as given above; thus by using the formula (5), we have: The BLEB estimator with complete data prior assumes the prior of parameter b is unknown but data prior is completely known as given above, then by using the formula (7), we have: bBLEBÀC ' ½1:04, 1:07 0 , P BLEBÀC ' 0:70 À0:30 À0:30 0:26 The BLEB estimator with complete data prior by treating the prior mean of the sample data as data has the same assumption as the BLEB estimator with complete data prior, and by using the formula (10), we have: bBLEBÀCT ' ½1:04, 1:07 0 , P BLEBÀCT ' 0:70 À0:30 À0:30 0:26 !F. The BLEB estimator with sufficient partial prior column rank) assumes that the partial data prior is known as: ! Then, by using the formula (11), we have: bBLEBÀS ' ½1:04, 1:07 0 , P BLEBÀS ' 0:70 À0:30 À0:30 0:26 !G.The BLEB estimator with sufficient partial data prior by treating the prior mean of sample data as a data ! is of full column rank) assumes that the sufficient partial data prior is known as:

!
It is easy to check that the estimates and MSEs of b in A-H totally satisfy Theorem 5: the estimates and corresponding MSEs in cases D-G are the same as those in case C, which shows that the BLEB estimators bBLEBÀC , bBLEBÀCT , bBLEBÀS , and bBLEBÀST are equivalent to the LMMSE estimator bLMMSE .The MSE in case H is smaller than that in case B, which shows that the BLEB estimator bBLEBÀIST performs better than the classical OWLS estimator bOWLS .The MSE in case H is large than those in case E to G, which shows that the BLEB estimator with insufficient data prior bBLEBÀIST performs worse than the BLEB estimators with complete or sufficient data prior.

Covariance Matrix Estimation Using the BLEB Estimators
In the case of a large number of assets, it is extremely difficult to forecast the covariance matrix directly and accurately, and the rough estimation of the high-dimensional covariance matrix will be seriously disadvantageous to the optimal allocation of subsequent portfolios.Fan et al. (2008) proposed using multi-factor models to transform the estimation of the high-dimensional asset return covariance matrix into the estimation of the lowdimensional factor covariance matrix (the number of factors is generally much smaller than the number of assets).The following is a detailed description of how to use the BLEB estimators proposed in this paper to forecast the high-dimensional asset return covariance matrix on the basis of the multi-factor model.
The multi-factor model shows that the excessive returns of assets over the risk-free interest rate satisfies: where p and n represent the number of assets and the number of sample return data available for each asset.f 1ij , :::, f Kij denotes the factor returns of K factors in the j th sample of asset i. b 1ij , :::, b Kij denotes the factor loadings of K factors in the j th sample of asset i, and e ij is an error term that is unrelated both to factor loadings and factor returns.Multi-factor models assume a certain relationship among specific factors.These factors can be macroeconomic (unexpected inflation, interest rate changes), fundamental (profit growth, return on net assets, market share), or market-related (beta, industry ownership).There are two common structured models, depending on the type of factors used in the model.

Structured Model 1: Estimating Factor Loadings Given Factor Returns
When using a factor model to forecast the covariance matrix of high-dimensional returns on assets, we first need to estimate factor loadings or factor returns.When the selected factors are macro factors, time series data should be used, that is, multiple sample data (n.1).The Fama-French three-factor model is a typical example of these structured models.In this case, factor returns f 1ij , :::, f Kij are observable (i.e., known) and are the same for different assets i, while factor loadings b 1ij , :::, b Kij are unknown estimated quantities and are the same for different samples j.Then by using the multi-factor model ( 16), we can obtain get the following specific linear data model: where m is the mean of the p-dimensional asset return vector; S is the covariance matrix of the p-dimensional Yuan and Yuan asset return vector; and e, R are the mean and covariance matrix of the independent and identically distributed error term e 1, , :::, e n , respectively.In linear data model (17) using multiple independent samples, the estimated quantity B is a vector consisting of factor loadings, and F is a matrix consisting of observable factor returns.Generally, it is difficult to know the prior information for factor loadings, so the OWLS method (3) without considering the prior data is simply used to estimate B in most of the existing literature.In this paper, from the perspective of making full use of information, In addition, the mean and covariance matrices of errors are unknown in practice and need to be estimated by residual error.We note that the estimate of e, R f g is e, R È É . Now, treating the estimated prior above as the known prior for the sample data in the model ( 17), and then using the proposed BLEB estimators, we can then obtain a more accurate estimate BBLEB of the factor loadings vector B, which is better than the LS estimator commonly used in the existing literature.
After obtaining the estimator of the factor loadings vector BBELB = b0 1 , :::, b0 p h i 0 , we can use the following formula to obtain the updated estimator of the covariance matrix S for the return on high-dimensional assets: where BBLEB m = b1 , :::, bp h i 0 is the matrix representation of the estimated factor loadings BBELB , and Ĉf is the estimator of the observable factor return covariance matrix.
It can be seen that when some elements of prior m0 , Ŝ0 n o are not available, such as the assets that have just been listed in the asset vector, we should extract the sub-vector part with known prior for the original asset vector, and use the BLEB estimators with partial data prior proposed in this paper.

Structured Model 2 : Estimating Factor Returns Given Factor Loadings
When the selected factor is the fundamental factor, cross-sectional data analysis should be used.For each cross-section in the sample, there is only one sample for each asset.On the j th cross-section, factor loadings b 1ij , :::, b Kij are directly observable (known), while factor returns f 1ij , :::, f Kij are the quantities to be estimated and are the same for different assets i.The Barra risk model is a typical example of these structured models (Briner et al., 2009).In this case, by using multi-factor model ( 16), we can obtain the following specific linear data model: where and E e j À Á = e j , cov e j À Á = R j In addition, under the assumption of independent and identically distributed samples, the first two moments of the sample data Y j are as follows.
where, m is the mean of the p-dimensional asset return vector, and S is the covariance matrix of the p-dimensional asset return vector.
In the linear data model ( 19), the quantity f j to be estimated is a vector consisting of factor returns on the cross-section j, and B j is a matrix consisting of observable factor loadings on the cross-section j.Generally, it is difficult to know the prior information for factor returns, so the OWLS method (3) without considering the prior is simply used to estimate f i in most of the existing literature.In this paper, from the perspective of making full use of information, the estimators m0 , Ŝ0 n o of the mean and covariance matrices using historical sample data or existing experience structures are substituted for the first two moments m, S f g of the return vector of assets and we have In addition, the mean and covariance matrices of errors are unknown in practice and need to be estimated by residual error.We note that the estimate of e j , R j È É is e, R È É . Now, treating the estimated prior above as the known prior for the sample data in the model ( 19), and then using the proposed BLEB estimators, we can then obtain a more accurate estimate fj BLEB of factor loading vector f j , which is better than the LS estimator commonly used in the existing literature.
By using the above estimation method on n cross-sections, we can obtain an estimator sequence f 1 BLEB , :::, f n BLEB n o of the factor returns vector.Using this estimation sequence, we can obtain the sample estimator Ĉf of the factor returns covariance matrix.Then, the updated estimator of the covariance matrix of high dimensional asset returns can be obtained by using the following formula: Where B is a matrix consisting of observable loadings on a cross-section of interest out of the sample.
Similarly, when some elements of prior m0 , Ŝ0 n o are not available, such as assets that have just been listed in the asset vector, we should extract the sub-vector part with known prior for the original asset vector, and use the BLEB estimators with partial data prior proposed in this paper.
In order to make our method clearer, Figure 2 shows the flow chart of the BLEB estimators based highdimensional covariance matrix estimation method.

Simulation Results
In this section, we use a simulation study to illustrate our theoretical results and to verify the finite-sample performance of our proposed BLEB estimators.Since our primary concern is to verify the practical improvement of estimation accuracy by using our BLEB estimators through factor models, we compare the performance of the proposed BLEB-based covariance matrix estimators only with that of the LS-based covariance matrix estimator using factor models.To contrast different covariance matrix estimators Ŝ to the truth S, we examine the estimation error of Ŝ and S using the root meansquare error (RMSE) criteria: where k k stands for the Frobenius norm.
For simplicity, we fix K = 3 in our simulation and consider the three-factor model The Fama-French three-factor model (Fama & French, 1992, 1993) is a practical example of the model ( 21) and is a kind of Structured Model 1 in Section 4.1.In the Fama-French three-factor model,y ij is the excess return of the i th stock or portfolio.The first factor f 1j is the excess return of the proxy of the market portfolio and the other two factors f 2j and f 3j are created using six value-weighted portfolios based on book-to-market ratio and size.We take the parameters used in the study of Fan et al. (2008) as our simulation parameters to make our simulation more realistic.The sample means m F i , m B i and sample covariance matrices cov are obtained from a fit of the Fama-French three-factor model using the three-year daily data for 30 industry portfolios from 1 May, 2002 to 29 Aug, 2005 and given as follows (Fan et al., 2008).In our simulation, we consider comparing five covariance matrix estimators ŜT BLEB , ŜC BLEB , ŜP 1=2 BLEB , ŜP 1=3 BLEB , ŜLS , and the definition of these five estimators is described in Table 3.
Then we take the following steps for each simulation: Generate n random samples of From model ( 21), we obtain random samples y ij with i = 1, :::, p, j = 1, :::, n.
Calculate the true mean and covariance matrix of returns on assets by m = Bm F j and S = Bcov F j B 0 BLEB , and ŜLS .Calculate the estimation error of the above estimators and the true covariance matrix using the RMSE criteria.
Table 4 reports the estimation performance of the five covariance matrix estimators when n = 100, M = 1000, and the number of assets p is set to 100, 300, and 500.The reported average estimation error and associated standard errors are based on 1,000 simulations.The pairwise differences of the estimation performance of the five estimators are also reported, along with the corresponding t-statistics.
Figures 3 to 5 present the average estimation error of the five covariance matrix estimators when the number of assets p is set to 100, 300, and 500, respectively.In each figure, we let M grow from low to high to represent different accuracies of prior.From Table 4 and Figures

BLEB
The BLEB based covariance matrix estimator with estimated half data prior (only half assets prior are assumed known) ŜP 1=3

BLEB
The BLEB based covariance matrix estimator with estimated one third data prior (only one third assets prior are assumed known)

ŜLS
The LS based covariance matrix estimator without using any data prior when n = 100, M = 1000 at each p of 100, 300, and 500.This result shows that the more information is used, the more accurate the estimator will be.
The estimator ŜT BLEB performs the best since true prior of returns data has been used for estimation, although this is not achievable in practice.The estimator ŜLS performs the worst because none of the prior information is considered.These results are consistent with Theorem 5.The estimator ŜC BLEB performs worse than ŜT BLEB since the estimated complete prior of returns data is inadequate.The estimator ŜC BLEB performs better than the estimator ŜP 1=2 BLEB and ŜP 1=3 BLEB since the former one has used more prior information for returns data, while this is not consistent with the conclusion that the BLEB estimator with complete data prior is equivalent to the BLEB estimator with partial data prior (full column rank case) in Theorem 5.This is because that the estimated data prior used in the simulation is different from the theoretical value of the data prior.However, the differences among ŜC BLEB , ŜP 1=2 BLEB , and ŜP 1=3

BLEB
decrease with the increase of M and they all converge on ŜT BLEB when M is large enough.This is because that the estimated data prior gradually approaches to the theoretical value with the increase of M. Clearly, the simulation results tell us that we should use estimated prior information as much as possible in practice.

BLEB
decrease with the increase of the value of M. This makes good sense because the more accurate the estimated data prior is, the higher the estimation accuracy will be.This encourages us to discover more accurate data prior in practice as far as possible.
The estimators ŜC BLEB , ŜP 1=2 BLEB , and ŜP 1=3 BLEB may perform worse than the estimator ŜLS when M = 100 (see Figures 4 and 5).This result indicates that the BLEB based covariance matrix estimators may lose efficacy when a extremely poor data prior is used for estimation.

Conclusions
In this paper, a class of BLEB estimation methods under the linear data model have been developed to improve the estimation accuracy in the case of unknown prior for the parameter.The proposed BLEB estimators perform better than the OWLS estimator since more data information is used to infer the parameter and they are equivalent to the LMMSE estimator when the complete or sufficient partial data prior is provided.Only when insufficient partial data prior is known is the MSE of the corresponding BLEB estimator larger than that of the LMMSE estimator, and it is still smaller than that of the OWLS method.A simple numerical example has been presented to verify the correctness of our method.
In addition, the estimation accuracy of highdimensional covariance matrix using a factor model depends on the estimation accuracy of factor exposure or factor return.Therefore, we used the proposed BLEB estimator that fully considers the prior information of the data in this paper to estimate the high-dimensional covariance matrix, that is, we proposed the BLEB-based covariance matrix estimation method.Moreover, according to the different observable variables in the factor model, we given the specific implementation form of the BLEB method in two different cases of observable factor return and observable factor exposure.Finally, the simulation results also showed that the proposed BLEBbased method has a significant improvement in estimation accuracy compared with the traditional factor model method.
The works in this paper still have some limitations and need to be further studied in the future work.First, the BLEB-based high-dimensional covariance matrix estimation method is proposed in view of the shortcomings of the traditional factor model method.Therefore, this paper compares and analyzes the differences in the estimation accuracy of the two factor model based methods in detail, but we have not yet compared the performance differences between the BLEB method and other types of high-dimensional covariance matrix estimators.We will study this problem in future research.Second, in practical application, the BLEB method needs to first determine the prior mean and covariance of the return on assets.This paper has not yet discussed the possible impact of different prior estimators on the estimation results of the high-dimensional covariance matrix.In the future study, we will consider designing different prior of asset return, and analyze the impact of different prior on the actual estimation results.
Then, the BLUE estimator with known prior of parameter is When b and e are uncorrelated, we have

Proof of Lemma 1
A prior mean of sample data can be treated as an extra data of a linear model using the following identity: where the error Y À X b act as an observation noise.Note that this extra model is valid for every case without any assumption.
Combine the extra model ( 22) with the data model (2), we can obtain the following augmented linear data model The mean e A and covariance R A of the augmented noise e A can be calculated as follows.
When b and e are assumed uncorrelated, we have C be = O, C eY = R, and then the above covariance R A can be simplified as From the Proposition 1 of X. R. Li et al. (2003), a BLUE estimator should have the form b = K Y A À e A ð Þ , where K satisfies KX A = I and minimizes the following MSE matrix The general solution of above quadratic minimization problem can be shown in the Appendix of X. R. Li et al. (2003) that where N = I À X A X + A is a projector onto the orthogonal complement of the column space of X A , M is any matrix satisfying To convert the above general solution (23) into an equivalent but more familiar form, we let The solution L in the form of equation ( 24) is equivalent to S in the form of equation ( 23) since Since N is an orthogonal projection matrix with In addition, follow the Theorem 1 of Khatri (1990) by letting L 1 = X A , L 2 = N , and T = R A , we have that Hence, the solution L can be transformed as which is unique almost surely.And its error covariance matrix is given uniquely as follows Proof of Theorem 3 The BLEB estimator with partial data prior (11) and its error covariance have the same form with the LMMSE estimator except with the corresponding moments are calculated as follows.
From the data model (2), we have Then, under the assumption that QX has full column rank, we get Consider that b and e are uncorrelated, we have C eZ 1 = RQ 0 1 , and then Hence, we have

Proof of Lemma 2
Similar to the proof of Lemma 1, a partial prior mean of sample data can also be viewed as an extra data of a linear model using the following identity: where the error Z 1 À Q 1 X b act as observation noise.It should be emphasized that this extra model is valid for every case without any assumption.
Combine the extra model ( 25) with the data model (2), we can obtain the following augmented linear data model The mean e A and covariance R A of the augmented noise e A can be calculated as follows.
When b and e are assumed uncorrelated, we have C be = O, C eY = R, and then the above covariance R A can be simplified as The proof procedure of Theorem 4 is exactly the same as that of Theorem 2 expect with different Y A , X A , e A , and R A .

Proof of Theorem 5
From the data model (2) and equation ( 5), we have where the last equation of ( 26) follows from the matrix inversion lemma.The proof of the Theorem 1 shows that the BLEB estimator b BLEBÀC is equivalent to the LMMSE estimator b LMMSE , and thus Equation ( 27) follows from the proof of Theorem 1 with the relationship that By Theorem 2 and equation ( 27), we have By Theorem 3 with the assumption that Q 1 X is of full column rank and equations ( 26) and ( 27), we have By Theorem 4 with the assumption that Q 1 X is of full column rank, we have By Theorem 4 with the assumption that Q 1 X is not of full column rank, then we have The inequation in (31) follows from the the Theorem 1 of Khatri (1990) by letting L 1 = X 0 Q 0 1 , L 2 = IÀL 1 L + 1 , and T = C À1 b , and so From the equations ( 26) to (31), we have The last in equation ( 32) is directly follows from the property of the OWLS that the OWLS estimator has the smallest MSE among all WLS estimators, and thus the proof of this inequation is omitted here.

Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Figure 1 .
Figure 1.Flow chart of the BLEB methods with different assumptions of data prior.

Figure 2 .
Figure 2. Flow chart of the BLEB estimators based high-dimensional covariance matrix estimation method.
F j as the sample data to be used for estimation.Generate p factor loadings vectors B i = b 1i , b 2i , b 3i ½ 0 as random samples from the trivariate normal distribution N m B i , cov B i À Á .Generate p standard deviations s 1 , :::, s p from a gamma distribution G a, b ð Þ with a = 3:3586, b = 0:1876(Fan et al., 2008).Generate n random samples E j = e 1j , :::, e pj Â Ã 0 from the p-variate normal distribution N 0, diag s 2 1 , :::, s 2 p .

Figure 3 .
Figure 3. Average estimation error with different M at p = 100.

Figure 5 .
Figure 5. Average estimation error with different M at p = 500.Figure 4. Average estimation error with different M at p = 300.

Figure 4 .
Figure 5. Average estimation error with different M at p = 500.Figure 4. Average estimation error with different M at p = 300.

Table 1 .
Comparison of Different Estimation Philosophies.

Table 2 .
The BLEB Estimators With Complete, Sufficient, and Insufficient Data Prior.

Table 3 .
The Covariance Matrix Estimators Compared in Simulations.

Table 4 .
Estimation Performance and Their Differences.
Note. *** indicates that the corresponding results are statistically significant at 99% confidence level.