Adaptive probability hypothesis density filter for multi-target tracking with unknown measurement noise statistics

Under the Gaussian noise assumption, the probability hypothesis density (PHD) filter represents a promising tool for tracking a group of moving targets with a time-varying number. However, inaccurate prior statistics of the random noise will degrade the performance of the PHD filter in many practical applications. This paper presents an adaptive Gaussian mixture PHD (AGM-PHD) filter for the multi-target tracking (MTT) problem in the scenario where both the mean and covariance of measurement noise sequences are unknown. The conventional PHD filters are extended to jointly estimate both the multi-target state and the aforementioned measurement noise statistics. In particular, the Normal-inverse-Wishart and Gaussian distributions are first integrated to represent the joint posterior intensity by transforming the measurement model into a new formulation. Then, the updating rule for the hyperparameters of the model is derived in closed form based on variational Bayesian (VB) approximation and Bayesian conjugate prior heuristics. Finally, the dynamic system state and the noise statistics are updated sequentially in an iterative manner. Simulations results with both constant velocity and constant turn model demonstrate that the AGM-PHD filter achieves comparable performance as the ideal PHD filter with true measurement noise statistics.


Introduction
Benefit from the increasing advances in sensing techniques and computer science, multi-target tracking has found its potential applications in various disciplines, such as autonomous robotics, intelligence surveillance, remote sensing and even biomedical research. [1][2][3] The goal of MTT is to estimate the number of targets together with their individual states from a given sequence of measurements, which is contaminated by a number of uncertain sources. The problem of tracking a group of targets was traditionally resolved by many data association-based methods, mainly including nearest neighbor (NN), 4 the joint integrated probabilistic data association (JIPDA), 5 multiple hypothesis tracking (MHT), 6 and multiple-target particle filter. 7 However, due to its combinatorial nature, theses tracking algorithms could be computationally intensive when the target number grows.
The random finite sets (RFS) theory was first introduced into the MTT by Mahler,8 where the explicit data association was avoided by representing the target states and noisy measurements as two individual RFS.
Consequently, the MTT problem was solved within the Bayesian filtering framework utilizing the RFS formulation. In addition, the classical PHD filter 8 and its several typical extensions, such as multi-Bernoulli filters 9,10 and cardinalized PHD, 11,12 were proposed to approximate the state posterior by propagating the first-order statistical moment or the intensity of the RFS. However, the recursion of various PHD filters is still intractable in general as it involves the calculation of multiple set integrals. Two closed-form implementations of the PHD filter were proposed, including Gaussian mixture PHD (GM-PHD) filter 13 and particle PHD filter 14 which are based on Gaussian mixture (GM) strategy and sequential Monte Carlo (SMC), respectively. Generally, the GM-PHD filter is more efficient and reliable than particle PHD filter as it obviates the expensive clustering process in the particlebased approach. Besides, the standard Kalman filter (KF) and its nonlinear variations could be easily adopted to estimate the individual mixture components under the Gaussian model assumption.
The statistics of the measurement noise statistics, especially the noise deviation and covariance, may not always be a given premise in some practical applications. In the cases that the actual measurement noise statistics deviate from the assumed ones, those standard Gaussian filters will suffer degraded performance or even divergence. In order to realize the state estimation problem involving unknown noise statistics, several adaptive filters were introduced within the framework of Bayesian estimation. For instance, to improve the estimation accuracy for vehicle navigation with noise uncertainty, the fuzzy interacting multiple model (IMM) is employed in sensor fusion to determine the bounds of the process noise covariance. 15 Zhao et al. 16 presented a general filtering scheme based on the IMM algorithm for nonlinear systems with unknown and random nuisance parameters. Furthermore, the linear minimum mean square error (LMMSE) filter 17 was embedded into the above IMM based filtering scheme to handle ballistic target tracking with unknown ballistic coefficient efficiently. By introducing an equivalent covariance matrix into the estimator, the robustness of the standard KF is enhanced against the uncertainty in the process noise. 18 As an example for MTT, the H infinite optimality criterion is integrated with the GM-PHD filter to handle the noise parameter uncertainties. 19 However, the filter could yield a conservative result since it minimizes the possible worst-case estimation errors by treating the noise statistics as bounded quantities. In addition to the above methods, the variational Bayesian (VB) approximation is introduced into Kalman filtering to reclusively estimate the dynamic state and time-varying variance of the measurement noise for linear Gaussian models. 19 For the nonlinear system model along with time-varying measurement noise covariance, a robust unscented KF using VB approximation is proposed to improve its adaptivity. 20 The VB based method has advantages over other aforementioned methods that it deals with posterior inference analytically at a low computational cost, since the posterior is assumed in the same functional form as the prior based on Bayesian conjugacy. 21 As a result, some adaptive multi-target tracking filters based on the VB approximation has been further addressed, and an inverse-Wishart or a product of inverse-Gamma distribution was defined as a conjugate prior for the measurement noise covariance. [22][23][24][25] However, the mean of the Gaussian measurement noise was assumed to be known as a zero vector. To deal with the cases that the mean of the noises are also unavailable, Ö zkan et al. 26 proposed a marginalized particle filter (PF) to update the noise statistics by employing finite dimensional sufficient statistics of Normal-inverse-Wishart distributions for each particle.
When referring to the MTT scenario with unknown mean and covariance of the noise, Li et al. 27 introduced a measurement preprocessing procedure prior to the multi-sensor PHD updating step to cope with the unknown statistical properties of the sensors. The measurement noise bias was handled by a Monte Carlo sampling-based debiasing approach with the multisensor network. 28 Here, a new adaptive PHD filter is proposed to recursively estimate both the unknown mean and covariance of the measurement noise in the process of target tracking. In the proposed filter, the multi-target state is represented by a Gaussian distribution, while the statistics of the measurement noise is modeled with a Normal-inverse-Wishart distribution. The measurement equation is reformed with the residual vector, and the rule for updating the hyperparameters of the measurement noise statistics is then derived using two different methods: the VB approximation and the Bayesian conjugate prior heuristics. On this basis, the hyperparameters of the model are determined sequentially and iteratively at each measurement updating step. The effectiveness of the proposed filter is evaluated by two multi-target tracking simulations where both the mean and covariance of the measurement noise are unknown.
The organization of the paper is assigned as follows. In section 2, the model formulation of target tracking with unknown measurement noise statistics is introduced. In section 3, the updating rules for the hyperparameters of the measurement noise statistics are derived, then the details of the proposed filter are presented. Section 4 illustrates the effectiveness of the proposed adaptive PHD filter with simulations. The main conclusions are given in the last section.

Problem formulation
Consider the following discrete-time state-space model for each target where k denotes the time index. x k 2 R n x and z k 2 R n z are the target state and the measurement, respectively. f() is the state transition function and h() is the measurement function. The process noise v k is additive with the Gaussian distribution N (0, Q k ). The measurement noise w k is also additive with the Gaussian distribution N (r k , R k ), but with non-zero mean r k and covariance R k . Besides, the noises are assumed mutually independently and identically distributed.
Since the exact information about the measurement noise statistics is unavailable, not only the multi-target state but also the measurement noise statistics have to be estimated simultaneously. For this purpose, the measurement residual e k is used to represent the nonzero mean Gaussian measurement noise as For multivariate Gaussian distribution of the random variable e k with unknown mean and covariance, the Normal-inverse-Wishart distribution defines as its conjugate prior. The measurement model can thus be transformed into a hierarchical Bayesian model as (4), where the density of the mean r k conditioned on the covariance R k is a Gaussian distribution, while the covariance R k is inverse-Wishart distributed, that is, where N (r k ; m k , R k k k ) denotes a Gaussian distribution of r k , and the hyperparameters m k and k k are the mean and scale factor, respectively. IW n k (R k ; L k ) denotes an inverse-Wishart distribution of R k , with the scale matrix L k and degree of freedom n k . The probability densities of the above Gaussian distribution and inverse-Wishart distribution are as follows: To track multiple targets with a time-varying number, RFSs are used to represent both the set of the target states and the measurements. The multi-target state RFS in the state space X and the multi-target measurement RFS in the observation space Z are defined as where n k and m k denote the number of targets and measurements, respectively. According to the finite set statistics (FISST) theory, the estimation of multi-target state can be approximated by the PHD filter which is sub-optimal but computation tractable. Since the knowledge of the measurement noise statistics is unknown, the standard PHD filter has to be extended by augmenting the target state with the measurement noise statistics. Assume that at time k À 1, the joint posterior intensity of the multi-target state and the measurement noise statistics is denoted by y kÀ1 (x kÀ1 , u kÀ1 jZ 1:kÀ1 ), where u kÀ1 = ½r kÀ1 , R kÀ1 denotes the measurement noise statistics. The predicted and updated intensities at time k can be derived by the two-stage Bayesian recursion: where g k ( Á ) is the intensity of the target birth RFS. h k ( Á ) denotes the intensity of Poisson clutter RFS. p s, k and p d, k are the target survival probability and detection probability, respectively. Z 1:k = fZ 1 , Á Á Á , Z k g is the sequence of given measurements up to time k. Besides, suppose that the evolution model of the measurement noise statistics p(u k ju kÀ1 ) is independent of the given state transition model p(x k jx kÀ1 ), that is, It is noted that the multi-target state is coupled with the unknown noise statistics in the measurement likelihood p(z k jx k , u k ), which results in the problem that the integrals involved in the standard PHD recursions cannot be analytically handled.

Joint estimation of state and measurement noise statistics
According to the definition of the transformed measurement model, the posterior of the measurement mean and covariance is reformed as where the quantities m kÀ1 , k kÀ1 , n kÀ1 and L kÀ1 are the corresponding hyperparameters, and e 1:kÀ1 = fe 1 , e 2 , Á Á Á , e kÀ1 g denotes the sequence of the measurement residuals up to the current time.
Assume that the predicted density of the measurement noise statistic is in the same form as the posterior, i.e., Here, the VB approximation 29 is employed to compute the estimate of the posterior density: where Q r (r k ) and Q R (R k ) are the approximated densities corresponding to the unknown measurement noise mean r k and covariance R k , respectively. The key of the VB approximation is to determine the posterior density by calculating the minimum of Kullback-Leibler divergence (KLD), which is defined as The solution to this optimization problem is obtained by minimizing KLD with respect to the Q r (r k ) and Q R (R k ) sequentially, while retaining the other part unchanged. Finally, the hyperparameters at time step k can be approximated from their corresponding values at time step kÀ1 given by Proof: For the proof see appendix A.
In what follows, the measurement residuals are treated as samples to derive the updating rule for the hyperparameters of the measurement noise statistics. The Normal-inverse-Wishart posterior can be calculated from its Bayesian conjugate prior with samples of measurement data as follows 30 : where n is the number of the data set fz i g n i = 1 , z k is the mean of the samples, and S k is the covariance matrix of the samples. The sample mean and covariance are defined as The above general recursive equations are specialized for the update rule of the hyperparameters of the measurement noise heuristically. Considering that the measurements are processed sequentially in the update step of the filter, the sample mean is equal to the current measurement z k and the covariance is a zero matrix. The measurement z k is then replaced by the measurement residual e k with a simple transformation given in equation (3). Consequently, the updated values of the hyperparameters are computed as Note that the above heuristic updating rules are essentially the same as the equations (17) and (19). When the hyperparameters at time step k is obtained, the predicted density of the measurement residual e k becomes a multivariate t-distribution as with center at m k , precision L k (k k + 1) k k (n k Àn z + 1) , and degree of freedom n k À n z + 1. The posterior density of the measurement noise statistics can be denoted by a single Normal-inverse-Wishart distribution as follows 30 : where Z is a constant defined as To make full use of the Bayesian conjugate prior, both the joint prior and predicted densities of x k and u k are assumed to be a product of Normal-inverse-Wishart and Gaussian distribution. For the given transmit and measurement models, the Bayesian recursions are then employed to obtain the joint posterior density. In the prediction step, to ensure that the predicted density and the prior density are in the same formation, the predicted values of the hyperparameters are computed with a simple evolution model: where r 2 (0, 1 is a given forgetting factor. Meanwhile, the mean and covariance of the target state can be predicted by applying the prediction equations of Gaussian filters.
In the measurement update step, it is intractable to directly derive an exact value of the joint posterior density because the target state is coupled with the measurement noise statistics. Therefore, an iterative method is used to estimate the target state and measurement noise statistics sequentially. Specifically, the measurement noise statistics are first estimated with the updated hyperparameters, which are computed according to the updating rule presented above: The updated mean and covariance of the target state are estimated by standard update equations of Gaussian filters with the known measurement noise statistics. By substituting the updated mean of the target state into the measurement model, one can obtain the new measurement residual, which is used to update the hyperparameters of the measurement noise statistics again. These iterative steps are performed multiple times until the result is converged or the preset maximum iteration limit is reached.

Adaptive GM-PHD recursion
In this section, the standard GM-PHD recursion is extended in a sense that not only the target state is estimated, but also the measurement noise statistics are adapted at the same time. Assume that all targets follow the same state-space model defined in section 2, thus the state transition and likelihood functions are both Gaussian distributed, i.e., where j is the target index. To ensure that the adaptive filter have closed-form solutions, the spontaneous birth are characterized by the intensities comprised of a product of Normal-inverse-Wishart and Gaussian distribution mixtures: where J g, k is the total number of mixtures with weight w j g, k , mean m j g, k and covariance P j g, k . The hyperparameters for each mixture of the birthed target are denoted by m j g, k , k j g, k , n j g, k , and L j g, k . By combining (7), (8), and (25) together, the proposed adaptive PHD recursions are detailed as follows.
Time Prediction: Assume that the joint prior intensity of the multi-target state and measurement noise statistics at the time k À 1 is also represented by a product of Normal-inverse-Wishart and Gaussian mixtures: The joint predicted intensity is computed by summing over the two intensities with regard to existing targets and spontaneous birthed targets, that is, where y s, kjkÀ1 (x k , u k ) denotes the predicted intensity from existing targets which is computed by The predicted mean and covariance matrix of the target states is estimated by And the predicted hyperparameters of the Normalinverse-Wishart distribution are computed in accordance with the equation (21), that is, Measurement update: Given that the joint predicted intensity at time k is represented by a product of Normal-inverse-Wishart and Gaussian mixtures Then the joint posterior intensity is computed by where the detected target intensity for each measurement z k is given by The fixed point iteration method is employed to compute the above model parameters since it is of asymptotic optimal convergence properties. 31 For each new measurement z k , an iterative PHD measurement update is performed with the initial values of the parameters given as where the superscript (i) = 1, 2, Á Á Á , N max is the iteration index with the preset maximum number of iterations N max . The (i + 1)-th iteration process is depicted as follows. And the auxiliary variables, that is, the innovation vector b z j k , the innovation covariance matrix T j k and the cross-covariance matrix C j k , for each mixture are computed by The estimated measurement noise statistics are updated with estimates of the hyperparameters from the previous iteration as b r j, (i + 1) The estimated multi-target state is updated by employing standard Gaussian filters with known measurement noise statistics as The estimated hyperparameters are updated by where the measurement residual is computed by Suppose that the algorithm converges after N iterations, then the parameters are determined by where the updated weights of the posterior mixture components are calculated by Note that if nonlinear functions are used as dynamic and measurement models, then the general Gaussian integrals in equations (29) and (35) can be approximated by utilizing commonly used variants of nonlinear Gaussian filters, such as the extended KF, the unscented KF, and the cubature KF.

Algorithm summarization
Additional steps including parameter initialization, mixture reduction, and state extraction have to be implemented when realizing the entire tracking procedure. The number of the mixture components after each PHD recursion will increase exponentially as time progresses. Consequently, similarly as in Granstro¨m and Orguner, 32 mixture reduction process is performed to ensure the tractable computational cost. This is done by first removing the components that have weights below a truncation threshold and then merging the components with reasonable Mahalanobis distance from the largest component. The multi-target state is finally extracted from the reduced mixture components that are corresponding to the peaks of the posterior cardinality. In conclusion, the pseudo-code of the proposed filter is summarized in Algorithm 1.

Simulation configuration
In this simulation, the performance of the proposed filter is evaluated in a 2-dimensional scenario, where the number of targets is time-varying. As shown in Figure 1, the targets are observed in a surveillance region with the scale 400m3300m. The target state is denoted by a vector x k = ½p x, k , _ p x, k , p y, k , _ p y, k , which is composed of target position ½p x, k , p y, k and velocity ½ _ p x, k , _ p y, k . The true trajectories of the targets are generated by a nearly constant velocity model with the matrices where the scanning period D = 1s, and the standard deviation of the process noise s v = 1 m/s. Target positions are observed by the following linear Gaussian model with unknown measurement noise statistics: where the ground truth of the measurement noise statistics are set as r true = ½9, 9 and R true = diag(½1, 1). Note that the true value of the noise mean is beyond three standard deviations to ensure that the impact of the non-zero noise mean is not negligible. As summarized in Table 1, each target has a different survival time and initial position. There are a maximum of four moving targets during the time period [20,80]. The spontaneous birth intensity is given by where the weight w g, k = 0:03, the means of Gaussian components m 1 g, k = ½À140, 2:5, 120, À2:5, m 2 g, k = ½À180, 4, 80, À2, m 3 g, k = ½À180, 2:5, À80, 2, and m 4 g, k = ½À140, 3, À120, 1:5, the identical covariance P g, k = diag(½25, 10,25,10). The initial hyperparameters of the Normal-inverse-Wishart components are set as m 0 = ½4, 4, k 0 = 5, n 0 = 5, and L 0 = diag(½8, 8).

Algorithm 1. Pseudo-code of the proposed PHD filter
Step 1: Parameter initialization Set initial hyperparameters of the measurement noise statistics: p(u 0 ) = N IW(u 0 ; m 0 , k 0 , n 0 , L 0 ): Set initial joint PHD intensity: Step 2: PHD recursion for k = 1 : K do Compute the predicted PHD intensity via (26)- (30), and mixtures are represented by Set the initial conditions for the iterations via (34); repeat Compute the estimated measurement noise statistics (r j k , R j k ) via (36). Compute the updated multi-target states via (37). until converged Compute the updated mixture weights via (41), and mixtures are represented by

end for
Step 3: Mixture reduction Prune and merge the mixtures for computational feasibility.
Step 4: State extraction Extract states corresponding to the peaks of the posterior cardinality.  Besides, other parameters used in the simulations are listed in Table 2.

General tracking performance
In Figure 2, the resulting target positions estimated by the AGM-PHD filter in one typical trial are presented. It can be easily seen that almost all the positions of the targets are estimated accurately in both x and y coordinates. The weighted average of the estimated measurement noise statistics from each extracted mixture is illustrated in Figures 3 and 4. The simulation results suggest that both the estimated means and standard deviations of the measurement noise converge to their true values after a few time steps. The reasons for the similar convergence speed of the means and standard deviations are that they are coupled together through one single Normal-inverse-Wishart distribution.

Comparisons with benchmark filters
To demonstrate the advantages of the AGM-PHD filter, three different variations of GM-PHD filters are also implemented and evaluated in the simulations. The highlights of these filters are described as follows.
GMTS-PHD: the standard GM-PHD filter with the ground truth of the measurement noise statistics, that is, the means and standard deviations of the measurement noise are chosen as 9.0 and 1.0, respectively. GMTM-PHD: the standard GM-PHD filter with the ground truth of the measurement noise mean only, while the standard deviations of the measurement noise are chosen as 2.0 to match with the initial standard deviation of AGM-PHD. GM-VBPHD: the adaptive PHD filter proposed in Wu et al. 22 where only the covariance of the measurement noise is assumed unknown, while the mean of the measurement noise is chosen as    8.0 to allow a reasonable distance from the ground truth.
In all the simulations, the estimated accuracy of the target position is evaluated quantitatively by the optimal sub-pattern Assignment (OSPA). 33 Similar to the root mean square error (RMSE) metric used to represent the differences between two vectors, the OSPA metric can be used to measure the differences between two random finite sets. In each simulation run, two hundred independent Monte Carlo trials are conducted. For the OSPA metric, the cut off factor c = 100, and the order is set as p = 2. The comparison of the averaged OSPA distance for all the four filters is presented in Figure 5. The result proves that the GMTS-PHD filter obtains the least estimation error since there is no bias in the measurement noise statistics in this case. It can also be seen from Figure 5 that the tracking performance of the AGM-PHD filter is the most comparable with that of the GMTS-PHD filter. This is due to the fact that the AGM-PHD filter can adapt to unknown measurement noise statistics with a few initial time steps. Besides, the GMTM-PHD filter achieves smaller errors than the GM-VBPHD. It implies that in the measurement update step, the mean of the measurement noise has more important impacts on state estimation than the covariance when its value exceeds three times the standard deviation. The comparison of average cardinality statistics calculated by the different filters are shown in Figure 6, where the quite similar result as the OSPA comparison can be seen. The average number of the targets determined by both the ideal GMTS-PHD filter and the AGM-PHD filter are very close to the ground truth. However, the GMTS-PHD filter and GM-VBPHD fail to be in accordance with the true number because only the mean or the covariance of the measurement noise is considered.
To compare the computational costs of these filters, the average running time of Monte Carlo trail of different filters are listed in Table 3. It is noted that the running time required for each filter are measured relative to that of the GMTS-PHD filter. As can be seen from the Table, the GMTM-PHD filter demands almost the same running time as the GMTS-PHD filter. The two VB based filters take more than 25% longer running time. This is mainly because that an additional iteration step has to be carried out to estimate the unknown model hyperparameters during the measurement update stage. The computational cost of the AGM-PHD filter is slightly higher than the GM-VBPHD filter since it involves more number of hyperparameters that need to be estimated in the AGM-PHD filter.
The impacts of different average clutter rate on the tracking performance are also evaluated. All the filters are carried out under six different levels of clutter rate whose value is taken from the interval [5,30] with step 5.0. For each level of clutter rate, the average OSPA metric with regard to different filters are plotted in Figure 7. It can be observed that the estimation accuracy concerning all the filters degrades as the level of clutter rate increases. However, the differences of the OSPA distance between the AGM-PHD filter and the GMTS-PHD filter remain below 3.0 at all clutter levels. Meanwhile, the increment of the OSPA distance of the GMTM-PHD filter and GM-VBPHD filter are larger than 12.0, which is nearly 1.5 times of the other two filters. This might due to the reason that more spurious

Evaluation with a constant turn model
To demonstrate the performance of the AGM-PHD filter for tracking targets with variable parameters of motion model, the targets are assumed to move within the same surveillance region following a nearly constant turn (CT) model 34 with known but varying turn rate. The tracking setup of the simulation keeps the same except the dynamic motion model, which is given as follows: where the transition matrix is given as a function of the turn rate: the turn rate v k = v c, k + u k D, D = 1s and u k denotes angle rate noise with Gaussian distribution N (u k ; 0, (p=180) 2 ), v c, k follows a piecewise function v c, k = 0, for 14k440, 60 \ k4100 Àp=10, for 40 \ k450 p=10, for 50 \ k460 8 < : In Figure 8, the ground truth trajectory of each target together with its start and end positions are presented. All the targets travel along nearly straight lines at the time intervals [1 s, 40 s] and [60 s, 100 s], while making two converse turns at time 40 s and 50 s. During the turning periods, tracking the targets correctly becomes more difficult as the targets' trajectories change significantly. Same as in the first simulation, 200 independent Monte Carlo trials are conducted for all four PHD filters. Consequently, the result in Figure 9 shows that more erroneous estimates appear around the turning points. Nevertheless, the proposed adaptive PHD filter still exhibit acceptable tracking performance as a whole. This implies that the AGM-PHD filter gains certain robustness against sudden changes in the control noise of non-linear dynamic models. The performance comparison between the filters is presented in Figures 10 and 11. The OSPA distance rises during the periods when the targets make the wide turns. It is also observed in the results, the AGM-PHD filter achieves smaller OSPA distance and more accurate cardinality estimates than the other two counterparts.

Conclusion
In conclusion, the classical PHD filter was devised to improve the adaptivity for MTT applications where the   measurement noise statistics are unavailable. During the process of the target state estimation, the measurement noise statistics are also estimated and refined iteratively according to the derived updating rule for the hyperparameters of Normal-inverse-Wishart distribution. The updating rule can be obtained based on the VB approximation or the Bayesian conjugate prior heuristics. Simulation results show the proposed adaptive PHD filter achieves comparable tracking performance with the ideal benchmark filter under different cluttering settings. Moreover, the proposed method has a certain degree of robustness even when the motion trajectories of the targets change abruptly.

Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was supported by the National Key R&D Program of China under grants 2017YFC0306203 and 2019YFB1310305, the Oceanic Interdisciplinary Program of Shanghai Jiao Tong University under grants SL2020ZD205 and SL2020MS033, and Zhejiang Cultivating Talent Project under grant SKX201901.

ORCID iD
Weijun Xu https://orcid.org/0000-0002-4617-0350 Similarly, the integral part in the lower formula of (42) is computed by ð logp(e k , r k , R k je 1:kÀ1 )Q r (r k )dr k = À 1 2 (n kjkÀ1 + n z + 2) log R k j j À 1 2 tr(L kjkÀ1 R À1 k ) À 1 2 h(e k À r k ) T R À1 k (e k À r k )i r À 1 2 hk kjkÀ1 (r k À m kjkÀ1 ) T R À1 k (r k À m kjkÀ1 )i r + C 3 = À 1 2 (n kjkÀ1 + n z + 2) log R k j j À 1 2 tr(L kjkÀ1 R À1 k ) where hÁi r = Ð ( Á )Q r (r k )dr k denotes the expected value with respect to Q r (r k ), C 3 , and C 4 are terms independent of R k . Besides, it should be noted that the following equations hold when r k is a Gaussian distribution: h(e k À r k ) T R À1 k (e k À r k )i r = tr h(e k À r k )(e k À r k ) T i r R À1 k À Á = tr (e k À r k )(e k À r k ) T R À1 k À Á + 1 k kjkÀ1 + 1 ð47Þ hk kjkÀ1 (r k À m kjkÀ1 ) T R À1 k (r k À m kjkÀ1 )i r = tr hk kjkÀ1 (r k À m kjkÀ1 )(r k À m kjkÀ1 ) T i r R À1 k = tr k kjkÀ1 (r k À m kjkÀ1 )(r k À m kjkÀ1 ) T R À1 k + k kjkÀ1 k kjkÀ1 + 1 By matching terms in equation (46) with the parameters of the inverse Wishart distribution IW n (R k ; L k ), the parameters of the approximated posterior density Q R (R k ) are given by n k = n kjkÀ1 + 1 Thus far, the updating rule of the hyperparameters is achieved by equations (45) and (47).