Laplace ℓ1 robust Student’s T-filter for attitude estimation of satellites

A Laplace ℓ1 robust student’s T-filter is presented for satellites to estimate their attitude state despite severe measurement noise and modeling error. Although the student’s T-filter (STF) has the capability of handling measurement noise, it cannot address the unknown modeling error. It is further sensitive to the degree of freedom (DOF). Hence, the measurement covariance is updated by using the maximum correntropy criterion to accommodate the covariance of unknown modeling error in robust filtering design. Moreover, the Laplace distribution is derived to reduce the influence of the DOF parameter by forming a surrogate function of an optimization problem. Then, the majorization minimization approach is formulated to solve such an optimization problem and present the proposed filtering algorithm in the STF framework. Numerical simulation of applying the proposed attitude estimation scheme is performed by comparing it with the third/fifth-order cubature Kalman filters.


Introduction
Nonlinear filters play a vital role in signal processing, 1 target tracking, 2-4 and navigation. [5][6][7] In the Bayesian estimation framework, filters are designed by solving the posterior probability density function (PDF). However, there does not exist an optimal solution to the posterior PDF for nonlinear systems, since the closed PDF does not exist. 8 The posterior PDF is thus assumed to satisfy the Gaussian distribution. Accordingly, many approximation methods are applied to design the suboptimal nonlinear filters such as the classical Kalman filter (KF) and its extensions. [9][10][11][12][13][14] For those filters, both the system states and the measurement predictions are assumed to have Gaussian characteristics. Considering space missions such as the orbit tracking or the attitude determination, the noise and the unknown perturbations may be induced. The measurement outliers may also exist. 15 The noise of the system and its measurement may have a heavier tail than the distribution with the Gaussian characteristics. 16 Moreover, unknown perturbations bring satellites with modeling error. This error will further increase the nonlinearity and the uncertainty of the noise. This complicates the state estimation further and deteriorates the estimation performance of KF.
The past decade has witnessed significant development of the filter theory in dealing with the heavy-tailed noise. Many investigations are available and they can be categorized into two types. The first is the regression-based robust filters. 17 The Huber theory 18,19 is applied to design a robust KF. The maximum correntropy criterion 20,21 is applied to maximize the prediction error and the residual correntropy with the non-Gaussian noise handled. However, their performance is sensitive to the estimation gains. This drawback drives the researchers to design alternative solutions to the heavy-tailed noise problem.
Some achievements focused on the PDF distributions describing the heavy-tailed noise. For example, the particle filter (PF) adopts a stochastic approximation method to address the non-Gaussian noise. For this filter, random sampling points are adopted to approximate the PDF via the sequence of the Monte Carlo sampling technique. 22,23 However, the PF has a computational burden because the considered particles' number will increase exponentially along with the state's dimension. The mixture Gaussian KF was derived by assuming that the sum of the Gaussians can approximate any density with arbitrary accuracy. 24,25 However, it is not easy to determine the number of filters and the corresponding computation burden is large.
An efficient approach to solve the above issue is the application of the Student's T distribution to approximate the posterior PDF. This scheme is named as the Student's T-filter (STF), in which the predicted and the posterior PDF are supposed to satisfy the Student's T distribution. 26 Therefore, the STF has significant advantages when dealing with the heavy-tailed noise. Many investigations focusing on this method have been presented in recent years. [27][28][29] To deal with the unknown modeling error, the maximum correntropy criterion is considered here to update the STF's measurement covariance. This can compensate for the covariance of the unknown modeling error to suppress it for the robust estimation design. Besides, the estimation performance is also dependent on the degree-of-freedom (DOF) of the Student's T distribution. The selection of DOF for STF is not easy. However, the Laplace distribution can solve this drawback. It can be implemented by optimizing a hybrid ' 2 /' 1 norm issue. Then, the surrogate function in a quadratic format is derived, 30 which is employed to get the approximation of cost function. Then, the majorization minimization (MM) technique is used to obtain the closed-form expression of state estimation by an iterative algorithm in the STF framework. This approach is called as the Laplace ' 1 Robust STF (L1-RSTF).
The rest of the work is organized as follows. In section ''Problem formulation,'' the theory of the Student's T-filter is introduced. In section ''Development of the Laplace ' 1 robust Student's T-filter,'' the proposed L1-RSTF is presented based on the STF by using the maximum correntropy criterion and the Laplace distribution. Simulation results of the attitude estimation for small satellites are shown in section ''Numerical simulation.'' Some concluding remarks are given in section ''Conclusion.''

Problem formulation
In this paper, the mathematical notations are standard defined as follows. The set of m3n matrices is represented by < m3n . The set of all non-negative integers is denoted as N 0 . The Euclidean norm of vectors or matrices is denoted by jj Á jj. ( Á ) T denotes the transpose of matrix or vector. For any vector a = ½ a 1 a 2 Á Á Á a n T 2 < n , jjajj 1 = is the normal distribution with mean m 0 and covariance S x .

Mathematical model of a class of nonlinear systems
The following discrete-time nonlinear system is considered in this study: where k 2 N 0 is the discrete-time index. x k 2 < n is the system state at the discrete-time t k , y k 2 < m is the system's measurement at t k , w k 2 < n is the process noise at t k , and v k 2 < m is measurement noise at t k . f(x k ) and h(x k ) are known, and they are the process and the measurement functions, respectively. Assuming that those noises w k and v k satisfy the heavy-tailed distribution, then the PDFs of those noises are supposed to be the following Student's-t distribution: where St(Á; m, S, y) is the Student's T PDF with mean m, scale matrix S, and DOF y. Moreover, the specific expression of the Student's T PDF with any variable z could be given by For y . 2, the mean and the variance of z can be calculated by E(z) = m and var(z) = E((z À m)(z À m) T ) = y yÀ2 S, respectively. It is known from the above definition that Q k and y 1 are the scale matrix and the process noise's DOF, respectively. R k and y 2 are the scale matrix and the measurement noise's DOF, respectively. Then, the initial value of x 0 can be assumed as the distribution of the Student's T characteristics with meanx 0j0 , scale matrix P 0j0 , and DOF y 3 as where x 0 , w k , and v k are mutually uncorrelated.

Student's T filtering theory
When applying the Bayesian theory framework to develop the Student's T-filter, the posterior PDF p(x k jY k ) is calculated via the Bayesian relation which represents the measurement update for the estimation.
Based on (6), it is ready to get the predicted PDF p(x k jY kÀ1 ) by the Chapman-Kolmogorov equation for the time update of the estimation, which is specified as The PDFs p(x k jx kÀ1 ) and p(y k jx k ) are got from the process function and the measurement function in (1). The PDFs of the correlated noises are updated as To obtain a solution to the posterior PDF (7), the jointly predicted PDF p(x k , y k jY kÀ1 ) based on the state x k and the measurement y k is assumed as the Student'st distributed 26 : wherex kjkÀ1 is the mean vector, and P kjkÀ1 is the scale matrix of the predicted PDF p(x k jY kÀ1 ).ŷ kjkÀ1 and P yy kjkÀ1 are the mean and the scale matrix of the PDF p(y k jY kÀ1 ), respectively. P xy kjkÀ1 is the cross-scale matrices of the system state, and P yx kjkÀ1 is the crossscale matrices of the measurement.
To this end, the posterior PDF p(x k jY k ) in (7) can be calculated by the Student's T distribution. In accordance, the generalized algorithm of STF can be summarized as follows.
(3) State update: The generalized STF could be obtained by calculating (11)- (15) with the Student's T weighted integral, and then by updating the estimated state and the scale matrix (16)- (19). However, the DOF parameters of the STF are difficult to select and they will have an important effect on the estimation accuracy. This issue will be addressed in the next section.

Maximum correntropy criterion design
In this section, the maximum correntropy criterion is adopted to update the measurement covariance of the unknown modeling error for the robust estimation scheme design. Therefore, the maximum correntropy criterion is introduced first. For any random variables G 1 and G 2 , their correntropy is denoted as 31 where k( Á ) denotes the kernel function and it is positive-definite. The function f G 1 , G 2 (g 1 , g 2 ) denotes the density of the joint probability.
In practical engineering, the joint probability cannot be obtained directly. It is usually represented by a series of data samples f(g 1i , g 2i ), i = 1, 2, Á Á Á Ng with finite numbers. Then, one can write the correntropy as According to Wang et al., 32 the following Gaussian kernel function can be employed for (19).
where s is the correntropy's kernel width. It is found that the kernel k G (e i ) will have its maximum value when e i = 0, which can be used as the cost function of the maximum correntropy criterion.
Considering that the investigation focuses on updating the measurement covariance to suppress the unknown modeling error, the new cost function is proposed by combining (23) and the STF framework with the definition of e k = ( y 2 y 2 À2 R k ) À 1 2 (y k À H k x k ) aŝ x kjk = arg min with jjx k Àx kjkÀ1 jj 2 ( y 3 y 3 À2 P kjkÀ1 ) À1 = (x k Àx kjkÀ1 ) T ( y 3 y 3 À2 P kjkÀ1 ) À1 (x k Àx kjkÀ1 ).x kjk is the system state estimations after the measurement update.x kjkÀ1 is system state estimations before measurement update. P kjkÀ1 is the likelihood of the scalar matrix of the PDF p(x k jY kÀ1 ). H k is the Jacobian matrix of the nonlinear function h( Á ) for the following theoretical derivation.
Differentiating (24) with respect to x k yields Inserting (22) into (23), one has where c k, i = 1 ffiffiffiffi 2p p (25) can be rewritten as The maximum posterior estimation of x k is then obtained by solving the following cost function, which is derived and transformed from (27).
x kjk = arg min where The modified cost function can be treated as the integration of the STF and the maximum correntropy criterion with the updated covariance. This can update the covariance according to the influence of unknown modeling error for robustness design.
The Laplace ' 1 distribution design Based on the above result, the Laplace ' 1 distribution is introduced to formulate the proposed L1-RSTF. Firstly, the normalized residual vector is denoted as Then, the new cost function (28) can be simplified by the Laplace ' 1 distribution aŝ Because the differentiation of the ' 1 norm function in (29) does not exist at j k = 0, it fails to apply the traditional gradient techniques to solve (29) directly. To solve this above difficulty, an iterative surrogate function is to be presented for re-describing the cost function (29). The majorization minimization (MM) method will be designed further to solve this issue by calculating the minimum value of the surrogate function. Consider a positive function g( Á ) and a positive scalar a, it follows that Moreover, 2a holds only when g(x) = a 2 . Then, one has Substituting (31) into (29), it leaves the surrogate function as Now, the original optimization problem has been changed into solving the surrogate function (32) via the MM technique. It is described aŝ Aiming of applying the equivalent condition (31), the solution of (34) could be approximated by where e i is a very small non-zero constant (e.g. 10 26 ;10 28 ) to avoid N (c) i = 0 with c representing the iterations' number. Generally, a starting weighting matrix N (0) could be selected first to obtainx (1) kjk through (33). Then, the estimationx (1) kjk could be used to update and obtain a new weighting matrix N (1) . In this work, the identity matrix I is treated as the initial weighting matrix for simplification.
Note that the surrogate function (32) has some similarities as the cost function of STF for a given N (c) i . Hence, the function (32) is equivalent to the following form It obtains the upper bound of this cost function in the STF framework. Once getting the estimation statex (c) kjk , one can update and iterate a new weighting matrix N (c + 1) for obtainingx (c + 1) kjk .

The Laplace ' 1 robust Student's T-filter
Based on section ''The Laplace ' 1 distribution design,'' we are ready to present the L1-RSTF in the following algorithm. The proposed L1-RSTF has a similar structure to the STF in the measurement updating. The major difference is in the state updating. Therefore, the new process of state updating for proposed L1-RSTF is described as follows.
(1) State updating for L1-RSTF: The initial value of the iteration and the scalar matrix for the measurement covariance is given by (2) Repeating: P yy kjkÀ1 = P yy kjkÀ1 + y 2 (y 3 À2) Until the desired convergence performance index is satisfied.
Up to now, the final estimation for the states and their scalar matrix of the covariance can be determined asx It can be known that the proposed L1-RSTF is similar to the standard STF via the Laplace ' 1 theory. However, it is difficult to choose the convergence criterion. Hence, one can determine the number of iterations according to the precision requirements.

Numerical simulation
In this section, the application of the L1-RSTF developed in section ''Development of the Laplace ' 1 robust Student's T-filter'' to achieve attitude estimation for small satellites will be done.

Mathematical model of a rigid satellite
The unit quaternion q = ½ q 1 q 24 T = ½ q 1 q 2 q 3 q 4 T , q 24 = ½ q 2 q 3 q 4 T , is adopted to represent the attitude with respect to the inertial frame. Then, the kinematics of a rigid satellite is given by where v = ½ v 1 v 2 v 3 T is the angular velocity in the body frame with respect to the inertial frame. F(q) is given by For small satellites, their attitude dynamic model is established as where t 2 < 3 is the total torque and J 2 < 333 is the inertia of the satellite. Based on (50), the attitude dynamics without any gyro measurement could be written as: where w denotes the heavy-tailed noise and Moreover, considering cost limitation, a three-axis magnetometer and a sun sensor are fixed to measure the states of the attitude control system. For these sensors, the measurement model could be given by where C b i 2 < 333 represents the transition matrix from the inertial frame to the body-fixed frame, and it describes the attitude information. S 1 2 < 3 and B 2 2 < 3 are the sun vector and the magnetic field, respectively. Dn 1 2 < 3 and Dn 2 2 < 3 are the heavytailed measurement noise of the sensors, respectively.
To this end, the satellite's attitude control system described by (51) and (52) can be written into the form of (1) after discretization. Consequently, the proposed L1-RSTF can be applied to estimate the attitude state withx = ½q TvT T denoting the estimation of the state x.

Simulation results
To validate the performance of the proposed L1-RSTF, simulations are conducted by comparing it with the CKF. In the simulation, the initial states of the satellites are q(0) = 1 0 0 0 ½ T and v(0) = ½ 0 0 0 T deg/s. The states' initial estimation is set to beq(0) = ffiffi where Q represent the covariance of the process noise with p1 = 0:9. R is the measurement noise's covariance with p2 = 0:9. The measurement accuracy of the magnetometer and the sun sensor is 50 nT and 0:2 degrees, respectively. Their covariances are represented by R B and R S , respectively. Based on Cao et al., 33 the 13thorder Gaussian sphere harmonic function of IGRF is considered to simulate the real y B . The 7th-order of IGRF is applied to simulate B 2 in the satellite system. The measurement error between the 13th-order and 7th-order of the IGRF is about 208 nT. This error is also considered. In the simulation, the following two scenarios are considered.
Remark: Letx = ½ v T errorq T T = x Àx denote the state estimation error, whereq = q Àq and v error = ½ W xerror W yerror W zerror T = v Àv. In satellite engineering, the unit quaternion q can be mathematically transformed into the Euler attitude angles to explicitly describe the satellite's attitude, that is, where f t ( Á ) is the transformation function. 34 f, u, and c are the roll, the pitch, the yaw attitude Euler angles with respect to q, respectively. Hence, the attitude estimation errorq can be transformed into the attitude Euler angles estimation error denoted as H error = ½ f error u error c error T .
(1) The first scenario: This scenario is to validate the developed filter's effectiveness in handling the heavytailed noise. Figures 1 and 2 show the resulted estimation errors. Because the CKF is widely used in practice, the 3rd-CKF and the 5th-CKF are also employed to compare with the proposed L1-RSTF. It is found in Figures 1 and 2 that the convergence rate of L1-RSTF is much faster and more stable than the CKFs without the obvious fluctuations. Although the advantage of L1-RSTF is not very obvious in the angular velocity estimation, this method still has a certain precision superiority. Moreover, there are a few estimated mutations for the L1-RSTF, which is mainly due to the mutation of heavy-tailed distribution with a small probability. However, due to the theoretical advantages of proposed L1-RSTF, the estimation error divergence can be rapidly suppressed and return to the normal estimate trajectory. Table 1 lists the RMSE results of the estimation error between the 3rd-CKF, the 5th-CKF, and the L1-RSTF. It is listed that the developed L1-RSTF has higher estimation precision than the 3rd-CKF and the 5th-CKF, which is especially true in attitude estimation. In the aspect of angular velocity, the theoretical   advantage of L1-RSTF is not obvious, but it still has the highest accuracy. Regarding the computational burden, the 3rd-CKF and the 5th-CKF need 12 and 73 sampling points to calculate and propagate the state estimation, respectively. However, the proposed L1-RSTF only needs 12 sampling points and the number of iterations is five. Therefore, the L1-RSTF also has obvious advantages in computational complexity. Therefore, the modeling errors and the heavy-tailed noise (57) are both taken into consideration in this case and introduced into the simulation system, which can be used to test the robust ability of presented L1-RSTF to the modeling error and the noise. Figures 3 and 4 show that the estimation accuracy of the second scenario is inferior to the accuracy in the first scenario. This coincides with the preceding theoretical result. Especially for the CKFs, the estimated convergence curve deviates seriously from the zero-value due to the existence of model error in Figure 3. However, the L1-RSTF still fluctuates slightly around zero and maintains a high precision compared with the CKFs. The angular velocity estimate accuracy of L1-RSTF also has better convergence properties than the CKFs, as we can see in Figure 4.
Moreover, the comparison results listed in Table 2 show that the estimation accuracy obtained from the 3rd-CKF and the 5th-CKF is severely decreased due to the modeling error. The reason is that the nonlinearity and the uncertainty of the satellite system are increased by modeling error and noise. Only the proposed L1-RSTF maintains acceptable high precision. This validates the superiority of L1-RSTF.
The above simulation results verify that the presented scheme is robust to the heavy-tailed noise and the modeling error with low computation required. For the CKFs, only the angular velocity estimation can get acceptable performance. It is concluded that the proposed scheme is less dependent on the model of the uncertainty and the noise. This lets the designed filter be superior to the traditional Kalman filter because the latter relies heavily on the known noise level. Therefore, it can be anticipated that this method has important application value and will be more widely used in the future.

Conclusion
This work presented a Laplace ' 1 robust Studen's T-filter to solve the attitude estimation problem of small satellites subject to heavy-tailed noise and modeling errors. The maximum correntropy criterion and the Laplace distribution were used to model the measurement process. The majorization minimization technique was applied to solve the maximum posterior estimation via an iterative algorithm. Numerical simulation with the application of the proposed L1-RSTF was conducted by comparing it with the 3rd-CKF and the 5th-CKF. The effectiveness of the proposed approach in both the accuracy of system estimation and the system robustness with less calculation burden was validated. As one of future work, using the estimated states provided by the proposed filtering approach, the attitude controller design should be carried out to ensure highaccuracy attitude control. This could be done by invoking the robust methods reported in. [35][36][37][38]

Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.