Variational Inference of Kalman Filter and Its Application in Wireless Sensor Networks

An improved Kalman filter algorithm by using variational inference (VIKF) is proposed. With variational method, the joint posterior distribution of the states is approximately decomposed into several relatively independent posterior distributions. To avoid the difficulty of high-dimensional integrals, these independent posterior distributions are solved by using Kullback-Leibler divergence. The variational inference of Kalman filter includes two steps, the predict step and the update step, and an iterative process is included in the update step to get the optimized solutions of the posterior distribution. To verify the effectiveness of the proposed algorithm, VIKF is applied to the state estimation of discrete linear state space and the tracking problems in wireless sensor networks. Simulation results show that the variational approximation is effective and reliable for the linear state space, especially for the case with time-varying non-Gaussian noise.


Introduction
The Kalman filter (KF), also known as linear quadratic estimation (LQE), is one of the most used linear state estimation methods and has numerous applications [1], which include guidance, navigation, and control of vehicles [2][3][4][5].Kalman filter is also widely applied in time series analysis in fields such as signal processing and econometrics [6,7].Other nonlinear methods, such as Extension-Kalman filter (EKF) [2] and unscented Kalman filter (UKF) [8], also use the Kalman filter to fix problems, combined with the linearization of nonlinear functions or the approximation of the probability density distribution, respectively.Practical implementation of the Kalman Filter is often difficult due to the inability of getting a good estimate of the noise covariance matrices.Extensive researches have been carried out in this field to estimate these covariances from data, such as the autocovariance leastsquares (ALS) [9,10], the modified Bryson-Frazier smoother [11], and the minimum-variance smoother [12,13].
Variational inference for the Kalman filter in linear state space model was also considered in several literatures [14][15][16].However, the authors in [14] just used the limited memory BFGS method to decrease the computational burden, both factorization and Kullback-Leibler (KL) divergence [15] (the expectation of joint distribution with holding some variable distribution as constant) were not considered.Works in [16] indeed applied the variational approximation, and got the posterior distributions of the state variables and the measurement noise.However, in the update step of the Variational Bayesian approximation Adaptive Kalman Filter (VBAKF), the initial values of states or variances were used in all iterative steps; thus, the iterations would quickly reach "steady states." Sometimes, the maximum number of the iterations is only 2 in the algorithm.That is to say that there are only two effective iterative loops performed in the update step of the Kalman filter, therefore, the performance of this algorithm is not good in such cases as the non-Gaussian and high-dimensional linear state space models.In this paper, we rederive the parameter expressions for the Kalman filter by using variational inference method (we call it VIKF in this paper), which can really form an iterative process for the update step of the Kalman filter.
Location and tracking are the keys in variety WSN applications [17][18][19], where the Kalman filter and its extension algorithms are widely used.In this paper, we will also consider the application of VIKF for WSN tracking problems, especially with time-varying and non-Gaussian noises.

International Journal of Distributed Sensor Networks
This paper is organized as follows.We firstly introduce the linear state space model, then the derivation of Kalman filter by using variational inference (VIKF) follows, and the iterative process of VIKF is presented.In simulation section, the effectiveness of our proposed method will be demonstrated by examples.Finally, we will compare VIKF with standard KF and VBAKF in the cases of high-dimensional discrete state spaces, and the tracking in WSN with non-Gaussian noise.

The Linear State Space Model
Linear state space models are a widely used class of models for control, economics, and so on.The most general statespace representation of a discrete-time linear system with  samples,  state variables, and  outputs is written in the following form: where X  is an  × 1 state vector of the system at time ,  = 1, 2, . . ., .M  is the state transition matrix at time .   is the corresponding prediction error vector of X  ,    ∼ (0, Q  ).    is assumed to be independent and identically distributed (i.d.d), so Q  is a diagonal matrix.Y  is the observation vector, and we assume that there are  observations, so N  is an  ×  linear observation operator matrix.   is an  × 1 observation error vector and assumed to be i.d.d; therefore,    ∼ (0, Γ  ), Γ  = diag( 2 ,1 , . . .,  2 , ).Time is indexed by .In general, we assume that the initial state has a Gaussian prior distribution X 0 ∼ ( 0 , P 0 ), and M  , Q  , N  ,  0 , and P 0 are all known.In tracking problems, X  includes the information of positions and speeds of the moving target, X  = [  ,   ,   ,   ]  , where (  ,   ) is the two-dimensional coordinate positions and (  ,   ) is the two-dimensional speeds.

Variational Inference for the Linear Space Model
Variational inference techniques have been extensively studied to solve problems in various fields [20][21][22][23].There are two steps in the optimization for the standard Kalman filter [24].The first step is to predict the state X  and Γ  from the last state X −1 and the observations Y 1:−1 .The second step is to update X  and Γ  using information from new observations Y  .

Predict
Step.This step calculates X  and Γ  from Y 1:−1 .
Here we assume that (X  , Γ  | Y 1:−1 ) can be decomposed as According to (1), we know that (X  | Y 1:−1 ) is a Gaussian distribution: with where is the posterior variance of observation errors.We can assume that (Γ  | Y 1:−1 ) equals (Γ −1 | Y 1:−1 ) temporarily in the prediction step, and it will be updated in the update step.We also assume that this posterior distribution is an inverse-Gamma distribution (also called Wald distribution).The inverse-Gamma distribution is often used in Bayesian inference, where the distribution arises as the marginal posterior distribution for the unknown variance of a normal distribution.
(Γ  | Y 1:−1 ) can be decomposed as where   , ,   , are the shape and scale parameters of inverse Gamma distribution IG(⋅), and set

Update
Step.This step updates X  and Γ  based on the new observation Y  .According to the Bayes' rule, We assume that (X  , Γ  | Y 1: ) can be factored as The variational method gets the approximation of each component by minimizing the Kullback-Leibler (KL) divergence between the estimated value and the true joint posterior.Each component can be obtained by performing the expectation operation to the joint distribution with respect to all other unknown parameters.So we have ln (10) ln (11) According to the known priori, we calculate (10): Input observation matrix Y 1: ; set initial hyper parameters  0 , P 0 ,  0, ,  where From the predict step, we know that (X  | Y 1:−1 ) is also a Gaussian distribution (   , P   ) (see ( 4) and ( 5)).The product of two Gaussian functions is still a Gaussian, so Based on (4) and ( 13), we get the analytical expression of   and P  , and because variational method usually requires an iterative process to achieve convergence, so we expressed them as where the superscript () means the th iteration, Γ  = diag( ,1 / ,1 , . . .,  , / , ),  , ,  , ,  = 1, . . .,  will be computed lately.At the beginning of the iteration, we use the results of the predict step, set  (0)  =    , P (0)  = P   .Similar results also appeared in [16] and the algorithm there was called VBAKF.In the iteration process, VBAKF computes  (+1)  and P (+1)  with  (0)   and P (0)  , not  ()  and P ()  ; therefore, VBAKF cannot form sufficient iteration during the update process.The maximum number of effective iterations of VBAKF is small, sometimes even only 2, so the performance cannot achieve the optimal solution of the variational method.

Similarly, we calculate ln 𝑞
Based on the known conditions, we know that  Γ  (Γ  ) is still a Wald distribution, and can be expressed as In the iterations, parameters are calculated as follows: The initial values can be obtained from the predict step, so  (0) , =   ,  (0) , =   , .We can see that (15), ( 16), (19), and (20) form an iterative process.
Variational method is an approximate calculation, not the best computation.Through simulations, we find that the performance of the algorithm is better if we set  (+1) , = 1/2 +  (0) , .Therefore,  , does not participate in the iterative process in our algorithm.
During the iteration, the iterative process should be executed with a sufficient number of times to reach a steady state.A simple stop condition for this is to test the value of the difference of inferred parameters between two successive iterations; that is,  = ‖ (+1)  −  ()   ‖

2
. If the difference is small enough (say ), the iteration has reached a stable state.The selection for  can influence the accuracy of the algorithm.The smaller  is, the higher the accuracy is, at the cost of more iterations and more computational time needed.

The Algorithm of VIKF. VIKF includes two steps
; the first step is to predict the present state with the initial information or the results of previous states, and the second step is the update step, which includes an iterative process to update the state with new observations.The process of the algorithm is as shown in Algorithm 1.

Simulations
The performance of proposed algorithm (VIKF) was simulated and compared with that of the standard Kalman filter (KF) and the VBAKF.We first applied the three algorithms to high-dimensional discrete state space with the noise of Gaussian or Student's -distributions.Residual mean square errors (RMSE) between the true state and the inferred state were calculated in all simulations and were used as the main evaluation criteria.We also applied these three algorithms to the tracking problems in the wireless sensor networks.In these simulations, time-varying parameters are considered for the noise of Student's -distribution.4.1.High-Dimensional Discrete State Space Simulation.We considered a 40-dimensional discrete linear state space; the dimension of observation vector was 80, and there are 30 samples.State transition matrix and observation operator matrix were all sparse matrixes.In each simulation case, 20 replicates were conducted and the results were averaged to get the performance of RMSE.In the case of Gaussian noise, we assumed that the process noise and the observation noise were all subject to (0, 1).In the simulations, the freedom of Student's -distribution was 2 for the process noise, while the freedom was selected randomly from set (1,2,3,4) for the observation noise.The noise intensities were 1 for all the cases in simulations.
The RMSE performances of VIKF, VBAKF, and KF are shown in Figure 1, Figure 2 (for Student's -distribution), and Figure 3 (for Gaussian distribution).The results shown in Figure 1(a) show the RMSE of each state, which are obtained by averaging the results of the 20 replicates.Figure 1(b) shows the accumulated RMSE of all steps for each replicate.A typical simulation result is also shown in Figure 2. In these figures, we restrict the display range of the -axis to show the results clearly, so some points with high RMSE cannot be seen.We can see that the performance of VIKF is clearly higher than that of VBAKF and KF in the case of noise with Student's -distribution.KF and VBAKF all have a higher probability of occurring extreme RMSE in certain circumstances, while the VIKF algorithm has a relatively lower probability.The reason that VIKF can decrease the probability of occurring large RMSE is that VIKF can perform sufficient iterations.
In the case of Gaussian noise, the performances of these three algorithms are close, as shown in Figure 3.

Tracking in WSN.
The state space for the tracking problem in WSN was fixed as in Section 4, included twodimensional coordinates and two-dimensional velocities.As the simulations in Section 4.1, we considered two kinds of noise: Gaussian distribution and Student's -distribution.We assume that there are 15 sensor nodes and 50 sampling points.Noise was set as that in Section 4.1.
As Figures 1 and 3, Figure 4 shows the RMSE performances with Student's -distribution noise, where Figure 4(a) shows the RMSE of each state obtained by averaging the results of 20 replicates and Figure 4(b) shows the accumulated RMSE of all steps for each replicate.Figure 5 shows the corresponding results with Gaussian noise.Figure 6 shows the tracking performance using the three algorithms with noise of Student's -distribution.
With the noise of Student's -distribution, the performance of VIKF in tracking is higher than that of VBAKF and KF.VIKF's probability of high RMSE is also lower than KF and VBAKF.In Gaussian cases, it seems that the performance of KF is better than that of VBIKF and VIKF.

Conclusions
This paper used variational approximation method to solve the problems arisen from high-dimensional discrete state space and the tracking problems in WSN with linear state model.We proposed a modified variational filtering algorithm, in which an iterative process was formed to get the optimization solutions.
The performance of variational methods is slightly lower than that of KF in a Gaussian environment.But in non-Gaussian and time-varying noise environment, the average performances of VIKF and VBAKF are better than that of KF.Since VIKF can carry out more sufficient iterations than the VBAKF in the update step, the performance of VIKF is better than that of VBAKF.VIKF algorithm also has a lower probability of high RMSE than KF and VBAKF in non-Gaussian environments.

Figure 1 :
Figure 1: Performances of VIKF, VBAKF, and KF for the case of Student's -distribution.(a) The RMSE of each state obtained by averaging the results of 20 replicates.(b) The accumulated RMSE of all 30 steps for each replicate.

Figure 2 :
Figure 2: A typical simulation result with Student's -distribution in high-dimensional linear state space.

Figure 3 :
Figure 3: Performances of VIKF, VBAKF, and KF for the case of Gaussian distribution.(a) The RMSE of each state obtained by averaging the results of 20 replicates.(b) The accumulated RMSE of all 30 steps for each replicate.

Figure 4 :Figure 5 :
Figure 4: Performances of VIKF, VBAKF, and KF for the case of Student's -distribution.(a) The RMSE of each state obtained by averaging the results of 20 replicates.(b) The accumulated RMSE of all 50 steps for each replicate.