A tracking error–based fully probabilistic control for stochastic discrete-time systems with multiplicative noise

This article proposes the exploitation of the Kullback–Leibler divergence to characterise the uncertainty of the tracking error for general stochastic systems without constraints of certain distributions. The general solution to the fully probabilistic design of the tracking error control problem is first stated. Further development then focuses on the derivation of a randomised controller for a class of linear stochastic Gaussian systems that are affected by multiplicative noise. The derived control solution takes the multiplicative noise of the controlled system into consideration in the derivation of the randomised controller. The proposed fully probabilistic design of the tracking error of the system dynamics is a more legitimate approach than the conventional fully probabilistic design method. It directly characterises the main objective of system control. The efficiency of the proposed method is then demonstrated on a flexible beam example where the vibration quenching in flexible beams is shown to be effectively suppressed.


Introduction
In control systems, the tracking error between the system output and a predefined desired output is the most commonly used optimisation signal for the tuning of the parameters of the system controller (Gaudio et al., 2019;Gerasimov et al., 2019;Humaidi and Hameed, 2019;Wu and Du, 2019;Zhou et al., 2020;Zhou et al., 2017). When accompanied with adaptive control (Chen and Jiao, 2010;Narendra and Annaswamy, 2005;Tao, 2003), the approach has been particularly proven useful to control systems that are affected by model uncertainty, random noises, and that are operating under changing environments and have unforeseen variations in their overall structure. Despite being adaptive and therefore are expected to deal with the underlying system uncertainty, many of the aforementioned methods are based on the minimisation of the mean square tracking error to optimise the controller parameters. The minimisation of the mean square tracking error, also known as tracking error variance, on the other hand, is based on the assumption of certainty equivalence; therefore, it does not generally yield a good performance. Thus, for more general stochastic systems and for systems with functional and model uncertainty, the variance of the tracking error cannot be used alone to represent the performance of the closedloop system (Herzallah, 2007;Herzallah and Lowe, 2003;Herzallah and Lowe, 2004;Herzallah and Lowe, 2006;Yue and Wang, 2003;Zhang et al., 2016). As a result, the Kullback-Leibler divergence (Cliff et al., 2018;Kulback, 1959;Yu and Mehta, 2009) measure has been proposed recently in several control literatures to characterise the uncertainty of the stochastic systems dynamics. This is because the Kullback-Leibler divergence measures the discrepancies between the stochastic system distributions to their desired distributions rather than characterising them by their means or variances.
An efficient control approach, known as fully probabilistic design (FPD), that uses the Kullback-Leibler divergence as a performance measure for designing randomised controllers has been proposed in Karny (1996) and Herzallah and Karny (2011). In this approach, the Kullback-Leibler divergence is used to measure the discrepancy between the joint pdf of the closed-loop description of the system dynamics and an ideal joint pdf. The main advantage of the FPD control approach is that it provides a closed-form solution for general description of stochastic systems without constraints of certain distribution. However, although a closed-form solution can be obtained, the solution cannot be evaluated analytically because of the multivariate integration involved in the optimisation process. Besides, in its original form the FPD control method considers the design of a randomised controller that shapes the pdf of the system dynamics. Nonetheless, the characterisation of the pdf of the system dynamics can be difficult for many real-world systems that work under high levels of uncertainty and stochasticity. Furthermore, in many real engineering systems the controller objective is to make the output of the system dynamics follow a predefined desired output value, thus emphasising the importance of the tracking error rather than the actual system output.
As such, this study follows an alternative approach where the Kullback-Leibler divergence is defined to be the distance between the pdf of the joint distribution of the tracking error and the randomised controller of the controlled system to an ideal joint distribution function. Therefore, the randomised controller is designed here to reshape the pdf of the tracking error of the controlled system rather than the pdf of its dynamics. Compared with the existing results on the topic and the conventional approach of FPD, this alternative approach has several advantages that have not been reported in the literature. First, the characterisation of the pdf of tracking error of the controlled system is normally easier than that of the pdf of its dynamics. This is because when the stochastic dynamics of the controlled system are estimated accurately, the resulting tracking error of the system will be small and most likely can be characterised by a Gaussian pdf. The aforementioned in turn simplifies the optimisation of the sought randomised controller. Second, the ideal distribution of the tracking error can be naturally specified by a zero mean distribution. In particular, a Gaussian distribution with zero mean and a prespecified covariance matrix that determines the allowed fluctuations of the tracking error around its zero mean value would be ideal. Furthermore, the FPD method in its original form considers additive noise only to the system dynamics. Our alternative solution considers stochastic systems with multiplicative noises which represent conditions under which most real-world systems operate. Therefore, an additional contribution of the study is the consideration of the multiplicative noise of the stochastic system in the derivation of the randomised optimal control law. Moreover, the proposed probabilistic minimisation of the tracking error will be shown to be particularly useful for solving the vibration control problem associated with mechanical systems. The vibration control problem is particularly challenging and is relevant to many real-world control problems, including robotic manipulators, aerospace structures, and biomechanical systems (Flores and Barbieri, 2006;Pappalardo et al., 2016;Simone et al., 2018;Sohn et al., 2009;Song and Gu, 2007).
To reemphasise, this alternative solution of the tracking error and the extension of the FPD to stochastic systems with multiplicative noises have not been discussed previously in the literature. Its theoretical development and numerical demonstration will be presented for the first time in this article.

Problem statement
In the original formulation of the FPD, the aim is to derive a randomised controller that shapes the joint probability density function of the stochastic system dynamics and the controller. This joint probability density function of the controller and the dynamics of the stochastic system represents the complete description of the closed-loop behaviour of the controlled system. However, in some control applications, the system is required to track a predefined desired trajectory. Thus, for these control applications, it would be more convenient to design the controller such that it reshapes the pdf of the tracking error as opposed to the original formulation of reshaping the pdf of the system dynamics. For the system to be able to track the desired signal, the controller should be designed such that the pdf of the tracking error is centred around zero with small variations. This objective of achieving a narrow distribution of the tracking error centred around zero error state implies that the system has tracked the desired trajectory and at the same time indicates that the uncertainty in the tracked trajectory is small. To be more specific, assume that the stochastic system can be described at each time instant k by the following conditional pdf where x k 2 ℜ n is the system state and u k 2 ℜ m is the system input. Defining the reference state that the system will be required to track as x r 2 ℜ n , then the system tracking error is given by Because the considered system in this study is stochastic and subject to random forces and functional uncertainties, only the probability density function of the state values defined in equation (1) can be specified. On the other hand, because the objective of this study is to design a randomised controller that shapes the pdf of the tracking error as a result of the requirements that the system state tracks a desired set point, the pdf of the tracking error needs to be assumed to be known which may be an unrealistic assumption for many realworld control problems. However, the density function of the tracking error can be obtained from the density function of the system dynamics using the probability theory as follows In general, s(x k |.) is not known in reality, thus needs to be estimated online using the observed data of the controlled system. The estimation process of this pdf is explained in Section 3.2.
Once the pdf of the tracking error is estimated, the randomised controller can be derived by redefining the ;e H ;u 0 ;…; u HÀ1 Þ, and H is the control horizon. Following the same approach of the original FPD, the minimisation of the Kullback-Leibler divergence defined in equation (4) can be achieved by recursively solving the backward recurrence equation that is given in the following proposition.
Proposition 1. The optimal randomised controller c(u kÀ1 | e kÀ1 ) can be obtained by recursively solving the following recurrence equation (Herzallah and Karny, 2011) Àlnðγðe kÀ1 ÞÞ ¼ min Proof. The derivation of the above result can be found in Herzallah and Karny (2011).
The optimal randomised controller that minimises the recurrence equation specified in equation (5) can then be shown to be given as specified in the following proposition.
Proposition 2. The pdf of the optimal randomised controller that minimises cost-to-go function (5) is given by Proof. This proposition can be proven by adapting the proof of Proposition 2 in Karny and Guy (2006). Note that the solution of the optimal randomised controller as specified in this proposition is not restricted by the pdf that characterises the error or the controller. It provides the general solution for the randomised controller without constraints on the required pdfs. However, the evaluation of the analytic solution for this randomised controller is not possible except for the special case of linear and Gaussian pdfs. Therefore, to facilitate the understanding and the analytical solution of the proposed tracking error-based FPD, the next section will demonstrate the solution to the probabilistic tracking control for a class of linear stochastic systems with multiplicative noise.

Solution of the probabilistic tracking control for linear stochastic systems
The theory developed in the previous section will be applied here to derive the analytic solution of the probabilistic tracking control for linear stochastic systems with multiplicative noise. Stochastic systems with multiplicative noises arise naturally in networked control systems where multiplicative noises are used to model packet loss. Previous works have considered this class of stochastic systems where the multiplicative noise is used to model packet loss (Wei et al., 2013) and time delay (Zhang et al., 2015) that happens during packet transmission in communication networks. This is different to parameters uncertainty (Lee et al., 2001;Liu et al., 2010;Xie et al., 1992) where the uncertainty of the parameters is usually grouped with the parameters of the state and can be considered stochastic or deterministic. The development of a robust control solution for these systems has been a long standing and still unsolved problem.

Model description
Consider a stochastic linear discrete-time system with multiplicative Gaussian noise described by where x k 2 ℜ n is the system state, and u k 2 ℜ m is the system input as defined before, A, B, and D are system matrices with appropriate dimensions, and v k 2 ℜ is an independent Gaussian noise with zero mean and covariance Q.
It should be noted that in real-world situations the parameters of stochastic model (8) are not known in general, thus need to be estimated. However, because the current value of the system state is affected by noise, its value cannot be completely specified by the previous control and previous state values. Therefore, the probabilistic description of stochastic model (8) needs be estimated online using observed data from the stochastic system dynamics to describe the probabilistic evolution of the system state. The online estimation process of the stochastic system parameters and consequently the system state distribution will be discussed next.

Estimation of the probabilistic description of the system tracking error
As discussed in the previous section, because of the stochastic nature of the system dynamics, only the probabilistic description of the system state can be specified. This can be obtained by estimating the system parameters of the stochastic equation of the system state given in equation (8). Therefore, given our prior knowledge of the linear dynamics of the system and the fact that it is driven by multiplicative noise, the required model of system (8) can be assumed to have the following form where A, B, and D are the estimates of the matricesÃ;B, andD, respectively. Then these parameters can be estimated online by updating their values at each time instant, k, when a new measurement of the state value becomes available. In particular, rewrite equation (9) as follows where q ¼ ½A B D and χ kÀ1 ¼ ½x kÀ1 u kÀ1 x kÀ1 v kÀ1 T . Here, χ kÀ1 has dimension (2n + m) × 1 and q has dimension n × (2n + m), where n and m are the dimensionality of the state vector and control input, respectively, as stated earlier. Then given a new observation of the system state x k , the parameter vector q can be estimated. Because the matrix χ kÀ1 is not a square matrix, the estimation of the parameter vector can be achieved by first multiplying both sides of equation (10) by χ T kÀ1 and then solving for the parameter vector q where χ † kÀ1 is a 1 × (2n + m) matrix known as the pseudo inverse of χ kÀ1 and is given by Remark 1. As can be seen from equation (12), the pseudoinverse matrix does have the property that χ kÀ1 χ † kÀ1 ¼ I, where I is the identity matrix. However, note that χ kÀ1 χ † kÀ1 ≠ I in general. If the matrix χ kÀ1 χ T kÀ1 is singular, then equation (11) does not have a unique solution. In this case, if the pseudo inverse is defined as then the limit can be shown to always exist and that the limiting value guarantees the optimal solution of equation (11). Following the estimation of these parameters, the conditional distribution of the system state is shown to be Gaussian described by where Ax kÀ1 + Bu kÀ1 is the mean of the state calculated using the estimated parameters A and B, and Dx kÀ1 Qx T kÀ1 D T is the covariance of the state calculated using the estimated parameter D.
For the objective of deriving a randomised controller that will achieve a narrow tracking error distribution centred around zero, thus guaranteeing an accurate tracking of the system state to the desired value, the tracking error distribution needs to be specified. This can be obtained from the definition of the tracking error given in equation (2).
The dynamical description of the tracking error can then be obtained by substituting equation (9) into (2), which yields where we have introduced the definition F = A À I. From equations (3), (14), and (15), the distribution of the tracking error is Gaussian with mean μ k and covariance Σ k specified as follows where

Randomised control solution
In this section, the generalised fully probabilistic control solution of the tracking problem for the stochastic linear system with multiplicative noise defined in equation (8) is derived. As discussed in earlier sections, the pdf of the system tracking error is assumed to be unknown, thus estimated online as explained in Section 3.2. The purpose of the designed controller here is to make the pdf of the tracking error sðe k ju kÀ1 ; e kÀ1 Þ follow a predefined ideal pdf s I (e k |u kÀ1, e kÀ1 ) and bring the tracking error to zero. Thus, the ideal distribution of the system tracking error described by equation (16) is specified as where Σ 2 specifies the allowed fluctuations of the tracking error around its zero mean value. In addition, the ideal distribution of the sought randomised controller, cðu kÀ1 je kÀ1 Þ, is taken to be Gaussian with the following form where Γ is the covariance matrix of the ideal distribution of the control input and μ u is the mean of the ideal distribution of the control input. To achieve the objective that the optimised randomised controller brings the tracking error between the system state and its desired value to zero, the mean value of the ideal distribution of the controller, μ u , is calculated from equation (15) to be Given the pdf of the tracking error defined in equation (16) and the ideal pdfs of the tracking error and controller defined in equations (19) and (20), respectively, the performance index for the class of linear stochastic systems defined in equation (9) can then be shown to be given by the following theorem.
Theorem 1. Using the pdf description of the tracking error dynamics specified by equation (16), the ideal distribution of the tracking error dynamics given by equation (19) and the ideal distribution of the controller given by equation (20) in equations (6) and (7) give the following performance index and where Proof. The claimed quadratic form of the optimal performance function specified in equation (22) can be verified subsequently by backward induction. The proof starts by evaluating γ in equation (7), repeated here This evaluation requires the evaluation of β 1 and β 2 . Starting with β 1 β 1 ðu kÀ1 ; e kÀ1 Þ ¼ Z sðe k ju kÀ1 ; e kÀ1 Þln sðe k ju kÀ1 ; e kÀ1 Þ s I ðe k ju kÀ1 ; e kÀ1 Þ de k To solve (28), the following rule from Golub and Meurant (2009) where A 1 is a positive definite matrix. Because ðjΣ k jjΣ 2 j À1 Þ is positive definite, the lnðjΣ k jjΣ 2 j À1 Þ term in equation (28) can be rewritten as Assumption 1. Because the objective of the sought randomised optimal controller is to make the distribution of the tracking error of the system dynamics as close as possible to the specified ideal distribution, it is expected that at steady state the covariance of the tracking error dynamics will become close to the covariance of the specified ideal distribution. This means that Σ k Σ À1 2 À I < 1 (31) Remark 2. Please note that the covariance of the noise, Q, affecting the system will not be too large in real-world systems. This in turn means that Σ k ¼ Dx kÀ1 Qx T kÀ1 D T will not be too large as well. Therefore, the above assumption is a valid assumption. This will be proven numerically in the numerical results section, Section 4.
Based on Assumption 1 and Lemma 2.6 from Hall (2015), equation (30) can be approximated as follows where n is the dimension of e k . Using equation (32) in (28) and expanding the terms of equation (28), we get The last part in equation (33), 0:5 k Þe k de k , can be evaluated as follows Substituting equation (34) back into (33), we obtain Similarly, β 2 ðu kÀ1 ; e kÀ1 Þ can be evaluated as follows where we have used with M 2 = D T S k QD. Thereupon, substituting equations (35) and (36) in (27) and collecting the terms that multiply the control input, u kÀ1 , together yields The integral in equation (38) can be calculated by completing the square with respect to u kÀ1 . Consequently, γðe kÀ1 Þ can be shown to be given by Note that according to Theorem 1, Àlnðγðe kÀ1 ÞÞ ¼ 0:5ðe T kÀ1 S kÀ1 e kÀ1 þ P kÀ1 e kÀ1 þ w kÀ1 Þ. Thus, equating quadratic terms, linear terms, and constant terms in equation (39) with S kÀ1 , P kÀ1 , and w kÀ1 , respectively, yields the definitions stated in equations (23)-(25). This completes the proof. □ Following the above verification of the quadratic performance index, the next step is to evaluate the parameters of the optimal controller distribution that will make the pdf of the tracking error follow the given ideal pdf. Based on equations (6) and (39), the randomised optimal controller that minimises the Kullback-Leibler divergence objective function is given by the following theorem. The objective of the sought randomised controller is then specified to be of suppressing the quenching vibration in the beam and stabilising the angle between the hubs frame and a global (stationary) reference frame, θ at the value of 1. Therefore, the reference value that the system state is required to track is taken to be x r = [1,0,0,0,0,0] T . In addition, the system state is assumed to start from the following initial state values x 0 = [22,0.3,1,0.4,0.5,2] T .
As discussed in Section 3, the parameters of the flexible beam system equation as specified in equation (48) are assumed to be unknown, therefore are estimated online at each time step. The mean and covariance of the conditional distribution of the beam system dynamics are then specified using the estimated parameters as discussed in Section 3. These estimates of the mean and covariance of the beam system dynamics are then used in equations (23) and (24) to evaluate the Riccati equation, S k , as well as P k which are then both used in equation (41) to calculate the mean of the control input to be forwarded to the beam. Also, in the simulation experiment, the covariance, Σ 2 , of the ideal distribution of the tracking error is taken to be 0.01 × I n×n . The covariance, Γ, of the ideal distribution of control inputs is taken to be 1. The simulation results are shown in Figures  1 and 2. Figure 1 shows the various states of the flexible beam with their corresponding reference signals. As can be seen from this figure, all the flexible beam system states are accurately tracking their corresponding reference states. This can be confirmed from the magnified figures in Figure 1 which show the steady state values of the beam states. The tracking errors are presented in Figure 2(a), from which it can also be seen that all the state tracking errors go to zero. These figures, on the other hand, show large deviation of the beam state values from their corresponding reference values and large tracking errors in the transient period. This is expected as the parameters of the beam equation which are estimated online will not have converged to their true values in this transient period. Once the parameters converge to their true values, the beam states show good tracking to their corresponding reference values. Also, the control input as calculated from equation (41) is shown in Figure 2(b). The control input as can be seen from this figure is stable, thus yielding the required results. Finally, the feedback gain as calculated from equation (42) is shown in Figure 3. This figure shows that all the feedback gains have converged and reached steady state values. To reemphasise, the numerical results prove the efficacy of the proposed probabilistic tracking control method and show that the mean of tracking error can be minimised to reach zero value.

Conclusion
This article presented a new framework for the design of randomised controllers for complex stochastic and uncertain systems that is based on the minimisation of the Kullback-Leibler divergence of the tracking error of the controlled system. The new proposed framework considers the design of randomised controllers that take the multiplicative noises that affect the dynamics of the controlled stochastic system into consideration in the optimisation process. The theoretical development of this framework is demonstrated on linear Gaussian stochastic systems that are affected by multiplicative noises. The theoretical findings were then validated on controlling the vibration quenching of flexible beams.

Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The work was funded by a Leverhulme Trust Research Project Grant number RPG-2017-337.