Adaptive optimal control approach to robust tracking of uncertain linear systems based on policy iteration

In this study, an optimal adaptive control approach is established to solve the robust output tracking problem of a class of continuous time uncertain linear systems based on the policy iteration (PI) in actor-critic algorithm. First, by augmenting the integral variables of the tracking error into state variables, the robust tracking problem is transformed into a robust control problem of an augmented uncertain linear system. It is proven that the robust control law of the augmented system enables the output of the considered system to track a polynomial time signal asymptotically. Second, an optimal control method in the corresponding auxiliary nominal system is established, and based on the Bellman optimality principle, PI algorithms are proposed to solve online tracking controllers for the matched and the mismatched uncertain systems. Finally, for testing the availability of the proposed approach and theoretical results, two numerical experiments are provided.


Introduction
Control systems in many practical problems have uncertainties, owing to data errors or disturbances. 1 Therefore, study on the robust control problem of uncertain systems has garnered significant attention. An adaptive optimal control method to design a robust tracking controller based on the policy iteration (PI) of the actor-critic algorithm is introduced in the research of algorithm design. Adaptive optimal control is a control method that automatically adjusts controller's parameters to optimize the performance of the control system according to the change of parameters or environment. Moreover, it is widely used in control practice. 2 Since the 1980s, the robust control problem of practical uncertain systems has been a research hotspot. Algebraic Riccati equations (ARE) were utilized to propose a robust control design for some uncertain systems. 3,4 The robust control of time-varying uncertain linear systems was studied, 5 and the controller parameters were obtained using a one-dimensional search method. The proposed method could be effective if the system were to have satisfied the matching condition. Using a noniterative method, a robust controller was obtained for a matched system. 6 The need for precompensators of the unstable nominal system was a disadvantage of this design method. In addition, other effective robust control-design methods were proposed for general matched uncertain systems. 7,8 However, only a few study results are available for mismatched systems having an uncertainty in the input matrix.
The robust tracking problem has been investigated in many studies. [9][10][11][12]32,34 Schmitendorf et al. proposed a robust tracking control approach for time-invariant uncertain system with unknown constant disturbances. 9 Benson and Schmitendorf considered a robust tracking problem by using observer-based augmented system. 10 Utilizing an improved linear quadratic optimal control method, Shieh et al. designed a robust tracking law for an uncertain linear systems. 11 Solving a robust tracking problem for an uncertain system was turned into an optimal control problem. 12 For linear systems, it is a linear regulator problem, which is solved using an ARE.
Furthermore, the PI in reinforcement learning (RL) has been utilized extensively in solving control and tracking problems for deterministic systems. In terms of feedback control, we can refer to the literature. [13][14][15] There are also some literatures on using PI algorithm to solve tracking problem. An online RL was established to obtain a linear tracking controller in partially unknown linear systems with unstable tracking signals. 16 By constructing an augmented system comprising the state variable and tracking signal, the tracking problem was turned into an optimal control problem. Therefore, the optimal tracking cost is quadratic as well. An adaptive dynamic programming (ADP) algorithm was proposed to obtain the tracking controller for completely unknown linear systems, 17 where tracking signals were generated for stabilizing the systems. Moreover, RL and ADP techniques were employed to achieve an optimal output regulation for linear systems. 18 On the other hand, the application of RL in the robust control of uncertain systems has been proposed by many researchers. [19][20][21][22][23][24][25][26] Robust tracking problems exist widely in practical applications. However, to our knowledge, only a few studies were conducted on the PI-based robust tracking of uncertain linear systems. Most existing design methods for solving robust tracking problems are based on impractical nominal systems. Therefore, it is necessary to use the PI algorithm to solve the robust tracking problems of uncertain linear systems with an unknown nominal system matrix. In this study, a PI method was developed to solve the robust tracking control of an uncertain linear system with a polynomial tracking signal. The robust tracking problem was turned into a robust control problem of an augmented linear system. Additionally, online PI algorithms were developed to solve the considered tracking design problem for matched and mismatched systems.
Our main contributions are as follows. First, we considered the tracking problem of general uncertain linear systems in which both the state matrix and the input matrix were uncertain. In the existing literature, 12 the design of a robust tracking-control method was proposed for uncertain linear systems only when an uncertainty entered the system matrix. The existing results were extended to a case where the input matrix was uncertain. Second, online PI algorithms were presented to solve tracking problems for the matched uncertain system and the mismatched uncertain systems. Because it is practically difficult to obtain the nominal systemmatrix information accurately, using the PI algorithm is advantageous. As a result, we extended the PI algorithm to calculate robust tracking control law for the general uncertain linear systems.
The rest of this paper is arranged as follows. We formulate the robust tracking problems and propose some basic results for the issues under consideration in Section 2. Solving robust tracking problem is converted to calculate a robust control law for augmented uncertain systems. In Sections 3 and 4, the robust tracking problem for a matched and mismatched linear system is solved by transforming it into a robust control problem of an augmented system. Online PI algorithms are developed for an augmented uncertain system based on the optimal control of an auxiliary linear system. To support the proposed theoretical framework, we provide numerical experiments with two examples in Section 5. In Section 6, the study is concluded, and the scope for future research is discussed.

Robust tracking control framework
Consider an continuous-time linear system with uncertainty as follows: where x 2 R n is the state vector, u 2 R m is the input variable, y 2 R 1 is the system output, s 2 S, l 2 L are the uncertain parameter vectors, and S and L are sets of uncertain parameters. A(s) is n3n uncertain state parameter matrix, B(l) is n3m uncertain input parameter matrix, and C is an 13n constant output matrix. The objective of control design is to establish a control input, u = Kx, such that the system output y asymptotically tracks the desired referenced signal, y r , for all s 2 S and l 2 L. The referenced signal is assumed to be a polynomial time signal, y r =a 0 + a 1 t + Á Á Á + a dÀ1 t dÀ1 , where a 0 , a 1 , Á Á Á , a dÀ1 are constants, and d is a nonnegative integer. In particular, the control design goal is to establish a control input, u = Kx, such that the closed-loop uncertain system (1) is asymptotically stable and the output, y = Cx, can asymptotically track the signal, y r , for all s 2 S and l 2 L.
Some definitions and basic assumptions 1,27 are elaborated as follows.
Definition 2. System (1) satisfies the matched condition in input matrix if, for every l 2 L, there is a matrix u(l), such that where u(l) 2 R m3m , and u(l)50.
Definition 3. When system (1) satisfies conditions (2) and (3), for all s 2 S and l 2 L, it is called a matched uncertain linear system.
Assumption 2. For the linear system (1) with matching uncertainties satisfying conditions (2) and (3), there is a known matrix, M, such that u T (s)u(s)4M50 for every s 2 S.
If system (1) does not satisfy matching conditions (2) and (3), the pseudo inverse, B(l 0 ) + , of nominal input matrix B(l 0 ) is introduced to decompose the uncertain system matrix and input the matrix into the following matched and mismatched parts. and where B(l 0 ) + = B T (l 0 )B(l 0 ) ½ À1 B T (l 0 ).
If system (1) is a mismatched uncertain system, let the uncertainty of the system be subject to the following assumptions.
Assumption 5. A positive semidefinite matrix, G, exists, such that The above assumptions are common in the study of uncertain systems 1,26,27 .
To solve the robust tracking problem, we introduce a new variable, e = y À y r , as an error. Based on the integrals of the error variable, the following new state variables are defined: Now, define the augmented system state as Using the augmented system state, an uncertain augmented linear system is constructed as where Here, O i3j 2 R i3j represents a zero matrix. Equation (8) can be regarded as a linear nonhomogeneous uncertain system with a nonhomogeneous term, Ny r . The considered robust tracking problem could be converted to the following robust stabilization problem. For every s 2 S and l 2 L, find a control input, u = KX, such that the augmented homogeneous uncertain linear system is asymptotically stable.
Proof. Assume that system (1) satisfies the system matrix matched condition (2) and the input matrix matched condition (3). Consequently, A(s) À A(s 0 ) = B(l 0 )u(s). Therefore, , and . This implies that the augmented system (3) satisfies the system matrixmatched condition. Moreover, which implies that the augmented system (9) satisfies the input matrix-matched condition. Hence, the proof is complete.
T . For all s 2 S, l 2 L, supposing that the state feedback controller u = KX = Kx K 1 q Â Ã can stabilize the system _ X = T(s)X + B(l)u, then u = Kx will stabilize system (1), and y ! y r as t ! '.
Proof. Because (9) is an augmented system corresponding to system (1), u = Kx will stabilize system (1). By determining the d-order differential on both sides in the augmented system (8), it yields Considering that u = KX can stabilize system (9), the matrix T(s) + B(l)K is Hurwitz stable. It follows from . Therefore, we have y ! y r as t ! '. Hence, the proof is complete.
Lemma 2 shows that the robust tracking problem of system (1) can be transformed to a robust stabilization problem of the augmented homogeneous system (9). Remark 1. Tan et al. 12 considered the robust tracking of an uncertain linear system without input uncertainty. In this study, we considered a system with uncertainties entering the input matrix, which is an extension of the existing study outcomes.
Here, the system is divided into matched and mismatched cases, and the robust tracking problem of the system is discussed.

Matched uncertain linear systems
In this section, PI algorithms are developed to calculate robust tracking control law for linear systems with matched uncertainties. The problem is transferred into stabilizing an augmented uncertain system which contains the original system states and tracking signal. Based on solving an optimal control problem with augmented nominal system and predefined performance index, the PI algorithms are proposed to obtain robust tracking feedback control.

Robust Stabilization of an augmented linear system with uncertainty
Here, we discuss system (1) when it satisfies the matched conditions. Furthermore, robust stabilization for the augmented uncertain systems is transformed into calculating an optimal control with nominal system and predefined performance index. The optimal control problem is solved using the PI method, and the required robust tracking control law is obtained.
For the nominal linear system we construct an optimal-control problem. Acquire a controller, u = KX, that minimizes the following performance index where Q = M + I50 and M is the supremum of uncertainty F T (s)F(s), and I is an identity matrix with proper dimensions. Actually, , which based on the following derivation Here, O denotes a zero matrix of an appropriate dimensions.
According to the optimal control theory, 26 u =À B T (l 0 )PX is the solution of the optimal control problem (11) and (12), which the positive matrix P satisfies the following ARE: (1) is a matched uncertain system with conditions (2) and (3), then the solution u = KX, in optimal control problem (11) and (12) can stabilize the augmented uncertain system (9). That is, for every s 2 S, l 2 L, the closed-loop system, is asymptotically stable.
Proof. It follows from Lemma 1 that the augmented system (9) satisfies the following matching conditions: and Choosing the Lyapunov function as V(X) = X T PX and taking its time derivative, along the closed-loop system (14), we obtain By matched conditions (15) and (16), It follows from the optimal control gain, K =À B T (l 0 )P, and Riccati equation (13), that We immediately conclude that dV dt \ 0, because F T (s)F(s)4 M and u(l)50. Therefore, for all s 2 S, l 2 L, closed-loop system (14) is asymptotically stable. Hence, the proof is complete.

PI algorithm and its convergence
This subsection has two parts. First, based on the performance function (12), an offline policy iteration algorithm is proposed to solve the robust tracking problem. Then, an integral reinforcement formula is developed based on the Bellman equation. An online real-time PI algorithm is developed to solve the robust tracking problem with unknown nominal system matrix.
For any initial time, t, rewrite the performance function (12) as Differentiating (17) with the trajectories of system (11) yields where H = T(s 0 ) + B(l 0 )K. Using (18) and K =À B T (l 0 )P, an off-line PI algorithm for robust tracking problem is obtained. Algorithm 1. Offline PI algorithm for robust tracking of matched uncertain linear systems 1. Initialization: Select an initial stabilization control gain, K 0 . 2. Policy evaluation: Solve P i in the equation, for a given control gain K i , where H i = T(s 0 ) + B(l 0 )K i . 3. Policy improvement: Compute K i + 1 using By alternative iterating (19) and (20), Algorithm 1 can be used to calculate the robust tracking law of uncertain systems, which is an extension of Kleinman's algorithm. 28 The convergence proof of Algorithm 1 is identical to that of Kleinman's algorithm.
In Algorithm 1, we can solve ARE (13) by iteratively computing (19) and (20). However, the implementation of the algorithm needs to know the information of the nominal system, and the process of calculating the controller can only be realized offline. Here, an online PI algorithm is developed to solve ARE (13) with an unknown nominal system matrix, T(s 0 ).
For any initial time, t, the optimal cost in (12) can be rewritten as In equation (21), only the matrix P is unknown. we can calculate the elements of the matrix P through the system trajectory data. Consequently, an online PI algorithm sloving robust tracking problem for uncertain linear system is obtained.
Algorithm 2. The online PI algorithm for robust tracking of matched uncertain linear systems 1. Initialization: Select an initial stabilization control law, K 0 . 2. Policy evaluation: Compute P i from 3. Policy improvement: Compute K i + 1 Remark 2. Algorithm 2 is an online PI algorithm based on the RL. Through the online trajectory data of system (11), the matrix P i can be solved using (22) and the least-squares method. Through iterative calculation and using (22) and (23), the robust control gain, K, for augmented uncertain linear system (9) is obtained. Moreover, according to Lemma 2, the system robust tracking law is expressed as u = Kx by decomposing the robust control law into KX = Kx K 1 q Â Ã . As an advantage, Algorithm 2 need not know the nominal system matrix and can effectively avoid dimension disaster. In, 13 by the use of online PI, the linear quadratic regulator with unknown system matrix is calculated. We developed this algorithm to solve the robust tracking problem for an uncertain linear system.
The convergence in Algorithm 2 is proven as follows: Theorem 2. We assume that (T(s 0 ) + B(l 0 )K i ) is stable, and we solve P i using equation (22), which equals to obtaining a solution for the following equation: Proof. Dividing by Dt on both sides of (22) and taking a limit, we attain Thus, (22) implies (24). Furthermore, take into account the asymptotically stable systems, _ X = (T(s 0 ) À B(l 0 )K i )X. Selecting the Lyapunov function as V i (X) = X T P i X, and taking its time derivative along the yields Calculating definite integral from t to t + Dt leads to which is (22). Hence, the proof is complete. Therefore, the integral reinforcement relation (22) is equivalent to equation (19) in Algorithm 1, which is an extension of Kleinman's algorithm. This indicates the convergence of Algorithm 2.

Mismatched uncertain linear system
The robust tracking problem of a mismatched uncertain linear system (1) is discussed in this section. When the matched conditions (2) and (3) are not satisfied in system, we consider the design of robust tracking control law. The robust tracking problem is transformed into designing a robust control law for an augmented uncertain linear system. By calculating iteratively the optimal control law for an extended nominal system with a properly defined performance index, the online PI method is used to obtain robust tracking control law.
Obviously, the augmented system (9) is a mismatched uncertain linear system when the uncertain system (1) does not match conditions (2) and (3). In addition, the uncertainty of the augmented uncertain linear system can be decomposed into a matched part and a mismatched part according to (4) and (5): where Proof. DenoteÃ = A(s) À A(s 0 ). From Assumption 3, we can obtain ½T(s) À T(s 0 ) T ½ B(l 0 ) + T B(l 0 ) + ½T(s) À T(s 0 ) Therefore, (25) holds. In addition, we can achieve Therefore, (26) holds, thus completing the proof. We construct the following optimal control problem. For an extended nominal linear system, find an augmented state feedback law, u = KX, v = LX, in order to minimize the performance index ð ' where b50 is a design parameter. We denoteB = B(l 0 ) (I À B(l 0 ) B(l 0 ) + ) Then rewriting (27) yields and (28) can be rewritten as By computing the optimal control law, it follows that where P is a positive definite matrix solving the following ARE PT(s 0 ) + T T (s 0 )P + F + H + b 2 I À PBB T P = 0 ð30Þ The following theorem illustrates that optimal control law (29) can stabilize the mismatched uncertain linear system (9). Theorem 3. We assume that u = KX and v = LX are the solutions to the optimal control problems (27) and (28). If parameter b can be chosen to hold the following conditions: then the optimal control law, u = KX, with K =À B T (l 0 )P can stabilize the mismatched augmented uncertain system (9). That is, for every s 2 S, l 2 L, the mismatched closed-loop system is asymptotically stable.
Proof. Choosing the Lyapunov function as V(X) = X T PX and taking its time derivative, along the closed-loop system (32), we obtain By decomposing the matched and mismatched parts in the augmented uncertain system (9), we will have B(l 0 ) + (T(s) À T(s 0 ))X, and À 2X T L T (T(s) À T(s 0 ))X 4X T (T(s) À T(s 0 )) T (T(s) À T(s 0 ))X + X T L T LX According to the conditions in (31) dV dt \ 0: Therefore, the mismatched uncertain linear system (9) is asymptotically stable for every s 2 S and l 2 L 31 . That is, u =À B T (l 0 )PX is a robust controller for the augmented mismatched uncertain system (9). Hence, the proof is complete.
Remark 3. Generally, the pseudo-inverse of matrix, B(l 0 ) + , will exist if its column vectors are not linearly dependent. 29 In practical application, the input matrix, B(l 0 ), is usually a matrix with column full-rank. Therefore, the pseudo-inverse of the input matrix B(l 0 ) is generally satisfied. Furthermore, the pseudo-inverse B(l 0 ) + satisfies B(l 0 ) + B(l 0 ) = I. However, it does not satisfy B(l 0 ) B(l 0 ) + = I.
Here, we propose a robust tracking PI algorithm based on the RL for mismatched linear systems.
For any initial time, t, the optimal performance function (28) can be written as where Q = F + H + b 2 I. Similar to Algorithm 2, the following online PI algorithm can be utilized to calculate the robust tracking law of a mismatched uncertain linear system. Algorithm 3. The online PI algorithm for robust tracking of mismatched uncertain linear systems 1. Initialization: Select an initial stabilization control gain, K 0 . 2. Policy evaluation: Compute P i from 3. Policy improvement: Compute The convergence of the algorithm is proven similar to that of the matched system, which is omitted here.

Remark 4.
Robust tracking design is a challenging problem, 30,33 particularly when the signals to track is unstable. We consider the tracking problem of a class of polynomial signals that may be unstable. Algorithm 3 is an online robust control method, which can be available for mismatched uncertain systems with unknowing nominal state matrix. Using the least squares method, the matrix P i can be solved using (35) through the online trajectory data of system (27). We used the PI algorithm in the RL to solve this problem, which is a novel approach. In Theorem 3, it is clear that if parameter b is selected to be large relatively, then the conditions in (31) easily matches in many practical applications.

Numerical experiments
The results of two numerical experiments are demonstrated to examine the viability of the theoretical frameworks. Moreover, the proposed PI algorithms are applied to solve the robust tracking of matched and mismatched uncertain systems effectively. Example 1. A matched uncertain linear system is considered as follows: where s 2 ½0, 3 and l 2 ½1, 3 are uncertain parameters. The reference signal is assumed to be y r = t + 5. The control objectives of system is to design a state feedback law, u = Kx, which can make the system output y = Cx asymptotically track the reference signal for every s 2 ½0, 3 and l 2 ½1, 3.
We denote A(s) = 0 2 2 + s s By applying Algorithm 2, a robust tracking-control gain is obtained for system (37). We chose the initial control gain as ½À14:5 À 8 À 20 À 6. The initial condition of the augmented nominal system was selected as X 0 = ½0:2 0 0:15 0 T . By using MATLAB, after four iterations, the positive definite matrix P and the control gain converge to the optimal solutions as follows: We used MATLAB to solve the ARE directly, which is the implementation of Algorithm 1. Consequently, the following P matrix is obtained. By comparison, the online PI algorithm has the same effect as the existing offline methods. But the online PI method does not need to know the nominal system matrix, which is the advantage of algorithm 2. It clearly shows that PI algorithm is effective in solving robust tracking problem. Figure 1 shows the evolution of the tracking-control signal. The convergence process of the elements in the P matrix is shown in Figure 2. It shows that optimal cost is obtained at time t = 4s after four updates of the controller parameters. The evolution of system output tracking reference signal with different parameters, s and l, are presented in Figure 3, which show that robust tracking is achieved for uncertain system (37).
Example 2. A mismatched uncertain system is considered as follows: where s 2 ½À2, 2 and l 2 ½1, 3 are uncertain parameters. The reference signal is assumed to be y r = 3. The control objectives of system is to design a state feedback law, u = Kx, which can make the system output y = Cx asymptotically track the reference signal for every s 2 ½À2, 2 and l 2 ½1, 3.
We denote A(s) = s s À It is clear that the system is a mismatched system. The initial condition of the augmented nominal system is selected as x 0 = ½3, À 2, À 8 T . By using MATLAB,  To solve the ARE directly using MATLAB software, the following P matrix is obtained:   It clearly shows that PI algorithm is effective in solving robust tracking problem for mismatched uncertain system. Figure 4 shows the evolution of the trackingcontrol signal. The convergence process of the elements in the P matrix is shown in Figure 5. The evolution of the output and the reference trajectory with different parameters s and l are presented in Figure 6. As a result, a robust output tracking is achieved for mismatched uncertain system (43).

Conclusion
In this study, RL-based online PI algorithms were proposed to calculate robust tracking control law for uncertain linear systems. It was based on an online policy iteration without using a nominal system matrix. The robust tracking problem was transformed into solving an optimal control with a predefined cost function. Based on the corresponding augmented linear system, offline and online PI algorithms were established to obtain a robust tracking controller. Numerical experiments were presented to demonstrate the effectiveness of the theoretical results. The proposed method may be developed to solve tracking problems of uncertain discrete-time systems, which may be the subject of our future research.