Self-learning control of model uncertain active suspension systems with observer–critic structure

This paper presents a self-learning control algorithm for model uncertain suspension systems using single network adaptive critic (SNAC) approach. First, a differential neural network (DNN) observer in conjunction with the weight updating law is established to observe the uncertain dynamic. Then, the nominal optimal value function is approximated by a critic NN whose weight is updated by a novel design learning law driven by the filtered parameter error. The online self-learning control policy is thus derived by approximately solving the Hamilton–Jacobi–Bellman (HJB) equation based on SNAC technique. The Lyapunov approach is synthesized to ensure the convergent characteristics of the entire closed-loop system composed of the DNN observer and the self-learning control policy. Computer simulation of a quarter car suspension system is established to verify the effectiveness of the proposed approach. Simulation results illustrated that the designed method can ensure the good performance in terms with the road hold and ride quality. In addition, independent of model and online self-learning characteristics make it possible to design a high-performance vehicle active suspension controller.


Introduction
A high-performance suspension system is the key to the pursuit of a smoother and safer car. 1 Correspondingly, the suspension system is expected to have more intelligence to adapt to different road inputs. It needs to be pointed out that the suspension system is affected by unknown road input, resulting in unavoidable vibrations related to vehicle ride comfort. Therefore, when designing the controller of the active suspension system, state variable information related to the vertical dynamics of the vehicle should be provided. 2 However, the implementation of suspension control system based on traditional observation theory brings great challenges because unknown road input is difficult to measure or observation when the vehicle is driving. Moreover, the uncertainty of parameters, such as sprung mass, also brings difficulties to the design of suspension observer and controller. Enhancing active vibration control performances as shown in refs. 3 and 4 is the main target of suspension control.
To deal with model uncertainty, robust approach, 5 LMI based approach, 6 sliding mode approach, 7,8,43 and globally bounded Jacobian approach 9 have been used to develop the observer. In ref. 10, a robust decoupling observer considering disturbance is achieved, but the rank condition in terms with the output and input distribution matrix is difficult to meet. To avoid the requirement of rank constraints related to the system matrix, Sedighi et al. 11 present a new development for the unknown input observers design, which guarantees the stability of the error closed-loop system and provides designers with a greater degree of freedom in design space. In ref. 12, a controller and fault identification method based on the separation principle is proposed to realize the simultaneous decoupling identification and control. In ref. 13, an adaptive extended state observer with lumped uncertainties and unmeasured states is proposed to estimate the unknown coefficient. However, most of the above observer designs rely on knowing the model information a priori, which poses a strict restriction to real application.
Neural network with good nonlinear approximation capability is very suitable to design the observer of model uncertain systems. The structure of a neutral observer mainly includes two parts, that is, a neural network to identify the unknown nonlinearity and a traditional Luenberger-like observer to estimate the state. In refs. 14 and 15, in the case of imposing strong strict positive real (SPR) conditions on the output error equation, a structure using two independent single layer neural networks is proposed to estimate the state of affine and non-affine SISO nonlinear systems. Furthermore, a nonlinear observer for the MIMO system using a single layer neural network was developed in refs. 16 and 17, where the SPR restrict condition has been weakened to a certain extent. Furthermore, a neural network observer with the modified backpropagation algorithm was proposed in ref. 18, where the SPR condition and any other strong restrictions are canceled. In particular, the differential neural network (DNN), 19 incorporating the feedback design, provides a more efficient way to solve the state estimation problem of model uncertainty systems. In ref. 20, a DNN observer with sliding mode updating rule was reported and the relevant observational conditions have been removed. New passivity analysis of DNN proposed by Xiao et al. 21 illustrates that the boundedness of external input can ensure the boundedness of the input-output signals for each block of closed-loop error dynamics, which therein avoids the requirement of persistency excitation (PE) condition.
Recent studies have shown that the design of a self-learning controller based on approximate dynamic programming (ADP) can avoid the requirement of model accuracy and achieve optimal control at the same time. 22,23 As we all know, the curse of dimensionality of dynamic programming and offline learning characteristics make the existing optimal control strategies based on dynamic programming inefficient. ADP is inspired by biological systems, which can greatly improve the computing power for solving optimal control problems, but it will not bring obvious approximation errors. Since Werbos 24 introduced a commonly used actor-critic (AC) scheme for self-learning control design, various modifications to ADP have been developed, such as action dependent heuristic dynamic programming (ADHDP), 25 heuristic dynamic programming (HDP), 26 dual heuristic programming (DHP), 27 and Q-learning. 28 The natural recursive characteristics of ADP are very suitable for solving discrete systems, and its related results cannot be directly used for reference in the continuous field. Ref. 29 proposes a self-learning control method for continuous nonlinear systems, where an identifier is used to identify unknown nonlinear dynamics, and then the online self-learning rate is obtained by approximately solving HJB through an evaluation network. However, it is limited to nonlinear affine system. As claimed in refs. 30 and 31, self-learning control method with the single network adaptive critic (SNAC) scheme has been proved to be an effective method to solve the HJB online and obtain the optimal control action.
It should be pointed that most of the above-mentioned selflearning control research are based on the premise that the state is fully known, which is a strong constraint condition, because in most cases the system state is not completely obtainable, such as the suspension system to be studied in this paper. The main contributions of this paper can be summarized as: (1) It is the first time that the self-learning control of model uncertain suspension system using only the input and output of the system is investigated. Unlike the commonly used LQR controller design depending on the system model, this paper introduces the observer-based self-learning controller design without prior knowledge of the system dynamics, which makes the proposed method become more suitable for practical application. (2) The investigated algorithm is implemented based on the observer-critic framework. First, a DNN observer with a reasonably designed weight online update law is used to identify unknown system dynamics based on the known system input/output information. Then, the self-learning control policy is derived with the help of SNAC method. With introducing two novel tuning laws for the DNN observer and the SNAC, the improved performance in terms of adaptability and robustness is realized compared with the general ADPbased self-learning method. (3) In addition, the entire learning process of the proposed method was updated online, and the stability of the entire closed-loop system was guaranteed by properly designed composite Lyapunov method. The model free and self-learning characteristics of the designed self-learning control method make it possible to develop a good performance active suspension controller that can avoid the influence of uncertainty. The simulation results on a quarter car suspension model encountered unknown road input are presented to demonstrate the effectiveness of the designed self-learning control method.
The overall structure of this article is organized as follows. Section 2 presents the formulation of the question. Section 3 illustrates the DNN observer design process. Observer-based self-learning control method for a quarter car active suspension system is developed in the Section 4. Simulation results carried on a quarter car suspension system with time varying road input are given in Section 5. Finally, the conclusions are concluded in Section 6.

Problem formulation
As we all know, the quarter car suspension model as shown in Figure 1 is commonly used in the design of active suspension control systems. 32 The state space equation considering the time varying parameters of a quarter car active suspension system can be expressed as where suspension state vector is denoted as x ¼ ½x 1 , x 2 , x 3 , x 4 � T , suspension deflection is denoted as x 1 ¼ z s � z u , sprung mass velocity is denoted as x 2 ¼ _ z s , tire deflection is denoted as x 3 ¼ z u � z r , unsprung mass velocity is expressed as x 4 ¼ _ z u , k s is suspension stiffness, b s represents suspension damper, active control input is express as u ¼ f a , m s is sprung mass which is equal to a quarter of the entire vehicle mass, m u denotes unsprung mass, k u denotes vertical stiffness of the tire, z s is vertical displacement of sprung mass, z u denotes vertical displacement of unsprung mass, z r denotes vertical displacement of the unknown road, ΔA and ΔB represent the uncertainties caused by the uncertainty parameter m s and unknown input z r , The requirements of active suspension system controller design should meet the following three aspects 1) The first main aspect is to ensure high-performance riding quality. This task aims to weaken the vibration force transmitted from unsprung mass to sprung mass through an advanced control approach, which is achieved by minimizing the sprung mass acceleration in the face of parameter uncertainties caused by sprung mass and time varying road input. 2) In addition, the wheel should maintain firm and uninterrupted contact with the road surface, and the normal tire load change related to the vertical deflection of the tire should be small to ensure good road holding performance. 3) Finally, the suspension space constraint should be less than the maximum suspension deflection z max , that is, The goal of active suspension design is to find a control policy that can not only keep the system state stable, but also minimize the required performance indicator, such that where the parameters Q and R represent different weights with respect to the state and control input, they are selected by the designer.
Definition 1. As claimed by Liews, 33 if u is continuous on V, uð0Þ ¼ 0, u stabilizes (1) on V, and "x 0 2 V, V ðx 0 Þ is finite, then the control input u is called to be admissible with respect to (1) on compact set V 2 R n and is denoted by u 2 ψðVÞ.
With the help of knowledge in the field of optimal control, the Hamiltonian of (1) can be defined as where V x ¼ Δ ∂V ðxÞ ∂x denotes the partial derivative of the cost function V ðxÞ with respect to x.
The nominal optimal valued function V * ðxÞ is expressed as Meanwhile, equation (4) should satisfy the following HJB equation ∂x . Assume that the minimum of equation (5) uniquely exists, self-learning control policy u * can thus be obtained by solving ∂Hðx, u, V * x Þ=∂u ¼ 0, such that Then, the HJB equation (5) can be further expressed with respect to equation (6) as follows It can be seen that the nominal value function V * ðxÞ should be known in advance to obtain the self-learning control policy u * in equation (6). However, the analytical solution of partial differential equation (7) cannot be obtained due to the unknown state, model uncertainty and unknown road input factors existed in the suspension system. This paper devotes to solve this problem by establishing an observer-critic framework via the single network adaptive critic method in the following two steps: 1) Based on the input/output information from the suspension system, an adaptive DNN observer with the weight updating law was established to observe the unknown state. 2) Based on the observations, a self-learning controller with the observer critic structure is developed via the single network adaptive critic approach.
Remark 1. The common LQR controller designs [34][35][36] are often based on formula (2) where all parameters are available in advance. In addition, the corresponding feedback control law is obtained by solving the Riccati equation offline. The disadvantage of this method is that the control gain cannot be updated online according to the system uncertainties caused by m s and unknown road displacement z r . This will inevitably lead to unsatisfactory control performance. Therefore, the design of active suspension systems is expected to introduce a self-learning intelligent control method that can adapt to various driving conditions.

The differential neural network observer
Consider the following DNN to identify nonlinear system (1), such that where W 2 R 4×4 represents the unknown nominal matrix, σð�Þ denotes the nonlinear activation function which is commonly chosen as a sigmoidal function σð�Þ ¼ α ð1þe �βx Þ�γ with the properly selected parameters α, β, γ, ξ is regarded as modeling error. Assumption 1. The nominal weight W and the approximate error vector ξ are bounded by Fact 1. Garces 37 As we all know, the nonlinear activation function σð�Þ satisfies the generalized local Lipschitz conditions, such that Considering the following the neuro-observer where b x denote the observation state, the observation gain K is obtained by solving the following stable equation where Q, P denote positive definite matrices and parameter α will be explained in the subsequent proof analysis.
The state observation error is expressed as From (8) and (11), the equation in terms with the observation error is derived as whereW ¼ W � b W . Once the architecture of the DNN observer is determined, the next step is to design appropriate update rules to implement online learning. There are generally the following two design ideas 1) Based on commonly used learning rules, such as back propagation algorithm, the online learning rate of DNN observer is designed, and then suitable candidate of Lyapunov function is designed to prove the convergence characteristics of the system. 11,15 2) In order to ensure the stability of the closed-loop system, the online learning rate is designed by defining the quadratic function which is related to the weight error and the observation error, and the time derivative of Lyapunov function is proved to be negative. 38,39 The main drawback of the previous work in the first way is the approximate treatment of the dynamic backpropagation problem by using the gradient approximation, which inevitably leads to parameter overflow problems. Therefore, we choose the second way to develop the updating law for the propose DNN observer in this paper. Theorem 1. The uncertain suspension model (1) is identified by the DNN observer model (11) with the following updating law then the state estimation error and the DNN weight error is UUB. That is,x 2 L ∞ ,W 2 L ∞ , where P is the solution of equation (12), L is positive definite matrix, λ is a positive constant.
Proof. The weight updating law in (15) is derived by properly selecting the following quadratic function related to the observation error and the weight error, such that The time derivative of L I can be obtained by using the error dynamic equation (14), such that Since the DNN weight is updated by (15) and satisfies the , then based on Assumption 1 and equation (16), _ L I becomes Define α ¼ λ σ W and select proper Q to satisfy the condition in (12), then we can get The negative definiteness of _ L I can been guaranteed if Thus _ L I < 0 is guaranteed, which demonstrates the UUB of � by using the Lyapunov theory. The correctness of Theorem 1 is thus guaranteed.
Remark 2. Since we can select L, λ in (15) arbitrarily, the updating gains of DNN observer is not limited to the value of P which is obtained by solving the Riccati equation (12). Hence, one can select Q to make the state observation error in equation (20) as small as possible.

Observer-based self-learning controller design
This section is mainly devoted to the design of a selflearning control policy with the aid of the abovementioned DNN observer using the single network adaptive critic approach.
The following single critic neural network is used to approximate the optimal value function, such that where W c 2 R I is the nominal weight vector, the activation function ψðxÞ 2 R I is usually selected as the sigmoid nonlinear function, I is the number of neurons and ξ c denotes the approximation error. The derivative of the optimal value function (21) in terms of b x is where =ψðb xÞ ¼ ∂ψðb xÞ ∂b x and =ξ c ¼ ∂ξ c ∂b x represent the partial derivative of ψðb xÞ and ξ c in terms with b x, respectively.
By substituting (22) into (7), the self-learning control policy is derived as Then, the approximate self-learning control is represented as Next, after substituting equation (22) into equation (6), the HJB equation can be further obtained as follows The expression ðA þ ΔAÞb x þ ðB þ ΔBÞu þ Lz r in (25) is replaced by the state derivative _ b x of the DNN observer proposed in equation (11), so we have Remark 3. Our previous study 40 show that, compared with the traditional observer design driven by observation error, the adaptive learning law including parameter error information can make the closed-loop system converge as soon as possible. Inspired by this result, the following analysis will be devoted to develop a novel learning law for critic neural networks driven by parameter errors, rather than the existed least square 33 or gradient methods. 41 The auxiliary variables X and Y are defined, Y f as Further, another two auxiliary regression variables G and H are designed as 8 > > < > > : By solving the differential equation shown in (28), the following expression can be obtained 8 > > > < > > > : In the end, the learning rule of b W c is designed as where μ represents the adaptive updating gain and the filtered vector J is defined as J ¼ GðtÞ b W c þ HðtÞ.
Theorem 2. A control mechanism composed of a DNN observer (11) with an update law (15) and a self-learning control (23) with a learning law (30) is used to control the system (1), then all the signalsx,W ,W c implied in the entire learning process consisting of the DNN observer, the SNAC and the self-learning control policy are UUB.
Proof. Let us design the following composite Lyapunov function where L I is defined as in equation (16) and L c is defined as From (29), we have As claimed by Na, 42 if the X satisfies the persistently excited (PE) condition, then the matrix GðtÞ defined in (29) is positive, that is, λ min ðEÞ > σ > 0. Therefore, according to the equation Based on the well-known Cauchy's inequality, that is, ab ≤ a 2 δ=2 þ b 2 =2δ with δ > 0, (34) is rewritten as From (8) and (32), we have Then the derivative of _ L can be obtained from (16), (35), and (36), such that where ϖ ¼ ΓW With the assumption that the following conditions hold κ > Γð2kA nn k þ 2Þ Then, _ L < 0 is proved. Hence, we concluded that UUB stability in terms with the DNN weight errorW and critic NN weight errorW c implied in the whole closed-loop systems are guaranteed. Thus, Theorem 2 is proved.
Based on the above analysis, the mechanism of the DNN observer-based self-learning control approach for model uncertain suspension systems is shown in Figure 2.

Remark 4.
A self-learning control method is proposed for uncertain suspension system. The significant advantage of the proposed method is that it gets rid of the limitations of the model, which is achieved by employing an DNN observer to identify the unknown dynamic and system state, where the weight updating law obtained by properly designed Lyapunov function instead of the commonly used backpropagation algorithm. 43 Moreover, all the signals implied in the entire learning process are UUB.
Based on the above analysis, the flowchart of the proposed control algorithm is depicted in Figure 3 and can be summarized as follows.
Step 1. Select the proper initial values of active functions σð:Þ, observation gain K in equation (11) and updating gains L, λ in equation (15) for the observer. σð:Þ is usually selected as the sigmoidal function σð:Þ ¼ a=ð1 þ e �bx Þ � c where a, b, c are the designed constants. W and W c are tuned online according equations (15) and (30). Hence, there is no need to select the initial values of W and W c . Meanwhile, select the proper function ψð�Þ in equation (21) and the updating gain μ in equation (30) for the critic SNAC. ψð�Þ is usually selected as a smooth function consisting with the different combination between the selected suspension states.
Step 2. Then, the inputs/outputs data of the suspension system is used to train the neural network including the DNN observer in equation (11) and critic NN in equation (21).
Step 3. Finally, self-learning control law expressed in equation (24) is obtained based on the first two steps.

Simulation and analysis
In this section, a car active suspension model platform presented in Section 2 is numerically simulated to verify the effectiveness of the DNN observer designed in Section 3 and the self-learning controller developed in Section 4. Table 1 shows the relevant suspension parameters used in the simulation process.
In order to verify the robustness of the proposed method, we adopt the following random road input model as following _ z r ðtÞ ¼ �2πf 0 z r ðtÞ þ 2π ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi Gðn 0 Þv s ωðtÞ p (39) where f 0 denotes the cut-off frequency, Gðn 0 Þ represents the pavement power spectral density under the reference spatial frequency n 0 , v s represents the driving speed, ωðtÞ is a uniformly distributed white noise with a mean value of 0 and an intensity of 1.
The following smooth function is used to approximate the optimal value function of the self-learning controller, such that where the weight vector of the SNAC is represented by Random uncertain road input considering pavement roughness as showed in (39) is performed in this paper to compare the effectiveness of the proposed control method with the LQR control method. 44 Moreover, time varying parameter of the longitudinal velocity is also considered in the simulations, such that Case A. Simulations results with the road input (39) and velocity 20 km/h are illustrated in Figures 4 to 7. The simulation results indicated that the proposed self-learning control approach is more effective than the LQR control suspension, the ride quality has been improved observably with respect to the suspension deflection, sprung mass acceleration, tire deflection and unsprung mass velocity, which means the suspension system can be brought back to the stable state as fast as possible when encountering unknown road input. The suspension performances in terms with the road hold quality is thus greatly improved.
Case B: Simulations results with the road input (39) and velocity v = 60 km/h are presented in Figures 8-11. One can easily find that the proposed control method still has better performance with smaller fluctuations when encountering different longitudinal speeds compared with the commonly used LQR control method. These facts further verify the strong robustness and self-adaptive properties of the proposed control method. It should be mentioned that there are two main reasons for the performance improvements with the proposed control method in the simulation results above. First, the control law of the LQR method is based on the accurate suspension model, the control accuracy cannot be guaranteed when subjecting to time varying parameters. However, the proposed control method is not limited to a mathematical model, but based on the inputs/outputs data of suspension system. Second, the feedback control law of the proposed method can be updated online with the time varying parameters like longitudinal speed and random road input, whereas for the LQR method and other model-based methods, the feedback control law is fixed in advance. Therefore, it could be concluded that self-adaptive property of the proposed method provides a more effective solution for   To show the control performance of the proposed method, the performance index-Root Mean Square (RMS) for the states error has been adopted for the purpose of comparison.
where n is number of the simulation steps, W c ¼ ½W c1 , W c2 , W c3 , W c4 , W c5 , W c6 , W c7 , W c8 , W c9 , W c10 � T is the difference between the state variables with control and without control at i th step.
One can also notice from Table 2 that the RMS values of all state variables for the proposed method are smaller than the commonly used LQR method, which further demonstrates the improved performance of the proposed method.
The above simulation results using the proposed DNN observer and self-learning controller indicated that all the evaluation indexes of suspension systems have been improved compared with the commonly used LQR control      method. The proposed observer-based self-learning mechanism can realize online self-renewal according to the unknown road displacement without the complete suspension model. It therefore concluded that self-learning and model independent characteristics of the developed self-learning control approach opens a new idea for the design of model uncertain suspension control system and can apparently improve the car road hold and ride quality. Therefore, the aim of pursuing a high-performance suspension system is achieved.

Conclusions
This paper presents a new way to realize simultaneous online state observation and control for model uncertain suspension systems. This article has achieved the following main innovations. First, the proposed self-learning control method does not require the complete information of suspension system, this feature makes it become easier to be applied in practical system. Second, the self-learning control policy can be updated online with the unknown road input, which can greatly enhance the suspension performances in terms with the road hold and ride quality. Finally, simulation results performed on a quarter car suspension system considering unknown road input are presented. The suspension responses with respect to suspension deflection, spring mass acceleration, tire deflection and velocity of unsprung mass are presented to validate the effect of the proposed approach. The future work will focus on developing a more practical intelligent controller for automotive suspension systems considering actual state constraints and actuator saturation limits.

Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work is supported by the National Natural Science Foundation of China