Adaptive dynamic programming-based controller with admittance adaptation for robot–environment interaction

The problem of optimal tracking control for robot–environment interaction is studied in this article. The environment is regarded as a linear system and an admittance control with iterative linear quadratic regulator method is obtained to guarantee the compliant behaviour. Meanwhile, an adaptive dynamic programming-based controller is proposed. Under adaptive dynamic programming frame, the critic network is performed with radial basis function neural network to approximate the optimal cost, and the neural network weight updating law is incorporated with an additional stabilizing term to eliminate the requirement for the initial admissible control. The stability of the system is proved by Lyapunov theorem. The simulation results demonstrate the effectiveness of the proposed control scheme.


Introduction
Robot applications are becoming more and more widespread, such as rehabilitation therapy, assembly automation and surgery. [1][2][3][4] They can either work independently to accomplish tasks or cooperate with their human partners for certain tasks. In the actual application process, the robot will inevitably interact with the external environments. [5][6][7] Consequently, in recent years, interaction control between the robot and environment has attracted great concern and is considered to be greatly important.
In existing research, two main approaches are applied to achieve compliant behaviour of the robot, that is, hybrid position/force control and impedance control. 8,9 The first approach requires the position subspace and force subspace decomposition, task planning and control law switching in the execution process. Without considering the dynamic coupling of the environment and the robot, the accuracy of the hybrid position/force control cannot be guaranteed. 10 In contrast, the second approach aims to adjust the mechanical impedance to a target one, which will guarantee the robot to be complaint with the interaction force imposed by the external environment. Impedance control ensures the safety of the robot and the environment and it has been proved to be more feasible and has better robustness. According to the causality of the controller, impedance control has two implementation methods, one is named impedance control and the other is admittance control. In impedance control system, the interaction force can be estimated from the desired motion trajectory and impedance model, while in admittance control system, the reference trajectory is obtained from the measured environmental external force and the desired admittance model. Therefore, in this article, admittance control is adopted to solve the problem of robot-environment interaction control.
In admittance control system, force and the admittance model are two important parts. When robot-environment interaction exists, the force can be detected and measured by the sensors installed on the end-effector of the robot arm. But, how to derive optimal parameters of the admittance model is non-trivial. On the one hand, it is usually difficult to derive the desired admittance model because of the complexity of environmental dynamics; on the other hand, a fixed admittance model cannot satisfy all cases. Taking human-robot cooperation as an example, variable admittance control is necessary to ensure more efficient performance. 11 To solve these problems, iterative learning has been studied in robot intelligent control area. It has been investigated to obtain admittance parameters to adapt to unknown environment. The aim of this approach is to introduce human learning skills into the robot and improve control performance by repeating a task. Cohen and Flash 12 proposed an impedance learning control scheme using an associative search network to complete a wall-following work. Neural network (NN) is introduced into the impedance control to regulate the parameters. 13 However, the iterative learning method requires the robot to operate repeatedly, which brings inconvenience in practical process and is not feasible in many situations. Love and Book, 14 Uemura and Kawamura, 15 Gribovskaya et al., 16 Stanisic and Fernández, 17 Landi et al. 18 and Yao et al. 19 have proposed to utilize adaptation approaches to address the problems stated above.
Robotic motion control is a challenging task as it is difficult to obtain accurate model concerning that the robot is a non-linear and highly coupled system. Proportionalintegral-derivative (PID) control, NN control, adaptive control and other control methods have been applied to the robot system. [20][21][22][23][24][25][26][27] As a classical control method, PID control is employed to the robot system and can track the given reference trajectory well. 28 It is acknowledged that PID control has some advantages, such as simple structure and good robustness, but it is not easy to select suitable PID parameters if the controlled plant is complex. In addition, when dynamic uncertainties exist in the system, PID control cannot satisfy the performance requirements for the magnitude of overshoot, the rising and settling time and so on. NN has the fundamental characteristics of human brain and can simulate human behaviour for information processing, therefore it is widely used in the control field for unknown system identification. NN control can model the uncertain dynamics online to improve the system performance. 29 An admittance adaptation method and the NNbased controller are applied into the robot system. 30 Tracking control is a significant research issue in the domain of robot intelligent control. For a controlled system, stability is just the minimum requirement. Optimal control needs to be considered, that is, it is required to design an optimal tracking controller, which could ensure system stability of the robot while minimizing the cost function. Werbos 31 proposed adaptive dynamic programming (ADP) strategy and it is considered to be an effective approach to resolve the optimal control problem. 32 The key of ADP method is to find a solution of Hamilton-Jacobi-Bellman (HJB) equation. However, because it is a partial differential equation, when the controlled system is nonlinear but not linear, its analytical solution will be very difficult to obtain, or even impossible. To solve the above problem, policy iterative is considered as an effective method to find the approximate solution, which requires initial stability control. 33 However, in practical process, the initial admissible control is usually very difficult to satisfy. Then, NN is introduced to derive an approximate solution of the HJB equation. The approximate solution is obtained by NN-based method, meanwhile the requirement of initial stability is eliminated with the incorporation of an additional term. 34,35 Yang et al. 30 paid attention to the robot-environment interaction control, but did not consider the optimization problem. However, for the robot, how to perform path tracking optimization and minimize the cost function is very important. Based on the above discussion, the optimal tracking control problem for robot-environment interaction is studied in this article. Moreover, the admittance control and ADP approach are adopted to improve the system performance. The contributions of this article are listed below: 1. The environment with unknown dynamics is modelled as a linear system. An admittance adaptation method with iterative linear-quadratic regulator (LQR) is obtained to achieve a compliant behaviour. 2. ADP approach is introduced into the robot system to solve the optimal tracking problem. The critic network with radial basis function (RBF) is developed to approximate the minimum cost function. In addition, to eliminate the requirement for initial admissible control, a stabilizing term is incorporated into NN weight updating law.
The rest of this article is arranged as follows. Firstly, the robot and environment systems and control objectives are described. Next, the control scheme including admittance adaptation and optimal control using ADP is developed. Then, simulation studies are given. Finally, the conclusion is drawn.

Robot dynamics
The n-link robot manipulator dynamics is showed as the following Lagrangian form . . . ; € q n T 2 R n represent the robot position vector, velocity vector and acceleration vector in joint space, respectively. 2 R n is the joint torque, while MðqÞ 2 R nÂn , Cðq; _ qÞ 2 R nÂn and GðqÞ 2 R n are known matrices and denote the inertial matrix, Coriolis/centrifugal matrix and gravity vector, respectively. For convenience, M, C and G denote the known matrices MðqÞ, Cðq; _ qÞ and GðqÞ in the following section, respectively.
Define the reference trajectory as q r 2 R n , and the tracking error q e 2 R n is shown as follows Then, the first and second time derivative of q e are given below We define the sliding motion surface x as follows where L 2 R nÂn is a constant positive matrix. According to equations (2) to (4), we can get Substituting equation (5) into equation (1), the error dynamics is obtained as follows Then, the following system is obtained The non-linear functions f : R n ! R n and g : R n ! R nÂn in equation (7) are specified by

Environment dynamics
It is assumed that the dynamics of environmental interaction force subject to the equation given below where C E and G E represent the unknown damping and stiffness of the environment, respectively. F denotes the interaction force and can be detected and measured by a force sensor. x is the end-effector position in Cartesian space and the corresponding desired trajectory x d is defined as where U 2 R mÂm is a known matrix. Subsequently, we define h ¼ ½x; x d T . Thus, combining equation (9) with equation (10), dynamics of the unknown environment and the desired trajectory are generated by If we take equation (11) as a linear system with F as its control input and h as its states to be controlled, this equation relates x with x d via the optimal feedback control law F ¼ ÀK e h whose aim is to minimize the cost function This cost function also indicates that our motivation of modifying a desired trajectory x d is to balance the contact force F with the tracking error x e ¼: x À x d . And this balance can be tuned via the user-defined Q E1 and R E .
In this section, the robot and environment dynamics are modelled. Then, we will design a control strategy to achieve the compliant behaviour and optimal tracking control in case the robot interacts with the environment.

Control scheme
A control scheme consisting of three parts as shown in Figure 1 including an optimal trajectory modifier using admittance control, a closed-loop inverse kinematics (CLIK) solver and a trajectory tracking controller based on ADP technique is designed in this section.

Trajectory modification using admittance control
The solution to equation (12) is an analogy with the LQR problem. It can be rewritten as whose system counterpart is consistent with equation (11). In this subsection, an algorithm proposed by Jiang and Jiang 36 is adopted to solve the algebraic Riccati equation (ARE) in equation (14) with unknown environment para- Some notations are outlined here. n, m and d are the length of h, F and the sample times integer, respectively. The sampled signal together with the historical ones comprising the matrix as followŝ and stand for the Kronecker product, and p ij and h i denote entries of P and h, respectively When the number of sampled data is large enough and the rank condition in equation (16) is satisfied, the algorithm can solve K e by iteratively calculating equation (17) untilp ðkÞ converge to an acceptable range e, that is, jjp ðkÞ Àp ðkÀ1Þ jj < e with k Ã k denoting the 2norm of Ã where the superscript ðkÞ denotes the index of the iteration, vecðÃÞ denotes the column vectorization of Ã and I n 2 R nÂn is an identity matrix. Once the optimal feedback gain K e is obtained, we can use it to modify x d . Formulations are given as below where K e1 and K e2 are compatible matrix from K e . Finally, the modified trajectory x r to be tracked is calculated, which is equivalent to the x in equation (18) x r ¼ ÀK À1 Inverse kinematics using CLIK The CLIK algorithm is employed to resolve the Cartesian reference trajectory x r into the one q r in joint space. 37 Let the solution error e :¼ kðq r Þ À x r where kðÃÞ denotes the forward kinematics and e is given by where K f is a positive user-defined matrix that decides the convergent rate of e. Expanding the above equations and combining with _ x ¼ J co _ q and J co ¼ @kðqÞ=@q, the following equation holds integrating of which yields the CLIK method where qð0Þ ¼ k À1 ðx r ð0ÞÞ, J y co ¼ J T co J co J T co þ sI n À Á À1 , and s 2 R is introduced to avoid the singularity problem which is recommended to be assigned small enough for improving the solution accuracy.

Optimal control using ADP
As mentioned in the Introduction section, it is very important to optimize the trajectory tracking while minimizing the design cost for robots. On the basis of optimal theory, the optimal control of the system (7) can be derived by solving the HJB equation in the frame of ADP. Consequently, in this subsection, our target is to find such an optimal control . Assume that the functions f ðxÞ and gðxÞ are Lipschitz continuous in R 2n and system (7) is controllable, then the optimal control Ã should minimize the cost function which is expressed as  where FðxðtÞÞ ¼ xðtÞ T QxðtÞ, U ðxðtÞ; ðxðtÞÞÞ ¼ ðxðtÞÞ T RðxðtÞÞ, Q 2 R nÂn and R 2 R nÂn are symmetric positive definite matrices. For robot system (7), the optimal control Ã should not only guarantee system stability but also can make the cost function finite, that is, the control law should be in the admissible control set which defined as . Additionally, for any admissible control law 2 , if J ðxÞ given in equation (23) is continuously differentiable, we will have the non-linear Lyapunov equation which is an infinitesimal version of equation (23) is shown as follows with J ð0Þ ¼ 0 where J ðxðtÞÞ is short for J ðxÞ for convenience and the notation r Ã 4 ¼ @Ã @x denotes the partial derivative of *. Then, the Hamiltonian function and the optimal cost function of robot system (7)  Suppose that the minimum value on the right side of formula (27) exists and also is unique, from @Hðx;ðxÞ;rJ Ã ðxÞÞ @ ¼ 0, then the following optimal control Ã ðxÞ can be derived as Substituting the optimal control law (28) into equation (24) yields another form of HJB equation with respect to rJ Ã ðxÞ is obtained as Hðx; Ã ðxÞ; rJ Ã ðxÞÞ ¼ 0 ð29Þ Inspired by Liu et al., 34 we know that if the optimal function J Ã ðxÞ is assumed to be continuously differentiable, J Ã ðxÞ can be rebuilded by RBFNN which can be shown as below where w 2 R l represents the ideal constant weight, S : R 2n ! R l denotes the activation function, l denotes the node number in the hidden layer and eðxÞ denotes the unknown approximation error of NN. Then, the derivation of equation (30) involving x is derived as rJ Ã ðxÞ ¼ ðrSðxÞÞ T w þ reðxÞ ð 31Þ From equations (28) and (31), the following Ã can be obtained as Then, substituting equations (31) and (32) where e c ¼ ðreðxÞÞ T ð f ðxÞ þ gðxÞ Ã ðxÞÞ ð34Þ In fact, the ideal weight w and J Ã ðxÞ in equation (30) are unknown, then the estimate weight and optimal cost function, respectively, denoted asŵ and J Ã ðxÞ can be obtained by the constructed critic NN. Therefore, the approximate optimal cost J Ã ðxÞ is given as beloŵ J ðxÞ ¼ŵ T SðxÞ ð 36Þ Then, the derivative of equation (36) is Based on equations (28) and (37), the approximate optimal control is obtained aŝ Similarly, applying equations (25), (37) and (38) According to equations (33), (39) and (41), e H in equation (40) can be described as e H ¼Ĥ ðx;ðxÞ; rĴ ðxÞÞ To train RBFNN, an appropriate weight updating lawŵ should be designed to both minimize the objective function E ¼ 1 2 e 2 H and ensure the approximate optimal weightŵ converge to the ideal weight w. To eliminate the requirement for the initial admissible control law, the weightŵ is tuned according to the standard gradient descent algorithm with an additional stabilizing term. The weight updating law is given as where a H and a c are the basic learning rate of the standard gradient descent algorithm and the learning rate of the stabilizing term, respectively. h is defined as follows where J s ðxÞ is selected as a Lyapunov function candidate which is continuously differentiable. And assume that a positive definite matrix N exists, then the following equation is satisfied It should be noted that J s ðxÞ is a polynomial with the state variable and can be chosen appropriately, such as the form J s ðxÞ ¼ 1 2 x T x.

Stability analysis
In this subsection, we will analyse the stability of the system and give the detailed proof that the approximate error w of the NN weight and the state x are convergent.
Theorem 1. Consider the robot system (7) with approximate optimal control (38) and the NN weight updating law (43), then it is concluded that the approximate errorw of the NN weight and the state x are convergent.
Proof. See the Appendix.

Simulation results
In this subsection, two cases will be compared to demonstrate the validity of the proposed scheme. Note that, the environment dynamics of the simulation is not totally consistent with that in equation (9), and x 0 is unknown. Therefore, two different K e values are considered and examined. Case 1: the feedback gain K pro e ¼ ½À0:5367; 0:22840 acquired from the proposed scheme, which is different from Case 2: the ideal feedback gain K opt e ¼ ½À0:4142; 0:6604 obtained by calculating offline with the exact values of G E and C E (the unknown x 0 is ignored in this case). For fair comparison, in Case 2, the trajectory will be modified at the time as Case 1.
Simulation results are shown in Figures 3 to 6. Figure 3 shows the modification process of the user-defined trajectory along the x-axis of both cases. It is not until around 4.1 s that the rank condition in equation (16) is satisfied following that the trajectory starts being modified. During the transient process, it can be found that the modified trajectory of Case 2 has a slight oscillation, and this subsequently triggers larger tracking errors compared with Case 1. The steady state and force pair of Case 1 and Case 2 trajectories at 10.28 s are 0.13 m/À0.07 N and 0.14 m/À0.06 N, respectively, which is in line with the time series of the cost function in equation (12) of both cases as shown in Figure 4. From the figure, we can see that after the modification of trajectory, the cost function of Case 1 is smaller than that of Case 2, which implies that in this simulation settings where the actual existence of unknown x 0 cannot be neglected, the feedback gain obtained from the proposed scheme is more appropriate. Note that, due to the unknown x 0 , the environment dynamics in equation (9) used for the designing of the    trajectory modifier differentiates from that in equation (47) used for simulation. Therefore, under this situation, actually neither the K e of Case 1 nor Case 2 is the optimal one. However, the proposed method still works and regards the dynamics in equation (47) as a linear one with an appropriate feedback gain. This has demonstrated the effectiveness of the proposed admittance control method. Figures 5 to 7 are plotted for analysing the performance of the ADP-based controller. Figure 6 shows the control torques t and sliding mode surface z of Case 1 and Case 2. On the whole, the proposed scheme tracks the both modified trajectory well, given that only nine neurons are used in the RBFNN, and the control torques are within the physical limitation. Besides, weights convergence can be observed in Figure 7. Note that, because of the introduced additional term rJ s , the initial admissible policy requirement is relaxed. Thus, in the simulation we choose the weights w to be zeros, without worrying about the control stability. This can be observed from Figure 6 that despite initial errors are large, they finally converge to zeros after some oscillations. Table 2 shows the feedback gain K e calculated online using the proposed admittance control under the choices of different Q E1 in equation (12). Its corresponding reference trajectories are shown in Figure 5 where the dashed lines denote the reference trajectories after modification and the solid lines stand for the actual trajectories of the robot end-effector under the control of the proposed ADP controller. Obviously, as the Q E1 is selected larger, the reference trajectories tend to get closer to the user-desired trajectory, which is consistent with the fact that more cost is applied to the modified error X e . Furthermore, although the reference trajectory varies, the proposed ADP controller is still eventually able to track the input signals with the same set of parameters. These also

Conclusion
The optimal control of robots interacting between unknown environment was studied in this article. An ADP-based controller with admittance adaptation was proposed. The unknown environment was regarded as a linear system and a compliant behaviour was guaranteed by the admittance adaptation control. In addition, NN was introduced into ADP controller to ensure trajectory tracking of the robot with minimal cost. The stability of the robot system was proved and simulation studies demonstrated the effectiveness of the proposed control scheme.
Because of the complexity of the robot system, dynamic uncertainties and input constraints such as saturation and dead zone are very common in robot systems, which will not only affect system performance but also may lead to system instability. 24,39,40 Therefore, in the frame of ADP, the optimal control problem with dynamic uncertainties and input constraints will be considered in our future work.

Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was partially supported by Engineering and Physical Sciences Research Council (EPSRC) under grant EP/S001913 and Shenzhen Science and Technology Plan Project [JSGG20180507183020876].