Distributed convex optimization via proportional-integral-differential algorithm

This paper studies the distributed convex optimization problem, where the global utility function is the sum of local cost functions associated to the individual agents. Only using the local information, a novel continuous-time distributed algorithm based on proportional-integral-differential (PID) control strategy is proposed. Under the assumption that the global utility function is strictly convex and local utility functions have locally Lipschitz gradients, the exponential convergence of the proposed algorithm is established with undirected and connected graph among these agents. Finally, numerical simulations are presented to illustrate the effectiveness of theoretical results.


Introduction
Recent years have witnessed an increasing interest in distributed optimization and its widely applications in various fields, including energy internet, intelligent manufacturing and machine learning. [1][2][3][4] The aim of distributed optimization problem is to minimize global objective function with a sum of individual objective function, which computes and exchanges information among neighboring agents. The research on distributed optimization can be traced back to the pioneering work 5,6 and then many valuable algorithms for this topic have been proposed, which can mainly be classified as discrete-time algorithm and continuous-time algorithm.
The discrete-time algorithm has attracted much attention since the distributed gradient descent (DGD) algorithm was proposed by Nedic´and Ozdaglar. 7 But, due to the diminishing step-size, the convergence rate of DGD algorithm was slow. Then, Shi et al. 8 proposed an exact first-order algorithm (EXTRA) with fixed step-size, which converges to the optimal solution precisely. In order to eliminate the requirement of the double random weight matrix, the push-sum algorithm under the directed graph was proposed by Nedic´and Olshevsky. 9 And then, the combination of push-sum protocol and distributed subgradient algorithm were proposed for unconstrained distributed optimization problem in the directed time-varying topology. [10][11][12] By introducing projection operator, various distributed projection algorithms for constrained distributed optimization problem were proposed in Lou et al., 13 Lin et al., 14 Liu et al., 15,16 Wang et al., 17 and Liang et al. 18 To deal with equality and inequality constraints, Chang et al. 19 proposed the consensus-based distributed primal-dual perturbation (PDP) algorithm, which was applied to smooth inequality constraints and nonsmooth constraints. For the case of non-smooth optimization problems, a smooth double proximal primal-dual algorithm was present in Wei et al. 20 Moreover, without convexity assumptions on the objectives, a chebyshev proxy and consensus-based algorithm (CPCA) was given in He et al., 21 which had low computational costs.
As a counterpart, much attention has also been paid to the continuous-time algorithm. Several distributed optimization algorithms with the single integrator dynamics were proposed in Lin et al., 22 which considered state-dependent gradient gains and nonuniform gradient gains. Considering second-order multi-agent networks, a double-integrator distributed optimization algorithm under the undirected graph was presented in Zhang and Hong. 23 Based on the Karush-Kuhn-Tucker condition and the saddle point property, a new distributed algorithm for constrained optimization was developed in Yi et al. 24 With impulsive communication framework, an average quasi-consensus algorithm for distributed constrained optimization was designed in He et al. 25 In order to reduce communication costs and energy consumption, the event-triggered control idea was introduced to design the distributed optimization algorithm. In Chen and Ren, 26 the event-triggered zerogradient-sum algorithm was established. In Li et al., 27 distributed optimization algorithm with event-triggered communication via input feedforward passivity was presented, and the event-triggered distributed fixedtime optimization algorithm over a directed network was developed in Yu et al. 28 Due to the progressive control techniques, various of algorithms based on proportional-integral (PI) control strategy for distributed optimization have attracted an increasing attention recently. Wang and Elia 29 proposed a distributed optimization algorithm based on PI control strategy, where each agent used an auxiliary state to correct the error caused by different local gradient. Based on PI control strategy, a continuous-time coordination algorithm with discrete-time communication was investigated in Kia et al., 30 which can reduce communication costs and energy consumption. Combining projection output feedback with PI protocol, the distributed convex optimization subject to general constraints was studied in Yang et al. 31 It is well known that proportional-integral-derivative (PID) control strategy is a classic control approach, which has simple structures, clear physical meaning of parameters, and strong robustness. Thus, it has been applied in various fields, such as, active disturbance rejection control (ADRC), 32 formation control, 33 and so on. But, to the best knowledge of ours, the PID control strategy for distributed optimization problem has not been discussed.
Based on the above discussions, a new continuoustime algorithm with PID control strategy for distributed optimization under an undirected communication graph is studied in this paper. The main contributions of this paper are as follows. (I) The PID algorithm is firstly presented for distributed optimization, where each agent uses only local information, that is, the proposed algorithm is fully distributed. (II) Using the matrix transform and the inequality technique, the exponential convergence and the estimation for convergence rate of the proposed algorithm are established under the assumption that the global utility functions are strictly convex and local utility functions have locally Lipschitz gradients. (III) The constriction on the initial value is removed, which has less conservation than that in Zhang and Hong 23 and Kia et al. 30 The remainder of this paper is organized as follows. Section 2 is devoted to mathematical preliminaries on distributed optimization problem. The continuous-time distributed PID algorithm and its convergence are presented in Section 3. Numerical examples are given to illustrate the performance of the proposed algorithms in Section 4. Finally, conclusions are drawn in Section 5.
Notations: Let R n be the n-dimensional real vector space. � k k denotes the Euclidean norm or induced matrix norm for a vector or a matrix. � is the matrix Kronecker product. I n is the identity matrix in R n3n . 0 N and 1 N represent the column vectors with N zeros and ones, respectively. det(�) is the determinant of a matrix.

Graph theory
An undirected edge e ij in graph G is denoted by the unordered pair of nodes (v i , v j ), which means that nodes v i and v j can exchange information with each other and A is a symmetric matrix. A path between nodes v i and v j in graph G is a sequence of edges An undirected graph G is connected if there exists a path between any pair of distinct nodes v i and v j (i, j = 1, 2, :::, N). L N has a simple zero eigenvalue and all the other eigenvalues are positive if and only if the graph G is connected, and the eigenvalues of L N can be ordered as 0 = l 1 \ l 2 4 . . . 4l N .

Problem formulation
Consider a network of N agents and each agent has a local private differentiable function f i (x) : R n ! R. The communication topology among agents is described by an undirected graph G. The following convex optimization problem will be studied.
The global utility function is assumed to be strictly convex, that is, (x � y) T (rf(x)�rf(y)).0,8x, y 2 R n and x 6 ¼ y: ð2Þ and for each i 2 V, the gradient of In this paper, we focus on designing a distributed proportional-integral-differential (PID) algorithm such that each agent obtains the global minimizer x � 2 ( � ', ') of the feasible optimization problem only using its own and its neighbors' information.

Distributed PID algorithm
In this section, we propose a distributed PID algorithm and obtain the convergence result. For each agent i 2 f1, � � � , Ng, the distributed PID algorithm for the problem (1) is described as follows.
where x i , v i , y i 2 R n represent position, velocity, and the auxiliary state of the i th agent, respectively, x i is a solution to problem (1) for agent i. Positive constants a, b, c . 0 are the proportional parameter, differential parameter, integral parameter, respectively. k . 0 is the constant stepsize.
Then, the algorithm (6) can be rewritten as the following compact form where L = L N � I n .
Lemma 1. Suppose the communication topology G is undirected and connected, the equilibrium point of system (7) is the optimal solution of problem (1).
Proof. Since the communication graph is undirected and connected, we have 1 T N L N = 0. Assume that (x � , v � , y � ) is the equilibrium point of system (7), we have Since G is connected, there is a x � 2 R n such that Next, it will be shown that x � is the optimal solution of problem (1). Multiply (1 N � I n ) T on the both sides of the second equation in (8), one can derive that (1 N and then which implies x � is the optimal solution of problem (1).
The proof is completed.
Theorem 1. Assume that the communication topology G is connected and the global utility function f(x) is strictly convex. The local utility function is continuously differentiable and its gradient is l i -Lipschitz on R n . Then, with the PID algorithm (6), where a, c . 0, b . ack, and k . 0 is bounded, x i (t) exponentially converges to the global minimizer x � for each i 2 V with any initial values.
Proof. Without loss of generality, for simplicity, let n = 1. The case for n . 1 can also be proved by the property of Kronecker product only with complex computation.
Then, (7) can be rewritten in the following matrix form where X =x ṽ y Denote E = ½�1 N�1 I N�1 � and M 1 = diagfEEEg. Let Y(t) = M 1 X(t), in term of (9), we have where  Since the communication topology G is connected, it is clear that all the eigenvalues ofL N�1 , that is, l 2 , l 3 , :::, l N are positive. Moreover, there is an invertible matrix S such that S �1L N�1 S = J. Thus, all the eigenvalues of J are positive.
Let M 2 = diagfS �1 S �1 S �1 g and Z(t) = M 2 Y(t). Then, it follows from (10) that where Assume u is an eigenvalue of M. Then If u = 0, that is, 0 is one of the eigenvalues of M.
According to the Laplace theorem for a partitioned matrix, by (12), we have det(J) = 0, which contradicts with that all the eigenvalues of J are positive. Thus, recall (12), we have It follows that Thus, there is at least a i 2 f2, � � � , Ng such that u 3 + bu 2 l i + aul i + a 2 ckl 2 i = 0: ð13Þ Case 1: u 2 = 0, which implies that Thus, by the Routh-Hurwitz theorem, u 1 \ 0.
Case 2: u 2 6 ¼ 0. It follows from (14) that Since b . ack, by the Routh-Hurwitz theorem, u 1 \ 0. Thus, all the eigenvalues of M are located at the open left half plane. Then, it is clear that there exist constants m . 0 and a . 0 such that Noticing (11), by the variation of parameter formula, we have and then According to (4), it can be obtained that where l is the maximum value of l i (i = 1, � � � , N). By (17)(18)(19), it follows that where Now, for any m . 1 and g 2 (0, a), it will be proved that If (21) does not hold, by continuity, there must exist a t � . t 0 such that Z(t � ) k k= h(t � ) and Z(t) k k\ h(t) for t 2 ½t 0 , t � ): Then, by (20), we can obtain For k4 a�g lq , we have The contradiction in (23) shows that (21) is valid for any m . 1. Let m ! 1, one has which implies that Z(t) can exponentially converge to 0, that is, x i (t) ! x � as t ! ', for all i 2 f1, � � � , Ng, and the convergence rate is estimated by g.
The proof is completed.

Numerical simulations
In the following, the proposed PID distributed algorithm is illustrated by some numerical simulations. We assume that there are six agents in the undirected and connected communication graph, which is described as Figure 1.
Case 1: All local cost functions are convex. In order to show the advantage of PID algorithm, the numerical simulation based on PI algorithm, that is, let b = 0 in (6), is shown in Figure 5, which shows that the state x i of each agent i(i = 1, � � � , 6) can not converges to the global optimal solution. Case 2: Some of the local cost functions are nonconvex.

Conclusions
In this paper, a continuous-time PID algorithm is proposed for solving the distributed optimization problem, where the global utility function is strictly convex and local utility functions have locally Lipschitz gradients. Using matrix transform and inequality technique, the exponential convergence of the proposed algorithm is established with undirected and connected graph. The estimation of the convergence rate is also given. It should be pointed out that this paper only consider the undirected communication graph and ideal communication. The directed communication graph and event-triggered PID control for distributed convex optimization will be discussed in future study.