Measurement and control of system resilience recovery by path planning based on improved genetic algorithm

Aiming at the problems of basic genetic algorithm in the field of path planning to system resilience recovery such as excessive randomness of initial population, slow convergence, low efficiency of evolution operator, and poor population diversity, this paper uses quotient model to measure resilience, uses overall task importance to measure system performance, and proposes an improved genetic algorithm on initial population and evolutionary operation. Improved genetic algorithm (IHGA) proposes a new greedy model that considers system node tasks importance, travel time, and maintenance time, which uses greedy ideas to generate partial high-quality initial population. And a new operator is also designed as intra-group head-to-head mutation operator (IHMO) to control the evolution to be more determinate and less ineffectively random. The simulation results in three cases show that the IHGA overcomes the defects and can better effectively recover system resilience with comparison to basic genetic algorithm (BGA) and multi-chromosome genetic algorithm (MCGA). Specially, it has obviously better optimal solution, convergence, and stability, especially in the harsh conditions as shorter repair time, more and unbalanced demands for spare parts, which shows the IHGA has great value to deal with measurement and control of system resilience recovery in practice.


Introduction
The term of resilience is originated from early social ecology, [1][2][3] which is defined as the ability of an entity to return normal state when being subjected to an event that has changed its state in general semantics. 4 Similar to the concepts of reliability, survivability, and tolerance, resilience is described as a performance indicator of the ability of a system to respond to changes and reduce risks.
In recent years, with the increase of natural disasters and terrorist attacks, the concept of resilience has become an important term in many fields, such as infrastructure system, 5-7 engineering field, 4 and system safety. 8,9 In engineering systems, resilience mainly involves three elements: the ability to predict disturbances or failures, the ability to maintain performance when disturbances or failures occur, and the ability to recover performance after disturbances or failures. Among them, the ability of disturbances or failures prediction and performance retention should be considered carefully in the design of resilient system, and the ability of system to recover from disturbances or failures is the very focus of disaster reconstruction.
System resilience is an extension of traditional system reliability and risk concepts, and has become one of the characteristics that require to be considered in the field of system evaluation. Improving system resilience could help reduce the risk of system damaged in an uncertain environment, making the system more reliable, safe, and stable. Therefore, it is of great practical significance to study the resilience recovery of system under attack.
From the literature, 10 we can see that there are three recovery mechanisms when system nodes are failed after being attacked: node backup strategy, node replacement strategy and node maintenance strategy. This paper makes research on the node maintenance strategy to recover system resilience. In general, not all failed nodes can be repaired at the same time, so the failed nodes should be restored under a certain maintenance sequence in a limited time and cost.
In this paper, the maintenance strategy to recover system resilience is abstracted into a problem of path planning. According to the different characteristics of the algorithm, the path planning algorithm can be divided into two categories: precise algorithm and heuristic algorithm. The precise algorithms mainly include benders decomposition algorithm, LaGrange relaxation, branch and bound method. Heuristic algorithms are represented by simulated annealing, genetic algorithm, tabu search algorithm, ant colony algorithm, and particle swarm algorithm. In recent years, more and more scholars have chosen heuristic algorithm to solve the path planning problem of recovery. Jing et al. 11 proposed an efficient and low-cost NUWS resilience recovery approach. That is, an improved genetic algorithm was proposed to determine and optimize the position of the relay weapon units. In which, the authors combined elite individual retention strategy and roulette selection strategy, and proposed adaptive crossover and mutation operators to prevent premature convergence. However, the individual convergence rate of this method in solving the multi-chromosome coding problem is slow, and the quality of the solution is not good enough. Li et al. 12 proposed a two-layer programing model to solve the optimization problem of traffic network recovery strategy in ERP, considering the maximum of system resilience and the user path selection behavior after disaster. Also, the paper 12 proposed a novel algorithm to optimize path planning by integrating Frank-Wolfe algorithm into genetic algorithm. However, the basic genetic algorithm (BGA) has shortcomings of redundant search space, low efficiency of evolution operator, and poor population diversity. The genetic algorithm proposed by Zhang et al. 13 also has the same problems as BGA in the optimization of moving path of mobile water sink. Zhang and Wei 14 designed a parallel coding method for bridge inspection and restoration after disaster, and developed a hybrid genetic algorithm that combines traditional genetic algorithms and specially designed heuristic algorithms, improving program's computational efficiency. The paper 14 provided a new idea for the priority decisionmaking issue of emergency inspection and recovery after disaster, but the hybrid genetic algorithm may not be able to obtain the global optimal solution. Wang et al. 15 used MVODM strategy to generate the initial solution of genetic algorithm, and introduced two-way/ three-crossing greedy operator to improve the quality of the solution. However, it has poor population diversity in the middle and later period. Ye et al. 16 proposed a multi-chromosome genetic algorithm with complex mutation operator and multi-chromosome coding method (MCGA), which avoided search space redundancy and speed up the algorithm convergence, but maybe not able to get a better feasible solution.
Through the analysis above, this paper adopts genetic algorithm to study path planning of system resilience recovery. 17,18 The path planning of resilience recovery is a discrete combinatorial optimization problem. The characteristics of the solution of this problem are just in line with the essence and characteristics of genetic algorithm designing. Another, genetic algorithm has strong versatility and good parallelism, so many literatures have used the algorithm to solve similar problems. In addition, some scholars also use particle swarm algorithm or wolf swarm algorithm in the solution of similar problem. As particle swarm algorithm cannot effectively solve the discrete and combinatorial optimization problems, and the wolf pack algorithm is too scattered with low globality, difficult to find the optimal solution. Therefore, it is not as good as the genetic algorithm in terms of the degree of incorporation of the algorithm principle.
Genetic algorithm is a random search algorithm that simulates natural selection theory and biological evolution mechanism. With good global search capabilities and parallelism, it is widely used in path planning, but it also has some shortcomings such as excessive randomness of the initial population, slow convergence, low quality of convergent individuals, low efficiency of genetic evolution operators, poor population diversity, and no control to improve the certainty of evolution direction. At the same time, many literatures look all nodes as having the same importance which is not accord to the situation in practice. Actually we need to repair nodes are more important to recover system resilience in prior.
In order to deal with the problems above, this paper proposes an improved genetic algorithm based on improved initial population and intra-group head-tohead mutation operator (IHGA) to cope with the path planning of system to recovery resilience. This paper makes major innovations and contributions as follows: (1) A planning method is designed to search for optimal path that maximizes system resilience under time, spare parts constraint, and multi-personnel maintenance. (2) This paper proposes a method to improve the quality of initial population, that is, uses a new greedy model and random method together to generate initial population. In it, the greedy model that considers node task importance, travel time, and maintenance time to generate part of high-quality initial population. The mixed initial population including greedy part and random part improves the initial solution that will cause a better solution and faster convergence. (3) IHGA proposes intra-group head-to-head mutation operator (IHMO), which comprehensively considers the effect of both greedy and random individual that control evolution to a better and more determinate direction to promote the convergence speed and final solution.
This paper is organized as follows. Section 1 is an introduction. Section 2 introduces the quotient model to measure resilience and three cases in recovery of resilience system. In section 3, a new greedy model and IHMO operator in IHGA are developed to better solve the control and recovery optimization of system resilience. In section 4, this paper compares and analyses the performance of IHGA with BGA and MCGA in three different cases through simulation. Finally, this paper makes a conclusion in Section 5.

Resilience measurement
This article mainly focuses on the research of network resilience after the recovery strategy. Therefore, it evaluates resilience from the perspective of maximum recovery capacity. Reading through the literature on resilience, it is found that the quotient resilience model is a classic non-time resilience model, 19 which is easy to understand and cited in most of the literature. Because it is difficult to establish a real time-dependent performance function of system resilience and the resilience at a certain moment is easy to obtain, this paper adopts quotient resilience model to measure system resilience as Figure 1. u(t) is denoted as the performance of system in time t.
According to the model, the system resilience is defined as the ratio of recovery value of performance to the loss value, that is, the ratio of the recovered performance level to the reduced performance level is used to measure system resilience: Where R t is the system resilience at time t, Recovery(t) is the system performance recovered at time t, and Loss(t d ) is the loss value of system performance.

Optimization model
In this section, resilience recovery model is established for three cases. Case 1 refers to the situation that requires to maximize the recovery of system resilience within a limited time after damaged. Case 2 is more severe than Case 1, it has more constraints, such as shorter repairing time and adding demands of different types of spare parts. In Case 3, it adds more and unbalanced demand for spare parts of each damaged node. As shown in formula 1 above, system resilience is related to system performance. This paper uses overall task importance of system as a measurement of system performance, the mathematical model of recovery path planning to system resilience is as follows.
Case 1. The mathematical model in Case 1 can be described as: Given that the system is damaged, there are N nodes damaged in need of repair are in uneven distribution, each node with different task importance. Set S maintenance personnel start from different node, and repair damaged nodes respectively in a limited time (t l ). Each personnel must recover at least x nodes and the nodes passed through are not repeated. The goal is to achieve the highest overall task importance of system for maintenance within a limited time, as shown in equations (2) to (4).
Where z i represents the task importance of node i, k represents the index of the maintenance personnel, i represents the index of the node, m represents the start node of the maintenance personnel k, and r represents the end node of the maintenance personnel k. h ik represents the time required for the k-th maintenance personnel to repair the i-th node. It can be seen from equation (2) that the solution to system resilience recovery is the maintenance path which makes f maximum.
Case 2. Case 2 simulates the system suffered attack, there are N damaged nodes for repair are in uneven distribution. It is known that the maintenance of each node equipment requires several A, B, and C spare parts. Let maintenance personnel S carry limited spare parts (G1, G2, G3) and start from different locations to repair within a limited time (t l ). In case 2, the demand of node spare parts is in the range of spare parts A, B, and C to (1, 3). Similarly, each repairer must recover at least x nodes, and the nodes passed during repair process cannot be repeated. Assume that task importance matrix of damaged nodes is Z = (z i ) n*1 , where z i represents task importance of node i, the matrix of spare parts required for maintenance is C = (c ij ) n*3 , and the matrix of time required for maintenance is H = (h i ) n*1 . The system task importance after maintenance is shown in equation (2) above. The mathematical model in Case 2 can be described as formula (5) to (9). s.t.
Case 3. From the relationship between the demand and supply of spare parts for node maintenance, it is significantly required to discuss the issue of system resilience recovery in the case of balanced and unbalanced demand of node spare parts. That is the difference in Case 2 and Case 3. In Case 3, the demand for node spare parts is more than other two cases and unbalanced. The demand for A, B, and C has increased to the demand for spare parts A and B are both (1,5), and that for spare parts C is (3,7). Case 3 simulates the system is more damaged and in an unbalanced need of more spare parts.

Introduction of MCGA
The difference between MCGA and BGA is reflected in the evolution operator. The algorithm proposes a complex mutation operator for the multi-chromosome encoding method, namely the complex mutation operator tree. The operator is generated by a combination of path mutation operator, span mutation operator and sliding mutation operator. Among them, the path mutation operator only performs mutation operations within a single chromosome, that is, exchanges any two gene fragments within any single chromosome. Spanning mutation operator is an operation of mutation between different chromosomes, that is, two gene sequences randomly selected in any two chromosomes are exchanged. The sliding mutation operator is a kind of adjacent mutation in which all chromosomes participate. Specifically, the last gene of each chromosome is moved to the first gene of the previous chromosome to form a sliding mutation operator. In algorithm principle, MCGA adds a variety of random mutations, but it does not control the direction of the mutation and consider the optimization for initial population. Therefore, this article makes further improvements to the genetic algorithm.

The design of IHGA
The recovery path planning is to solve a specific problem involving multiple maintenance personnel simultaneously repairing damaged nodes. In Case 1, we only take time constraint into account. While in Case 2 and Case 3, we should consider both time constraint and spare parts constraint with balanced and unbalanced demand for spare parts. In view of the shortcomings of BGA, such as slow convergence, prematurity and poor solution, this paper proposes IHGA to improve the path planning of system resilience recovery. IHGA is shown in Figure 2. Firstly, this paper adopts coding method of multi-chromosomes integer coding. 20,21 Individual composed of multi-chromosomes with different length represents a solution of path planning. Secondly, the initial population are generated by the combination of random method and greedy algorithm. The details of initial population generation through greedy algorithm are shown in Section 3.2. Thirdly, calculate fitness value of each individual according to fitness function, and record optimal fitness value and path solution. In the process of population iteration, continuously update historical optimal fitness value, which ensures that population always evolves in a better direction.
In order to increase population diversity and search space of solution, this paper adopts an elite selection strategy of paralleled multi-population. 22 Elite individuals are selected from initial population for crossover and mutation. And this paper takes into account the characteristics of multi-chromosome encoding method and initial population generation, based on the ideal of greedy, and proposes intra-group head-to-head mutation operator-IHMO, which eliminates inferior individuals, speeds up algorithm convergence and improves the solution. Details of evolutionary operator are given in section 3.3. After evolution operators are executed, the offspring population is finally generated. Finally, when algorithm stopping criterion (the maximum number of iterations) is met, IHGA gives optimal fitness value and corresponding path solution.

Initial population
In this paper, the initial population is optimized by the idea of greedy combined with randomness. As the initial population size in IHGA is represented by M, the first M/2 individuals are randomly generated, and the rest individuals are selectively generated by greedy algorithm. The algorithm flow chart is shown in Figure 3, and steps are as follows: 1. Among the N damaged nodes, the first node of each maintenance team is randomly generated. 2. Set the following greedy degree model: Where F(X,Y) represents the greedy degree of node X to node Y, degree(X) represents the task importance of node X, d(X,Y) represents the distance between node X and node Y, v represents the walking speed of maintenance personnel, h(X) represents the time required for node X to recover, and A is the adjustment coefficient used to generate reasonable greedy degree.
1. Select the next node for repair in each group according to greedy algorithm, and the next node is the node with the highest greedy degree in current. By analogy, the path of each group for maintenance can be obtained, and then the path of every individual can be obtained. Circulate M/2 times and finally M/2 individuals are generated and added to initial population. Figure 4 shows how this path generated based on greedy model.

Intra-group head-to-head mutation operator
The head node of each-group path for repair in parent elite individuals are matched with the corresponding location node in offspring individuals, which generated by evolutionary operator proposed in Ref. 16 If matched, the location of next node for repair after head node for each-group path is seen as mutation point, and exchange nodes in the corresponding location of parent individual and offspring individual. At last, the operator modifies duplicate data after exchange and generate new offspring population. The above constitutes intra-group head-to-head mutation operator. The schematic diagram of the operator is shown in Figure 5. This paper considers the positive impact of solution obtained by greedy algorithm to individual, so the proposed IHMO retains partial optimal characteristics of improved initial population. When initial population has the same head node with evolved population, the second node brings a fairly positive effects on improvement of solution. So the exchange of the second node and keep other nodes unchanged will evenly improve the population and keep the randomness as well.
This paper proposes a combination of IHMO and the evolutionary operator proposed in [16] to improve solution quality. IHMO will eliminate some inferior individuals. When used together with improved initial population by greedy model, it can speed up convergence, control, and enhance certainty of population evolution, and weaken invalid randomness. As a result, the population evolves in a better direction.

Simulation and analysis
After system is damaged, the node location information, node task importance, node repair time, and number of spare parts required for node repair are shown in Appendix B. The performance of BGA, MCGA and IHGA proposed in this paper are compared and analyzed in three cases, including the number of node recovery, fitness value, system resilience, convergence, and stability. The parameters of three algorithms are shown in Table 1. The mutation probability of 0.38 is an empirical value used by the algorithm in order to increase the diversity of the offspring population. The Case 1 is to simulate a case that require recover system resilience as much as possible within a limited time after damaged, which is mainly used to verify feasibility and effectiveness of IHGA. The Case 2 and Case 3 are used to simulate resilience recovery of two situation. The one is with uneven distribution of damaged nodes after the disaster and balanced demand for node spare parts, the other is with unbalanced demand for node spare parts. They are mainly used to verify adaptability of the algorithm in more real and harsh conditions.

Case 1
The information of damaged nodes in Case 1 is shown in Table B1 in Appendix B. In the experiment, the number of maintenance personnel is set to 5, and the algorithm sets random path groups, and each group contains at least three nodes. The maintenance personnel repair in parallel and the time constraint of each group is T k \ 2000 s. Figure 6 shows location of the damaged nodes. The simulation results of three algorithms are shown in Figure 7. They are respectively recovery paths of BGA, recovery paths of MCGA and recovery paths of IHGA.
The data analysis is shown in Table 2. It is not difficult to find that IHGA is better than BGA and MCGA in algebras to reach optimal solution and the quality of optimal solution. In Table 2, BGA recovers 34 nodes for the simulation of system recovery path in Case 1, MCGA recovered 36 nodes, and IHGA recovered the largest number of nodes, reaching 37 nodes. Similarly, IHGA has the highest system task importance with the smallest algebra to reach optimal solution. This indicates that the IHGA proposed in this paper has made further improvements than BGA and MCGA.
Comparing the evolution process of three algorithms with optimal task importance, as shown in Figure 8.  The initial optimal task importance of IHGA is higher than that of BGA and MCGA, indicating that the method of improving initial population proposed can effectively improve initial solution of algorithm. At the same time, we find that in the process of population iteration, IHGA has the fastest convergence and takes much priority over BGA and MCGA to achieve optimal task importance.
The equation (1) is used to measure system resilience, in which system performance is measured by the importance of system task. The system resilience before and after recovery of three algorithms is shown in Table 3. In it, resilience improvement 1 is the improvement degree of IHGA and MCGA on the basis of BGA, also resilience improvement 2 is the degree of resilience improvement of IHGA on the basis of MCGA.
By comparison, it can be found that the system resilience using IHGA is the highest in Case 1. Compared with BGA, the resilience improvement of MCGA has increased 13.3%, and that of IHGA has increased 16%. And Compared to MCGA, IHGA makes 2.3% improvement on system resilience in case 1.

Case 2
In Case 2, the location of damaged nodes is uneven, as shown in Figure 6. The number of maintenance personnel is set to 5, and the algorithm sets random path grouping, but each group repairs three nodes at least. Besides considering constraints of time and spare parts, this paper discusses two cases of balanced and unbalanced node spare parts demand respectively. The data of spare parts in two cases are shown in Tables B1 and B2 in Appendix B. Maintenance personnel repair in parallel and time constraint of each group is reduced to T k \ 1800 s, and spare parts constraints all meet A \ 14, B \ 14, C \ 14. The demand of node spare parts is balanced, as the range of spare parts A, B, and C is (1, 3).
The simulation results of three algorithms are shown in Figure 9, which are recovery paths of BGA, recovery     paths of MCGA and recovery paths of IHGA, respectively.
The data analysis is shown in Table 4. In it, BGA recovered 28 nodes for the simulation of system recovery path in Case 2, MCGA recovered 31 nodes, and IHGA recovered the largest number of nodes, reaching 33. Similarly, IHGA has the highest system task importance.
Comparing the evolution process of three algorithms with optimal task importance, as shown in Figure 10. The initial optimal task importance of IHGA is higher than that of BGA and MCGA, indicating that the method of improving initial population proposed in this paper can effectively improve initial solution. Similarly, we find that IHGA takes priority over BGA and MCGA to achieve optimal task importance in the process of population iteration. That shows the IHMO proposed in this paper further improves the optimal solution.
The equation (1) is used to measure system resilience, in which system performance indicator is measured by the importance of system task. The system  resilience before and after recovery using three algorithms is shown in Table 5. By comparison, it can be found that the system resilience using IHGA is the highest in Case 2. Compared with BGA, the resilience improvement of MCGA has increased 20.3%, and that of IHGA has increased 23.4%. And Compared to MCGA, IHGA makes 2.6% improvement on system resilience in case 2.

Case 3
In Case 3, the node location information in case 3 is the same as cases 1 and 2, but the demand for node spare parts is unbalanced. The demand for spare parts A and B are both (1,5), and that for spare parts C is (3,7). The simulation results of the three algorithms are shown in Figure 11, which are the recovery paths of BGA, the recovery paths of MCGA, and the recovery paths of IHGA, respectively.
The data analysis is shown in Table 6. In it, BGA recovered 13 nodes for the simulation of system recovery path in Case 3, MCGA recovered 16 nodes, and IHGA recovered the largest number of nodes, reaching 17. Similarly, IHGA has the highest system task importance. Its convergence algebra is better than MCGA which means the IHGA has a good convergence and has the ability to avoid premature as BGA.
Comparing the evolution process of three algorithms with optimal task importance, as shown in Figure 12. The initial optimal task importance of IHGA is higher than that of BGA and MCGA in case 3, indicating that the method of improving initial population proposed in this paper can effectively improve initial solution of algorithm. Similarly, we find that IHGA takes complete priority over BGA and MCGA to achieve optimal task importance in the process of population iteration. The above demonstrates that the IHMO proposed in this paper further improves the optimal solution.  BGA  50  28  170  109  142  MCGA  50  31  170  131  57  IHGA  50  33  170 135 90 Figure 10. The evolution curve of three algorithms in Case 2. The resilience measurement method of equation (1) is used to calculate system resilience, in which system performance indicator is measured by the importance of system task. The system resilience data before and after recovery using three algorithms is shown in Table 7.   Figure 13. The improvement of three algorithm in three cases.
By comparison, it can be found that the system resilience after IHGA is the highest in wartime (Case 3).
Compared with BGA, the resilience improvement of MCGA has increased 17.1%, and that of IHGA has increased 28.6%. And Compared to MCGA, IHGA makes 9.8% improvement on system resilience in case 2. This shows that IHGA has better solution than MCGA, and the system resilience has been significantly improved in the condition of actual wartime, such as the reduction of given time, uneven distribution of damaged nodes, and the increasing and unbalanced demand for spare parts. The significant differences in case 3 reflect the obvious superiority of IHGA under harsh conditions.

Analysis of improvement
This paper lists the improvement on system resilience of three algorithm in three cases as Table 8 and Figure 13. It can been seen from Table 8, in the three cases, the MCGA and IHGA are both better than BGA, and IHGA is better than MCGA and BGA. At the same time, the improvement increases with the complexity of case as in Figure 13.
In Figure 13, IHGA makes obvious improvement than BGA and MCGA and the improvement increases with more complexity and constrains of system. It shows the IHGA has the better performance in system resilience recovery and can be better applied in practice.

Analysis of stability
In three cases above, this paper respectively counts 15 running results of three algorithms. The detailed data is shown in Appendix A. Fitness curve after 15 runs in each case is shown in Figure 14, and the convergence iteration is in Figure 15.
The detailed data of Figures 14 and 15 is in shown in Table 9.   Through horizontal observation, it can be found in the table that IHGA is better than BGA and MCGA in average fitness and average number of convergence iterations. The variance of IHGA's fitness value is smaller than BGA and close to MCGA. This illustrates that IHGA has better effects with optimal solution and good stability.
Through vertical observation, it can be found that the average fitness and the average of convergence iterations of IHGA are significantly improved compared with MCGA in case 3. Simultaneously, the variance of fitness is better than that of BGA and MCGA. This indicates that the stability, convergence and optimal solution of IHGA proposed are better than BGA and MCGA, especially in the harsh conditions of shorter repair time, more and unbalanced demands for spare parts.

Conclusion
This paper proposes an improved genetic algorithm (IHGA) to solve the problem in recovery path planning of system resilience, which optimizes initial population on greedy idea and designs intra-group head-to-head mutation operator (IHMO). Three cases demonstrate that the IHGA can obtain optimal recovery path under less iterations when solving recovery path planning of system resilience. Ultimately, it can effectively improve system resilience. By making the comparison with BGA and MCGA, it is found that the improved genetic algorithm-IHGA is superior to the other two algorithms in convergence, optimal solution, solution improvement, and algorithm stability. In addition, IHGA has more significant advantages especially in the harsh conditions of shorter repair time, more and unbalanced demands for spare parts, which can meet more stringent actual requirements, and has better practical effects in application.

Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was funded by National Natural Science Foundation of China, grant number ''61973282,'' and by Jiangsu University of Science and Technology, Reliability and System Engineering Open Group (JRSOG) Open Fund, grant number 2020002.