Research on degradation prediction of rolling bearing based on adaptive multi-GA-BP

Rolling bearings are widely used in industrial equipment. It is of great significance to study the degradation trend of rolling bearings. In this paper, an Adaptive Multi-population Genetic Algorithm (AMGA) is proposed. Firstly, Kernel Principal Component Analysis (KPCA) method is used to fuse the vibration signal in both time domain and frequency domain, which uses kernel to map the sample space to higher dimensional space and uses the higher dimensional space for linear dimensionality reduction. It can effectively reduce the dimension of nonlinear correlation variables and obtain the trend signal representing the Remaining Useful Life (RUL) characteristics of rolling bearing. Moreover, AMGA is proposed to optimize the number of neurons, initial weight, and initial threshold of the Back Propagation (BP) neural network prediction model. AMGA applies chaos algorithm to Genetic Algorithm (GA) to improve the diversity of the initial population. Meanwhile, the communication frequency between different populations is controlled by judging the similarity of the optimal solution among different populations, so as to effectively jump out of the local optimum and obtain the global optimal solution. Finally, the whole life data of the spin-up process of the rolling bearing from University of Cincinnati is taken as an example to analyze the performance of the algorithm. Compared with the traditional BP, the R2-score performance and the MAPE performance of KPCA-AMGA-BP are improved by 0.297 and 2.46% respectively. Furthermore, compared with the optimized BP, this method obtains the improved R2-score performance and the MAPE performance by 0.218 and 0.46%.


Introduction
With the development of technology and large-scale socialized production, reliable long-term and alternate multiple operational States are essential for modern manufacturing. Rolling bearings are widely used in these types of machinery that are normally operated in high-speed and heavy-duty environments. Once damage occurs, the consequences are often catastrophic. 1 Therefore, it is of great significance to predict the degradation of rolling bearing for preventing equipment failure and realizing equipment maintenance based on condition.
The research on the residual life of rolling bearings was put forward at the end of the 19th century. 2 In order to improve the reliability of predictive maintenance, the literatures 3,4 use different methods to improve the prediction accuracy. At present, there are some achievements in the prediction of residual life of rolling bearings at domestic and overseas. Peng, Y., et al. 5 used the Least Squares Support Vector Machine (LS-SVM) to predict the remaining useful life of the rolling bearing. LS-SVM involves penalty parameters, so the determination of penalty parameters has a great impact on the prediction effect. Fengtao, W., et al. 6 used the Gray Model GM (1,1) to predict the residual life of the rolling bearing. Lu, Y., et al. 7 used the Auto-Regressive Moving Average Model (ARMA). Kuncan, M., et al. 8 proposed a new approach that consisted of the one-dimensional local binary pattern (1D-LBP) method and the gray relational analysis (GRA) model to classify the vibration signals. The existing studies show that the machine learning method can effectively improve the accuracy of predicting the remaining life of rolling bearings. However, with the increment of the complexity of vibration signals, the deep learning method starts to show advantages in processing complex vibration signals.
Presently, there are many works on rolling bearing fault diagnosis using different deep learning techniques, such as convolution neural network (CNN), 9,10 recurrent neural network (RNN), 11 transfer learning-convolutional neural network 12 (TLCNN), adaptive deep transfer learning neural network 13 and so on. Through the above methods, satisfactory diagnosis accuracy can be provided. When we process one-dimensional vibration signals of rolling bearing, the training speed of BP neural network is faster than those methods due to the simpler network.
In recent years, research works on the degradation prediction of rolling bearing using BP neural network have been introduced. Zhang, R., et al. 14 used a dictionary learning algorithm to denoise vibration signals and used BP neural network to detect the fault state of rolling bearing. However, the results of BP neural network are often affected by the super parameters in the training process, and there is no clear criterion for the selection of the super parameters. Therefore, Liu, X., et al. 15 and Wan, L., et al. 16 improved the BP prediction model by using Particle Swarm Optimization (PSO), which was applied to the wind turbine blade strain prediction and rolling bearing fault prediction. This method effectively improved the nonlinear fitting ability and efficiency of the single BP prediction model. Cheng, P., et al. 17 and others used Genetic Algorithm (GA) to optimize the weights and thresholds of BP, which improved the efficiency and accuracy of human body shape prediction.
In actual industrial production, the working conditions of the rolling bearing are complicated and various; therefore, how to effectively predict rolling bearing fault under various working conditions is an important problem to be solved. So, in order to improve the efficiency and accuracy of rolling bearing degradation prediction, this paper uses the time domain and frequency-domain feature extraction of the data. Then, we apply the Kernel Principal Component Analysis (KPCA) method to reduce the dimension of the characteristic signal, eliminate the multiple collinearities between variables, and obtain the health index of rolling bearing. Finally, the health index is input into the Adaptive Multi-population Genetic Algorithm (AMGA-BP) prediction algorithm. The overview of the method is outlined in Figure 1.
To summarize, the contributions of this paper are as follows: (1) KPCA method is used to fuse the time and frequency domain signals extracted from rolling bearing in order to obtain effective bearing degradation signals in this paper since that it can fuse multi-dimensional feature signals, improve the signal-to-noise ratio of the data, remove the relevant indicators, and avoid input redundancy. (2) The chaos algorithm is combined with GA to improve the diversity of the initial population of GA in this paper. Meanwhile, the similarity of the optimal solutions among different groups is judged by the multi-population parallel method to control the communication frequency between populations creatively, so as to jump out of the local optimum and obtain the global optimal solution. (3) The number of neurons, initial weight, and initial threshold of BP neural network is optimized by using AMGA, to increase the nonlinear fuzzy approximation ability of BP.
This paper is mainly arranged as follows: Bearing feature signal extraction section describes the theory of feature extraction; The prediction model of AMGA-BP section describes the theory and improvement of degradation prediction model AMGA-BP; Experimental analysis section describes the experiment and comparison between AMGA-BP, GA-BP, BP, and LSTM; Conclusion section describes the conclusion.

Time and frequency domain feature parameters extraction
The time domain characteristic indexes include the dimensional indexes such as Mean value, Skewness, and Kurtosis, as well as the dimensionless indexes such as Form Factor and Margin Factor; the frequency domain characteristic indexes involve the information such as frequency spectrum distribution, vibration energy size, and frequency band position change. 18 To obtain the data with obvious fault features and surface bearing performance degradation, the noise in the original data is removed through feature extraction. The following calculation formulas of the time and frequency domain are shown in Tables 1 and 2.

Feature fusion
The literature 19 shows that KPCA is an unsupervised feature fusion learning method. KPCA can effectively remove the redundant information from the original data and extract the effective information from the original data. Feature extraction based on KPCA is a nonlinear analysis method based on PCA, which mainly uses the nonlinear kernel function to map the input data to high-dimensional feature space, and transforms the nonlinear problem in low dimensional space into the linear problem in high-dimensional space, so as to carry out principal component analysis. 20 Because of the limited degradation that can be reflected by a single index, it is necessary to construct a new index that can comprehensively reflect the deterioration of bearing performance by using the method of kernel principal component analysis.
For N samples in the original space, X k ((k = 1,2,…, N), X k 2R M , M is the number of variables) is the covariance matrix C H of the mapping matrix on the high-dimensional feature space H by means of nonlinear mapping where λ is the characteristic value, V is the eigenvector.
and normalize eigenvectors V m , which satisfies (V m ,V n ) = 1, then, the original sample data X is mapped to the image φ(x) on the high-dimensional eigenspace H on the m-th (m = 1,2,...) eigenvector matrix t m can be expressed as where α i m and C H is the correlation coefficient. Because the data X k mapping in the feature space φ(x) is not necessarily zero-normalization, we can obtainK by using the following formula to zero-normalization the proof K Let λ i be the eigenvalues of the matrixK, and the first s eigenvalue were ranked as principal component characteristics according to their contribution degree. Figure 2 is the framework of KPCA.

The prediction model of AMGA-BP
Parameter optimization based on GA Genetic Algorithm (GA) is an adaptive probabilistic optimization algorithm used in complex systems. The method has been proved to be valid for complex systems. The literature 21 introduced the single and multiple objective Pareto optimal solutions using GA to improve the optimization. Moreover, the multi-objective GA and multi-objective   Particle Swarm Optimization (PSO) 22 were used to generate Pareto optimal solutions for several complexity varying experiments. The basic principle of GA is derived from Darwin's biological evolution principle. Through the simulation of biological evolution processes such as replication, exchange, and mutation, and the corresponding simulation of natural selection pressure according to the demand, the optimal solution can be obtained. 23 GA generates an initial population X. The X population has N individuals. The initial individuals [X 1 , X 2 ,…, X N ] have three main operations in the genetic iteration process: selection, crossover, and mutation. Although GA can achieve convergence globally in theory, it usually achieves local convergence in the practical process. Moreover, it is inefficient, time-consuming, and unstable, so it needs to be optimized.
Parameter optimization based on adaptive multi-population GA AMGA is proposed in this paper to balance the global optimization ability and local optimization capability of GA, and improve the accuracy of the algorithm. The specific improvement methods are as follows: (1) In order to improve the diversity of the initial population, a chaos algorithm is applied to the generation of the initial population. The chaos algorithm is expressed as: when μ = 4, X t achieves a completely chaotic state, makes the population evenly distributed in the range, and effectively improves the diversity of the initial population.
(2) In order to solve the problem that a single population needs to search for the global optimal solution quickly and avoid falling into the local optimal value in the iterative process, the multi-populations hybrid genetic algorithm is adopted. Its basic idea is to make the multi-populations search for the optimal solution with different crossover probability and mutation probability. In the iterative process of the populations, the optimal solution among the multipopulations is transferred to avoid premature convergence of the populations and fall into the local optimum. However, individual migration between populations in each iteration process is easy to be lead to multiple populations entering the same local optimum, so the similarity between populations in each iteration process is calculated. When the similarity between populations is too low, individual migration between populations is carried out to make the populations communicate with each other. The threshold of similarity between populations is set as A (vector distance) and B (difference of optimal solution). If the similarity is too low, it means that the spatial distance of the optimal solution is far between multi-population and the fitness value of the optimal solution is similar. In this case, multi-population communication can make the population jump out of the local optimum. The initial values of A and B are set randomly and experimentally to find the optimal values. Tables 3 and 4 show the optimal solutions of functions 1 and function 2 with different similarity thresholds. The actual optimal solutions of both functions are 0. Figures 3 and 4 show the optimization effect of function 1 and function 2 on the optimization effect of using MGA and traditional GA with different values of A and B. The X-axis is the number of iterations, and the Y-axis is the optimal value. The curve represents the optimal value of the model as the number of iterations increases. The number of iterations required by MGA to find the best is significantly less than that of traditional GA. As shown in the figure, when A is close to 1, the number of iterations to reach the optimal solution is less, indicating that the speed of finding the optimal solution is the fastest. When B is set to 60, the speed of finding the optimal solution is the fastest, too. Function 1 and function 2 are defined as follows: Function 1: Rastrigin Function 2: Sphere Meanwhile, it can be seen from Tables 3 and 4 that the error between the optimal value obtained by MGA and the real optimal value is small. So, the effect of MGA is better than that of traditional GA. However, the effect of the multipopulation genetic algorithm is related to the communication condition, and the best similarity judgment condition can be obtained through multiple experiments. Then parameters A = 1, B = 55 are set in this case.
(3) The communication conditions of MGA are required to be obtained through many experiments, which is the limitation. In order to solve this problem, the similarity threshold is proposed to be adaptive. It is assumed that the similarity between multipopulation is low when the initial population is generated. The similarity at the initial time can be used as the threshold to solve this problem. As shown in Figure 5, the initial value of AMGA solution is close to the best similarity threshold effect that is obtained by plenty times of experiments with other algorithms.
The flow chart of AMGA is shown in Figure 6.

Prediction model based on BP
BP is a typical multi-layer Back Propagation neural network. Its learning process is mainly composed of two parts: signal   forward propagation and error back-propagation. When the signal propagates forward, the signal enters the network through the input layer and passes through the hidden layer to the output layer. The state of neurons in each layer can only affect the state of neurons in the next layer. When the output layer does not get the expected signal, it enters the backpropagation mode, and the network will adjust the weight and threshold through the deviation, so that the output of BP is close to the expected output. After a certain number of iterations, the error is finally controlled in a very small range. 24 Three-layer BP is proposed to predict the residual life of the rolling bearing. As shown in Figure 7, the first layer is the data input layer, which inputs the historical degradation performance data {X 1 , X 2 ,…, X n }. The second layer is the hidden layer containing neurons. The third layer is the output layer which describes the output variables which is the degraded data {Y 1 , Y 2 ,…, Y n } in the future. The data is processed through the second layer W ij and third layer W jk with activation function.

BP prediction model optimized by AMGA
Although BP has a strong ability to deal with nonlinear data, its convergence speed is slow and it is easy to fall into the local optimum. The convergence of BP mainly  depends on the initial weights and thresholds. According to the characteristics of BP, it is a popular and effective method to overcome the defects of the natural heuristic algorithm.
In this paper, AMGA is used to optimize the initial weights and thresholds of BP. Each individual in the population records the ownership value and threshold value of the network. During every training, the weights, and thresholds will be updated. The individual selects, crosses, and mutates the individual population through the network feedback error value as the fitness value. After the evolution, the individual is constantly changing weights and thresholds of the network which are updated to make the BP prediction more accurate. The improved prediction model is shown in Figure 8.

Experimental data
The experimental data are provided by the University of Cincinnati, USA. The Dataset is retrieving from https://ti.arc.nasa. gov/tech/dash/groups/pcoe/prognostic-data-repository. Four Rexnord ZA-2115 double row bearings 25 are installed on the shaft as shown in Figures 9 and 10. Each bearing is equipped with  acceleration sensors in the direction perpendicular to each other, so it can meet the basic conditions of full vector fusion. The motor speed is 2000 r/min, the sampling rate is set to 20 kHz, the data acquisition interval is 20480 points, and the acquisition time is 984*10 min. Bearing 1 is selected to collect the outer ring failure vibration signal, as shown in Figure 11, which is the time domain waveform of the whole life process of the bearing from normal operation to outer ring failure.

Verification results and discussion
Time and frequency feature extraction. Because of the large amount of original data, the time-frequency domain feature extraction can effectively extract features with a large amount of recognition information, which is reliable and can improve the accuracy of the training and testing of the prediction model. 9 time domain signals and 5 frequency domain signals shown in Figures 12 and 13 are extracted from the rolling bearing. It can be seen that each index reflects the bearing degradation trend in different ways. Among them, Kurtosis, Root Mean Square and P1 have obvious degradation trends and Form Factor, Crest Factor, Margin Factor, and P2 are too noisy to be sensitive to early faults, so they require for further feature extraction.
Feature fusion. The process of bearing degradation is complex and mutable, so it is necessary to use multiple characteristic parameters to characterize the whole degradation process, so as to eliminate the parameters which are insensitive to bearing degradation. Compared with the traditional feature extraction methods, the feature extraction method combining time domain and frequency domain features improve the accuracy, adaptability, and robustness of bearing degradation trend prediction.
Through time frequency feature extraction, 9 time domain features and 5 frequency domain features can be obtained, but the contribution of each feature to the life characteristics of rolling bearing is different. KPCA is used to obtain the principal component after time and frequency domain feature fusion. The first principal component with the highest correlation of rolling bearing life is used as the trend signal of the rolling bearing life feature. The first principal component is shown in Figure 14.
Degradation prediction. The obtained trend signal of rolling bearing life characteristics is input into the BP prediction    The blue line is the actual value and the red line is the predicted value. Obviously, the predicted value is much closer to the actual value of using the AMGA-BP prediction. Table 5 shows the comparison of MAE, MAPE, RMSE, and R2-Score under different models, where MAE is mean square error, MAPE is mean absolute percentage, and RMSE is root mean square error. The values are much closer to 0 when the model is more accurate. Moreover, the prediction model achieves accurate performance when R2-Score is close to 1. MAE, MAPE, RMSE, and R2-Score are obtained from equations (8)(9)(10)(11). In the equation y i is the predicted value, b y i is the actual value and y is the mean of actual value. Among them, the evaluation index results of LSTM 20 are obtained by trying to replicate. The results from AMGA-BP of these evaluation indexes are better than other methods, which show that this method is indeed effective.

Conclusion
In this paper, the KPCA method is used to fuse the time domain and frequency domain signals of rolling bearing vibration signals obtained in complex environments, and obtain the whole life trend of rolling bearing effectively. Moreover, KPCA-AMGA-BP prediction model is proposed, which is higher than BP, GA-BP, and LSTM method in prediction accuracy. AMGA uses the distance and fitness comparison of the optimal solution in the iterative process of two populations to determine whether the two populations are allowed to carry out gene exchange, which limits the exchange conditions and increases the effectiveness of the exchange between populations in the optimization process. The proposed prediction model uses AMGA method to optimize the number of neurons, the initial weights, and thresholds of BP, which increases the prediction accuracy of BP.
The experimental results show that the proposed method can accurately predict the future degradation trend of bearings within the allowable error range. Compared with other similar methods, this method has certain advantages under MAE, MAPE, RMSE, and R2-Score, and can provide new ideas for the related research in this field.