System-level performance prognosis based on data augmentation for air brake systems of in-service trains

In order to prognose the performance for air brake systems of in-service trains, a data augmentation method based on correlation analysis and improved multivariate support vector regression was proposed in this paper. By using black box theory and correlation analysis, brake cylinder pressure was extracted as the performance identification signal of air brake system, and the input and output signals were used as the auxiliary signals to construct the black box model of air brake system. Meanwhile, in order to make full use of the obtained data information, a data augmentation prognosis model based on improved multivariate support vector regression was established. Moreover, the optimal parameters of prognosis model were selected by means of particle swarm optimization algorithm, and RMSE and MAE were used as evaluation indicators. Finally, case studies on test-rig performance experiment data of air brake system were conducted. The prognosis model was trained and verified by using the service 1-7N and emergency, service full brake-relief and stage brake-relief experiment data, respectively, and the modeling and calibration error at pressure-holding stage were both in the order of ±5 kPa, which demonstrated the proposed method to be quite accurate and effective.


Introduction
With the continuous development of rail transit network, a large number of trains have been in-service.2][3] As the safety equipment of trains, brake system plays a vital important role in slowing down or stopping trains, and it has become an important constraint on the further improvement of train speed and traction quality.However, due to the harsh working conditions, such as vibration, humidity, electromagnetic interference and frequent usage, performance degradation of brake system may occur inevitably, which further affect the operation safety, efficiency, quality and maintenance costs of trains.Fault prognosis, detection and diagnosis are effective ways to improve system active safety ability and reduce accident risk. 4,5esearches on fault prognosis, detection and diagnosis of brake system have been reported in recent years, 6,7 and the relevant methods mainly contain three categories: model-based, data-driven, and signal processing-based.Ding and Zuo 5 proposed a performance degradation prognosis model based on relative characteristic and long short-term memory network for components of brake systems.Niu and Zhao 8 proposed a fault detection and isolation method for locomotive brake system based on bond graph model.Zuo et al. 9 established a performance degradation monitoring model based on data fusion method for pneumatic brake system.Lu et al., 10 Zuo, 11 and Zhou et al. 12 proposed data-driven methods to detect and isolate faults of sensors, pneumatic units and brake cylinders, respectively.Seo et al. 13 conducted fault diagnosis for solenoid valve of railway brake systems with embedded sensor signals and physical interpretation.It can be found that existing literatures were basically limited to the fault prognosis, detection and isolation methods for components of brake systems, but few researches on system-level performance prognosis for air brake system of in-service trains have been reported.
5][16] As air brake system is a mechanic-electric-pneumatic coupled and time-varying nonlinear system with complex structure and multiple operating modes, it is hard to establish an accurate mathematical model, while data-driven method focuses on data characteristics. 17,18Moreover, time series forms during the service period of air brake system.Therefore, data-driven prognosis method was more appropriate and was adopted in this paper.Considering the nonlinearity of the input and output time series of air brake system under multiple operating modes, commonly used data-driven nonlinear prognosis methods, such as artificial neural networks (ANN) and support vector regression (SVR), were proposed.You et al. 19 proposed a neural network-based method for real-time health status prediction of electric vehicle batteries.Shen et al. 20 used multivariate SVR to predict the remaining life of rolling bearings, and experimental results showed that the method could obtain accurate prediction results in the absence of samples.Although ANN has a strong ability to approximate nonlinear mapping, it adopts a learning method based on empirical risk minimization criterion, which requires a large sample size for training and is prone to overfitting, affecting the generalization ability.While SVR is based on structural risk minimization criterion, which considers both empirical risk and confidence interval minimization.It has strong generalization ability, and requires less samples, short training time and strong anti-noise ability, which can effectively overcome the shortcomings of ANN. 21,22Nevertheless, the kernel parameters have a greater impact on SVR-based prognosis model.Genetic algorithm 23 and particle swarm algorithm 24 were used to optimize the kernel parameters and penalty factors to improve the prognosis accuracy.However, the prognosis model based on SVR also has the deficiency that the continuity and correlation between data points of time series is not fully considered.
In this paper, a black box model of air brake system was constructed, and the data augmentation prognosis model based on an improved multivariate support vector regression algorithm was established by enhancing the utilization of the obtained data, and PSO algorithm was used to optimize the parameters.
The remaining parts of this paper are organized as follows.Section 2 introduces the air brake system and its black box model.Section 3 presents the data augmentation prognosis method.Section 4 describes case studies to demonstrate the effectiveness of the proposed method.Finally, conclusions are drawn in Section 5, with some perspectives on research and development.

Brake system of high-speed train
The microcomputer-controlled electropneumatic brake system (Figure 1(a)) is widely used in trains, and it is mainly composed of driver controller, compressed air supply unit (CASU), electronic brake control unit (EBCU), pneumatic brake control unit (PBCU), and basic brake unit (BBU). 5river controller generates brake command signals and transmits them to EBCU for brake force calculation and distribution.CASU supplies compressed air to PBCU and other air-consuming equipment.PBCU generates brake cylinder pressure according to the instructions of EBCU.BBU transfers brake cylinder pressure into mechanical brake force. 25

Black box model of air brake system
The brake force of trains generally consists of electric brake force and air friction brake force.The electric brake force generated by the traction motor decreases as the train speed decreases, while the air friction brake force is provided by the air brake system, which comes into effect in the cases of service brake and fast brake at low speed, as well as emergency brake.Therefore, air brake system has been used as the inevitable safety system for slowing down and stopping the trains.Since various pneumatic components of air brake system are driven by compressed air, pressure can effectively reflect the performance of air brake system.Therefore, it is of great significance and value to study the change law of pressure for performance prognosis of air brake system.
As the structure of air brake system is complex, the working medium is compressed air, air brake system has a strong nonlinearity, and the components are coupled and influenced by each other.As a result, it is hard to establish an accurate mathematical model in practice, and data-driven prognosis method is more suitable for air brake system.Actually, although the structure is complex, air brake system is a control system determined by a nonlinear mechanism, 26 and in the case of a given input, changes within the system will have an impact on the final output.By introducing the black box theory 27 and focusing on the numerical characteristics, the input and output signals were used as the identification signals to construct the black box model of air brake system.
As shown in Figure 1(b), the pneumatic input of air brake system is brake supply reservoir pressure, that is, P SR , the electric control input is EP current, that is, I EP , and the output is brake cylinder pressure, that is, P BCP .Moreover, P BCP is an important variable to reflect the performance of air brake system, 9 so the performance prognosis of air brake system can be achieved by predicting the time series of P BCP .
In order to ensure the validity of the black box model, it was assumed that P BCP is related to P SR and I EP , and a correlation test based on Spearman correlation coefficient was carried to verify the rationality of the hypothesis, and the calculation formula is as follows 28 r s = 1 À 6 where n is the number of samples, and d i represents the level difference between two variables at the ith moment.

Multivariate support vector regression
According to the black box model of air brake system, it can be known that the input variable of the prognosis model is more than one.Meanwhile, considering the nonlinearity of the input and output time series of air brake system under multivariable operating conditions, the multivariate support vector regression (MSVR) algorithm was considered.The basic idea of MSVR is to establish a nonlinear mapping between multidimensional input vector and output vector.By introducing a kernel function, the original data of the input space is mapped into a high-dimensional feature space, then the nonlinear regression problem in input space is transformed into a linear regression problem in feature space. 29The main steps of MSVR are as follows.
For the training sample set is the input vector, y i 2 R is the output vector, m0 is the number of variables, and n is the number of training samples.The modeling process is to construct a nonlinear regression hyperplane that can fit the input vector and output vector of the training samples, and the nonlinear regression function is expressed as where w is the weight vector, uðÁÞ is the nonlinear mapping function, and b is the bias.In order to obtain w and b, the training error of MSVR is defined as equation ( 3) where f x ð Þ and y are the predicted value and the observed value of P BCP , respectively, and e is the insensitivity.
Secondly, construct a quadratic convex optimization problem, where 1 2 w k k 2 is the regularization term, which is used to reduce the complexity of the model and reduce overfitting, ) is the empirical risk term, C is the penalty factor, which is used to balance the training error and the generalization ability.
Then, slack variables j i and j Ã i are introduced, and equation ( 4) is transformed into a constrained quadratic optimization problem.The optimization objective is as shown in equation ( 5) In order to satisfy the minimization of the original variables w, b,j i , j Ã i , and the maximization of the dual variables a i , a Ã i , b i , b Ã i , according to the KKT complementarity condition, 30 the equation ( 6) is transformed into the dual Lagrangian function with a i and a s:t: The nonlinear regression function is obtained as Due to the high computational cost of the inner product of the nonlinear regression function, u T (x i ) Á u(x j ) in equation ( 8) is replaced by the kernel function K(x i , x j ), and the nonlinear regression function is expressed as

Improved multivariate support vector regression based on data augmentation
Although the MSVR-based prognosis model has strong nonlinear mapping ability, it only considers the correspondence between the input vector and the output vector at the same time during model training.In fact, according to the Takens delay embedding theorem, 31 there is a certain functional relationship between the future value of a time series and its previous m q values, that is, given a time series X n = (x 1 , x 2 , Á Á Á , x n ), the known data at and before time t can be used to predict the data at time t + 1, which is expressed as where m q is the embedding dimension.Based on the above ideas, in order to make full use of the acquired data for prognosis, data augmentation was carried out for the training set and testing set, and an improved multivariate support vector regression (IMSVR) algorithm was proposed.Specifically, different from utilizing the input vector and output vector directly, new input vector X and output vector Y were constructed according to equations (11) and (12).
The nonlinear regression function of IMSVR is obtained as Thus, the IMSVR-based prognosis model is expressed as where xn + 1 is the predicted value of the n + 1 point.

Model parameter optimization and evaluation indicator
Parameter optimization of prognosis model.Kernel function has a great impact on the accuracy of prognosis model, and kernel functions mainly include d-order polynomial, sigmoid and radial basis function (RBF) kernel functions.Among them, RBF kernel function has the advantage that the center of each basis function corresponds to a support vector.Compared with the d-order polynomial kernel function, there are fewer parameters involved in the operation, which reduces the complexity of model.Moreover, the sigmoid kernel function may be invalid when taking some parameter values, so RBF kernel function is selected, which is expressed as where g is the kernel parameter.
In order to further improve the accuracy of the prognosis model, it is necessary to reasonably select the kernel parameter g and the penalty factor C in the modeling process.By comparing Cross Validation (CV), Genetic Algorithm (GA) and Particle Swarm Optimization (PSO), it was found that PSO algorithm has better optimization effect, [32][33][34] and it has been applied in the railway domain in recent years, 35 so PSO algorithm was chosen to optimize the model parameters.
The basic principle of PSO algorithm is to find the optimal solution for an individual in the population through competition or cooperation, which has the advantages of simple algorithm, easy implementation and strong global convergence. 36The implementation steps of PSO algorithm are as follows.
Firstly, a set of random values is initialized as a swarm of particles.The particles have two properties: velocity and position, and the velocity and position of the ith particle in s-dimensional space are denoted as Secondly, the particles search for the optimal solution by means of iterations.During each iteration, the optimal position searched by the particle is noted as pbest and the optimal position of the whole particle swarm is noted as gbest, which are called the personal best and the global best respectively.All particles in the swarm update their velocity and position by using pbest and gbest.The updating equations for particle velocity and position are expressed as. 34 v where v i (l) is the velocity of the ith particle at the lth iteration; l is the number of iterations, w v and w p are the inertia weights; r 1 and r 2 are random numbers between (0, 1), x i (l) is the position of the ith particle at the lth iteration, and c 1 and c 2 are learning factors.After several iterations, the personal best, global best and particle swarm velocity values are continuously updated until the termination condition is reached, completing the selection of the optimal parameters of g and C.
Evaluation indicator of prognosis model.Moreover, in order to evaluate the accuracy of the prognosis model, root mean squared error (RMSE) and mean absolute error (MAE), 37 which are widely used in the field of time series prognosis, are selected as evaluation indicators, and the calculation formulas are expressed as where y k ð Þ is the measured value of P BCP , y p k ð Þ is the predicted value, and N is the number of samples.

Experiment sample acquisition and preprocessing
Considering that air brake system has the characteristics of variable operating conditions, in order to validate the effectiveness of the data augmentation prognosis method, three typical operating conditions of service 1-7N and emergency, service full brake-relief, as well as stage brake-relief performance tests were conducted on the test-rig of brake system of high-speed train (Figure 2) to construct the training set and testing set.According to the black box model of air brake system, the signals collected include EP current, brake supply reservoir pressure and brake cylinder pressure, and the sampling frequency was set as 10 Hz.
For the collected time series data, on the one hand, due to the interference of the working environment, there is measurement noise during data sampling process.On the other hand, the magnitude of the EP current, brake supply reservoir pressure and brake cylinder pressure are not in the same order of magnitude.Using the original data directly will reduce the accuracy of the prognosis model, and even cause the model to fail.Therefore, in order to eliminate the influence of noise and magnitude, five-point moving average filtering on the collected original data was performed.Then, the [0, 1] interval normalization method was used to normalize the filtered data, and the conversion formula is expressed as where x is the original data, x min is the minimum value of the original data, x max is the maximum value of the original data, and x Ã is the normalized data.Figure 3 shows the performance test curves of the EP current, brake supply reservoir pressure and brake cylinder pressure under three typical operating conditions, respectively.

Verification work
Correlation analysis results.The correlation among the EP current, brake supply reservoir pressure and brake cylinder pressure under three typical operating conditions were shown in Table 1 to Table 3, respectively.
It can be seen from the results with ** in Table 1 to Table 3 that there is an extremely significant correlation among EP current, brake supply reservoir pressure and brake cylinder pressure.Therefore, the hypothesis that the brake cylinder pressure is related to the brake supply reservoir pressure and EP current could be validated, which further demonstrated the validity and feasibility of the black box model of air brake system developed in this paper.

Prognosis results
Prognosis model based on IMSVR.Considering the variable operating conditions of air brake system, in order to make the prognosis model have sufficient generalization capability, the performance test data under service 1-7N and emergency operating conditions shown in Figure 3 The key parameters of the PSO algorithm were set as follows: c 1 is 1.5, c 2 is 1.7, w v is 1, w p is 1, the number of termination generations is 200, and the population size is 20.The optimal parameters for the service 1-7N and emergency operating conditions were g = 0.01 and C = 46.22.The prognosis model was trained using the constructed training set with the optimal parameters, and the obtained prognosis model was used to predict the testing set.For analysis purposes, the testing set was aligned with the training set, and the predicted and measured brake cylinder pressure time series curves under the service 1-7N and emergency operating conditions were obtained as shown in Figure 4.
As can be seen in Figure 4, the predicted values of brake cylinder pressure under the service 1-7N and emergency operating conditions follow the measured values well, with the errors within 6 5 kPa, except at the time of stage switching.In addition, the RMSE and MAE of the IMSVR-based prognosis model was 3.0160 and 2.8372, respectively, which showed that performance prognosis model of air brake system based on the proposed method has a high accuracy.
Comparison results of prognosis model based on MSVR.In order to further validate the accuracy of the developed performance prognosis model, the conventional MSVRbased model without data augmentation was used to predict the time series of brake cylinder pressure under service 1-7N and emergency operating conditions.The    optimal parameters were g = 0.01 and C = 6.94.The predicted and measured values of the brake cylinder pressure under the service 1-7N and emergency operating conditions were shown in Figure 5.As can be seen in Figure 5, the predicted values of the brake cylinder pressure time series of MSVR-based prognosis model follow the measured values less well than those of IMSVR-based prognosis model.In addition, the evaluation indicators of the MSVR-based prognosis model were calculated.The RMSE and MAE of the MSVR-based prognosis model were higher than those of the IMSVR-based prognosis model, with the RMSE being 68.12% higher and the MAE being 51.23% higher.Therefore, the prediction accuracy of the IMSVR-based prognosis model was significantly better than that of the MSVR-based prognosis model, and it also proved that the data augmentation method could effectively improve the accuracy of the performance prognosis for air brake system.
Calibration results.In order to verify the generalization performance of the IMSVR-based prognosis model, the performance test data shown in Figure 3(b) and (c) were selected as the calibration signals.The comparison curves of the predicted and measured brake cylinder pressure time series and error curves under the service full brake-relief and stage brake-relief operating condition were shown in Figures 6 and 7, respectively.
As can be seen in Figures 6 and 7, the predicted and measured values of the brake cylinder pressure time series under both service full brake-relief and stage brake-relief operating conditions were in good agreement, and the calibration errors, with the exception of the stage switching moments, were basically within 6 5 kPa.In addition, the RMSE and MAE of the  IMSVR-based prognosis model based on the service full brake-relief calibration signal was 3.1507 and 2.4872, and those based on the stage brake-relief calibration signal was 1.3066 and 0.8855, respectively.
It is obvious that the performance prognosis model of air brake system has a high calibration accuracy under both service full brake-relief and stage brakerelief calibration signals, indicating that the data augmentation method proposed in this paper has good generalization performance.

Conclusion
This paper investigated the performance prognosis method for air brake system of in-service trains.Compared with traditional prognosis strategy, the obvious improvements are that this method shows great accuracy and is practically suitable for systems with complex structure and time-varying nonlinear characteristics.
(1) In view of the complex structure of air brake system, brake cylinder pressure was extracted as the performance identification signal, and black box model of air brake system was constructed.Correlation analysis tests were conducted using Spearman's correlation coefficient, and the validity of the black box model was verified by the performance test data of air brake system.(2) To address the time-varying nonlinear characteristics of brake cylinder pressure time series, as well as to make full use of obtained data, a data augmentation prognosis model was developed based on the IMSVR algorithm, and PSO algorithm was used to optimize the parameters to improve the accuracy of the prognosis model.(3) The performance tests of air brake system under different operating conditions were carried out, and the analysis results of test data showed that the modeling error of the prognosis model established by using the performance test data under different stages was within 6 5 kPa, and the calibration errors under two typical operating conditions of service full brake-relief and stage brake-relief were also within 6 5 kPa, which demonstrated the validity and applicability of the proposed method.
By comparing the prognosis results with those based on the MSVR algorithm, it is further verified that the proposed method has higher accuracy.The method can be applied to extend the data and provide a reference for monitoring the service status of air brake system.In the future, fault warning algorithm of air brake system can be designed by comparing the predicted and measured values.

Figure 1 .
Figure 1.Black box model of air brake system.(a) Schematic diagram of brake system; (b) Schematic diagram of black box model.
(a) were selected to construct the training set and used for the training of IMSVR-based prognosis model.The values of g and C were optimized by PSO algorithm.

Figure 2 .
Figure 2. Test-rig of brake system of high-speed train.(a) Test site; (b) Test console.

Figure 3 .
Figure 3. Performance test curves of the EP current, brake supply reservoir pressure and brake cylinder pressure.(a) Service 1-7N and emergency; (b) Service full brake-relief; (c) Stage brake-relief.

Figure 4 .
Figure 4. Prognosis results under service 1-7N and emergency operating conditions.(a) Comparison curves between the predicted and measured values; (b) Prognosis error.

Figure 5 .
Figure 5.Comparison prognosis results under service 1-7N and emergency operating conditions.(a) Comparison curves between the predicted and measured values; (b) Prognosis error.

Figure 7 .
Figure 7. Calibration results under stage brake-relief.(a) Comparison curves between the predicted and measured values; (b) Calibration error.

Figure 6 .
Figure 6.Calibration results under service full brake-relief.(a) Comparison curves between the predicted and measured values; (b) Calibration error.

Table 1 .
Correlation analysis results under service 1-7N and emergency operating conditions.*inTable1meansthat there is an extremely significant correlation. *

Table 2 .
Correlation test results under service full brake-relief operating conditions.
**Means that there is an extremely significant correlation.

Table 3 .
Correlation test results under stage brake-relief operating conditions.
**Means that there is an extremely significant correlation.