Long-short term memory and gas path analysis based gas turbine fault diagnosis and prognosis

At present, the main purpose of gas turbine fault prediction is to predict the performance decline trend of the whole system, but the quantitative and thorough performance health index (PHI) research of every major component is lacking. Regarding the issue above, a long-short term memory and gas path analysis (GPA) based gas turbine fault diagnosis and prognosis method is proposed, which realizes the coupling of fault diagnosis and prognosis process. The measurable gas path parameters (GPPs) and the health parameters (HP) of every main component of the goal engine are obtained through the adaptive modeling strategy and the gas path diagnosis method based on the thermodynamic model. The predictive model of the Long-Short Term Memory (LSTM) network combines the measurable GPPs and the diagnostic HPs to predict the HPs of each major component in the future. Simulation experiments show that the proposed method can effectively diagnose and predict detailed, quantified, and accurate PHIs of the main components. Among them, the maximum root mean square error (RMSE) of the diagnosed component HPs do not exceed 0.193%. The RMSE of the best prediction model is 0.232%, 0.029%, 0.069%, and 0.043% in the HP prediction results of each component, respectively.


Introduction
As a combustion engine, a gas turbine utilizes a continuing flow of hot gas as an actuating medium so as to impel the vane wheel to promotes high-speed rotation and make the chemical energy of fuel converted to useful work. According to the purpose of gas turbines, they can be divided into micro gas turbines, small gas turbines, heavy-duty gas turbines (HDGT), aero engines, and ship engines. 1 After the 1950s, with its widespread application of gas turbines in the aviation industry, owing to their exceptional performance such as rapid maneuverability, high thermal efficiency, and environment friendly, gas turbines have received more and more attention from the industrial power stations, oil and gas transmission pipeline, and shipbuilding industry. 2 As the power kernel of the aviation industry, shipbuilding industry, and industrial power stations, ensuring its safe and stabilized operation is the key.
HDGTs mainly use natural gas as fuel and are widely used in gas-steam combined cycle power plants. According to statistics, as of 2015, the power 1 School of Energy and Mechanical Engineering, Shanghai University of Electric Power, Shanghai, China 2 College of Electronic and Information Engineering, Shanghai Dianji University, Shanghai, China generation of HDGTs accounted for about 22% of the total global power generation, and it is still increasing steadily. HDGT power generation is the third-largest power generation method in the world after coal power and nuclear power. In the 21st century and beyond, HDGTs will become the nucleus power device of energy-efficient conversion and clean utilization systems. The huge demand for power station gas turbines has driven the rapid development of HDGTs. It is estimated that 12,591 new HDGT engines for power generation will be added in the next 10 years from 2015, and the manufacturing cost will exceed 152.9 billion US dollars.
During the operation of the engine, under extreme cases (high temperature, high pressure, high speed, high heat flow, high-intensity combustion, high mechanical stress as well as thermal stress inside the engine, as well as the ambient dirty environmental conditions), its major components (such as compressors, combustion chambers as well as turbines), as the increase of the running time, various performance degradation or damages, such as dirt, leakage, erosion, thermal deformation, internal/foreign object damage and so on, will be caused. 3 With the rapid growth of gas turbine technology, its maintenance costs are getting higher and higher. According to reports, in 2009, the global industrial gas turbine operation and maintenance costs exceeded US$18 billion, and it is increasing fleetly. Among the life period cost of F-class gas turbine power stations, 15%-20% of the cost is used for operation and maintenance. In particular, 10%-15% of the whole cost is used for maintenance, and with the stride of science and technology, the percentage of the base station and fuel costs gradually decline, the percentage of maintenance costs progressively rises. 4 Nowadays, the major gas turbine manufacturers of the world are exploiting the next generation of gas turbines (such as H-class and G-class), which have higher initial gas temperature, pressure ratio and power. Furthermore, HDGTs have complex working environments and variable operating conditions. High parameters, intricate environments, and frequent changes in operating conditions augment the risk of breakdown. For this reason, with the advancement of HDGTs, it has ever-higher requirements on reliability. Currently, the everyday maintenance strategy of gas turbine users at home and abroad usually adopts preventive maintenance, which is usually based on the equivalent operating hours (EOH) provided by the manufacturer to determine if minor inspection, examination of the hot gas path, or main overhaul is needed. 5 Regarding the shutdown and maintenance of the engine, whether it is planned or unplanned, as well as the widespread disrepair and over-repair, it always leads to high operation and maintenance costs. 6 In the light of the practical performance and health of the engine, the user adopts corresponding maintenance strategies through monitoring, diagnosis, and prediction. Maintenance is carried out according to the health of the engine to maximize the service life, reduce operation and maintenance expenditures, and improve the availability and reliability of the engine. 7 Generally, fault diagnosis and prognosis are two main processes that determine the reliability and efficiency of state-based maintenance. 8 Since the gas turbine is a non-linear coupled thermal system, changes in environmental factors (like atmospheric temperature, pressure, and relative humidity) and operating conditions (like startup, shutdown, and dynamic load/unload mode operation) will cause the internal condition of the gas turbine thermal system (such as performance) major changes. 9 This brings great challenges to how to use effective methods to diagnose and predict the performance degeneration, ageing, damage, and failure of the components of this strongly nonlinear thermodynamic system. [10][11][12][13] In terms of fault prognosis, existing methods are usually carried out based on a certain thermodynamic monitoring parameter. For example, the performance status of an aero engine is mainly measured by the exhaust gas temperature margin (EGTM) of the highthrust engine during the take-off phase of the aircraft. The regression prediction model of the EGTM and the number of take-offs is framed to predict the performance decline trend of the engine. 14, 15 Ahsan et al. 16 used the data preprocessing process of data segmentation (under different working conditions)-normalization-smoothing-selection, to detect the turning point of the initial performance degradation and the turning point of the transition from the normal performance degradation stage to the accelerated degradation stage, and combined with particle filter, a prediction model of the remaining service life of the turbofan engine is established. This kind of thermodynamic system time series regression (TSR) prediction method based on a certain thermodynamic monitoring parameter can usually only give the performance decline trend of the whole system, but cannot give detailed and quantified performance health indicators (PHIs) of each main component. How to further convert the component HPs extracted by fault diagnosis into component HPs that can be used for fault prognosis, so as to realize the effective coupling between fault diagnosis and degradation prognosis, is a problem worthy of further discussion. Alozie et al. 17 proposed a thermodynamic model decision-making-based turbofan engine fault prognosis method. The method first diagnoses the HPs of each component in real time through the thermodynamic model, and then performs TSR modeling based on the component HPs obtained by the diagnosis to obtain a performance degradation model, which is used to predict the remaining service life. De Giorgi et al. 18 proposed a turbojet engine performance degradation prediction method based on two series neural networks. Firstly, the measurable gas path parameter (GPPs) are used as input and the first neural network is used to establish a turbojet engine performance prediction model, and then the output of the performance prediction model is used as input to use the second neural network to establish a turbojet engine performance degradation model, and the effectiveness of this method in predicting compressor fouling and turbine corrosion is discussed through simulation tests. At present, the performance prediction of gas turbines is more focused on the research of aero engines, and mainly adopts traditional methods based on physical models (such as wear models) or neural networks based on data driving. There are few studies on the performance prognostication of HDGTs for ground power generation, and there is still a lack of research on the performance prediction of the combination of gas path analysis (GPA) methods and deep learning networks.
This paper mainly studies a multi-dimensional TSR prediction method that conforms to the gas path diagnosis (GPD) information of each main component, and a long-short term memory and GPA-based gas turbine fault diagnosis and prognosis method is proposed. This article collects the measurable GPPs of the target gas turbine during the operation process and uses the GPA method to obtain the real component characteristic diagram of each component of the target engine. Compared with the corresponding performance of gas turbine components considered ''healthy,'' the deviation degree of the characteristic map is quantified to reveal the degradation of its performance. In terms of prediction, this paper combines the gas path measurable parameters collected according to the running time and the component health parameters (HPs) diagnosed through the GPA method and uses the Long-Short Term Memory (LSTM) neural network to predict the future performance degradation of each component of the gas turbine, so as to realize the effective coupling of the process of fault diagnosis and prognosis.

The proposed method
The existing gas turbine fault prediction can only get the performance decline trend of the whole machine, but can not get the detailed and quantitative performance health status of every main element, which is not conducive to making proper and legitimate optimal control and maintenance tactics. Thus, a multidimensional TSR prediction means that integrates the GPD message of each major component is urgently needed to be studied. The gas turbine fault diagnosis and prediction method needs to have the ability to decouple the thermodynamic system which has a strong nonlinear coupling relationship (input-state-output) in a thermodynamic mechanism way. The input and output parameters obtained are used to solve the internal state parameters that cannot be directly measured, and the appropriate component health measurement standards are introduced to quantify the health state to achieve the goal of fault diagnosis and prediction. The fundamental problem is to study the mathematical methods of optimal identification of fuzzy data and TSR. However, some basic theoretical methods such as mechanism modeling, specialization identification, TSR, etc. still need to be continuously improved. This determines that there is still much research work to be carried out in the theory of gas path fault prediction and diagnosis. In the paper, a long-short term memory and GPA-based gas turbine fault diagnosis and prognosis method is proposed, seen in Figure 1, to realize the effective coupling of the process of fault diagnosis and prognosis.
The concrete steps are as follows: Firstly, the engine thermodynamic model of the object to be diagnosed is constructed by adaptation modeling strategy based on the measurable GPPs when the engine is just put into operation or when it is healthy. In this way, the component characteristic maps of the current thermodynamic model are matched with the real component characteristic maps of the machine within the range of large variable conditions, which can eliminate two uncertainties: (1) eliminate the uncertainty (i.e. engine-to-engine variability) caused by manufacturing and installation deviations between different gas turbines of the same type; and (2) eliminate the changes in the characteristics of each component caused by the uncertainties introduced by different disturbances and unknown initial conditions (e.g. different environmental conditions, operating conditions, maintenance, and overhaul, etc.). The thermodynamic model after adaptation is used as the driving model for subsequent GPD, as shown in Figure 2.
Secondly, with the operation of the engine, using the measurable GPPs of the gas turbine, the thermodynamic model is adaptively adjusted through the component characteristic line to further match the component characteristic map of the actual target engine to keep track of the engine performance and health in real-time. The component HPs (such as compressor, turbine flow characteristic index, and efficiency characteristic index) reflect the deviation of component characteristic lines caused by performance degradation, to obtain the overall health of the gas turbine.
Finally, combining the measurable GPPs of the engine and the HPs of the main components that have been diagnosed, a multi-input, multi-step, multi-output time series prediction model by the LSTM network is designed to forecast each main component future HPs. The health parameters of each component at the previous time and the measurable GPPs at that time are used as input vectors, and the health parameters of each component at that time are predicted through multiple time steps, so as to realize the effective coupling of fault diagnosis and prognosis process, seen in Figure 3.  The method proposed in this paper can effectively diagnose and predict detailed, quantified, and accurate performance and health indicators of each main component.

Application and analysis
Taking a certain type of single-shaft HDGT as the research object, considering the ISO2314 standard (the international standard for gas turbine acceptance testing), the equipment configuration diagram for thermal modeling using the equivalent cooling flow processing method is shown in Figure 4.
In Figure 5, based on EOH, the aging curve of a HDGT under full power working conditions is shown.
And the detailed gas turbine boundary condition parameters and measurable GPPs of this heavy-duty engine are illustrated in Table 1.
And the detailed gas turbine component HPs of this HDGT are illustrated in Table 2.
And the influence of different gas circuit component failure modes on component HPs can be referred to in our previous work. 19 The component failure modes and the corresponding health parameter change ranges of the case study in this paper are shown in Table 3.
In our previous work, 19,20 two thermodynamic models of single-shaft power generation gas turbines were developed by the MATLAB/Simulink software platform to test the effectiveness of the method put forward in this paper. One of them is a transient or dynamic heat gas turbine model, as shown in Figure 6.   By infusing different component HPs SF into the dynamic performance engine model, the gas circuit failure is simulated. Another thermal model is established by VC++ &MATLAB mixed programming, in which the working fluid thermal property calculation program is framed by VC++ and compiled into a dynamic link library file apposite for MATLAB calling. Thence, the dynamic performance model of the single-shaft power generation gas turbine can be taken as the target engine which implanted component degradations SF, and the engine performance model can be undertaken by the steady-state performance model of a single-shaft power generation gas turbine.
In addition, this paper collects the actual environmental pressure(kPa), temperature (°C), and relative humidity (%) conditions of the location. The atmospheric environment data source is NCDC (National Climatic Data Center, https://www.ncdc.noaa.gov/) and https://en.tutiempo.net/climate. The sampling time is from August 19, 2018 to December 31, 2019, the sampling frequency is once every 3 h, and a total of 3905 sets of data. And in order to test the effectiveness of the GPA method in the diagnose of the health parameters of each component under the transient operating conditions of the gas turbine, the electrical load (kW) of the dynamic performance engine model was changed according to the following factors at the same time. As shown in Figure 7.
Firstly, the relational steady-state performance model of the object to be diagnosed is constructed by adaptation modeling strategy based on the measurable GPPs when the engine (i.e. the dynamic performance engine model) is healthy. And the specific principle of adaptation modeling strategy can be consulted in our previous research work. 19 Secondly, with the operation of the engine, using the measurable GPPs of the engine (i.e. the dynamic performance engine model), the HPs of each main component can be diagnosed by the GPD method based on the related steady thermodynamic mode in real-time, as shown in Figure 8. Where, SFFC, SFEC, SFFT, and SFET respectively represent the compressor flow index, compressor isentropic efficiency index, turbine flow index, and transparency entropy efficiency index. The red line is the fault trend of the flow capacity implantation, and the black line is the fault trend of the isentropic efficiency implantation.
It can be seen from Figure 8 that the diagnosed HPs of each component are consistent with the trend of the HPs of the major components implanted in the goal engine, and among them, the maximum root mean square error (RMSE) of the diagnosed component HPs does not exceed 0.193%.
Finally, combining the measurable GPPs of the engine and the HPs of the main components that have been diagnosed, a multi-input, multi-step, multi-output time series prediction model by the LSTM network is    designed to predict the HPs of each main component in the future, so as to realize the effective coupling of the process of fault diagnosis and prognosis, as shown in Figure 9. The steps of defining and implementing the LSTM prediction model are as follows: (1) Data normalization. Due to the different nature of the data, the order of magnitude varies greatly. In order to avoid large network prediction errors due to the large difference in the magnitude of the input and output data, this paper uses the maximum-minimum method to normalize the data. This article uses ''scaler=MinMaxScaler(feature_range=(0, 1))'' function method to convert the data to values between [0,1].
This article uses the most important Sequential model in Keras, and the construction method is ''model=Sequential();''. The first hidden layer of the model defined in this article has 128 neurons, a hidden layer before the output layer has 32 neurons and 4 neurons in the output layer for predicting the health parameters of each component, and the construction method is ''model.add(LSTM(128, input_shape=(-train_ X.shape [1], train_X.shape [2]),return_sequences =False)); model.add(Dense(32)); model.add (Dense(4));''. The time step size and the number of LSTM network layers are the discussion objects that affect the prediction accuracy, and the nature of the input data is 12 measurable GPPs and 4 component HPs, so the input shape is n time steps and has 16 features. In addition, when adding a layer of LSTM, return_sequences=True. So as to test the effectiveness of the multidimensional TSR forecasting method proposed in this paper, which integrates the GPD information of all major components, the former 3124 time series samples of the component HPs that have been diagnosed are selected as the training samples of the LSTM time series prediction model, and the later 781 time series samples of the component HPs are selected as the test samples of the LSTM time series prediction model. In order to further compare the utility of the method proposed in this paper, recurrent neural network (RNN) and LSTM neural network models that only use gas path measurable parameters as input vector are selected as the comparison algorithm. And the effect of the time step size of 4, 8, 16, 32, 64 on the accuracy of time series prediction is also studied. And the structure of different deep neural network models is shown in Table 4.
And the prognostic results by different deep neural network models are shown in Table 5 and Figures 10   Figures 10 to 13, two-layer LSTM prediction models S-LSTM2 + 1 and S-LSTM2 + 16 with gas path measurable parameters and health parameters as input parameters and time steps of 1, 16 respectively, and the RNN and LSTM prediction models that only take gas path measurable parameters as input parameters are selected for comparison.
From Table 5 and Figures 10 to 13, it can be seen that compared with the prognostic results by RNN and LSTM models that only use gas path measurable parameters as input vector, the prognostic results by the proposed method in the paper show better accuracy, and the average RMSE of the prognostic results of each component's health parameter is 0.292%, 0.065%, 0.121%, and 0.067%, respectively. For compressor flow parameters (SFFC), compressor efficiency parameters (SFEC) and turbine flow parameters (SFFT), the best deep learning prediction model is S-LSTM2 with a time step of 16. For the turbine efficiency parameter (SFET), the best deep learning prediction model is S-LSTM1 with a time step of 32. Based on the prediction results of various health parameters, the S-LSTM2 model with a time step of 16 has the best prediction effect.

Conclusion and discussion
Existing gas turbine failure prediction methods lack detailed and quantified performance and health indicators for every main component, which creates difficulty to the advance of suitable and reasonable majorization control and maintenance tactics. Thus, it is imperative to research a multi-dimensional TSR forecasting method that integrates the GPD information of each major component. This paper proposes a multidimensional TSR forecasting method that integrates Figure 9. A long-short term memory and GPA based gas turbine malfunction diagnosis and prognosis method.      the GPD information of each major component, and a long-short term memory and GPA-based gas turbine fault diagnosis and prognosis method is proposed, to realize the effective coupling of the process of fault diagnosis and prognosis. The following meaningful conclusions can be drawn through the simulation experiment test: (1) The method proposed in this paper can obtain detailed, quantified, and accurate PHIs of each main component in the transient operation mode of the gas turbine, and achieve the purpose of accurately predicting the component performance degradation (fault evolution process) in the future time sequence. It can provide theoretical guidance for drafting adequate and wise majorization control and maintenance strategies. and LSTM neural network models that only use gas path measurable parameters as input vectors, this paper proposes a deep learning prediction network with measurable gas path parameters at time t and component health parameters at time t 2 1 as input parameters, whose prediction result has better accuracy. Combining diagnosed component health parameters as input vectors can make the LSTM network better learn the development of component health parameters over time, and grasp the degradation trend of main components over time. Based on the prediction results of various health parameters, the S-LSTM2 model with a time step of 16 has the best prediction effect.

Author contributions
Hongyu Zhou, Jingchao Li and Yaofei Jin drafted the manuscript; Yulong Ying proposed the methodology; All the authors gave final approval for publication.

Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The research work is supported by the National Natural Science Foundation of China (No. 51806135).

Data availability
The knowledge data samples for training and testing the deep learning model have been uploaded as part of the electronic Supplemental Materials.