Degradation state identification for hydraulic pumps using modified hierarchical decomposition and image processing

Monitoring the degradation state of hydraulic pumps is of great significance to the safe and stable operation of equipment. As an important step, feature extraction has always been challenging. The non-stationary and nonlinear characteristics of vibration signals are likely to weaken the performance of traditional features. The two-dimensional image representation of vibration signals can provide more information for feature extraction, but it is challenging to obtain sufficient information based on small-size images. To solve these problems, a method for feature extraction based on modified hierarchical decomposition (MHD) and image processing is proposed in this paper. First, a set of signals decomposed by MHD are converted into gray-scale images. Second, features from accelerated segment test (FAST) algorithm are applied to detecting the feature points of the gray-scale image. Third, the real part of Gabor filter bank is used to convolve the images, and the responses of feature points are used to calculate histograms that are regarded as feature vectors. The method for feature extraction fully acquires the multi-layered texture information of small-size images and removes the redundant information. Furthermore, support vector machine (SVM) and nondominated sorting genetic algorithm II (NSGA-II) are introduced to conduct feature selection and state identification. NSGA-II and SVM can conduct the joint optimization of these two goals. The details of the proposed method are validated using experimental data, and the results show that the highest recognition rate of our proposed method can reach 100%. The results of the comparison among the proposed method, local binary pattern (LBP), and one-dimensional ternary patterns (1D-TPs) certify the superiorities of the proposed method. It obtains the highest classification accuracy (99.7%–98%) and the lowest feature set dimension (13–10).


Introduction
The heart of hydraulic system is hydraulic pump, which has the advantages of high pressure and high efficiency. As a component for transferring and converting energy, hydraulic pumps are widely used in the fields of transportation and industry. 1 Failure to troubleshoot hydraulic pumps in time will threaten the safe operation and reliability of the hydraulic system and even cause terrible accidents and economic losses. 2 Therefore, degradation state identification for hydraulic pumps is conducive to the reliable and long-term operation of the system. However, the structure of hydraulic pump is not as simple as other rotating machinery such as bearings. The thermal-solid-fluid coupling existing inside the hydraulic pump makes its fault information hidden. 3 Consequently, a novel method needs to be proposed for feature extraction and applied to the degradation state identification for hydraulic pumps.
The traditional features of time-frequency domain, time domain, and frequency domain are widely used in the fields of fault diagnosis and prognosis. Applying frequency domain analysis and time domain analysis to non-stationary signals is not effective. 4 Although the time-frequency analysis can be used to extract the features of a non-stationary signal more effectively, its stability will be weakened in the presence of a signal with weak fault characteristics and nonlinear characteristics. 5 In recent years, some novel methods for feature extraction have been proposed in the fields of fault diagnosis and condition monitoring. Hongru Li et al. 6 extracted the power entropy and singular entropy of the modified composite spectrum from multi-channel vibration signals of the hydraulic pump and proposed the relative entropy method to fuse the initial features. Ying Jiang et al. 7 proposed a feature indicator called hierarchical entropy based on sample entropy and hierarchical decomposition. Since then, some scholars had combined the hierarchical decomposition with other complexity indicators and proposed some novel feature extraction methods. With the support of hierarchical decomposition, Bing Han et al. 8 developed the single-scale Lempel-Ziv complexity into hierarchical Lempel-Ziv complexity and applied it to fault feature extraction for rotating machinery. In the process of hierarchical decomposition, the length of time series is shortened. Modified hierarchical decomposition (MHD) proposed by Yongbo Li et al. 9 overcome this drawback. The moving-difference and moving-averaging procedure were used to improve hierarchical decomposition, and modified hierarchical permutation entropy was proposed based on permutation entropy and MHD. Cheng Yang et al. 10 extended MHD to multiple scales, and proposed hierarchical multi-scale symbolic dynamic entropy, which is a tensor feature extraction method based on MHD, multi-scale analysis and symbol dynamic entropy. The above studies related to the hierarchical decomposition show that it has the advantage of obtaining the information of signals from high-and lowfrequency components, so that it can reflect the fault information more accurately than traditional multi-scale analysis.
Inspired by the rapidly developing image processing technology, some scholars also applied it to extracting fault features. Kaplan et al. 11 applied texture analysis to diagnosing bearing faults. The one-dimensional (1D) vibration signals were converted into two-dimensional (2D) gray-scale images. Local binary pattern (LBP), a classic texture descriptor, was used to extract image features. Hao Zheng et al. 12 used a corner detection method called features from accelerated segment test (FAST) to identify the feature points in gray-scale images. They then used the unoriented scale-invariant feature transform (unoriented-SIFT) to describe the feature points. Melih Kuncan et al. 13 developed the local ternary patterns for image processing into one-dimensional ternary patterns (1D-TP), which was used to extract bearing fault features from vibration signals. 1D-TP can be considered as a variant of image processing technology. Since deep learning can learn the deep abstract features of data and has a strong data expression, it has also been widely used in the field of mechanical fault diagnosis. Although deep learning can process 1D signals, images have been more widely used as its input. Jianyu Wang et al. 14 proposed a bearing fault diagnosis model transferred from an AlexNet model. The images created by eight time-frequency analysis methods, including short-time Fourier transform, and fast kurtogram, were used as the input of the model. Yongliang Bai et al. 15 proposed an algorithm to represent the frequency spectrum characteristics of vibration signals in image form, and a deep neural network (DNN) classified the images to diagnose the faults of wheelset-axlebox assemblies. Duy-Tang Hoang et al. 16 converted the vibration signals into gray-scale images, and a deep convolutional model automatically extracted features from the images and recognized the bearing faults. Compared with 1D signal, 2D image representation has three main advantages: (1) For machinery working in a noisy environment, vibration signals are usually added with noise. However, when the data is transformed into images, the added noise is considered as the illumination of the light to the image. Therefore, the effect of noise is suppressed. 17 (2) Image representation can provide a comprehensive, detailed, and nonlinear description of the data. 15 (3) From the perspective of human vision, 2D images are of course easier to distinguish. With a simple transformation of the signal, people can more easily classify each signal from the images. 18 Although the application of image processing technology is novel and effective in the fields of fault diagnosis and prognosis, there are still several problems that need to be considered. First, an image often contains a large number of pixels in the field of image processing, while a fault sample is likely to contain a small number of data points in the field of fault diagnosis, which makes a 1D signal into a small-size image. It contains less information, and its local and global information needs to be considered comprehensively. Second, it is challenging to obtain sufficient information based on small-size images. Third, deep learning requires a large number of data samples to meet good recognition results. To extract the degradation features of hydraulic pumps more effectively, it is promising to combine hierarchical decomposition with image feature extraction. In this paper, an approach based on MHD, FAST algorithm, and Gabor filter is proposed to extract features. FAST is a feature point detection algorithm widely used in computer vision. 19 The points with large difference in gray value of surrounding pixels are identified as feature points by FAST algorithm; thus, FAST is a local feature detector and suitable for signals with non-stationary characteristics. 12 The role of the Gabor filter in this paper is to describe the detected feature points. As a powerful tool in the field of computer vision, the Gabor filter proposed by Dennis Gabor is one of the best approaches for texture analysis. 20 Because of its high sensitivity to describing directional features, it is suitable for calculating the gradient of feature points. Wei Jia et al. 21 used histogram of oriented gradients (HOG) descriptor improved by the Gabor filter to extract palmprint features. In this paper, a Gabor filter bank with one scale and several directions is applied to all feature points, and the response histogram is used as the feature vector of a gray-scale image.
In this study, after feature extraction, nondominated sorting genetic algorithm II (NSGA-II) 22 and support vector machine (SVM) 23 are introduced to conduct feature selection and identify different degradation states of a hydraulic pump. Overall, a new strategy based on MHD, FAST algorithm, Gabor filter, NSGA-II, and SVM is proposed to identify degradation states of hydraulic pumps. The results of two experiments show that the proposed method is effective and reasonable. Compared with LBP and 1D-TP, it can obtain the highest classification accuracy with the fewest features.
The following contents are also presented: The details of model establishment for the proposed degradation state identification method are presented in the second section. Experimental test results and discussion content are given in the third section. The fourth section summarizes the full paper and gives conclusions.

Feature extraction
Modified hierarchical decomposition (MHD). Compared with the conventional hierarchical decomposition, MHD proposed by Yongbo Li has a better stability. 9 The calculation process of MHD includes four steps as follows: Step 1: For a given time series QfqðiÞ, i ¼ 1; 2,:::,Lg, an averaging operator O 0 and a high operator O 1 can be expressed according to equation (1) and equation (2) Step 2: The operator O h j ðj ¼ 1 or 0Þ at the hierarchical layer h can be calculated according to equation (3) Step 3: For a given positive integer h and unique vector ½v 1 ,v 2 ,/,v h , the integer e is expressed according to equation where fv m , m ¼ 1,/,hg 2 f1, 0g determines the selection of high or averaging operator at the m th layer.
Step 4: Calculate the component at the node e of the layer h in the hierarchical structure according to the following equation To illustrate the MHD, an example is presented in Figure 1.
The time series Q is decomposed into three layers, and the last layer has eight components. Take the case of h = 3 as an example, as the low-frequency component of Q 2;1 , Q 3;2 is the component at node 2 of layer 3. When e = 2 and h=3, the unique vector ½v 1 ,v 2 ,v 3 = [0,1,0] according to equation (4). Consequently, Q 3;2 can be obtained as (5). Q 3;3 is the high-frequency component of Q 2;1 .

FAST feature detection
As shown in Figure 2, a pixel p is selected from an image, and the calculation method of FAST is as follows: The gray value of p is represented as I p , and an appropriate threshold t is set. There are a series of pixels on a discretized circle with p as the center and r as the radius. For instance, there are 16 pixels on the circle with r = 3 in Figure 2. Any pixel on the circle is represented as z. If the gray value of z ðmarked as I z Þ is not greater than the value of ðI p -tÞ, z is darker than p, and the status of p → z (expressed as S p → z ) is d. Similarly, the value of S p → z is calculated according to equation (6) Equation (6) is applied to all pixels on the circle, and two sets are defined as follows The center p is identified as a feature point if the number of elements in Y b or Y d is greater than n. The parameter n is assigned by the user.

Gabor filter
In a two-dimensional space, the Gabor filter is a Gaussian kernel function modulated by a sinusoidal plane wave, 24 which can be expressed as equation (9) Gðx,y; λ,θ,σ,γÞ ¼ 1 x ' λ (9) its real part is expressed as  G r ðx,y; λ,θ,σ,γÞ ¼ where i 2 ¼ À1, λ is the wavelength of the sinusoidal function, θ controls the direction of the parallel strips in the Gabor kernel, σ is the standard deviation of the Gaussian factor, and γ is the spatial aspect ratio, which controls the ellipticity of the Gabor function. If γ = 1, the image of the Gabor function is circular. A Gabor filter bank can be created by changing θ and fixing other parameters. The angle θ c is calculated according to equation (12) The proposed feature extraction method For a given signal segment QfqðiÞ, i ¼ 1; 2,:::,Lg, the proposed feature extraction method consists of 6 steps. As seen in Figure 3, an example is shown there. In this case, Q is decomposed into three layers, and the Gabor filter bank has six different directions: (0, π=6, π=3, π=2, 2π=3, 5π=6).
Step 1: Apply MHD to Q. Then, ð2 h max þ1 À 1Þ hierarchical components can be obtained (Q is treated as a component), where h max is the maximum value of the layer value k, which is also the total number of layers (Figure 3(b)).
Step 2: Transform each hierarchical component into a gray-scale image. First of all, each component is converted into a matrix according to the method shown in Figure 4. When it comes to the size of the matrix, the difference between M and N is minimized. In other words, a square matrix is the best result. Note that several data points may be discarded. For example, for a given signal with 4088 data points, (M, N) is set to (68, 60), and 8 points are discarded. Then, all the elements of the matrix are converted to values between 0 and 255 according to equation (13) New a i ¼ a i À minðAÞ maxðAÞ À minðAÞ where a i is any element in matrix A, max(A) is the maximum value of elements in matrix A, and min(A) is the minimum value of elements in matrix A. Finally, each matrix is converted to uint 8 type, and the gray-scale images are obtained (Figure 3(c)).
Step 3: FAST feature detection: The feature points in each gray-scale image are detected according to the method described in the subsection "FAST feature detection" (Figure 3(d)).
Step 4: A Gabor filter bank is created according to the method described in the subsection "Gabor filter". Then, each gray-scale image is convolved with the real part of the filter bank. Finally, the gradient magnitude m p and orientation θ p corresponding to each feature point are calculated according to equations (14) and (15), respectively where c ¼ 1; 2,…, n ' , G r ðθ c Þ is the real part of the filter bank obtained according to equations (10)(11)(12), I p is the feature pixel, and * means the convolution operation ( Figure 3(e)).
Step 5: Create n ' bins with different θ c and calculate the histogram within each gray-scale image (HI) as follows (Figure 3 Step 6: The histogram corresponding to the signal segment Q (HS) can be obtained by integrating the histograms of ð2 h max þ1 À 1Þ gray-scale images as follows (Figure 3(g)) It is obvious that HS can be considered as a n ' × ð2 h max þ1 À 1Þ dimensional vector. Feature selection and state recognition based on NSGA-II and SVM.

Feature selection and state recognition based on NSGA-II and SVM
After performing feature extraction, a feature pool that contains amounts of information is created. However, the feature pool generally includes redundant information, which will cause the performance of degradation state identification to be impaired. 25 In this paper, NSGA-II and SVM are introduced for joint optimization of feature selection and degradation state identification.

Nondominated sorting genetic algorithm-II
NSGA is the earliest multi-objective evolutionary algorithm. 26 NSGA-II is an improved version that addresses the shortcomings of NSGA, such as lack of elitism and high computational complexity. As an algorithm based on nondominated sorting, NSGA-II has the same basic process as the original genetic algorithm. 27 With the properties of an elitist strategy, a fast nondominated sorting procedure, a simple yet efficient constraint-handing method and a parameterless approach, NSGA-II has found increasing application in the fields of fault prognosis and diagnosis for equipment. 28,29 The procedure of NSGA-II is summarized as follows: Step 1: Initialize a random population, which consists of N individuals.
Step 2: Sort the population based on the nondominated sorting procedure and assign ranks to all individuals.
Step 3: Utilize selection, crossover, and mutation operators to create the offspring population Q t from the parent population P t .
Step 4: Combine the offspring population and the parent population to form a whole and sort the whole according to the nondominated sorting procedure. Step 5: Continuously select nondominated solutions from the best ranking (F 1 , F 2 , …, F k , …, F n ) to generate a new population P t+1 . The selection is not stopped until the selection of a certain F k makes the size of P t+1 exceed the population size N. Then select individuals from F k according to the crowding distance sorting until the size of P t+1 is equal to N.
Step 6: Go to step 2 if the stop condition is still not met. For a better understanding, Step 4 and Step 5 are illustrated in Figure 5. For other unfinished details about NSGA-II, refer to Deb et al. 22

Support vector machine
As one of the classic machine learning techniques, SVM is widely used in pattern recognition. It learns by example to assign labels to objects. 30 SVM finds an optimal classification hyperplane so that the sum of the distances from this hyperplane to the nearest sample in each class is minimized. Take the binary classification problem as an example, for n linearly separable samples ðx 1, y 1 Þ, ðx 2, y 2 Þ, …, ðx n, y n Þ, x 2 R m (where m is the sample dimension, and y 1 , y 2 , …, y n are classification labels), Finding a hyperplane can be transformed into a solution to the following problem As the only variable in problem (18), λ can be solved by quadratic programming. We assume that the optimal solution λ * ¼ ðλ * 1 , λ * 2 , …, λ * n Þ. For the classification of nonlinear data, the commonly used method is to express the high-dimensional set through the nonlinear mapping function φ, and the classification hyperplane is constructed in a high-dimensional space. The kernel function that replaces the dot product operation is defined as Nonlinear data is mapped to a high-dimensional space, and the problem (18) can be transformed into the following expression where C is the penalty factor. Consequently, the classification function is expressed as equation (21) f ðxÞ ¼ sgn In this paper, radial basis function (RBF) is selected due to its good performance and universal application, 31 and the RBF kernel of samples x and x ' can be expressed as a feature vector defined as follows where 1=2σ 2 = f and f is the kernel parameter. The details of SVM are described in Hearst et al. 32 The proposed method for feature selection and state recognition NSGA-II has two objective functions, recognition accuracy and the total number of features fed into SVM, in this study. The recognition accuracy is the classification accuracy provided by SVM. The chromosomes of individuals in the population are encoded in binary form, "0" means the corresponding feature is not selected and "1" means it is selected. When NSGA-II runs, the algorithm maximizes the recognition accuracy and minimizes the total number of features simultaneously. The multiple optimal solutions called "Pareto Frontiers" result from a multi-objective optimization problem. 33 The Pareto Front with the highest classification accuracy will be selected. In other words, by performing this step, we can get the highest accuracy recognition result and its corresponding lowest-dimensional feature subset. The detailed process is shown in Figure 6.

Suitable parameters determination
There are several parameters in the algorithms described above that need to be determined: Hierarchical layer h, radius r and two thresholds (t, n) of FAST feature detection, the same ðλ, σ, γÞ and different θ (θ c , c ¼ 1; 2,…, n ' ) of Gabor filter bank, the kernel parameter f of SVM, and several parameters of NSGA-II. Note that the size of the filter kernel needs to be determined before using a Gabor filter bank. For the selection of h, insufficient decomposition will be caused by too small h, and time-consuming will be caused by too large h. We set h = 3. In our code, the parameter f of SVM is set to "auto," which enables f to be adaptively changed according to the change of feature subset during the operation of NSGA-II. That is, f = 1/ N f , where N f is the number of features. According to the recommendation in Rosten et al., 34 we choose n = 12 and r = 4. The more the angle parameters θ c we set, the more the detailed description of the image we obtain. However, this also means more timeconsuming. Furthermore, when the number of angles changes in the interval [4,6], it has little effect on the classification accuracy. 35 Consequently, we take the empirical value as ðθ 1 , θ 2 , θ 3 , θ 4 , θ 5 , θ 6 Þ = (0, π=6, π=3, π=2, 2π=3, 5π=6). According to the recommendation in Zhang et al., 36 we choose γ ¼ 1. When it comes to the parameters of NSGA-II, they are set to the following relatively balanced values: the total number of evolutionary generations = 500, crossover probability = 0.6, mutation probability = 0.6, and population size = 30. Compared with other unset parameters, they have weaker effects on the recognition accuracy. Then, there are four more important parameters, t, λ , σ, and Gabor kernel size (Ksize), that need to be fixed. In fact, the relationship between λ and σ can be expressed as follows 37 where b is the half-response spatial frequency bandwidth of the filter. Specifically, inspired by Bianconi et al., 35 we use equation (24) to express the relationship between λ and σ Therefore, we still need to assign values to three independent parameters. The four parameters are selected as follows based on experimental results: t = 3, λ = 2.13, σ = 4, and Ksize = (11×11). The specific details are presented in the section "Experimental validation." The proposed degradation state identification method for hydraulic pump The proposed method consists of the following four steps: Step 1: Acquire the vibration signals for various degradation states of the hydraulic pump and divide them into several samples.
Step 2: Utilize the method described in subsection 2.1.4 to extract features from each sample. 90 features will be acquired according to the formula n ' × ð2 h max þ1 À 1Þ.
Step 3: Integrate the features of all samples into a new dataset, and partition it into a testing dataset and a training dataset with a ratio of 1: 1.
Step 4: Utilize the method described in 2.2.3 to identify the different degradation states of the hydraulic pump, and the best feature subset is also obtained.

Experimental validation
A hydraulic pump test platform shown in Figure 7 was set up for data acquisition, which consists of a cooling system, a control system, a signal monitoring, acquisition and display system, a pressure regulating system, and a drive system.  The hydraulic pump for this study is an axial piston pump with the following parameters: type: L10VS028DFR, displacement at the rated working condition: 28 ml=r, rated pressure: 22 MPa, and rated rotation speed: 1480 r=min: Three acceleration sensors are, respectively, installed in three mutually orthogonal directions as Figure 8. They acquire vibration signals at a sampling frequency of 50 KHz, each sampling lasts for 1s, and the interval between two samplings is 30 s.
The single loose boot is studied in this paper because the loose boot is one of the common fault patterns of hydraulic pump. The gap between the plunger and the boot will increase when the loose boot occurs. To acquire vibration signals close to the actual situation, the normal plungers are replaced with failed plungers obtained after equipment maintenance. Five different degrees of the loose boot, as shown in Figure 9, are considered. Vernier caliper is used to measure the maximum radial distance between the boot and the plunger under the five different degrees. The five measurements considered as the loose degree are 0.12 mm, 0.18 mm, 0.3 mm, 0.42 mm, and 0.64 mm, respectively. Including the normal state which is considered as a special degradation state, a total of six different degradation states are considered. The data of each degradation state is divided into 100 samples, and each sample consists of 4095 data points. The time domain waveforms of several samples are shown in Figure 10. Furthermore, one sample of each degradation state is decomposed into two subsignals using MHD, and their time domain waveforms and gray-scale image representations are shown in Figure 11. When the figure is analyzed, it is difficult to separate the degradation states from each other using the time domain waveforms. However, all images appear to be differentiated from each other. Therefore, the conclusion that the image representation of the signal is useful is reinforced.

Experiment 1: Selection of parameters t, λ , σ, and Ksize
In this experiment, 600 data samples are directly converted into 600 images for testing. The threshold t affects the number of detected feature points. One sample of each degradation state is taken to analyze the relationship between t and the number of feature points, as shown in Figure 12. It can be seen that the feature points decrease with the increase of t. Too few feature points lose too much information and too many feature points lead to time-consuming and acquisition of useless information. Therefore, our goal is to choose a greater t while guaranteeing the recognition accuracy. In addition, the number of feature points in each state is less than 200 except for the loose degree 0.64 mm when the t value is greater than 20, and there are less than 100 feature points in the normal state, which will lose too much information. Therefore, the t value should be selected in the interval [1,20].
The Ksize value should be generally an odd number so that the filter kernel is centered on a pixel. As the Ksize value increases, neighbors in a larger neighborhood of a pixel participate in the convolution operation on the pixel. Some unrelated neighbors may participate if the Ksize value is too great. Likewise, some useful neighbors may be discarded if the Ksize value is too small. 38    The parameter σ should match the texture boundary. The window overlap different texture regions if σ is too great. However, a too small σ causes window-position perturbations and makes the output unstable. 39 To verify the relationship between these three parameters and classification accuracy, we assign values to them for testing. The result is shown in Figure 13. It can be seen that the distribution of scattered points is not messy but regular. The scattered points with high classification accuracy are clustered together. The following phenomena have been noticed: (1) When the value changes in a range greater than 10, Ksize has little effect on accuracy. The scattered points with high classification accuracy are basically in this range.
(2) The scattered points with high classification accuracy are mainly gathered in the open interval σ = (2.5, 5). (3) The ideal value of t is in the interval [1, 7.5].
In order to finally determine the values of the parameters, we set the value ranges of the three parameters as σ ¼ ½2, 6, t ¼ ½1, 7, and Ksize = [11,16]; their incremental step sizes are all 1. The test result is visualized as shown in Figure 14. The two points with the highest classification accuracy can be easily found (surrounded by red dotted boxes). Therefore, the ideal parameter settings are as: t = 3, σ = 4, Ksize = (11×11), and λ = 2.13 according to the equation (24).

Experiment 2: Verification for the necessity of modified hierarchical decomposition
The following four methods are used to identify the degradation states of the hydraulic pump.
Method 1: The steps of Method 1 are the same as those described in subsection 2.4 except that the MHD is not   From the above description of the four methods, it can be seen that the main differences between Method 1 and Method 2 are as follows: The image window in Method 2 is divided into small spatial regions called "cells," and several adjacent "cells" are merged as a "block." However, the image window in Method 1 is not divided into cells. Each method is implemented 10 times to reduce random effects. The highest classification accuracy obtained in each run is shown in Figure 15. Table 1 gives the statistical results of accuracy.
The following information is drawn from Figure 10 and Table 1: The classification accuracy obtained by Method 2 is higher than that obtained by Method 1. What is more noticeable is that the classification accuracy obtained by the proposed method is significantly higher than that obtained by Method 4. The proposed method achieves the highest classification accuracy (100-98.7%). There are 3 reasons for the result. First, in Method 2, more detailed information can be obtained since the image is divided into cells. However, each sample only consists of 4095 data points, which results in a smaller gray-scale image. Therefore, the contribution of dividing the image into cells to the classification accuracy is limited. Second, dividing the image into cells and applying the MHD together will improve the classification accuracy, but it will also produce more redundant information. Third, for samples with a small number of data points, not dividing the image into cells based on MHD and FAST feature detection is more beneficial to obtain useful information. Experiment 3: Comparison among the proposed method, LBP, and 1D-TP As mentioned in Kaplan et al., 11 when LBP is applied to texture feature extraction, all non-uniform patterns are assigned a bin in the histogram, and each uniform pattern is assigned a bin. We set 8 neighbors for the central pixel. In this case, 59 features are acquired. We set 8 neighbors for the central point, threshold β ¼ 4:5 in 1D-TP. In this case, two feature sets, 256 low features and 256 up features, are obtained, respectively. 13 For a fair comparison, the four feature sets obtained by the three methods should have the same dimension. In this experiment, to reduce the dimensionality of feature sets generated by the other two methods except for LBP, principal component analysis (PCA) 40 is introduced. So that all feature sets are 59-dimensional. To facilitate the comparisons of the identification results, the method for feature selection and state recognition described in subsection 2.2.3 is also used after LBP (simplified into LBP-NSGA-II-SVM) and 1D-TP (simplified into 1D-TP-NSGA-II-SVM), respectively. 15 run times are conducted to reduce random effects. The classification accuracy and the corresponding number of features obtained in each run are shown in Figure 16. Table 2 gives the statistical results of accuracy, and the minimum number of features corresponding to the highest accuracy and the lowest accuracy are also shown in parentheses after them. First of all, the proposed method achieves the highest identification accuracy (99.7-98%). Second, it is more evident that the proposed method requires the least features   . Third, other methods get lower identification accuracy (99.3-95.8%), and they require more features . Therefore, the superiority of the proposed method is reinforced. There are three reasons for the results. First of all, the proposed method for feature extraction combines the merits of FAST and Gabor filter. It can comprehensively reflect the local and global information of images. Second, the MHD enhances the integrity of the mined information. Third, in contrast, LBP and 1D-TP are insufficient in reflecting the information of signals.

Summary
To effectively extract the degradation features of hydraulic pumps, a novel method based on MHD and image feature extraction is proposed. The original signals are decomposed and transformed into gray-scale images. FAST algorithm and Gabor filter are applied to the images, and the histograms constructed by the filter response of the feature points are considered as feature vectors. They can fully reflect the local and global information of small-size samples. The effectiveness of the novel method and its superiorities in degradation feature extraction are validated using the experimental data. Furthermore, a new strategy for identifying the degradation states of hydraulic pumps is proposed based on our feature extraction method, NSGA-II and SVM. The experimental results show that the six different degradation states of the hydraulic pump can be successfully identified by the proposed method. This paper has 4 major contributions as follows: (1) The combination of FAST feature detection and Gabor filters can effectively obtain local and global information of the image. (2) MHD was introduced to increase the ability to obtain information. (3) A novel method based on our feature extraction method, NSGA-II and SVM was proposed, which was applied to degradation state identification for hydraulic pumps. (4) The superiority of the method proposed in this paper was demonstrated with experimental signals.
In this study, the experimental data comes from a hydraulic pump with prefabricated faults. The purpose of the experiment is to verify the effectiveness of the proposed method in identifying the degradation state of the hydraulic pump. When it comes to condition monitoring of an actual hydraulic pump, the real-time vibration data of the hydraulic pump is acquired first. After dividing the data into some samples, the feature representations are calculated according to the feature extraction method proposed in this paper. Vibration data of the prefabricated faults has extracted features in advance. In this way, two sample sets are obtained. The sample set belonging to the actual hydraulic pump is considered as the testing set. The other is considered as the training set. These two sample sets are fed to the NSGA-II-SVM proposed in this paper to conduct feature selection and state recognition, and the degradation state of the actual hydraulic pump can be identified. As the identification continues, the training set can be continuously updated, which is conducive to the subsequent identification.
In previous studies, some effective feature indicators were proposed by us. Fusing the degradation features proposed in this paper with them is the focus of further studies.