Intelligent diagnosis of rolling bearing compound faults based on device state dictionary set sparse decomposition feature extraction–hidden Markov model

Identification of rolling bearing fault patterns, especially for the compound faults, has attracted notable attention and is still a challenge in fault diagnosis. Intelligent diagnosis method is an effective method for compound faults of rolling element bearing, and effective fault feature extraction is the key step to decide the intelligent diagnosis result to some extent. The sparse decomposition method could capture the complex impulsive characteristic components of rolling bearing more effectively than the other time–frequency analysis method when compound fault arises in rolling bearing. Based on the self-learning dictionary under different operating states of the device corresponding to the special features modes, an intelligent diagnosis method of rolling bearing compound faults based on device state dictionary set sparse decomposition feature extraction–hidden Markov model is proposed in the article. First, characteristic dictionaries of rolling bearing under different operating conditions are extracted by sparse decomposition self-learning method, and state dictionary set of rolling bearing is constructed. Then, the compound fault signals of bearing are transformed into sparse domain using the constructed dictionary set to extract sparse features. At last, the extracted sparse features are used as training and testing vectors of hidden Markov model, and satisfactory intelligent diagnosis results are obtained. The validity of the proposed method is verified by compound faults of rolling element bearing. In addition, the advantages of the proposed method are also verified by comparing with the other feature extraction and intelligent diagnosis methods, and the proposed method provides a feasible and efficient solution for fault diagnosis of rolling bearing compound faults.


Introduction
The rolling element bearing is one of the most commonly used rotating components, and it is meaningful to study effective fault diagnosis methods to avoid catastrophic accident. The fast Fourier transform (FFT) and envelope demodulation are the two classical signalprocessing methods for this purpose. In recent years, kinds of related new signal-processing methods for fault diagnosis of rolling element bearing have been arising such as wavelet transform (WT), 1 tunable quality-factor wavelet transform (TQWT), 2 ensemble empirical mode decomposition (EEMD), 3 Spectral kurtosis (SK), 4 and modified fast kurtogram. 5 However, most of the above techniques are only effective for single fault of rolling bearing, which means that if faults occur in different parts of rolling element bearing at the same time, most of the above methods would not work effectively: (1) coupling effect phenomenon will exist in the compound faults usually. (2) The gearmeshing signal with its harmonics will make a big interference if compound faults arise in the gearbox. Fault diagnosis of rolling bearing compound fault is not only challenging but also is a hot area. Amounts of literature arise in recently years, and most of them are mainly focusing on classification of compound faults using intelligent algorithms. The support vector machine (SVM) was improved in paper 6 and the oneagainst-all multiclass support vector machine (MSVM) method was proposed, and which was combined with heterogeneous feature models and used in multiple combined fault diagnosis of bearings. The experimental results showed that the classification performance could be improved effectively by the proposed method. A step-by-step compound faults diagnosis method for equipment based on majorization-minimization (MM) and constraint sparse component analysis (SCA) was proposed to separate the compound faults. 7 A novel method called deep decoupling convolutional neural network was proposed for intelligent compound faults diagnosis successfully. 8 A novel method called multi-scale feature extraction (MFE) and MSVM with particle parameter adaptive (PPA) was proposed for intelligent multiple-fault diagnosis. 9 To diagnose compound faults of locomotive roller bearings accurately, a novel hybrid intelligent diagnosis method was proposed, 10 and the diagnosis results of the compound faults of the locomotive roller bearings verified that the proposed hybrid intelligent method may accurately recognize compound faults. A new method based on use of combined mode functions for selecting the intrinsic mode functions instead of the maximum cross correlation coefficient-based EEMD technique, sandwiched with, convolution neural networks, which were deep neural nets, used as fault classifiers. 11 A multifault diagnosis method of rolling bearing elements by combing wavelet analysis with hidden Markov model (HMM) was proposed. 12 An effective method for multi-fault diagnosis was presented with optimizing signal decomposition levels using wavelet analysis and SVM. 13 A novel method of rolling bearing fault diagnosis based on a combination of EEMD, weighted permutation entropy, and an improved SVM ensemble classifier was proposed. 14 The stationary WT and singular value decomposition were combined and the stationary wavelet singular entropy was proposed, which was used to extract fault feature of compound faults of rolling bearing. The extracted features were passed on to a kernel extreme learning machine classifier and satisfactory results could be obtained through experiment verification. 15 The multi-scale wavelet entropy of rolling bearing compound faults was computed and used as training and test input of the kernel extreme learning machine classifier, and the results verified that the proposed method had slight much better diagnosis accuracy than other related methods. 16 Based on the above literature, it could be concluded that there are two key steps in intelligent diagnosis of rolling bearing compound faults: feature extraction and efficient intelligent classification algorithm. As a relative signal-processing method, sparse decomposition could capture the implicit characteristics of vibration signal when fault arises in rotating machinery, and it has been used widely in fault diagnosis of rotating machinery and other areas. 2,[17][18][19][20] However, most of papers relating to fault diagnosis of rolling bearing basing on sparse decomposition mainly focus on single fault of rolling bearing, and it has great application potential in feature extraction of rolling bearing compound fault. In recent years, kinds of intelligent classification algorithms such as deep belief network (DBN), 21,22 generalized linear regression model, 23 hidden Markov random field, 24 hybrid MLPNN-ICA, 25 hybrid CNN-MLP, 26 and so on have been arising. However, most of these intelligent methods are used mainly in image classification. Besides, HMM is an effective and mature intelligent algorithm, which has been used widely in fault diagnosis of rotating machinery [27][28][29] and the authors of this article also has done some work on HMM. 30 So, the article proposes an intelligent diagnosis method of rolling bearing compound faults based on device state dictionary set sparse decomposition feature extraction-HMM. First, characteristic dictionaries of rolling bearing under different operating conditions are extracted by sparse decomposition self-learning method, and state dictionary set of rolling bearing is constructed. Then, the compound fault signals of bearing are transformed into sparse domain using the constructed dictionary set to extract sparse features. At last, the extracted sparse features are used as training and testing vectors of HMM, and satisfactory intelligent diagnosis results are obtained.
The organization of the article is as following: section "The sparse feature extraction method based on device state dictionary set" is dedicated to the sparse feature extraction method based on device state dictionary set. Section "HMM" and section "The flow chart of the proposed method" are dedicated to the basic theory of HMM and the flow chart of the proposed method, respectively. Experiment is carried out in section "Experiment," and the analysis results are presented in section "Experiment." Besides, the comparison and discussion are also given in section "Experiment" and conclusion is given in section "Conclusion" at last.
The sparse feature extraction method based on device state dictionary set The construction method of sparse feature Suppose there exists L known target classes and the total training sample sets are represented by . . . ; d i;n i represent the subdictionary of the ith target class, and D i could be obtained using the following function All the sub-dictionaries are gathered into a larger redundant dictionary D ¼ ½D 1 ; D 2 ; . . . ; D L , so that the dictionary contains all the sub-dictionaries to express the target class. D is named as dictionary set in order to distinguish it from D i . Assuming y is a test sample signal and its sparse coefficients X ¼ ½X 1 ; X 2 ; . . . ; X L under D are calculated, and X i ¼ ½x i;1 ; x i;2 ; . . . ; x i;n i are the sparse coefficients corresponding to D i . Furthermore, the potential components of each sub-dictionary could be obtained using the sub-dictionary D i and its corresponding coefficients X i Meanwhile, the test sample single y could be expressed as The sub-dictionary D i of signal y i has strong adaptability to the test sample signal y. That is to say, the sub-dictionary D i could be more likely to be activated to approach or represent the test sample signal y. i is assumed to represent the class label of the test sample y i , and the sub-dictionary D i is more likely to be activated. That is to say, the non-zero term in its sparse coefficients of D i is most likely to appear in X i . Sparse coefficients are often used directly as sparse features for classification in image or speech signal-processing area.
However, in mechanical signals, the dimension of sparse coefficients is often high. In this article, a sparse feature construction method based on energy distribution is proposed.
The test sample signal y could be decomposed into the sum of a series of sub-components under the redundant dictionary set D ¼ ½D 1 ; D 2 ; . . . ; D L using the following equation The energy of each sub-component could be defined as following The normalized energy shown in equation (6) is used as sparse feature to prevent the occurrence of large amounts of energyẼ The last obtained normalize sparse eigenvectors are shown in equation (7) F ¼ ½Ẽ 1 ;Ẽ 2 ; . . . ;Ẽ L In summary, the flow chart of sparse feature construction method based on dictionary learning is shown in Figure 1, and it could be divided into two main steps: the construction of redundant dictionary set D ¼ ½D 1 ; D 2 ; . . . ; D L and sparse features F ¼ ½Ẽ 1 ;Ẽ 2 ; . . . ;Ẽ L . The shift invariant sparse coding (SISC) 31 method is used to construct the redundant dictionary set D ¼ ½D 1 ; D 2 ; . . . ; D L , and the featuresign search (FSS) 32 method which will be discussed in section "Fast algorithm for sparse decomposition" is used to calculate the sparse coefficients of D i .

Fast algorithm for sparse decomposition
Although there are many algorithms for solving sparse coefficients such as matching pursuit (MP), 33 basis pursuit (BP), 34 and so on, all of these algorithms have the problem of large amount of computation. The FSS method solves the analytic solution by guessing the sign of coefficients, which is more efficient than MP and BP. The objective optimization function of BSS could be expressed as equation (8) when the dictionary set D is known This problem existing in equation (8) is a L 1 regular least squares problem. The L 1 norm problem of js j j could be ignored if the sign of each element in coefficients S is known: when s j < 0, there is js j j ¼ Às j ; when s j > 0, there is js j j ¼ s j ; when s j ¼ 0, there is js j j ¼ 0. Then, equation (8) could be transformed into unconstrained quadratic optimization problem which could achieve effective solution. The following are main steps of FSS: 1. Initialize each element of the sparse coefficient with their corresponding sign, that is, s i ¼ 0, h i ¼ 0; h i 2 fÀ1; 0; 1g. Besides, the initialization set is initialized as activeset ¼ fg. 2. For all the coefficients with value of 0, select The 0 components ofŜ are removed from activeset, and update h i ¼ signðŜÞ, and the optimal conditions are checked as follows: a. For non-zero coefficients: @ky À DSk 2 2 =@s i þ bsignðs i Þ ¼ 0; 8s i 6 ¼ 0. If the condition a is not satisfied, then go back to step (3). Otherwise, check the condition b. b. For zero coefficients: j@ky À DSk 2 2 =@s i j b; 8s i ¼ 0. If the condition b is not satisfied, then go back to step (2). Otherwise, end the iteration process.

HMM
A HMM 12 is a finite state statistical structure with a fixed number of states, and it is generally applicable to analyze the non-stationary signals such as speech and time-varying noise. HMM is a double-embedded stochastic process with an underlying stochastic process which is not observable directly, but can be observed only through another set of stochastic process which produces the sequence of observations. HMM could be divided into discrete hidden Markov model (DHMM) and continuous hidden Markov model (CHMM) based on the property of the observations which is discrete or continuous. A DHMM can be described using the following parameters: 1. States. Let N represent the number of states in the model. The states can be described as with the following property in which i and j denote the state indices.

Observation symbol probability distribution
5. Initial state distribution p ¼ fp i g In summary, a HMM k is defined by two model parameters, N and M, observation symbols, and three sets of probability measures: A, B, and p. The model k is expressed as In practical applications, the observations encountered are continuous usually. Although the continuous signal can be encoded into discrete points, amounts of valuable information may be lost in this encoding process. In this case, a CHMM is advantage over a DHMM. In a CHMM, the Gaussian mixture model is used usually to fit the probability distribution of the observations In equation (14), M is the number of Gaussian elements, c jm is the mixture coefficient of mth Gaussian element in jth state. l jm and U jm are the mean vector and covariance matrix of the mixture coefficient of mth Gaussian element in jth state. In summary, a CHMM can be described as being shown in equation (15) whose observations probability is mixture Gaussian distribution k ¼ ðp; A; C; l; UÞ The parameters can be estimated using the expectation maximum (EM 35 ) algorithm.

The flow chart of the proposed method
The flow chart of the proposed method is shown in Figure 2 which contains main four basic steps as follows: 1. State dictionary set construction. Apply dictionary learning method on the training vibration data of the different running states of rolling bearing, and the sub-dictionary D i corresponding to the different running states are obtained. Then, fuse each subdictionary D i and the device state dictionary set D ¼ ½D 1 ; D 2 ; . . . ; D c is obtained. 2. Sparse feature extraction. Extract the sparse feature based on D ¼ ½D 1 ; D 2 ; . . . ; D c and each group of signals is decomposed into the sum of a series of sub-components, that is, L ¼ fL 1 ; L 2 ; . . . ; L C g. Besides, the obtained sparse feature is normalized and last obtained sparse feature is marked as

Experiment
The rolling element bearing compound fault experiment is carried out in the section to verify the effectiveness of the proposed method. Figure 3 is the test rig, and NU205 is the used rolling bearing type in the experiment. The corresponding parameters of NU205 are presented in Table 1. Four running states of test bearings are implemented, respectively: normal (N), outer race and ball compound fault (OB), outer and inner race compound fault (OI), and outer race and inner race and ball compound fault (OIB). Figure 4 (a)-(c) is the processed faults on inner race, out race, and rolling element of the test bearing, respectively, and the three kinds of compound faults are realized by their different combinations. The right-end bearing supporting the right end of the shaft is detachable for the convenient replacement of the bearing in the test processes. During the experiment, the outer ring is fixed and the inner ring rotates synchronously with the shaft. The acceleration sensor is installed near the test bearing and is used to collect the peak value of corresponding vibration signal. Set the sampling frequency as f s ¼ 8192 Hz and the rotating frequency of the shaft is f r ¼ 13:3 Hz. Equations (16)-(18) are used to calculate the characteristic frequencies of inner race fault, outer race fault, and rolling element In equations (16)- (18), Z is the number of rolling elements, d is rolling element diameter, D is the pitch diameter, and b is the contact angle. The values of f i , f o , and f b are 95.38, 64.61, and 5.38 Hz, respectively, through calculation.
The time-domain waveforms of the test bearings' four states with their corresponding envelope       demodulation spectral are shown in Figure 5. Figure 5 (a) and (b) is the time-domain waveform with it envelope demodulation spectral of N. It is evident that very little amount of impulsion signal arises in the timedomain waveform of N. In Figure 5(d), though the fault characteristic frequency (FCF) of outer race is extracted, the FCF of ball is not extracted. In Figure 5(f) and (h), the spectral lines are chaotic from which the compound fault features of OI and OIB could not identified clearly. Ten groups of samples of each state are selected randomly as training samples and are used to learn the corresponding sub-dictionary D i . The original signal of bearing' four states are analyzed by SISC, and the parameters are selected as follows: the atomic length is 256, the overlap rate is 0.25, the sparsity is 1, and the number of base atoms is 4. Then, the sub-dictionary of the four states are obtained, and use D 1 to represent the sub-dictionary of N, D 2 to represent the sub-dictionary of OB, D 3 to represent the sub-dictionary of OI, and D 4 to represent the sub-dictionary of OIB. The device state dictionary set D ¼ ½D 1 ; D 2 ; D 3 ; D 4 is obtained by fusing the four of them. The learned basis for each state using SISC is presented in Figure 6.
The sub-dictionary has much better adaptability to the samples in the corresponding state since each of them is obtained from the training samples in the corresponding state. In other words, the sub-dictionaries are more easily activated when the test samples are expressed by the state dictionary set. The test sample could be decomposed into a series of sub-components fl 1 ; l 2 ; l 3 ; l 4 g using the sparse feature construction method as introduced in the previous section after obtaining the dictionary set. Besides, the normalized sparse feature fE 1 ; E 2 ; E 3 ; E 4 g is also obtained, and values of sparse features represent the energy distribution of samples in each sub-dictionary. Figure 7 is the energy distribution of each class of bearing data on each sub-dictionary.
Each group of training samples are divided into 10 segments, and the length of each segment was 0.08 s. Each segment of the signal is sparsely decomposed, and sparse features are extracted in the dictionary. A 4*10 feature vector sequence can be obtained from each training sample group. HMM is trained using the feature sequences of different states, and four HMM models fk NC ; k ORF ; k IRF ; k REF g corresponding to the bearing' four states are obtained. Sparse features are extracted from each group of test samples and are input into each trained HMM models. Likelihood probability is calculated by Viterbi algorithm in turn, and the state of the test samples is determined by the maximum value of likelihood probability. There are total of 80 sets of test samples: No. 1-20 come from N state, No. 21-40 come from OB state, No. 41-60 come from OI state, and No. 61-80 come from OIB state. Figure 8 shows the diagnosis results using different feature extraction vector: (a) sparse feature vectors (SFVs); (b) time-domain statistical feature vectors (TDSVs) such as AMP, P-P, and so on; and (c) wavelet packet energy (WPE). Misclassified samples are marked with black circles: there are total two N samples and one OIB sample misclassified as shown in Figure 8(a). There are total 9 samples misclassified as shown in Figure 8(b), and there are total 17 samples misclassified in Figure 8(c). It is evident that the SFV has better classification result than the other two feature extraction vectors.
The K-nearest-neighbor (KNN) and BP neural network algorithm are used to analyze the above three kinds of different features to compare their analysis results, and the diagnosis results are given in Table 2. BP neural network adopts three-layer network structure, in which the number of nodes in the input layer is consistent with the characteristic dimension, while the numbers of nodes in the hidden layer and the output layer are 10 and 4, respectively. The diagnostic rates of KNN, BP, and HMM were 90%, 92.5%, and 96.25%, respectively, when the same sparse feature is used as input, and the diagnostic accuracy of HMM is the highest. It can also find that HMM has the best diagnostic effect when other eigenvectors are used. By comparing different features under the same classifier, it is found that the diagnostic rate corresponding to SFV is significantly higher than that of TDSV feature and WPE feature.

Conclusion
Identification of rolling bearing fault patterns, especially for the compound faults, has attracted notable attention and is still a challenge in fault diagnosis. As a relative signal-processing method, sparse decomposition could capture the implicit characteristics of vibration signal when fault arises in rotating machinery, and it has great application potential in composite fault diagnosis of rolling bearing. In this article, a method of constructing sparse features based on dictionary learning is proposed. This method learns each state of the device to get the feature dictionary, then fuses all the feature dictionaries to form the state dictionary set, and then the sparse features of the different compound faults of rolling bearing are learned based on the obtained state dictionary set. Furthermore, the learned sparse features are used as training and test input of HMM, and satisfactory classification results are obtained at last. The concrete flow chart of the proposed method is given, and the validity of the proposed method is verified by compound fault experiment of rolling bearings. Besides, in order to highlight the superiority of sparse feature, the classification results of sparse feature are compared with those of timedomain feature and energy feature of wavelet packet. The comparison results show that sparse feature has high classification accuracy and stability.

Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this