A combined kernel-based fuzzy C-means clustering and spectral centroid for instantaneous frequency estimation

An improved instantaneous frequency estimation algorithm for rotating machines based on a kernel-based fuzzy C-means clustering (KFCM) algorithm used in association with a spectral centroid algorithm is proposed in this study. The clustering algorithm is used first to discriminate the time-frequency points from the sources of the reference axis and other points. The discrete time-frequency points related to the instantaneous rotation frequency of the reference axis are then located based on the values of the time-frequency matrix elements; on the basis of these elements, the instantaneous rotation frequency is then estimated using a spectral centroid algorithm. It is demonstrated that this method effectively reduces the effects of interference and noise while achieving higher estimation precision. To validate the proposed method, numerical simulations of multi-component signals and crossover signals are performed. The results of these simulations indicate that the method can realize instantaneous frequency estimation with high precision, even when the numerical responses are contaminated by Gaussian white noise. In addition, when this method is used to analyze the vibration signal of rotating machinery in the situation of a run-up procedure, remarkable speed estimation results are obtained.


Introduction
Vibration signal analysis is widely used in traditional condition monitoring and fault diagnosis applications. In early fault diagnosis, however, vibration signal analysis suffers from low signal strengths and environmental noise interference. Therefore, the feature information related to machine failure is embedded deeply within the vibration signals and is difficult to extract. Nevertheless, the features yielded by certain failures have shown a strong relationship with the rotation speed. During instantaneous varying workload situations, and particularly during the run-up and run-down procedures of the principal axes, it is relatively easy to locate the features of early machine failures. Therefore, condition monitoring of the vibration signals during speed-varying phases is extremely important.
Real-time acquisition of the rotation speed thus plays a critical role in fault diagnosis and monitoring of the characteristic frequencies of the system (e.g., mesh, shafts, bearings). 1 In practice, rotation speed measurement using a tachometer or a shaft encoder may be impossible in some mechanical systems because of the restrictions of disassembly requirements, huge machine sizes or environmental limitations. 2 Therefore, the extraction of precise rotation speeds from condition monitoring signals has been studied intensively in recent years. 3 Time-frequency analysis is a commonly used approach to deal with nonstationary signals from rotating machinery. The vibration signals are mapped into the corresponding time-frequency domain using the time-frequency analysis method; in this domain, the features of the machinery structure, including the rotation speed, vary with respect to time.
The most widely used time-frequency analysis approaches include the short time Fourier transform (STFT), 4,5 the Hilbert-Huang transform, 6,7 the continuous wavelet transform, 8,9 and the chirplet transform. 10,11 The instantaneous frequency estimation is performed immediately after the time-frequency analysis so that the instantaneous frequency is determined, thus enabling calculation of the rotation speed.
Each time-frequency analysis method has its own specific features. However, time-frequency analysis is prone to interference from hidden noise components that will lead to reduced precision in both frequency and speed estimation. As a result, it is necessary to post-process the results of time-frequency analyses. A cost function was introduced for adaptive selection of the ridge after a continuous wavelet transform 12 and a low computational cost implementation of the cost function ridge detection algorithm was explored via dynamic programming optimization. An improved algorithm that combined the continuous wavelet transform (CWT) with the crazy climber algorithm was proposed for peak detection applications. 13,14 A new method based on a novel ridge extraction method that took the fact that the time-frequency (TF) representation is both discrete in time and frequency into account was proposed, to be followed by an STFT. 15 An angular maximal distribution (AMAD) was constructed from a set of directionally smoothed pseudo-Wigner-Ville distributions to highlight the entire instantaneous frequency curve. 16 Furthermore, the TF ridge of an observation can be estimated from its AMAD using a maxima position detector. Another proposed method called the synchroextracting transform (SET) comes from a post-processing procedure used for the STFT and the polynomial chirplet transform. 17,18 This method can generate a more energy-concentrated TF representation and thus allow signal reconstruction.
The synchrosqueezing transform (SST) 19,20 and the  multisynchrosqueezing transform 21 can effectively provide high-resolution TF representation while retaining  a reversible capability. With the aim of compensating for the errors introduced by noise during estimation processes, we propose a novel rotation speed estimation approach composed of a kernel-based clustering algorithm with fuzzy Cmeans and a frequency spectrum centroid searching algorithm.
In this approach, the clustering algorithm is first used to determine the different energy distribution areas via the kernel-based fuzzy C-means algorithm. Second, noise and other interference signals are then ruled out using a different threshold that is determined during the clustering phase. Then, the spectrum centroid algorithm is used to find the discrete TF points for the reference and other axes. Finally, a fitting method is applied to obtain the continuous TF curve based on these discrete points. The proposed method is then tested using both simulated signals and experimental vibration data and it demonstrates a remarkable performance in both reducing interference and improving the precision of rotation speed estimation.
The remainder of this paper is organized as follows. The models for kernel-based fuzzy C-means clustering and the spectral centroid algorithm are established in Section 2. The implementation process for the method is also introduced in this section. In Section 3, numerical simulations are performed to demonstrate the proposed method and the results that are presented are compared with previous studies of the same topic. Detailed experimental data processing procedures and the results of analysis of the vibration signals are given in Section 4, thus demonstrating that high-precision results can be obtained using the proposed method. Finally, the conclusions drawn from the work are offered in Section 5.
Instantaneous frequency estimation using kernel-based fuzzy C-means (KFCM) and frequency centroid algorithm

KFCM clustering
The fuzzy C-means (FCM) algorithm, as a typical clustering algorithm, has been used in a wide range of engineering and scientific disciplines, including medical imaging, bioinformatics, pattern recognition, and data mining. 22 The principle of the clustering algorithm proposed in this article is based on optimization and is intended to minimize the Euclidean distances from all data points to their clustering centers when weighted by the fuzzy membership degree. 23 This algorithm will continue to modify the clustering centers and the matrix until the termination conditions are attained.
For a given data set X ¼ fx 1 ; x 2 ; :::x N g, where x i denotes a feature vector, the KFCM clusters all the feature vectors into C categories with respect to a fuzzy partition matrix U, where u ji indicates the rationality that x i belongs to the category j. The algorithm addresses the clustering task by minimizing the following optimization objective: where v j is the center of the j category, and ||x i 2v j || 2 denotes the norm operation used to represent the distance between x i and v j . For the u ji , because each x i should be grouped into a single cluster, we use the following constraint: However, the FCM algorithm contains certain defects since it does not operate well with heavy environmental noise or asymmetric data structures. 24 Therefore, several modified fuzzy clustering algorithms have been developed to address these drawbacks. The kernel method, because of its successful performance in support vector machine (SVM) applications, 25 is also introduced into the clustering algorithm. Using the different kernel functions, the original features are mapped into higher dimensional spaces that will endow the feature data with more properties. The kernel function to be used is the Gaussian kernel, which has the following form: 26 The Euclidean distance between two feature vectors x i and x j after application of the Gaussian kernel can be calculated as shown below: The objective function can then be transformed as where fðv i Þ is the image of v i in the kernel space. To minimize the above function, the membership matrix should be consistent with the following equation: Therefore, the clustering centers in the kernel space should have the following form: Thus, we obtain: and The new membership function is then given by To implement clustering on incomplete data, we have derived the following algorithm based on KFCM: (1) Fix C, T, m . 1 and e . 0 for some positive constant; (2) Initialize the memberships u 0 ji and the prototypes v i ; (3) For t =1,2..T, do: (a) Update all memberships u ji with equation (6); (b) Update all prototypes v j with equation (7); (c) Compute E=abs(J(t)2J(t21)); if E\ e, terminate.
First, we initialize the precision e, the fuzzy index m, the maximum iteration number T and the cluster number C. The appropriate kernel function is selected in this phase, and the centers of each cluster are also initialized during this phase.
In the second phase, the objective value, the optimal fuzzy partition matrix U and the centers in kernel space fðv j Þ are calculated repeatedly. If the maximum number of iterations or the required precision is reached, the algorithm then returns the center of each cluster.

Estimation of instantaneous frequency
The instantaneous rotating frequency is estimated through measurement of the vibration signals. To maintain the integrity of the signals, higher sampling frequencies are used in the implementation. However, use of the higher sampling frequency will inevitably lead to a large data set volume. In this case, we take advantage of the STFT, which allows the signal frequency contents of local sections to be determined as the signal changes over time and is thus a useful analysis tool for vibration signals. 27 The main inspirations for the proposed method and a flowchart for the KFCM technique are shown in Figure 1. Estimation of the instantaneous rotation speed using the KFCM method and the spectral centroid proceeds as follows: (1) The TF distribution of the vibration signal is acquired using the STFT. For a signal x(t), the STFT is defined as: where the symbol * denotes the complex conjugate, and gðtÞ is the window function with gðtÞ k k¼ 1. The power spectrum of the signals with respect to the STFT is denoted by where Pði; jÞð1 ł i ł M; 1 ł j ł N Þ.
(2) The Gaussian operator, Gði; jÞ, is used as a smoothing filter for the power spectrum result Pði; jÞ, where: Gði; jÞ ¼ (3) The filtered power spectrum is denoted by After the clustering process, we obtain C clusters, e.g., We denote the elements in cluster O 1 using P G k ði; jÞ and the cluster center value using P i . The upper and lower boundaries of O i are O im and O in , respectively. Given that the energy of the signal concentrates around the instantaneous frequency curves in the TF plane, 28 we can simply focus on the clusters with the relatively high center values. We can determine the time and frequency labels for the remaining clusters for each element in O.
(4) Ideally, for a time instant t i , the instantaneous rotation frequency should be consistent with the frequency with the maximum value in column t i of the TF matrix. However, with the presence of noise and other sources of interference, the maximum value in column t i may not match the instantaneous rotation frequency. The spectrum centroid algorithm is used to compensate for this deviation. This spectral centroid algorithm 29 is given by: After filtering and clustering, the matrix that embodies the time and frequency information is as shown in Table 1. The rotation frequency corresponding to t i is f j . In addition, P G k ði; jÞ is the value of the vibration signal that corresponds to time instant t i and frequency f j . To determine the frequency at time instant t i , we must first read through the P G k ði; jÞ values shown in Table 1 in a column-first manner. The spectrum centroid algorithm is then used to rectify the instantaneous frequency f j at time instant t i .
After all the instantaneous frequencies have been determined, the least squares method is used to fit the curve of the time instants and their corresponding frequencies. The instantaneous rotation speed can then be calculated using the relationship nðtÞ ¼ 60 f ðtÞ. Hz. An instantaneous frequency of 200 Hz is achieved at 1.5 s. The instantaneous frequency for x 2 ðtÞ at 1.5 s is 300 Hz. An instantaneous frequency of 400 Hz is achieved at 0.4 s. In addition, x 3 ðtÞ is defined as x 3 ðtÞ ¼ sinð2pð250t-sinð2pðt-0:5ÞÞÞÞ, and nðtÞ is a zero mean white noise term. The signal-to-noise ratio (SNR) in this case is 0.3 and is given by

Multi-component signal
A s and A n are the root mean square values of the signal and the noise, respectively. Here, we assume that x 1 ðtÞ is the signal component to be extracted because it simulates the vibration signal generated by reference axis acceleration. The process used to extract this component is described in the following. Figure 2 depicts the amplitude of a signal that has been corrupted by Gaussian noise. The STFT method is applied to the multi-component signal and the STFT spectrum results are given in Figure 3. As Figure 3 shows, there are three different signals with some additional noise signal frequencies around them.
The following explains how the proposed method based on the KFCM algorithm and the spectrum centroid algorithm is used to extract the instantaneous frequency given by the TF matrix from the STFT. First, we perform Gaussian smoothing on the matrix to eliminate noise. Next, the KFCM algorithm is applied to all elements of the processed matrix and clusters all elements into C classes. Then, the different categories that represent the powers of different signal components are extracted through the upper and lower thresholds of the various datasets. These datasets are obtained from the clustering results. Figure 4 shows the results of clustering. Here, three types of data sets are generated through clustering. We then use the simulated component x 1 ðtÞ as an example and extract the instantaneous frequency. Figure 5 shows the result.    Finally, using the spectral centroid method, the instantaneous centroid is obtained at a specific time point. Figure 6 shows the process of instantaneous frequency extraction of the reference axis signal when using the fusion algorithm. As the figure shows, the result from the fusion method agrees well with the theoretical instantaneous frequency, demonstrating high accuracy.
For contrast, the dynamic optimization method 12 is also applied to perform the same calculation. As shown in Figure 7, when the dynamic optimization method algorithm is used to estimate the specific instantaneous frequency directly, the results will show some deviation and false instantaneous frequency information appears in some areas, particularly at the beginning and the end of the signal.
To quantify the accuracy of this identification of the instantaneous frequency, the root mean square (RMS) error is used as an accuracy index (Index of Accuracy) and is defined as follows: wheref ðnÞ is the estimated value of the instantaneous frequency f ðnÞ, and N is the number of points contained in the calculation. After the calculations, the RMS error for the dynamic optimization technique was found to be 3:84%, while the RMS error for the novel algorithm was0:71%. Therefore, the novel algorithm shows a better result for extraction of the instantaneous frequency from the vibration signal in terms of deviation from the actual value and also maintains good  estimation accuracy when compared with the dynamic optimization method.

Crossover signal
The proposed method also works well with crossover signals. Consider the following signal: yðtÞ ¼ y 1 ðtÞ + y 2 ðtÞ + nðtÞ ð 20Þ The signal is composed of one quadratic chirp component and one sinusoidal frequency component. The instantaneous frequency for y 1 ðtÞ at 0.4 s is 100 Hz. An instantaneous frequency of 200 Hz is achieved at 1.5 s, thus simulating the vibration signal generated by the reference axis acceleration. In addition, y 2 ðtÞ is defined as y 2 ðtÞ ¼ sinð2pð150tÞÞ, thus simulating the crossover signal as an interference signal. Similarly, nðtÞ represents a Gaussian white noise term. The waveform and the corresponding STFT spectrum are shown in Figure  8 and Figure 9, respectively. As shown above, the two frequency components of the signal are cross-aliased.
Similarly, Gaussian smoothing and KFCM clustering were performed on the TF matrix in turn. The results are shown in Figure 10. The different categories that represent the powers of the different signal components were then extracted using the thresholds of the various datasets. Here, we use the simulated component y 1 ðtÞ as an example to extract the instantaneous frequency. The result is given in Figure 11.
As shown in Figure 12 and Figure 13, more false frequency values are estimated when using the dynamic optimization method. Additionally, when compared with the proposed method, the dynamic optimization method will also yield lower accuracy.    For the cross-signal, the RMS error of the dynamic optimization method was 1:34%, while the RMS error for the novel algorithm was 0:69%.
Based on the simulation results, the accuracy of instantaneous frequency estimation for the different signals is shown in Table 2.
When compared with the dynamic optimization method, the proposed method has greater precision, and the instantaneous frequency estimation results are better than those obtained using the traditional method.

Experimental validation
Experiments were performed on a rotor test bench to verify the correctness and the efficiency of the proposed algorithm. During the experiments, the rotational acceleration was constant. The vibration signals were acquired using an Integrated Electronics Piezo-Electric accelerometer. The experimental installation of this accelerometer is shown in Figure 14. We installed the accelerometer in the horizontal direction on the test bench to obtain the vibration acceleration signal. The sensitivity of this sensor is 100 mV/g.
The rotational speed of the rotor is acquired from a torque and rotation speed meter with a reference pulse. The resolution of this meter is 1024 pulses per round. The device is mounted on the output shaft to measure the actual rotation speed, as shown in Figure 15.
Because the vibration signals do not remain stable over time and are thus vulnerable to modulation, the signal is pre-processed to alleviate noise interference, as   shown in Figure 16. The STFT is then used to obtain the TF curve, which is shown in Figure 17. The rotation speed curve (the yellow curve) over time is indicated in figure. The other curves are caused by the internal mechanical structures. The actual rotation speed curve and the estimated rotation speed curve acquired using the KFCM and the spectrum centroid algorithm are shown in Figure 18. Figure 18 and Figure 19 show that these two curves are well matched and the deviation between the two curves is approximately 1.55%, which verifies the remarkable performance of our proposed algorithm.

Conclusion
A new method to estimate rotation speed from an extracted instantaneous frequency has been proposed in this paper. The method combines KFCM clustering with the spectrum centroid algorithm. To mitigate noise effects, the KFCM clustering process is introduced to cluster the different instantaneous frequencies from the     TF matrix based on the fact that the signal energy concentrates around the instantaneous frequency curves in the TF plane. To correct the required frequency, the spectrum centroid algorithm is implemented to perform instantaneous frequency extraction. Numerical examples demonstrate that the proposed method can identify the instantaneous frequencies of time-varying signals effectively with high accuracy when compared with other existing methods. An experimental analysis on a test bench indicated that, as an additional benefit of the proposed method, we can also determine the complex rotation speed with high precision. In future work, we intend to apply image processing methods to instantaneous frequency extraction to process signal components that have the same energy.