Rotating machinery fault diagnosis by deep adversarial transfer learning based on subdomain adaptation

Rotating machinery fault diagnosis is very important for industrial production. Many intelligent fault diagnosis technologies are successfully applied and achieved good results. Due to the fact that machine damages usually happen under different working conditions, and manual scale labeled data are too expensive, domain adaptation has been developed for fault diagnosis. However, the current methods mostly focus on global domain adaptation, the application of subdomain adaptation for fault diagnosis is still limited. A deep transfer learning method is proposed for rotating machinery fault diagnosis in this study, where subdomain adaptation and adversarial learning are introduced to align local feature distribution and global feature distribution separately. Experiments are performed on two rotating machinery datasets to verify the effectiveness of this method. The results reveal that this method has outstanding mutual migration ability and can improve the diagnostic performance.


Introduction
Rotating machinery is currently used in various industries, and timely and effective diagnosis of rotating machinery faults is also very important. 1 With the maturity of the concept and technology of big data and intelligent manufacturing, intelligent fault diagnosis methods have been applied. 2,3 Deep learning is also widely used to solve fault diagnosis due to its powerful feature learning ability. [4][5][6] The success of these methods depends on the large amount of labeled data available for supervised learning, and they require training and testing data come from the same distribution. However, manual large scale labeled data are too expensive and sometimes cannot be collected in practice. Also, due to the fact that the working conditions of the machine usually change in different tasks, there is a general distribution difference between training and testing data in the real industry. The predictive models trained using these deep representations on one dataset cannot be well generalized to novel tasks.
Transfer learning can mine domain invariant essential features and structures between two different but interrelated domains, which enables supervised information such as labeled data to be migrated and reused between domains. 7 Recently, transfer learning methods have been increasingly used in rotating machinery fault diagnosis. 8 Shao et al. 9 used a pre-trained network and fine-tune strategy to achieve fast and accurate fault state classification. He et al. 10 proposed a deep transfer auto-encoder for fault diagnosis of small samples under different working conditions. However, these methods still require the labels of the target domain.
Domain adaptation is a representative method in transfer learning that learns domain-invariant features without using target labels to bridge source domain and target domain. 11 Because deep networks could learn more transferable features, embedding domain adaptation method into deep learning can better match the distributions across domains. 12,13 Minimizing domain distribution discrepancy is the most popular approach in cross-domain fault diagnostics and the maximum mean discrepancy (MMD) is commonly used as a metric. 14, 15 Li et al. 16 proposed a domain adaptation architecture and minimized multi-kernel MMD between two domains to realize cross-domain fault diagnosis. Wen et al. 17 also used MMD with a three-layer sparse autoencoder for bearing fault diagnosis. Zhang et al. 18 used a domain adaptive convolutional neural networks, which combined MMD and fine-tune strategy for fault diagnosis under different working conditions.
Recently, adversarial domain adaptation integrates adversarial learning and domain adaptation, has been successfully embedded in deep networks to minimize the discrepancy distance of the domain discriminator by an adversarial objective. 19 Guo et al. 20 used adversarial adaptation and minimization of MMD to design a deep convolutional domain adaptation network for bearing fault diagnosis. Besides, Li et al. 21 utilized adversarial learning on rotating machinery fault diagnosis. They also combined adversarial training and parallel data strategy to distinguish machine health conditions. 22 However, these domain adaptation methods only align global distributions between source and target domains, regardless of crucial information for each category. As a result, we cannot guarantee that samples from different domains but from the same category will be mapped near the feature space, since there is no labeled information for target domain. Subdomain adaptation can accurately align the class conditional distributions. As shown in Figure 1, 23 when the categories of mechanical faults are close, such as different fault diameters of the same fault type, after global domain adaptation the overall distributions of source domain and target domain are close, but the features of different fault categories are too close to be accurately classified. After subdomain adaptation, different mechanical fault categories are divided into subdomains and the same mechanical fault can be aligned accurately. Subdomain adaptation can explore the dependency between the features and categories to capture the underlying multi-mode structures of data distributions. 24 Xu et al. 25 proposed a conditional domain adaptation method based on domain adversarial neural network for cross-domain fault diagnosis. Yu et al. 26 proposed a simulation data-driven domain adaptation approach which align the marginal distribution and conditional distribution between simulation data and realistic monitoring data.
In this study, a deep transfer learning method based on subdomain adaptation and adversarial learning is proposed for rotating machinery fault diagnosis. We use domain confusion to align global feature distribution and use local maximum mean discrepancy to align local feature distribution. Also, we pre-train the feature extractor on ImageNet dataset and utilize fine-tune strategy to speed up training process and improve accuracy. The experimental results reveal that this method can greatly improve the ability to extract the transferable features between the two domains, and can improve the diagnostic performance.
In the remainder, Section 2 introduces the problem formulation and domain adaptation. Section 3 presents the proposed method in detail, and the verification and research of the proposed algorithm are carried out in Section 4. In the end, Section 5 provides a conclusion of this paper.

Problem formulation
In this study, the relationship between relevant fault subcategories is considered in the rotating machinery fault transfer diagnostic problem. In order to explain the concerned problem, some symbols, and definitions are first given in this section. Let denote the source and target domain samples from data distribution P(x s ) and P(x t ) respectively, where x s i and x t i are collected signals from different operation conditions or monitoring location, y s i 2 R n c and y t i 2 R n c are corresponding machine health condition labels, n s and n t are the number of the source and target domains fault samples, and n c is the number of rotating machinery fault categories. Particularly, the machine health condition label spaces of the source and target domains are identical in this study. However, due to the differences in operation conditions or sensor installation locations, there are great differences in feature distribution between rotating machinery fault monitoring data in the two domains, that is, P(x s ) 6 ¼ P(x t ).

Domain adaptation
As previously mentioned, recent studies reveal that deep transfer learning can reduce the shifts in the distributions of monitoring signals from two different domains and learn transferable representations of machinery fault feature simultaneously. But these methods mainly focus on aligning the overall distributions of two domains, which ignore the relationship between subdomains in different domains of the same class. Actually, when machinery fault monitoring signals are collected from different operation conditions or sensor installation location, not only overall distributions of two domains is different, but also the relationship between subdomains in different domains of the same class is also variant. For this purpose, this study proposes a novel deep adversarial transfer learning method. This method uses adversarial learning and subdomain adaptation to realize rotating machinery transfer fault diagnostic, which aligning both global and local feature distribution of source and target domain simultaneously. This study considers the relationships between subdomains in different domains of the same class in machinery transfer fault diagnostic field.

Network architecture
In rotating machinery fault diagnosis field, the purpose of this study is to design a deep transfer learning network to not only reduce the shifts in overall distributions of source and target domains, but also reduce the shifts in local distributions within the same category of two domains, and learn transferable feature representations between different domains simultaneously. The architecture of the proposed deep adversarial transfer learning method based on subdomain adaptation is shown as Figure 2. The method consists of two feature extractors F s and F t , a domain discriminator D, and a classifier C, which involves source domain and target domain. The feature extractor is used for learning highlevel representations. The high-level features extracted by the feature extractor can use local maximum mean discrepancy to align local feature distribution. The classifier is designed for health state classification and the domain discriminator is designed for domain confusion loss to align global feature distribution.
Feature extractor. In this study, machinery fault monitoring signals are collected by accelerometer from

different operation conditions or installation locations.
Considering vibration signal is a non-stationary timevarying signal in the machine running process, this study firstly applies short-time Fourier transform (STFT) to obtain time-frequency spectrum of raw signals, which is then fed into the feature extractors as inputs. 27,28 Due to the different input spaces in fault diagnosis, two feature extractors F s and F t are employed to the source domain sample x s and target domain sample x t , which have identical network structures. Moreover, considering the brilliant achievements of residual network in image recognition, the 50-layers deep residual network is used as feature extractors. 29 Deep neural networks are able to learn hierarchical representations from images, and the knowledge embedded in the pretrained model's weights can be transferred to the new task. Therefore, the key attribute of an ImageNet like dataset is to enable the model to learn the features that can be extended to other tasks in the problem domain. In this study, feature extractors F s and F t are pretrained on the ImageNet dataset, and the weights of two feature extractors are shared. Also, using ImageNet pre-training can accelerate the convergence on the target task and reduce over-fitting. Finally, the high-level feature representations of the input raw vibration signals from different operation conditions or installation locations can be obtained as Classifier. The classifier C takes the obtained high-level feature representations x s f and x t f as input to diagnosis rotating machinery health conditions, which consists of a fully-connected layer and an output layer. Concretely, one fully-connected layer adopts ReLU as activation unit, while the output layer utilizes Softmax function as activation unit. In this way, the final classification of machinery health condition can be carried out.
Domain discriminator. The domain discriminator D is two-class classifier with adversarial learning, consisting of one fully-connected layer and an output layer. Similarly, one fully-connected layer utilizes ReLU as activation unit, while the output layer takes Softmax function as activation unit to identify whether highlevel representation features come from source or target domain. Furthermore, in order to mine domaininvariant features, the adversarial learning is employed to train the domain discriminator.

Optimization objective
In this study, the proposed deep adversarial transfer learning method comprises four optimization objects. In order to make the classifier identify the health status of the source domain correctly, supervised learning with the source labeled data is critical which is objec-tive1. To align the global feature distribution, we use adversarial learning, that is objective 2 and objective 3. Objective 2 aims to train the domain discriminator to recognize whether the features are from source domain or target domain. Objective 3 aims to confuse the domain discriminator. To align the local feature distribution, we use LMMD as objective 4. Then, we introduce these four optimization objectives in detail.
Objective 1. In order to recognize rotating machinery health conditions, the proposed network should have the ability to learn discriminative features from the source domain supervised samples. Therefore, the source supervision is utilized to minimize the classification error. Concretely, the cross-entropy loss is regarded as the source supervision loss, which is defined as follow, where x s C, i, j means the j-th output element of the i-th source domain sample in the classifier module, y s i represents the corresponding label of rotating machinery health condition, n c is the number of health condition categories, and 1½Á is an indicator function.
Objective 2. The domain discriminator is firstly applied to recognize whether the high-level features are from source domain or target domain. Therefore, domain recognition is introduced to minimize the domain recognition error between the two domains. Similarly, the cross-entropy loss is regarded as the domain recognition loss, which is defined as follow, where x s D, i, j and x t D, i, j means the j-th output element of the i-th source domain sample and target domain sample respectively in the domain discriminator module, and d i represents the corresponding ground truth domain label.
Objective 3. For learning domain-invariant features, the adversarial learning is employed to train domain discriminator. In general, the gradient reversal layer is applied to maximize the domain recognition loss to extract domain-invariant features like generative adversarial network. 30 However, this will make the discriminator converge too fast, and make the gradient disappear. In this study, we employ domain confusion loss as the adversarial loss to learn actual mapping. The cross-entropy loss function is used to train the map with uniform distribution and it predicts the input binary domain label to encourage them to predict as close as possible to a uniform distribution on the binary labels. The loss seeks to learn domain invariance to confuse the two domains. Hence, the third optimization objective is calculated as follow, Generally, we need to simultaneously minimize equations (2) and (3) for the representation and the domain classifier parameters. Nevertheless, these two losses are directly opposed. Learning fully domain invariant feature extractors represent the domain discriminator does a poor job. While learning a highperformance domain discriminator represents the features learned by feature extractors are not domain invariant. We are not globally optimizing the parameters, but given the fixed parameters of the previous iteration, iterative update of the two objectives is performed. In this way, the loss can ensure that the adversarial discriminator views the two domains equally.
Objective 4. Maximum mean discrepancy (MMD) 15 is a non-parametric distance estimation, and it is widely used in the field of rotating machinery fault diagnosis to measure the difference between target and source distribution. However, the previous MMD-based deep fault diagnosis transfer learning algorithms only focused on the global distributions, neglecting the relationship between subdomains in different domains of the same class in machinery transfer fault diagnostic field. Considering the relationship between related subdomains, it is of great significance to align the distributions of the related subdomains in the two domains of the same class. Therefore, the local maximum mean discrepancy (LMMD) is introduced to align distributions of the related subdomains, it assumes each sample belongs to weight w c , which is defined as follow, 23 where w sc i and w tc j mean the weight of x s i and x t i belonging to category c, and z l is the l-th (l 2 L = f1, 2, :::, jLjg) layer activation. For sample x i , w c i is computed as follow, where y ic is the cth entry of vector y i . In source domain, the true label y s i is a one-hot vector to calculate w sc . In target domain, there is no labeled data, but the output of the network is a probability distribution which can describes the probability of assigning x i to each category. Therefore, it usesŷ t i as the probability to compute w tc for target domain.
When these optimization objects are built, the network is trained by stochastic gradient descent (SGD) algorithm. Due to the fact that the parameters of the feature extractors and classifiers in the two domains are shared, three modules are used: the feature extractor, domain discriminator, and classifier, whose parameters are denoted as u F , u D , and u C , respectively. Instead of global optimization parameters, we choose to optimize this objective in stages and use the following unconstrained optimization: Equation (8) only updates u F , u C and equation (9) only updates u D . Also l and g are the penalty coefficients for L dr and L sa . These updates ensure that we can learn domain-invariant representation. Based on the equations (8) and (9), the parameters can be updated in each training epoch as follows: where h is the learning rate.
In this way, the model can learn the domain invariant features from the two domains in the training process. Then the trained model can predict the unlabeled target samples according to these features.

Experimental study
Dataset descriptions CWRU dataset. The CWRU dataset is came from the Case Western Reserve University Bearing Data Center. 31 It uses acceleration sensor to monitor the bearing with single-point faults damaged by electrical discharge machines (EDMs) and is widely used in fault diagnosis research. The experimental device is shown in  Figure 4.
In this study, the spur gearbox dataset is taken to test the accuracy of our proposed method. It has eight different health conditions under low and high load and 30, 35, 40 Hz speed. The Signal under each health condition is collected at a sampling frequency of 66.67 kHz and the acquisition time is 4 s. The label information of the dataset is shown in Table 1. To verify the advancement of our method, the migration was carried out under different load scenarios and different speeds. STFT is used for time-frequency imaging representation of vibration signal. About 200 images are taken for   each category of health statuses in the two domains, totaling 1600 images, respectively.

Compared methods and training details
The proposed method is compared with other deep transfer learning methods: Deep Domain Confusion (DDC), 33 Deep Adaptation Network (DAN), 34 Domain-Adversarial Neural Networks (DANN), 35 Deep CORAL (D-CORAL), 36 and our previous work DADA-TL. 37 DDC embeds MMD into an adaptation layer to learn domain invariant features. DAN uses multi-kernel MMD to align different distributions optimally to learn transferable features. DANN makes the domain discriminator unable to distinguish the source and target domain through adversarial training, thereby improving domain adaptability. Deep Coral uses CORAL loss to match source and target domains. We use ResNet-50 as the feature extractor for the above methods. We follow standard evaluation protocols for unsupervised domain adaptation, comparing the average accuracy of each method in three random experiments. For all MMD-based methods and our proposed method, we adopt Gaussian kernel with bandwidth set to median pairwise squared distances on the training data. We use PyTorch framework to implement all transfer learning methods, and fine-tune ResNet models provided by PyTorch. The models have been pretrained on the ImageNet 2012 dataset. The layers of feature extractor are fine-tuned and the layers of domain discriminator and classifier are trained from scratch via back propagation. Therefore, the learning rate of domain discriminator and classifier are set ten times of feature extractor. We use the learning rate annealing strategy in DANN, 35 it is adjusted during SGD with 0.9 momentum and describing by the following expression: h = h 0 (1 + ap) b , where p is the linearly changing from 0 to 1 in the training progress, h 0 = 0:005, a = 10, b = 0:75. For suppressing noisy activations at the early stages of training, instead of fixing the penalty coefficients, we change them from 0 to 1 by a progressive schedule: 2 e À10p À 1. 35 The gradual strategy importantly stabilizes parameter sensitivity to a large extent and simplifies model selection.
Results and analysis CWRU dataset. There are a total of twelve transfer tasks under different load scenarios, where T uv is to take the data of u HP as the source domain, and the data of v HP as the target domain. Figure 5 shows the classification accuracy of this dataset under different loads. The proposed method is superior to other comparison methods in all transfer tasks. The accuracy of some tasks is even improved to 100%, and the average accuracy is 99.7%. It can be noticed that the accuracy of other deep transfer learning methods will decrease a lot when the difference between the two domains increase, such as T 03 and T 30 . However, the proposed method greatly improves the accuracy of transfer tasks with large difference between the two domains. These results illustrate the key significance of subdomain adaptation in cross-domain fault diagnosis, and show that the proposed method can learn more transferable fault features to effectively realize domain adaptation. Furthermore, the confusion matrix of the proposed method for task T 30 is shown in Figure 6. It is noted that the misclassification only occurs in ball fault, and most of them are misclassified only in fault diameters. The other seven health conditions can be classified exactly. This verifies the effectiveness and superiority of this method in cross-domain fault diagnosis tasks. The proposed method can improve the accuracy in the case of the domain differences.
There are eight transfer tasks under different sensor locations, where T uDF represents the data of u HP collected in the driver end as source domain, and the fan end as target domain. The classification accuracy under different sensor positions is shown in Figure 7. Although the CWRU dataset is usually simple for fault diagnosis, the problem of domain adaptation between different sensor positions can be more challenging. It can be seen that the accuracy of other methods under different sensor positions is lower than that under different loads, but the proposed method obtains good performance in all tasks. The accuracy of all transfer tasks is above 99.6%, and the average accuracy is 99.9%. It shows that the effectiveness of the method is obvious. Particularly, in the migration task from fan to drive end under the same load, the performance of other transfer learning methods is obviously lower than that from drive to fan end. After adding subdomain adaptation, the effect is obviously improved. The results of this method have broad application prospects in fault diagnosis.
PHM 2009 challenge dataset. To further verify the advancement of our method, PHM2009 dataset is used. Its multi-class mixed faults make the fault diagnosis transfer task more challenging. Table 2 shows the accuracy of this dataset under different speeds, and Table 3 shows the classification accuracy under different loads. About 30-35 means that the data collected at 30 Hz is used as source domain and 35 Hz as target domain. The 30L-H represents the data with the low load of 30 Hz speed is used as source domain and the data with the high load of 30 Hz speed is used as target domain. It can be seen from the results that in the hybrid fault diagnosis transfer tasks, the performance of the global  domain adaptation is not satisfactory, and the performance is significantly reduced in different transfer tasks. Especially for migration under different loads, the best average accuracy of the global domain adaptation method is lower than 70%. The proposed method using subdomain adaptation can greatly increase the accuracy of classification. The average accuracy is 85.9% under different loads and 97.5% under different speeds, which greatly improves the performance. In general, the above results all illustrate that our method can effectively realize fault diagnosis. In addition, we take 40L-H task as an example to compare the performance of domain confusion loss and the gradient reversal layer as shown in Figure 8. The result shows that using domain confusion loss has higher accuracy than gradient reversal layer and is more suitable for our network architecture.
To intuitively analyze the performance of our method, t-SNE 38 technique is applied to visualize the features extracted by the feature extractor from the two domains into a two-dimensional map. We visualized the 40L-H task, and the results are displayed in Figure  9. For global domain adaptation methods, they focus on marginal domain adaptation, when the categories of mechanical faults are close, the fault features in source and target domains are not aligned very well and some features are hard to classify. Although these methods can improve the distribution difference, it is still not satisfactory. For our proposed method, we can find that the fault features of the same category in two domains are aligned very well. Fault features in these two domains with the same category are very close, and fault features with different categories are also scattered. Subdomain adaptation can obtain more fault category information, which can effectively improve the performance of cross-domain fault diagnosis. The results suggest that the proposed method is more effective to reduce the distribution discrepancy of the two domain, and intuitively illustrate the high-performance of our method.

Conclusions and future work
This paper presents a deep adversarial transfer learning method on rotating machinery fault diagnosis. Unlike the previous method, the paper uses domain confusion  and local maximum mean discrepancy to align global distributions and subdomain distributions simultaneously. By comparing with other transfer learning methods on two rotating machinery datasets, our proposed method improves the ability of extracting domaininvariant and transferable features and greatly improve the rate of accuracy. Furthermore, in the transfer tasks under various complex working conditions, the proposed method achieves the best results. It proves this method is an effective way to address the problem of the unlabeled data in practical industrial application. Furthermore, this method can also extend to the fault detection of other mechanical systems too. The future work can be focused on the imbalanced data problem and more transfer scenarios. Besides, different timefrequency analysis methods will affect the accuracy of the model, 39,40 we will try to research it in the future.

Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.