Fault data enhancement and real-time diagnosis using optimized ViT ++ algorithm for electric drive system

With the rapid development of electric drive technology for new energy vehicles, fault data identification of key components of electric drives has become a crucial issue in improving the stability and safety of electric systems. However, traditional fault data recognition methods have many limitations in dealing with complex and variable operating fault situations. To address this problem, this paper proposes a deep learning model, Vision Transformer Plus (ViT++), based on the self-attention mechanism and combined with data enhancement strategies for fault identification of energy vehicle electric drive system. Accurately identifying fault types is achieved by transforming the electric drive system fault data into an image matrix and performing feature extraction and learning with the help of the ViT model. To validate the effectiveness of the proposed method, we conducted extensive cross-experiments using a large amount of actual electric drive key component fault data and applying a data enhancement strategy. The experimental results show that the fault data recognition method based on the ViT algorithm has higher accuracy and robustness than the traditional convolutional neural network (CNN)-based method. Therefore, the proposed method in this paper is conducive to improving the accuracy and efficiency of fault data identification for key components in electric vehicles, thus playing a critical role in analyzing electric drive system faults.


Introduction
With the increasing global concern for environmental protection and energy sustainability, the research and development of new energy vehicles has become an important direction for the global automotive industry.A new energy vehicle is a vehicle that uses a new type of power system, such as an electric motor, battery, fuel cell, etc., instead of the traditional internal combustion engine as the driving power. 1,2The development of new energy vehicles is of great significance for improving environmental quality, promoting energy transformation, and realizing the sustainable development of the automobile industry.By strengthening the research on new energy vehicles, the performance and range of vehicles can be further improved, the manufacturing cost can be reduced, the popularization and application of new energy vehicles can be promoted, and a greener and cleaner way of traveling can be created for human beings.The electric drive system is the most important core component of new energy vehicles, and whether it can efficiently and reliably provide power for new energy vehicles has attracted widespread attention. 3,4he rapid development of electric drive technology for new energy vehicles makes fault data identification of key components of the electric drive system an important challenge to improve the stability and reliability of electric vehicles. 5,6However, traditional fault data identification methods have certain limitations in dealing with complex and variable operating fault situations.This article proposes a novel approach to address these issues, that is, based on the improved Vision Transformer Plus (ViT ++ ) (which introduces a new function, Quiet Attention, an innovative adaptation of the traditional Softmax function) for electric vehicle drive systems fault data enhancement and identification.By leveraging the self-attention mechanism of the ViT ++ model, we aim to achieve the ability of the electric drive system to automatically learn and extract key fault features from raw data. 7,8In addition, we introduce data enhancement strategies to enrich fault data features and improve the robustness and generalization of the model. 9,10ur focus is on the potential of the ViT ++ algorithm for fault data recognition in electric vehicle drive systems.By converting fault data into an image matrix and utilizing the ViT ++ model for feature extraction and learning, we aim to achieve more accurate and efficient fault type recognition.The introduction of data playback and image preprocessing techniques allows random sampling of fault data to be detected, further enhancing the model's adaptability to different realworld fault situations.To evaluate the effectiveness and superiority of the proposed method, we conducted indepth experiments using real datasets containing many faults in critical components of electric drives.The experimental results demonstrate the superior performance of the ViT ++ algorithm compared to traditional Convolutional Neural Network (CNN)-based methods and provide valuable insights into the indepth analysis of key features and patterns in fault identification.
The main contributions can be summarized as: 1. We propose an architecture for fast and intelligent diagnosis of electric drive fault data under multiple operating conditions.
2. A practical data enhancement algorithm is proposed to increase the robustness of the system for fault identification.3. Fixing the flaws in the attention formula in machine learning and applying the improved ViT ++ algorithm to electric drive fault data identification.4. A sliding window-based data recognition method is proposed to replace the neural network method with long and short-term time memory, which improves the corresponding speed of the system, increases the sensory field of recognition, and strengthens the recognition of fault relations.
The rest of this paper can be organized as follows: Section ''Related work'' reviews the current research related to electric drive system fault diagnosis.Section ''Proposed method'' introduces an efficient architecture for data enhancement and data identification and proposes the ViT ++ algorithm.Section ''Experimental results'' shows experimental results.Finally, conclusions are drawn in Section ''Conclusions and future work.''

Related work
In recent years, the study of fault data enhancement and intelligent identification methods for electric drive systems of new energy vehicles has attracted widespread attention, mainly focusing on the following aspects:  17,18 By reviewing and synthesizing the above aspects, it can provide a comprehensive understanding in the research of fault data enhancement and intelligent identification methods for electric drive systems for new energy vehicles, and at the same time, provide a theoretical basis and research motivation for the research framework and methodology of this paper.In addition, it also helps to point out the shortcomings of the current research and provides new directions and possibilities for future research.Based on this, this paper aims to solve the problems faced by the fault diagnosis of electric drive systems in new energy vehicles, to ensure timely and accurate identification of electric drive system fault problems, and to provide useful academic and practical references for the innovative development of new energy vehicle fields.

Proposed method
This section mainly consists of two parts: The first part is the explanation of the overall architecture diagram of the fault data intelligent diagnosis system of the electric drive system.The second part introduces the visualization and application of the fault data of the electric drive system and proposes a fault data recognition model based on the ViT ++ algorithm, as well as explains the data enhancement method.

System architecture design
This paper focuses on the architecture of an intelligent diagnosis system for electric drive system fault data, as shown in Figure 1.The system architecture includes four main modules: fault components, fault tree, data storage and analysis, and fault diagnosis.
The fault components are the basic source of research data from the electric drive system components of a certain model of new energy vehicle.From all the faults, nine common fault types are sorted out: Motor Encoder Error (MEE), Battery Over-discharge (BOD), Battery Pack Voltage Fault (BPVF), Motor Gear Scratch (MGS), Motor Startup Speed Abnormal (MSSA), Speed Sensor Fault (SSF), Hub Motor Speed Abnormal (HMSA), Gearbox Gear Wear Fault (GGF), and Bearing Inner Ring Fault (BIRF), which have a significant impact on the safety and stability of the electric drive system, and thus are the focus of this paper.Fault components must be retrofitted with sensors converted to analog-to-digital signals by Analogto-Digital Converter (ADC) to collect much real fault data.With the increased amount of fault data, this paper elicits the fault tree approach to increase the system's expandability and facilitate data analysis and storage.A fault tree is a graphical logical analysis method for describing and analyzing the likelihood and impact of system failures.By unfolding the tree structure layer by layer, each node represents a failure event or cause, and logical gates indicate the relationship between events.Fault tree analysis assesses system reliability, identifies the causes of faults and major fault paths, and guides preventive and maintenance measures to improve system safety and reliability.In this paper, the faults are categorized into three main categories: Motor Component Failure (MEE, MGS, MSSA, HMSA), Power Battery Failure (BOD, BPVF), and Other Component Failure (SSF, GGF, BIRF).
The process from electric drive system components to data acquisition and processing to data diagnosis is shown in Figure 1.Firstly, many faulty electric drive system components need to be found.Then the fault data are obtained through the data acquisition system (during which the analog and digital signals must be converted to each other by ADC).][21] Data analysis and fault diagnosis are the focus of this research, from the characteristics of the data, the type of fault perspective to analyze the cause of the fault, through the fault diagnosis algorithm, can quickly from the real-time massive data to find the new energy vehicle electric drive system faults, and remind the user in time to carry out vehicle maintenance, the research for the development of new energy vehicles to bring an important role.

Proposed ViT ++ algorithm
The ViT algorithm is an image classification model based on the attention mechanism.Although it has achieved significant results on many image tasks, we found that the ViT algorithm showed a large difference in SOTA when applied to the dataset of a public competition versus the data studied in this paper.To address this problem, we improved the ViT algorithm with optimizations in attention mechanism and data enhancement.After the improvement, we successfully applied the proposed algorithm to the fault data diagnosis of the electric drive system and achieved satisfactory results.The optimized algorithm is called ViT ++ , and the detailed optimization process of the algorithm is described below.
1.In the Vision Transformer algorithm, the traditional Softmax function is usually used to calculate the attention weights to determine the correlation between different image regions. 22,23owever, the Softmax function may produce unstable values and high computational complexity when facing many image regions.To improve this situation, we introduce a new function called ''Quiet Attention'' to replace the traditional Softmax function and improve the performance and efficiency of the model.The Q: Query matrix represents the features or information to be queried.In image tasks, it is usually a feature vector representing the image obtained through one or more layers of the Transformer.K: Key matrix, used to represent the key information of the features.Like the query matrix, it is also a feature vector obtained through one or more layers of the Transformer.
V : Value matrix, which represents the value information of the features.Again, it is the feature vector obtained through one or more layers of the Transformer.
softmax: The softmax function calculates the attention score and normalizes the score to get the attention weights.
Qk t : denotes the transpose of the product of the query matrix and the key matrix used to compute the attention score.ffiffiffi d p : denotes the square root of the dimension d of the feature vector, which is used to scale the attention score to prevent it from being too large or too small.ATT(Q, K, V ): denotes the feature vector computed by the attention mechanism, obtained by weighting and summing the value matrix V according to the attention weights.
Q, K, and V originate from the same input sequence.They are, again, not identical, that is, they are projected differently.However, each layer starts with the same annotated embedding vectors.The Qk t term is used to find correlations between token vectors at different positions, essentially constructing a correlation matrix (dot product scaled by scaling), where each column and row corresponds to a token position.A softmax operation is then performed on each row of this square matrix, and the resulting probabilities are used as a mixing function of the value vectors in the V matrix.The probability-mixed V is summed with the input vectors, and the summation result is passed to the neural network for further processing.Multihead attention performs the above process multiple times per layer in parallel.Essentially, this approach divides the embedding vectors, with each head using information from the entire vector to annotate a (non-overlapping) segment of the output vector.This is the tandem operation from the original Transformer paper.
The attention mechanism calculates the similarity scores between the query matrix and the key matrix, then normalizes the scores by the softmax function to obtain the attention weights, and finally, weights and sums the value matrices according to the attention weights to obtain the final feature vectors.Such an attention mechanism can make the model focus on important features and information, thus achieving better performance in tasks such as image classification.
3. A new function, Quiet Attention, also called softmax 1 , is introduced, an innovative adaptation of the traditional softmax function.
Adding 1 to the denominator allows the vector to be taken as a whole that tends to 0. Otherwise, it will only shrink the value a little, and the normalization process will compensate for the shrinkage.Otherwise, it will only shrink the value a little, and the shrinkage will be compensated for in the normalization process.
Compare this with the new, improved softmax 1 .Preoptimization: lim Optimized: lim The derivative is positive, so there is always a non-zero gradient, and it sums between 0 and 1, so the output doesn't get out of control.And the following properties are satisfied.
Zheng et al.
That is, the relative values in the output vector remain unchanged.
The data used in this paper are special in that the length of the diagnosed data is inconsistent, the possible fault features are random in the time series, the amount of data to be diagnosed is large, and the possible fault components in the electric drive system are also random.Therefore, designing a comprehensive and fast method to process the data and identify the features is necessary, and thus we designed a time series-based equal step fault diagnosis sliding window (T-DSW), as shown in the left half of Figure 2. The idea of T-DSW design is as follows: 1.The Figure 2 shows the image plotted for a total amount of data of 3 s in time, with nine types of fault states of the data, three sliding windows as a group (Fbox, Mbox, Bbox), five reference quantities on the timeline t1, t2, t3, t 0 , t 00 ð Þ , t 0 = t1 + t2 2 , t 00 = t2 + t3 2 ; 2. The experiments designed in the paper used intensive data acquisition to generate 2D image features, where the image's dimensions are H Ã W (height and width).Although the height H can be fixed, the width W can theoretically be of infinite length, which leads to a form of dense compression when the data is displayed on the image, which affects the presentation of detailed image features; 3.Each time we diagnose data, we must simultaneously acquire nine fault state data.From the visualization point of view, we can intercept the fault state data image of the current period from the data area covered by the sliding window Fbox from time t1 to t2; from time t 0 to t 00 the data area covered by the sliding window Mbox at time t1 to t2; and from the data area covered by the sliding window Bbox at time t2 to t3, we can intercept the fault state data image of the current period.The final result is a nine-grid image, as shown in the middle part of Figure 2.
Each grid corresponds to a different fault state data image, and such a visualization helps to observe and analyze the fault state data intuitively.
Based on the above improvement scheme, data enhancement is needed to improve the recognition rate Above are the key algorithms and the core design methods involved in this paper.In the next section, we will verify the effectiveness of the design scheme through experiments.

Experimental results
The dataset used in this paper is sourced from a precision instrument company in Jiangsu province that specializes in the maintenance and fault handling of new energy vehicles.We have a close collaboration with this company in the field of data analysis and mining algorithms.The objective of this research is to enhance the efficiency of fault diagnosis in the electric drive systems of new energy vehicles.Moreover, the data used in this study is highly competitive within the industry.
In order to validate the performance of the proposed method, Visual Geometry Group-16 (VGG-16), InceptionV3  the gap or error between the model prediction and the actual target.Our goal is to minimize the value of the loss function and thus improve the performance of the model.
The proposed method transforms the fault diagnosis problem from discrete data to visual image recognition, thus providing clearer insights into the dynamic visualization of data changes.Among the numerous image recognition models, the Transformer model has demonstrated superior recognition capabilities in various domains such as text, speech, and image.We initially considered a backbone of image classification models commonly used in academia and industry, such as VGG, Inception, and ResNet, and compared them to the Transformer model.The Transformer model demonstrated strong performance in our paper, prompting us to choose the ViT neural network algorithm based on the Transformer backbone.After choosing the Transformer model, we observed that its performance varied with different training parameter settings.The performance of these models in various research depends on parameters such as the chosen model structure, dataset, and training parameters.In our parametric experiments using the Transformer model, we found that the input block size of the image directly affects the performance of our research.The ViT ++ model supports various patch sizes, including 14 3 14, 16 3 16, and 32 3 32.We evaluated the performance of the ViT ++ model under different patch input conditions.To ensure the comparability of the experiments, we set the epoch number to 50 based on our experimental experience.The dataset is divided into training data and test data according to the ratio of 6:4.
In our experiments, we tested the performance of the proposed scheme on a deep learning service, while conducting experimental verification with multiple deep learning image classification models (VGG-16, InceptionV3, ResNet-101, ViT ++ ).Moreover, we exclude the time consumed on the communication channel as it heavily depends on the network traffic.The parameter settings of the experimental environment are shown in Table 1.Specifically, the experiments can be divided into two main parts: the recognition of multi-class fault spliced images and the recognition of single-class fault images, as shown in follows.
1. Recognition of multi-class fault spliced images with only two types of classification results: faulty or normal.We input the processed multiclass fault spliced image inputs into VGG-16, InceptionV3, ResNet-101, and proposed ViT ++ models to verify the recognition effect, respectively.
From Table 2, it is evident that InceptionV3 performs worse in terms of both training accuracy and testing accuracy compared to the other models.It also exhibits higher loss values.VGG-16 outperforms InceptionV3 slightly in terms of training, testing, and loss, but the overall recognition accuracy remains below 80%.The ResNet-101 model achieves a high training accuracy of 99:1%, but its testing accuracy is only 78:8%, resulting in a large gap between training accuracy and testing accuracy.Additionally, the loss value of the ResNet-101 model is not the minimum in the same column.Analyzing the whole Table 2, we can see that the ViT ++ algorithm consistently achieves over 90% accuracy in both training accuracy and testing accuracy.It also exhibits an advantage in terms of loss within the same column.This shows the superiority of the ViT ++ model and the accuracy of the model improves as the patch size increases, with the best performance observed at a patch size of 32 3 32.
From the results of multi-model training accuracy and loss of multi-class fault spliced images in Figure 4, the fastest convergence is the ViT ++ 14/16/32 model, and the accuracy of more than 90% are the ResNet-101 and ViT ++ models.Combining accuracy and Meanwhile, the proposed ViT ++ algorithm implements the SOTA effect on each category individually and achieves higher recognition rates as the patch size increases.
The above results demonstrate the superior performance of the proposed ViT ++ algorithm.It is due to the fact that the proposed ViT ++ deep learning architecture, unlike traditional CNNs, takes a completely different approach to processing image data, utilizing the self-attention mechanism to capture global and local relationships in an image, which is more suitable for massive and complex data with multiple feature dimensions in this paper.Specifically, we can make the following analysis:

Conclusions and future work
In this paper, we propose a data enhancement and realtime diagnosis method based on the optimized ViT ++ algorithm for fault data identification problems in electric drive systems.We have achieved remarkable results by applying the proposed ViT ++ algorithm to fault data diagnosis of the electric drive system.By improving the attention mechanism in the ViT ++ algorithm and introducing a data enhancement strategy, we were able to identify faults in electric drive systems more accurately and efficiently, improving the intelligent diagnosis ability of fault data in electric drive systems.However, some aspects still need to be further discussed and improved.Although we have introduced a data enhancement strategy to enrich the characteristics of the fault data, there is still a possibility of overfitting or underfitting problems.More data enhancement methods can be further explored, and experiments can be conducted to verify their effectiveness.Complex and variable working conditions and environments may affect the electric drive system in practical applications.Therefore, we need to consider the robustness of the model to ensure high accuracy even under different conditions.

Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Figure 1 .
Figure 1.Overall architecture of intelligent diagnosis system for electric drive system fault data.

Figure 2 .
Figure 2. Data Augmentation, graphical visualization of electric drive system state data, sliding windows Fbox, Mbox, Bbox, acquire the electric drive operating state data during a specific period to generate and intercept block images of P 3 P (32 3 32) size, with a total of nine states (MEE, BOD, BPCVF, MGS, MSSA, SSF, HMSA, GGF, BIRF) stitched into 3 3 3 nine-grid size images and displayed on the real-time fault diagnosis and monitoring interface of the electric drive system.GGF, BIRF into a 3 3 3 nine-grid size image and display it on the real-time fault diagnosis and monitoring interface of the electric drive system.
, and Residual Network-101 (ResNet-101) are used for comparison.Among them, VGG-16 is a deep CNN architecture for image classification and object recognition tasks, InceptionV3 is designed to improve the performance of image classification and object detection, and ResNet-101 contains a 101-layerdeep network architecture designed to improve the performance of computer vision tasks such as image classification, object detection, and semantic segmentation.In addition, we used training accuracy, testing accuracy, and loss as experimental evaluation metrics.Training accuracy is the ratio of the number of samples correctly classified by the model on the training data to the total number of samples, testing accuracy indicates the proportion of samples correctly classified by the model on the test dataset, and loss is used to measure

1 .
Traditional image classification methods mainly rely on CNNs.In contrast, the ViT ++ algorithm breaks the limitations of traditional CNNs in terms of image size by introducing the Transformer model, which converts image data into sequence data and uses the self-attention mechanism to learn global features.This enables the ViT ++ algorithm to handle images of

Figure 4 .
Figure 4. Multi-model training accuracy and loss results for multi-class fault spliced images.
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by Quzhou Science and Technology Key Research Project: Research on key technologies for Intelligent diagnosis and predictive maintenance of faults in new energy vehicle Integrated electric drive systems based on big data (2022K105); Research on intelligent detection methods for electromagnetic interference attacks in industrial IoT (2023K252); Research on intelligent visual networking platform for pump station clusters used in urban sewage lifting (2023K248).Natural Science Foundation of Zhejiang ProvinceMulti-agent based hierarchical cooperative control strategy for torque distribution of distributed drive electric vehicle (LY21E050001).

Table 1 .
Configuration parameters of the experimental environment.

Table 2 .
Comparison of recognition results of multi-class fault spliced images with multi-model experimental effects.

Table 3 .
Comparison of recognition results of single class fault spliced images with multi-model experimental results.The proposed ViT ++ algorithm can learn global features of images and contextual information through the self-attention mechanism instead of being limited to the local sense field.This enables the ViT ++ algorithm to perform well in understanding the overall content and contextual relationships of an image and to capture important features in an image better.3. The proposed ViT ++ algorithm achieves efficient feature extraction through the Transformer model, which achieves accuracy comparable to that of large CNNs with relatively small model parameters.This makes the ViT ++ algorithm more efficient under resource constraints and provides new ideas for designing lightweight models.