Surface defect detection and classification of Si3N4 turbine blades based on convolutional neural network and YOLOv5

Due to the influence of mechanical vibration, high temperature creep and other factors, Si3N4 turbine blades are prone to surface defects. Besides, traditional algorithms are incapable to detect and classify surface defects simultaneously. Aiming at solving these problems, an algorithm for defect detection and classification of Si3N4 turbine blades based on convolutional neural network is proposed. The detection and classification network of this algorithm is optimized based on YOLOv5 network, the PAN structure and FPN structure of YOLOv5 are replaced by BiFPN structure. We establish the dataset of Si3N4 turbine blades, which is expanded by data enhancement. For the purpose of achieving a higher level of feature fusion, the PAN and FPN structures of the Neck part are replaced by BiFPN structure. As a result, the accuracy of detecting and classifying the surface defects by this algorithm is as high as 97.4%, and the detection speed is as low as 16ms. This optimized algorithm is able to solve the problems of traditional detection methods such as heavy workload, long time consuming and low accuracy. The algorithm provides a feasible approach for the quality detection of Si3N4 turbine blades and has certain engineering application value.


Introduction
Si 3 N 4 turbine blades have excellent properties such as low coefficient of thermal expansion, heat resistance, corrosion resistance, and wear resistance. Particularly, they still have high strength and high hardness and small specific density at 1400°C, 1-3 which can effectively reduce the weight of aircraft engine and fuel consumption. Therefore, they are widely used in aerospace and military industry. 4,5 Owing to the influence of mechanical vibration, high-temperature creep, and other factors, Si 3 N 4 turbine blades are prone to surface defects, such as snowflake defects, pit defects, scratch defects, etc. Abnormal abrasive as a sharp head is the cause of the pit defects. As a blunt head, the upper grinding plate produces crackle defects. 6 The crystal structure of the material changes to produce snowflake 1 defects. Incorrect machining pressure and unbroken hard abrasive particles during finishing produce graze and scratch defects. 7 These defects do harm to the service life and performance of Si 3 N 4 turbine blades. 8,9 Traditional silicon nitride turbine blade detection methods include manual detection method, which is inefficient and has strong subjective factors. Meanwhile, traditional algorithms cannot simultaneously detect and classify defects, [10][11][12] which reduces the production speed of products and is not conducive to improving product quality. Along with the image processing technology, nondestructive testing technology is gradually applied in the field of surface defect detection of silicon nitride turbine blades. 13 Non-destructive testing technology, assisted by image processing, 14 is unable to cause secondary damage to Si 3 N 4 turbine blades. It is of vital significance to study and design a detection and classification method for surface defects of Si 3 N 4 turbine blades. 15,16 The new method plays an increasingly important part in improving the stability of product quality, 17 reducing labor intensity of workers, 18,19 and improving the level of industry automation.
Based on the innovations of YOLOv4 and YOLOv5 algorithm in data enhancement, CNN(convolutional neural network), adaptive anchor frame calculation and adaptive image scaling, scholars at home and abroad have done in-depth research on YOLO (You Only Look Once) algorithm. Malta et al. 20 proposed a task assistant model based on convolutional neural network. The data set of automobile engine parts was established. With training the neural network, using augmented reality glasses to identify the architecture of the system. The algorithm is able to accurately detect the parts in the real-time video stream; Neuhauser et al. 21 put a kind of surface defect of extruded aluminum based on CNN classification and detection method. Using camera to continue extrusion profile, distinguishing the flawless surface defect and surface through YOLOv5 network. The defect surface was marked and classified. It has shown that this method has the advantages of fast detection speed, high classification accuracy, and high labeling classification accuracy. As a result, the method meets the needs of industrial production equipment in the detection of surface defects. Based on a single neural network, Redmon et al. 22 proposed a target detection system, namely YOLO network, whose essence is to target detection problem into a regression problem. Given the input image, the target's frame and its classification were directly regressed from multiple positions of the image. While ensuring the detection, the speed had been significantly improved. Zhang et al. 23 proposed a bearing fault mode recognition method based on convolutional neural network. Combined with the ensemble empirical mode decomposition method, the adaptive decomposition was carried out. The eigen function components were screened, network images were extracted by CNN. The classification accuracy of bearing fault modes was improved by 4.26%. Based on the research of surface defect detection and classification algorithm by the scholars home and abroad, the surface defect detection and classification of Si 3 N 4 turbine blade was realized by improved YOLOv5 algorithm.
In order to accurately detect the surface defects of Si 3 N 4 turbine blades and improve the efficiency of defect classification, a defect detection and classification method based on CNN and YOLOv5 after optimizing is proposed. Deep CNN is used to extract semantic features of defects on the surface of Si 3 N 4 turbine blades. The PAN structure and FPN structure of YOLOv5 are replaced by BiFPN structure. Images are input into the network for training and verification. Images in the test set are used to test the trained network. The method we proposed has the advantages of high accuracy and recognition speed. It has certain guiding significance to improve the detection and classification of surface defects of Si 3 N 4 turbine blades.
The structure of the article is as follows. Firstly, Chapter 2 introduces the Si 3 N 4 turbine blade surface defect detection and classification platform, including Chapter 2.1 on the construction of detection environment and Chapter 2.2 on the fabrication of surface defect data set of Si 3 N 4 turbine blades. Besides, Chapter 3 describes the surface defect image processing of Si 3 N 4 turbine blade, Chapter 3.1 introduces the principle of data augmentation, and Chapter 3.2 is about the data set division and annotation process. What's more, Chapter 4 is about the detection and classification method for surface defects of Si 3 N 4 turbine blades, Chapter 4.1 introduces defect detection and classification process, Chapter 4.2 describes the preparation of YOLOv5 algorithm network partition, Chapter 4.3 introduces the check for the model network structure, Chapter 4.4 describes the improvement of detection model, and Chapter 4.5 mainly introduces the BiFPN structure. In addition, Chapter 5 is the exposition of loss function calculation and algorithm performance analysis, Chapter 5.1 mainly introduces the loss of the model, Chapter 5.2 describes the model training process, and Chapter 5.3 is about the model evaluation. Next, Chapter 6 is the experimental results and analysis. Last but not least, Chapter 7 is conclusion.

Si 3 N 4 turbine blade surface defect detection and classification platform
Detection environment setup Si 3 N 4 turbine blade surface cleanliness of sample will be about image quality, affect the accuracy of experimental results. 24 For the purpose of getting high quality defect images, we build Si 3 N 4 turbine blade surface defect detection and classification system. This system, shown in Figure 1, is mainly composed of PC workstations, stereo microscope, industrial camera, image acquisition card, support platform, coaxial light, circular shadowless light, and camera obscura. Based on the characteristics of silicon nitride turbine blades, the light source device we used are UD115W LED circular shadow light source and C20W coaxial light source. Mer2-2000-19u3m is used for industrial cameras, HN-2520-20M-C1/1x lens is used for stereoscopic microscope. The operating system of PC workstation is Windows10, equipped with Nvidia GTX1080Ti GPU and Intel i7-7700 CPU. Compared with other defect detection systems, this system performs much better.
The working process of detecting surface defects of silicon nitride turbine blades in this system can be divided into the following steps: (1) Installing the debugging device and placing the Si 3 N 4 turbine blade on the workbench. Fabrication of surface defect data set of Si 3 N 4 turbine blades The foundation of defect detection and classification is the establishment of high-quality data set. The diversity and balance of sample types should be guaranteed when making the data set. 25 In order to achieve satisfactory experimental results, it is necessary to select the data set that meets the certain standard. After training the CNN, the recognition results obtained have a certain orientation.
We have prepared 500 Si 3 N 4 turbine blade experimental samples through experiment. Surface defect images are collected by the image acquisition system, and 1000 surface defect images are collected as the data set. Surface defects are detected by YOLOv5 network, five defects are mainly collected, namely pits, scratches, cracks, snowflakes, and grazes, the quantity of each defect is 200. The five surface defects are shown in Figure 2. The pit defects are showed partial tissue stripping, and the blade surface is uneven. 26 The scratch defects are linear and shallow. The crack defects are annular or straight, the grain of which is obvious. The snowflake defects show large net white spots or pitting. The abrasion defects are banded and have no obvious depression.
Surface defect image processing of Si 3 N 4 turbine blade

Data enhancement
With the deepening of CNN, the required learning parameters increase and the structure of learning model becomes more complex. When the data set is small, too many parameters will fit all the characteristics of the data set instead of the commonness between the data, resulting in over-fitting. 27 In the field of Si 3 N 4 turbine blade surface defect detection, for the lack of defect images, the database on a smaller scale. It is difficult to get a lot of defect images with labels. Small scale database leads to the overfitting phenomenon of the convolution neural network. Data enhancement technique 28 is introduced, namely the original image data is enhanced data transformation. Adding new samples plays an important role in extending the dataset to five times of the original one. The enhanced images, shown in Figure 3, improve the generalization ability of the machine learning model. Data augmentation is a method to add new training data on the basis of the original dataset. It is achieved by transforming the data of training set into a new data by using domain-specific technology. Create new defect images that belongs to the same category as the original images. Through resizing images and scaling pixel values, the dataset has been extended. As a result, representativeness of the defect dataset has been ensured.
Common data augmentation methods, used to increase the training set and improve the generalization ability, includes random cutting, image flipping, rotation, noise addition, affine transformation, etc.
The data enhancement methods and purpose are shown in Table 1. The data enhancement methods adopted are as follows:

Data set partitioning and annotation
Around 5000 surface defect images are obtained by data enhancement technology, and the defect images are divided into training set, verification set, and test set in a ratio of 7:2:1. What should be noted is that the ratio of the training set, validation set, and test set appear in any proportion. In the field of machine learning deep learning, it is very important to divide training set and test set reasonably. For machine learning, the typical ratio of training set to test set is 7:3 or 8:2. However, according to the characteristics of this dataset, we have tried the ratio of 6:2:2 and 5:3:2, the training effect performs bad, the mAP and precision are low. Finally, the ratio has been changed to 7:2:1, the radio is unable to degrade the performance of the YOLOv5 algorithm we proposed. The sample distribution of data set is shown in Table 2.
The training set is used to learn the sample data set and estimate the parameters in the model, so that the model can reflect the reality and predict the future or other unknown information. The main function of the validation set is to adjust the hyperparameters of the model and evaluate the capability of the model. The test set evaluates the generalization capability of the model. 29 Data augmentation, also known as data enhancement, is the process of making a limited amount of data produce value equivalent to more data without adding substantial data. For small data set, the neural network will definitely overfit with proper training. Since the training set is not enough to reflect the characteristics of the data set, the test accuracy is obviously low. In order to avoid the over-fitting phenomenon, data augmentation technology is adopted to expand the data set. Not only can data augmentation improve training efficiency and model classification performance but also enhance the abilities of model positioning and generalization.
In target detection, it is necessary to mark the original image and determine the detection target. LabelImg image annotation tool is used in this system to mark each defect in the image of Si 3 N 4 turbine blade. The main function of this tool is to define the type and quantity of annotation and record the location information of the annotation object. The steps of drawing defect location mania are as follows: (1) observe the defect image, (2) select the corresponding defect type, and (3) draw the position box at the defect. The integer ID corresponding to each defect image is 0-pit defect, 1-scratch defect, 2-crack defect, 3-snowflake defect, and 4-graze defect.
Detection and classification method for surface defects of Si 3 N 4 turbine blades

Defect detection and classification process
Based on CNN and YOLOv5 network, the PAN structure of Neck part and FPN structure are replaced by BiFPN structure to obtain the defect detection and classification method of the system. The algorithm structure has been greatly improved. As for the situation that the pit, the snowflake, the scratch, the graze, and the crack defects occur simultaneously, the algorithm has the capacity of detecting and classifying the defects. The optimized YOLOv5 have the ability of keeping good performance when different defects appear simultaneously. The detection process is shown in Figure 4. The defect detection process is as follows: (1) With data enhancement method, the defect images are collected, and these images are  Training set  695  683  675  720  727  3500  Verification set  221  196  213  189  181  1000  Test set  89  96  103  106 106 500 labeled and classified, and the optimized YOLOv5 algorithm is used to train the defect images. The preparation of YOLOv5 algorithm network partition The YOLOv5 algorithm is a single-stage target detection algorithm. Considering the characteristics of small surface defects and many kinds of defects, the network structure of YOLOv5 algorithm has been analyzed. In order to improve the training speed and detection accuracy of the algorithm, the main improvement ideas are as shown in Figure 5. It is the improvement of YOLOv5 that contributes to comprehensively improve the performance.
The input module. In the stage of model training, the improvement ideas are proposed. Compared with the previous model, the functions include data augmentation, adaptive anchor frame calculation, and adaptive image scaling has been added. As a result, the data set is amplified. The surface defect images are adjusted adaptively. The anchor frame is drawn according to the area of surface defect.
The Backbone part. As a classification network with excellent performance, the Backbone part is composed of the Focus structure, the BottleNeckCSP structure, and the SPP structure. As the reference network, Focus adopts interlaced sampling splicing structure. The input images are clipped through the slice operation. The CSP structure divides the original input into two branches and performs convolution operation to halve the number of channels. Therefore, the model has the ability to learn more features. The neck part. In order to obtain higher semantic information, the SPP structure selects the feature layers of the last three shapes for the next operation. The input feature layers are maximized. The pooled results are stacked and convolved for three times. With the SPP structure, the most significant context features can be isolated. The PANet has only one top-down path or one town-top path. On the basis of PAN structure, an extra edge is added for the case that the input and output nodes are at the same level. More features are fused without increasing cost. As a consequence, this part has the ability of obtaining the information in the pictures and extracting features.
The head part. The anchor frame mechanism of the output layer is the same as that of YOLOv4. Used to complete the output of target detection results, there are two branches in the output side, namely classification branch and regression branch. In order to further improve the detection accuracy of the algorithm, the major improvements have been put forward, including the loss function during training and NMS. The loss function of YOLOv5 algorithm consists of three parts: category loss, position loss, and confidence loss. It is through this section that feature graphs and prediction boxes can be output.

Check the model network structure
The structural network of YOLOv5 algorithm is mainly composed of Backbone, Neck, and Head, which is shown in Figure 6. The three parts are described as follows: Backbone mainly includes Focus, CSP, and SPP. The function of Focus structure is to slice 608 3 608 3 3 input images into 304 3 304 3 12 feature images, and output 304 3 304 3 32 feature images through a 32-channel convolution layer. The function of CSP structure is to split the feature graph into two parts, one part is used for convolution operation, the another is used for feature fusion with the result of convolution operation. SPP structures fuse multi-scale features by pooling different convolution kernels.
Neck is located in the middle layer between Backbone and Head, and adopts FPN structure and PAN structure, which is mainly used to generate feature pyramid. Feature pyramid is able to enhance the detection of objects with different scale of the model, so as to identify the same object with different size and scale.
Head, the last detection part, outputs feature maps of three scales for detection of small, medium, and large objects. Each grid contains three prediction boxes, in which the confidence of the object and the location of the object are contained. Through non-maximum suppression NMS, 30 the repetitive and redundant prediction boxes are eliminated, and the information of the prediction box with the highest confidence is retained. As a result, target detection is completed.

Improvement of detection model
The key to improve the performance of the model is to fuse the features of different scales. In order to enhance the feature fusion of different layers, the FPN structure and the PAN structure are adopted in Neck part of YOLOv5 network, which is conducive to multi-scale prediction.
FPN structure establishes a top-down path for feature fusion. The fused maps with higher semantic information are used for prediction. As a result, the accuracy has been greatly improved. The feature map of the upper level has stronger semantic information, which is beneficial to object classification; while the feature map of the lower level has stronger location information, which is beneficial to object localization. Consequently, the structure improves the semantic information of the predicted feature graph, but loses a few location information. On the basis of FPN structure, a new path from bottom to top is created to transmit location information to the prediction feature graph, namely PAN structure. The function of the structure are as follows: making the prediction feature graph have high semantic information and location information, improving the accuracy of target detection task. It has been proved that the effectiveness of bidirectional fusion. Since the bidirectional fusion of PAN structure is relatively simple, the PAN structure is replaced by the more complex BiFPN structure. The three structural models are shown in Figure 7.

BiFPN structure
BiFPN structure is shown in Figure 7(c). The blue part is the top-down pathway, conveying semantic information of high-level features. The red part is the bottom-up path, conveying the location information of low-level features. The purple part is a new edge added between the input node and the output node of the same layer. Compared with PAN structure, there are three major design changes of this structure: (1) Delete the node with only one input edge: in the case of a node with only one input edge without feature fusion, its contribution to blend different feature networks with different features will be small. The existence of the node does nothing to the network. Deleting the node is conducive to simplify the bidirectional network. (2) In order to incorporate more features without too much cost, in the case that the original input node and output node are in the same layer, an extra edge is added to BiFPN structures between the original input node and the output node. (3) There are two paths in PAN structure, one topdown path and one bottom-up path. Different from PAN structure, for the purpose of achieving higher level feature fusion, BiFPN structure treats each bidirectional path as a feature network layer, and loops the same layer several times.

Loss
The loss function of the YOLOv5 algorithm consists of three parts: category loss, position loss, and confidence loss. Category loss and confidence loss are calculated with the method of cross entropy. And it is shown in formula (1)- (3): Where, S 2 represents the number of grids divided; B denotes the number of predicted boundary boxes per grid; I obj ij represents whether the JTH prior box of the ITH grid has a target to be predicted; I nobj ij denotes whether the JTH boundary box of the ITH grid has a target that does not need to be predicted; l obj and l nobj are the weight coefficient of mesh with or without target; C i Ĉ i are the confidence value of predicted target and actual target; C is the target category predicted by the boundary box; p i (c) represents the prediction probability of the target category to which the ITH grid belongs when the target is detected;p i (c) represents the actual probability that the ITH network belongs to the target category when it detects the target. GIoU loss function is adopted for position loss. As for an important indicator to evaluate the performance of target detector. IoU stands for overlap degree and intersection ratio, which can be used to measure the similarity between real frame and predicted frame. The formula is shown in (4)-(6): Where, A denotes the area of the real box, B stands for the area of predicted box. IoU is directly used as the position loss. When there is no overlap between the prediction box and the real frame, IoU is 0. In this case, the distance between the two boxes cannot be reflected by IoU, and the gradient of the loss function is 0, which cannot be optimized. A c represents the area of the smallest closed box containing the prediction box and the real box, and U represents the area of the prediction box and the real box. The larger the GIoU of the two bounding boxes is, the network will be optimized in the direction of high overlap between the prediction box and the real box.

Model training
In order to test the performance of the model in dataset, the established data set is input into the optimized YOLOv5 network for model training. Epoch is set to 300, batchsize to 16, and initial learning rate to 0.0001. Adam optimizer is selected for iterative optimization. The variation curves of each loss function in the training process are shown in Figure 8. It can be seen that with the increase of iterations, the neural network is gradually optimized and the positioning accuracy of predicted frame is getting higher and higher.

Model evaluation
The combination of the real defect category and the learner's prediction category is divided into True Positive, False Positive, True Negative, and False Negative cases. A true example indicates that a defect is correctly detected, a false positive example indicates that no defect is incorrectly detected as a defect, a true negative example indicates that no defect is correctly detected, and a false negative example indicates that a defect is incorrectly detected as a defect.
Precision, recall, and mAP are used to evaluate the defect extraction ability of neural network. The accuracy rate refers to the proportion of the correctly identified samples to the correctly identified Si 3 N 4 turbine blade surface defects samples and non-Si 3 N 4 turbine blade surface defects samples. Recall rate refers to the proportion of correctly identified samples of surface defects on Si 3 N 4 turbine blades among all samples identified as surface defects on silicon nitride turbines. According to the relationship between the precision and recall, the precision-recall curve (P-R curve) can be drawn. Accuracy of all categories (AP) stands for the area of the area enclosed by the PR curve and the coordinate axes. Accuracy of all categories (AP) refers to the area of the zone enclosed by the PR curve and the coordinate axes. Map of all categories can be obtained by calculating AP of all categories and taking the average value. The calculation formula is shown in (7)- (10).

Experimental results and analysis
There are 5000 images using in the defect detection system to verify the algorithm, the values of each evaluation index are shown in Due to the defect area is larger than the others, the characteristics of snowflake defects are obvious and easy to identify, all evaluation indexes of snowflakes are high. There are similarities between the pits and snowflake defects. Since the pit defects are small in area, single in shape, and not obvious in characteristics.
It is difficult to detect and classify pits accurately. Therefore, all the evaluation indexes of pits are relatively low. On account of the adaptive image scaling technology in the YOLOv5 algorithm, the image size of the input image does not need to be considered.
When it comes to the advantages of data augmentation, not only can this increase the amount of training data but also improve the generalization ability of the model. It is data augmentation that add noise data to improve the robustness of the model. For a dataset of 1500 defect images, the values of each evaluation index are shown in Table 4. The recall rate of the optimized YOLOv5 model is 95.6%, which indicates that 95.6% surface defects are correctly detected and classified. Compared to the results of the data augmentation dataset, the mAP is lower and the comprehensive performance of the dataset of 1500 defect images is poor. As a result, for small sample datasets, data augmentation plays a significant role in improving the performance of the model.
The optimized YOLOv5 algorithm keeps good performance when different defects appear simultaneously. Different defect images in the dataset are fused together to form new representative surface defect images by image enhancement technology. The model is improved and trained, and the model training results are obtained as shown in Figure 9. As shown in the figure, when multiple defects exist in the surface defect image at the same time, the defects can still be marked. Therefore, it can be proved that for an image with multiple defects at the same time, the model can recognize and mark multiple defects.
As the function of this detection system is to detect and classify the defects of Si 3 N 4 turbine blades, in order to fully evaluate the optimized YOLOv5 algorithm, the Faster RCNN algorithm and YOLOv5 algorithm are introduced for comparative experiments. Compared with the three algorithms, the precision rate, recall rate, average accuracy value, and detection speed are selected as evaluation indexes, the target detection and classification abilities of different detection algorithms are evaluated scientifically and reasonably. The comparison results are shown in Figure 10 and Table 5.
As can be seen from the figure, the optimized YOLOv5 algorithm more accurately identify the As can be seen from the table, the precision, recall, and average accuracy values of the optimized YOLOv5 algorithm are higher than those of the other algorithms. It is indicated that compared with the other algorithms, the optimized YOLOv5 algorithm can better identify and classify defect images. In terms of defect detection speed, the optimized YOLOV5 algorithm is the fastest, as low as 16 ms, 69.6% Faster than the YOLOv5 algorithm and far faster than the Faster RCNN algorithm, which better meets the task requirements of defect detection and classification of silicon nitride turbine blades. Among these algorithms, the optimized YOLOv5's mAP performs best, the YOLOv3 and the YOLOv5 both perform worst.
The shortcoming of the optimized YOLOv5 algorithm lies in its dependence on the size of data sets. Different from small sample deep learning algorithm, the YOLOv5 algorithm needs the support of large sample data set. For small data sets, the neural network will inevitably overfit with proper training. Because the training set is not enough to represent the problem, the test accuracy is obviously low. Because the training set is not enough to reflect the characteristics of the data set, the testing accuracy is obviously low. Thus, the model does not perform well. However, for large sample data sets, this model can achieve better results of defect recognition and classification.

Conclusion
Due to mechanical vibration and high temperature creep and other factors, surface defects of Si 3 N 4 turbine blades cannot be detected and classified by traditional algorithms at the same time. Therefore, a method for surface defects detection and classification of Si 3 N 4 turbine blades based on optimized YOLOv5 network and CNN is proposed. According to the defect image characteristics of Si 3 N 4 turbine blade surface, a defect detection system for Si 3 N 4 turbine blade surface is built. The Neck part of YOLOv5 has been optimized, and the PAN and FPN structure are changed to a more complex bi-directional fusion BiFPN structure, which improved the detection accuracy and detection speed. It is shown that this method used to detect and classify surface defects of Si 3 N 4 turbine blades makes an excellent performance. The defects are accurately and completely segmented. The classification accuracy is 97.4%, and the detection speed is as low as 16 ms, which meets the industrial requirements.
When it comes to the further research direction of the model, it lies in continuous improvement of the structure according to the latest proposed algorithm. Since the Transformer algorithm was proposed as a machine translation algorithm, it has gradually replaced feature extraction led by CNN and RNN. To be completely recursive, the idea of transformer's algorithm is to use the attentional mechanism to handle the dependencies between inputs and outputs. Transformer is undoubtedly an overwhelming improvement over the recursive neural network. However, the ability of Transformer algorithm to obtain local information is not as strong as that of YOLOv5. There are also problems in location information coding. The future  research direction is to combine the YOLOv5 algorithm with the Transformer algorithm to obtain a training model with stronger adaptive ability.

Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.