Fuzzy–synthetic minority oversampling technique: Oversampling based on fuzzy set theory for Android malware detection in imbalanced datasets

In previous work, imbalanced datasets composed of more benign samples (the majority class) than the malicious one (the minority class) have been widely adopted in Android malware detection. These imbalanced datasets bias learning toward the majority class, so that the minority class examples are more likely to be misclassified. To solve the problem, we propose a new oversampling method called fuzzy–synthetic minority oversampling technique, which is based on fuzzy set theory and the synthetic minority oversampling technique method. As the sample size of the majority class increases relative to that of the minority class, fuzzy–synthetic minority oversampling technique generates more synthetic examples for each minority class examples in the fuzzy region, where the minority examples have a low degree of membership to the minority class and are more likely to be misclassified. Using the new synthetic examples, the classifiers build larger decision regions that contain more minority examples, and they are no longer biased to the majority class. Compared with synthetic minority oversampling technique and Borderline–synthetic minority oversampling technique methods, fuzzy–synthetic minority oversampling technique achieves higher accuracy on both the minority class and the entire datasets.


Introduction
Malware (malicious software) continues to increase with the rapid development of mobile networks, especially on the Android platform and services. In November 2015, McAfee Labs Threats Report 1 announced that in first three quarters of the year, mobile malware incidents increased by approximately 4 million, more than twice the same period of 2014, and the total number of mobile malware incidents reached approximately 10 million. Android malware can infect and harm Android platforms and services through various methods such as malicious websites, spam, malicious SMS messages, and malware-bearing advertisements. Android malware causes security threats such as phishing, banking-Trojans, spyware, bots, root exploits, SMS fraud, premium dialers, and fake installers. Moreover, the most-frequent malware behaviors are escalating privileges, taking remote control, incurring financial charges, and stealing personal information. 2 Therefore, Android malware detection is a necessary and pressing task. There are two effective types of approaches: static analysis by decompiling the source code and dynamic analysis by monitoring application execution at runtime. 3 The datasets for most Android malware detection experiments are composed of both benign and malicious applications. The benign applications are collected from Google Play Store, and the malware applications are collected from malware share websites or by researchers themselves. Because it is difficult to create a comprehensive malware collection, the datasets used in the experiments are typically imbalanced: number of benign applications is larger than the number of malware applications. [4][5][6] However, this imbalance problem is usually not considered seriously.
Dataset imbalance is not a new problem; it occurs in many real-world situations including intrusion detection, risk management, text categorization, and information filtering. 7 In these problems, just as in Android malware detection, it is important to correctly identify the minority class. There is a strong learning bias toward the majority class when using imbalanced datasets. As a result, the minority class examples are more likely to be misclassified; 8 therefore, the minority class should be given more attention. Machine learning from imbalanced datasets has been widely researched. Overall, the approaches for solving the imbalanced dataset problem can be divided into two types: resampling methods and imbalanced learning algorithms. 9 Resampling tries to balance the class distribution inside the training data through both oversampling method (adding examples to the minority class to approach the majority class size) and undersampling method (removing examples from the majority class to approach the minority class size). The most famous oversampling method is the synthetic minority oversampling technique (SMOTE). 10 Later, studies have proposed modifications to existing methods that have achieved good performances on balanced datasets or proposed new algorithms to resolve the imbalance problem.
In the SMOTE method, the minority class is oversampled to generate synthetic minority examples. For each minority example, the k-nearest neighbors (KNNs) in the minority class are calculated. Then, some neighbors are randomly selected depending on the oversampling rate, which is the integer ratio of the sample size of the majority class to that of the minority class. Subsequently, new minority synthetic examples are generated along the lines between the minority example and its selected nearest neighbors. These synthetic minority synthetic examples increase the number of the examples in the minority class, balance the distribution of the datasets, and build larger decision regions that contain more minority class examples. The results of experiments show that the SMOTE approach improves the accuracy of classifiers for the minority class. 10 The imbalance ratio (IR) characterizes the imbalance degree of the dataset. It is equal to the sample size of the majority class divided by the size of the minority class. 11 The IR is considered to be an important factor that affects the classification accuracy of the minority class. However, recent studies have indicated that the IR is not the only factor. For some datasets with a high IR, standard classifiers achieve minority class accuracy that is still high. Other factors also reduce minority class classification accuracy in the imbalanced datasets, such as overlap between classes 12,13 and the presence of many minority examples within the majority class. 14 However, when multiple factors occur together in imbalanced datasets, the accuracy of minority class classifications can be seriously affected. Therefore, data distribution can have a large impact on the quality of imbalanced datasets and on classifier performance. The membership degree of the examples to the classes is computed to measure the distribution of the datasets. 15 Therefore, data distribution and the membership degree of the examples to the classes lead to the main goals of our study.
Our main goal is to solve imbalanced dataset distributions by adding new minority examples and to improve the accuracy of the minority class. The most important task is to construct a principle for oversampling the minority class. This principle should act as a guide to help determine which minority examples should be used to generate new synthetic examples and how many new synthetic examples should be generated for each minority example.
The concept of membership degree was first proposed in fuzzy set theory: it reflects the degree of uncertainty about whether an example belongs to a set, and it permits the gradual assessment of the membership of the examples in a set. 16 Membership degree quantifies the relationship of each example to a given dataset in a range [0.0, 1.0]. When the value of the membership degree of an example equals 1.0, that example is sure to belong to the dataset. When the value of membership degree is between 0.0 and 1.0, the example is fuzzy and only partially belongs to the dataset. Fuzzy set theory provides a methodology for data analysis; here, we extend fuzzy set theory to the task of Android malware detection in imbalanced datasets.
According to fuzzy set theory, minority examples in the imbalanced datasets that have a low membership degree to the minority class can easily be misclassified. However, the decision region of the minority class can be broadened to ensure correct classification by creating new minority examples that are similar to the minority class. 17 The result is that classifiers are no longer biased away the minority class. Based on the above concepts, a fuzzy region is defined to contain minority  examples with low membership degree, and there is a  need to generate new synthetic examples. To generate  such examples, the following two questions should be  answered: (1) What is the range of the fuzzy region? (2) How many new synthetic examples should be generated for each minority example in the fuzzy region?
The reason for question (1)  We propose a new oversampling method called fuzzy-SMOTE, based on fuzzy set theory and SMOTE, to solve the imbalance problem in Android malware detection. The results indicate that fuzzy-SMOTE improves performance both on the accuracy of the minority class and on entire datasets. Fuzzy-SMOTE calculates the membership degree of each minority example to the minority class. In addition, it defines a fuzzy region that contains the minority examples with low membership degrees that are more likely to be misclassified. Combined with the IR, fuzzy-SMOTE generates more synthetic examples for each minority example with low membership degree. The new synthetic examples broaden the decision region of the minority class and reduce classifier bias to the majority class.
The remainder of this article is organized as follows: section ''Related work'' reviews related works on Android malware detection methods and imbalanced learning. Section ''Methodology'' presents the details of the research methodology. Section ''Experiment and discussion'' introduces and discusses the experiments, and section ''Conclusion'' provides conclusions.

Methods for Android malware detection
Android malware detection methods are divided into three categories: static analysis, dynamic analysis, and hybrid analysis. 3 Static detection methods analyze the decompiled source code at the binary level or the API level without executing the Android applications. Dynamic detection methods analyze application behaviors at runtime by monitoring behaviors indicative of Android malware activity. Hybrid analysis methods combine both static and dynamic techniques.
Static analysis methods contain rule policy 18,19 and machine learning methods. [4][5][6] Machine learning detection process consists of feature extraction, feature selection, and machine learning. In feature extraction part, permissions, 20 APIs, 4 combination of permissions and APIs, 6 system calls, 21 signatures, 22 and component information 23 are extracted as features from the decompiled source files. In feature selection part, the common methods are Information Gain (IG), 6,24 CHI, 6,25 and Fisher Score. 25,26 Machine learning classification methods include KNN, naive Bayes (NB), decision tree (DT), logistic regression (LR), support vector machines (SVMs), AdaBoost, and k-means. [4][5][6]24 However, dynamic analysis methods focus on application behavior collection and relational analysis. 27 The behavior objects include semantic-based approach, 28 data flow, 29 system call sequence, 21 inter-application communication (IPC), 30 and privilege escalation. 31 Imbalanced datasets have been widely used in previous studies. In Cen et al., 6 five training datasets were used in which the ratios of malicious applications to benign applications in the datasets were 0.0%, 0.23%, 4.79%, 0.25%, and 8.07%. In Aafer et al., 4 the dataset included 16,000 benign and 3987 malware applications. In Sanz et al., 5 the dataset included 1811 benign and 249 malware applications, and Jang et al. 32 included 109,193 benign and 9990 malware applications. Although the malware detection accuracy was high, reaching 99% in Cen et al., 6 the researchers did not pay attention to the imbalance problem or take special measures to solve it. However, in our work, we aim to explore the nature of the imbalance problem and its solution, and we propose a new oversampling method to ensure the performance of imbalanced datasets in malware detection.

Methods for imbalanced learning
The methods to solve the problem of the imbalanced dataset are divided into two types: data resampling methods and imbalanced learning algorithms. 9 The data resampling methods aim to balance the given dataset by adding or removing samples. While imbalanced learning algorithms focus on modifying the existing machine learning mechanism or creating new algorithms to improve the detection rate of minority class.
Resampling methods consist mainly of two types: oversampling and undersampling. 33 Oversampling duplicates existing examples or generates new examples for the minority class, while undersampling removes examples randomly from the majority class. Both oversampling and undersampling methods have been combined to solve the imbalance problem. 10 Specifically, random oversampling is a non-heuristic method that balances the class distribution by randomly replicating minority class examples. 34 The most famous of these methods is SMOTE, 10 which generates new (synthetic) minority examples between each minority example and its nearest neighbors. This method makes decision regions larger and less specific; consequently, classifiers achieve better performance. Han et al. 35 presented Borderline-SMOTE, which oversampled only the minority examples near the borderline. Borderline-SMOTE achieved a better true positive (TP) rate and F-measure than SMOTE and random oversampling methods. Chawla et al. 36 integrated SMOTE and the standard boosting method to create SMOTEBoost, which synthesized minority class examples and indirectly changed the updating weights to compensate for skewed distributions. Bunkhumpornpat et al. 37 proposed Safe-Level-SMOTE, which oversampled minority instances along the same line at a safe level. The safe level was defined based on the nearest-neighbor minority instances. Guo and Viktor 38 proposed DataBoost-IM, which generated synthetic examples for the majority and minority classes. Qiong et al. 39 proposed genetic algorithm-based synthetic minority oversampling technique (GASMOTE) to improve SMOTE-based on the genetic algorithm (GA) by setting different sampling rates for different minority samples. Moreover, SMOTE has been integrated with numerous machine learning 9,40-42 and deep learning algorithms. 43 Undersampling methods also have many different forms including random undersampling, 34 inverse random undersampling, 44 and the EasyEnsemble and BalanceCascade undersampling strategies. 45 Imbalanced learning algorithms are divided into four types: cost-sensitive methods, kernel-based learning methods, active learning methods, and ensemble methods. 46,47 (1) Cost-sensitive methods focus on the costs of the misclassified examples using different cost matrices and have outperformed various other empirical sampling methods. 48 Cost-sensitive methods can be categorized into three types: those that use misclassification costs as a form of dataspace weighting, those that use cost-minimization techniques to combine ensemble methods to improve performance, and those that indirectly incorporate cost-sensitive functions into classification paradigms to train classification models. 46 Several different cost-sensitive boosting methods based on the AdaBoost algorithm have been proposed. Sun et al. 49 changed AdaBoost's weight-updating strategy by introducing cost items. They proposed three costsensitive boosting methods: AdaC1, AdaC2, and AdaC3. Song et al. 50 proposed BABoost, which assigned higher weights to the misclassified examples in the minority class. Cost-sensitive DTs 51 and costsensitive neural networks 52 have also been widely studied for imbalanced learning. (2) Kernel-based learning methods such as SVM mainly center on statistical learning and can achieve relatively robust classification results when addressing imbalanced datasets. 9 SVM adopts different error cost terms to shift the decision boundary away from positive examples to guarantee that negative examples are classified correctly. 53,54 (3) Active learning methods are often integrated into kernel-based learning methods. As an active learning method, SVM is used to select the most informative examples from unknown training data while retaining the kernel-based methods. 46 (4) Ensemble methods are effective in averaging prediction errors and reducing bias and error variance. Most current ensemble methods have similar procedures for imbalanced datasets: resampling and voting. 55 Most of these ensembles are based on known strategies from bagging, boosting, or random forests. Moreover, the diversity of ensemble approaches determines the final prediction accuracy of an example. Consequently, creating algorithm strategies that ensure and enlarge diversity is a critical task. 56 Methodology SMOTE Chawla et al. 10 proposed the SMOTE approach, which can improve classifier minority class accuracy on imbalanced datasets. SMOTE performs oversampling by creating synthetic samples for the minority class rather than duplicating existing minority class samples. The synthetic samples are generated based on a random k grouping of the minority class' nearest neighbors. Then, the generated synthetic samples are added to the dataset when training the machine learning classier model.
The principles of the SMOTE algorithm are as follows: (1) find KNNs x for each sample x in the minority class based on Euclidean distance. Commonly, k is set to 5. (2) Calculate the difference between the minority example and a nearest neighbor selected randomly from the k-minority class nearest neighbors. (3) Multiply the difference by a random number between 0 and 1, and add it to the original sample x. A synthetic sample is generated as follows More synthetic samples of the minority class can be obtained using the above steps. The new synthetic samples can maintain the distribution of the original minority class and balance the distribution of the entire training set. Therefore, SMOTE causes classifiers to create a new decision boundary that is no longer biased to the majority class.
SMOTE can improve classifier performances on imbalanced datasets. Meanwhile, improving SMOTE has attracted much research attention; consequently, several improved algorithms have been proposed that include Borderline-SMOTE and SMOTEBoost. 35 Borderline-SMOTE takes only the minority examples near the borderline to oversample based on SMOTE. SMOTEBoost is a combination of SMOTE and the boosting procedure that creates synthetic examples for minority class and thus indirectly changes the updating weights and the compensation for skewed distributions. These methods improve classifier performance on the minority class and overall performance on imbalanced datasets; however, they do not consider the distribution of the imbalanced dataset nor do they specifically consider the easily misclassified minority examples. Therefore, we propose an improved algorithm, called fuzzy-SMOTE, which is based on the membership degree concept from fuzzy set theory and the SMOTE method.

Fuzzy-SMOTE
For balanced datasets, most machine learning methods build an impartial decision boundary to achieve good overall performance. However, in the original imbalanced dataset space, this decision boundary tends to be biased toward the majority class because fewer minority examples exist to train the classification model. 8 To better identify the minority class, we concentrate on oversampling the minority examples. Therefore, we explore the distribution of the minority and majority classes and generate additional synthetic samples for those minority examples that are most easily misclassified. This is the difference between our method and the existing oversampling methods.
In the fuzzy-SMOTE method, we first calculate the membership degree of each minority example to the minority class based on fuzzy set theory; then, we define a fuzzy region in which the minority class examples have low membership degree to the minority class and are easily misclassified; and finally, by combining the imbalance factor, we create synthetic samples for the minority examples in the fuzzy region and add them to the original training set.
Suppose that the training set is X = fx 1 , x 2 , . . . , x N g, its size is N ; the majority class is X p = fx p1 , x p2 , . . . , x pN + g and has a size of N + ; and the minority class is X q = fx q1 , x q2 , . . . , x qN À g, whose size is N À . The IR is defined based on the size of the majority class divided by the size of the minority class and used to characterize the degree of imbalance of the dataset. The IR function is defined as follows The IR is related to the oversampling rate of the training minority class. The oversampling rate K denotes the number of the synthetic samples to be generated and is defined as follows where IR b c is the rounded-down value of IR. The fuzzy-SMOTE method is executed in the following steps: Step 1: membership degree. Let the classes C p and C q be the distributions of the majority and minority classes, respectively. Let V p and V q be the centroids of classes C p and C q , respectively, and let m C i (x j ) be the membership degree of an example x j to a given class C i . 15 These can be defined as follows where jjx j À V i jj and jjx j À V k jj are the Euclidean distance of example x j to the centroids V i and centroids V k (k is set to p or q). Here, m denotes the fuzziness of membership to each class, and it is set to 2 in our work. m C i (x j ) denotes the membership degree with which an example belongs to a class, and its range is ½0:0, 1:0. In this work, x j belongs either to class C p or C q , such that its total membership to both classes is A large value for m C i (x j ) indicates that example x j has high membership degree to the class C i and, therefore, can easily be classified to class C i . As shown in Figure 1, the membership degree of x a to the class C q is larger than that of x b ; consequently, x a is easier to classify to C q .
Step 2: fuzzy region. Suppose example x qj is a minority class example. When m C q (x qj ) 0:5 and m C p (x qj ).0:5, example x qj is closer to C p than to C q , and example x qj is likely to be misclassified to C p . Therefore, we define a fuzzy region to contain the minority examples that have low membership degrees to the minority class and are likely to be misclassified. It is important to determine the range of the fuzzy region. When the range is small, the classification region biased to the majority class is little changed. In contrast, when the range is large, some redundant examples are generated. We will explore the proper range of the fuzzy region through the experiments.
In addition, examples in which the value of m C q (x qj ) is smaller should be given more attention, and additional synthetic samples should be generated for them.
Step 3: oversampling rate. For every minority example x qj of the training minority class in the fuzzy region, the oversampling rate K qj can be calculated as follows When m C q (x qj ) is smaller, the value of K qj is larger. Therefore, more synthetic examples can be generated for the minority examples with small membership degrees.
Step 4: oversampling. First, we find the KNNs for each sample x qj in minority class based on Euclidean distance. Here, k is set to 5, as in SMOTE. Second, we randomly select one neighbor x qj from the KNNs and calculate the difference diff j between x qj and x qj in formula (9). Then, diff j is multiplied by a random integer 0 or 1. Finally, a new synthetic minority example is generated between x qj and its nearest neighbor x qj is shown in formula (10). diff j = x qj À x qj , j = 1, 2, . . . , K qj ð9Þ The above procedure is repeated for each minority example in the fuzzy region until P K qj synthetic minority examples have been generated. Then, the new synthetic minority examples are added to the original training set, which is then sent to the classifier and used to train classification models.

Experiment evaluation
To evaluate the performance of our proposed method, the datasets are divided into training sets and testing sets using 10-fold cross-validation. In this method, a full dataset is split into 10 independent folds. In turn, nine folds are used to train the classification models, and the remaining fold is applied as the testing set to validate and assess the model. Consequently, each fold is used once as a testing set. Formally, the complete cross-validation estimate is the average of the 10-fold estimates computed in a loop. 57 While 10-fold crossvalidation can be computationally expensive, it does not require as much data as a fixed arbitrary test set and is quite suitable for small datasets. 58 Traditionally, with balanced datasets, machine learning classifiers are evaluated by overall accuracy. However, to meet the special situations of imbalanced datasets, other evaluation measures are adopted to provide comprehensive assessments. 10,56 In our work, the negative class (malicious applications) is the minority class and the positive class (benign applications) is the majority class. The evaluation metrics are defined as follows where acc + denotes classifier accuracy for the majority class (positive examples), and acc À denotes classifier accuracy for the minority class (negative examples). Thus, overall À acc represents the overall accuracy of the classifier on the complete dataset, and precision is the percentage of predicted positive applications that are actually benign. Here, recall is the percentage of correctly classified positive examples. In addition, F-measure integrates both recall and precision as a measure of the effectiveness of the classification models. When both recall and precision are high, the F-measure value is also high. The parameter b reflects the relative importance of recall and precision and is usually set to 1.0. 36 Furthermore, G-mean evaluates the degree of inductive bias in terms of positive and negative accuracy. When the difference between acc + (recall) and acc À is small, the value of G-mean is high. 59

Experiment and discussion
This section presents the detailed methodology and results of the experiments. For each method, the imbalanced evaluation values (such as acc + , acc À , F-measure, and G-mean) are the average values of the 10-fold cross-validation experiments.

Data source
We used 10 datasets in the experiments as described in Table 1 , and DroidDream4 datasets. Finally, the benign samples and the malware sets were integrated to form the S1-S8 datasets.
The Android permission mechanism is used to restrict application access to system resources and guarantee runtime security. Permissions are declared in an AndroidManifest.xml file and requested when the applications perform sensitive behaviors through a corresponding set APIs. Malware and benign applications tend to request permissions differently. Malware applications often request higher-risk permissions such as SEND_SMS, RECEIVE_SMS, and READ_SMS. Therefore, to some degree, malware can be differentiated from benign applications based on the permissions listed in the AndroidManifest.xml file. Previous works have demonstrated that permission features can identify whether an application is malicious. [63][64][65] Therefore, in our work, we extract the permissions from decompiled AndroidManifest.xml files and use them as features for detecting malware. In each dataset, the permission features are the union of permissions declared by both benign and malware applications. The numbers of features in the different datasets are listed in Table 1.

Accuracy bias to the majority class
For imbalanced datasets, machine learning algorithms create a decision boundary biased to the majority class. Consequently, the minority class examples are more likely to be misclassified. 8 In this section, we use S1, S2, S4, S6, and S7 datasets to explore the accuracy bias to the majority class in imbalanced datasets. The minority classes are similar in the S1, S2, S4, S6, and S7 datasets except for their number, so the datasets are comparable. We used SVM as the classifier. The results are shown in Table 2. As shown in Table 2, the results indicate that as the IR grows, the accuracy of minority class (acc À ) classification decreases while the accuracy of majority class (acc + ) classification and the overall accuracy of the dataset (overall À acc) increase. This trend is consistent because the classifiers tend to be biased to the majority class. When the IR is larger than 18, the value of acc À decreases by more than 20%. In this case, misclassifications of the minority examples (malware applications) may result in serious consequences. Meanwhile, recall and precision reflect the correct classification of the majority class (positive), and their values increase as the IR increases. The F-measure value also increases because the F-measure integrates recall and precision. In addition, G-mean is the square of acc À 3 acc + . When the difference between acc À and acc + is small, the value of G-mean is large. As the IR increases, the difference between acc À and acc + also increases; therefore, the value of G-mean decreases.

Performance for fuzzy region
In section ''Fuzzy-SMOTE,'' the fuzzy region in fuzzy-SMOTE method is defined to contain the minority class examples, which are easily misclassified to the wrong class and should be given more attention. In this section, we explore the performance of the fuzzy region at different ranges. First, we calculate the membership degree m C q (x qj ) of each minority example to the minority class. Using the S1 and S8 datasets as examples, the distribution diagrams of the membership degree are shown in Figure 2. The results show that the range of the membership degree of some minority examples is concentrated in the range [0.0, 0.5]; these examples are more likely to be misclassified to the majority class. Therefore, these minority examples should be included in the fuzzy region. This part of the experiments explored the proper range for the fuzzy region. Then, we used fuzzy-SMOTE to create the synthetic samples and used SVM as the classifier. The algorithms with different range parameters are executed 10 times. The results that appear repeatedly are taken as the final results.
The G-mean performances of SVM using fuzzy-SMOTE with different fuzzy region ranges are presented in Figure 3. Figure 3(a) shows that when the range of the fuzzy region is smaller than 0.7, the  number of synthetic samples generated by fuzzy-SMOTE increases. However, after the range of the fuzzy region exceeds 0.7, the numbers change very little. Similarly, Figure 3(b)-(d) shows that as the fuzzy region range increases, the classifier achieves better performance on acc À , F-measure, and G-mean. However, after the range of the fuzzy region exceeds 0.7, the results of acc À and G-mean change very little. We can conclude that the proper range of the fuzzy region is [0.0, 0.7]. Moreover, fuzzy-SMOTE does not need to create synthetic samples for examples in the minority class whose membership degree is high because they are already highly likely to be classified correctly.
Comparison with existing oversampling methods SMOTE 10 and Borderline-SMOTE 56 have been shown to improve classifier accuracy for the minority class. In this section, we compare several sample synthesis methods including SMOTE, Borderline-SMOTE, and fuzzy-SMOTE with the fuzzy region [0.0, 1.0]. The oversampling rate of each training dataset is IR b c in the SMOTE and Borderline-SMOTE methods. The goal is to try to balance the number of the minority and majority classes in the training set. For the fuzzy-SMOTE method, the oversampling rate of each minority example is determined by its membership degree to the minority class. After performing oversampling of the original training sets using the different synthesis methods, we applied SVM as the classifier.
The comparison results are illustrated in Tables 3-7. As shown in Tables 4 and 7, all the oversampling methods improve acc À and G-mean. Fuzzy-SMOTE achieves the best performance on acc À and G-mean, and it improves the acc À scores on S7 by 16%. On datasets S2, S3, S5, S7, S8, and S9 Borderline-SMOTE achieves higher scores than SMOTE on acc À and G-mean. Meanwhile, both Borderline-SMOTE and SMOTE sacrifice the performance of acc + and F-measure. On datasets S2, S3, and S8, the losses to acc + and F-measure are smallest from fuzzy-SMOTE. Moreover, as shown in Table 3, fuzzy-SMOTE usually generated the smallest number of synthetic samples for datasets S1-S8. Overall, fuzzy-SMOTE is superior to SMOTE and Borderline-SMOTE because it improves acc À the most while using the fewest synthetic samples.
Combining the evaluation results from Tables 3-7, for most datasets, Borderline-SMOTE outperforms SMOTE because it improves acc À while requiring fewer synthetic samples. These results may have occurred

Comparison with machine learning methods
The DTs C4.5, NB, SVM, and AdaBoost have all been applied as classifiers in imbalanced dataset experiments. 10,35,36,56 In the experiments described in the previous subsection, we showed that combining fuzzy-SMOTE and SVM is better than SVM combined with other oversampling methods. In this section, we explore how well fuzzy-SMOTE works with other machine learning methods. The principle of NB is to calculate the posterior probability for each class of testing samples; the class with the highest posterior probability is the outcome of the prediction. C4.5 algorithm is constructed using a recursive partitioning operation that splits the examples into successive subsets based on information gain. SVM constructs a hyperplane to divide the sample space into two parts: a positive part and a negative part. AdaBoost learns a series of weak classifiers using the training dataset and then conjoins the weak classifiers to create a boosted classifier. This section compares the performances of these methods with fuzzy-SMOTE oversampling. The performances of four basic machine learning methods with fuzzy-SMOTE are presented in Tables 8-11. First, comparing the classifier performances with and without fuzzy-SMOTE, we can conclude that these four machine learning methods achieve better performance on acc À and worse performance on acc + when using fuzzy-SMOTE. The greatest improvement of acc À occurred with fuzzy-SMOTE and SVM on S10; the improvement was 39.18% compared to the basic SVM method without fuzzy-SMOTE. In  summary, fuzzy-SMOTE can improve the accuracy of classifiers on the minority class and their overall performance on G-mean, although the accuracy of the majority class is sacrificed. Comparing the machine learning methods, on the whole, NB had the worst performance. Without an oversampling method, AdaBoost outperforms SVM on acc À , F-measure, and G-mean, while when using fuzzy-SMOTE, SVM outperforms AdaBoost. set is used to train new classification models, the trained models pay more attention to the enhanced minority class. Consequently, the decision boundary of the minority class is enlarged and the classifier is no longer biased toward the majority class. The outcome is that classifiers trained with fuzzy-SMOTE classify more minority class examples correctly.