Multi-modal prediction of breast cancer using particle swarm optimization with non-dominating sorting

Cancer is enlisted as the second leading reason for death across the world wherein almost one person out of six dies of cancer. Breast cancer is one of the most common forms of cancer predominant in women having the second highest mortality rate in the world. Various scientific studies have been conducted to combat this disease, and machine learning approaches have been an extremely popular choice. Particle swarm optimization has been identified as one of the most powerful and efficient technique for the diagnosis of breast cancer guiding physicians towards timely and accurate treatment. It is also pertinent to mention that multi-modal prediction methods are used to make decisions depending upon different scenarios and aspects whereas the non-dominating sorting feature is useful to sort different objects based on differing requirements. The main novelty of this work is multi-modal prediction algorithm for breast cancer prediction is proposed. The work encompasses the use of particle swarm optimization, non-dominating sorting and multi-classifier techniques, namely, k-nearest neighbour method, fast decision tree and kernel density estimation. Finally, Bayes’ theorem is implemented for revising the results to achieve optimum accuracy in the breast cancer prediction. The proposed particle swarm optimization and non-domination sorting with classifier technique model helps to select the most significant features relevant to breast cancer predictions. The selected features design the objective of the problem model. The proposed model is implemented on the WBCD and WDBC breast cancer data sets publicly available from the UCI machine learning data repository. The metrics considered are sensitivity, specificity, accuracy and time complexity. The experimental results of the study using measures such as sensitivity, specificity, accuracy and time complexity. The experimental results of the study are evaluated against the state-of-the-art algorithms, namely, genetic algorithm kernel density estimation and particle swarm optimization kernel density estimation wherein the results justify the superiority of the proposed model.


Introduction
Breast cancer is one of the leading causes of death among women in the world today. [1][2][3][4] It is one of the most invasive types of cancer, and statistics reveal that every year approximately 40,000 women die from breast cancer in the United States alone. In India, as per the reports of 2012, around 1,44,937 women had breast cancer and out of which 70,218 of them succumbed to death. 4 It is thus an enormously serious health concern for women worldwide. Breast cancer develops from breast tissue results in lump in the breast, changes in the shape of the breast, breast skin abnormalities, fluids from the nipple and many other repercussions. The accuracy in the prediction of breast cancer plays a vital role as it leads to timely and conclusive decision-making providing opportunity to avail the sophisticated modern day healthcare treatments. With the ever increasing number of techniques introduced for the cancer diagnosis in the present day and age, data mining and artificial intelligence (AI)-based diagnosis hold a popular and significant choice among researchers for the prediction and discovery of tissues affected by cancer. The data mining techniques are used to detect large, heterogeneous, time series and complex information from data sets aiding various diagnostic and therapeutic services in the health care industry. The data sets in these approaches are fragmented, distributed and then analysed to yield predictive results for accurate diagnosis.
AI is similarly a widely used technique for the detection and diagnosis of various types of cancer. AI tools are mostly used for classification and clustering of gene data of malignant cells, mutated cells and canceraffected tissues and such approaches play significant role in cancer treatment as indicated by Agrawal and Agarwal. 5 Beni and Wang 6 mentioned swarm intelligence algorithm to be used for forecasting scenarios as well as monitoring and diagnosis of cancer. Various machine learning techniques and optimization methods integrated with swarm intelligence have been successfully used for predicting and diagnosing the canceraffected tissues. This study proposes a new breast cancer determination and diagnosing method based on swarm intelligence for the detection of breast cancer.
Jele et al. 7 and Teague et al. 8 proposed a fine needle technique for aspiration biopsy (FNA) which is a simple and quick procedure for removing breast lesion by a fine needle. Chen et al. 9 mentioned the fact that machine learning characteristics and techniques are needed for diagnosis and also for finding any possible errors made by physicians due to time-constrained examinations. The study also informs that classification and machine learning methods work correctly when probability densities are known. In case density of data is not available estimations could be performed with the help of kernel density estimation (KDE). Sheikhpour et al. 13 cited kernel density as a widely useful technique for data estimation. Kernel estimation depends on selection of the most relevant features which further improves the performance indicator. The main advantage of KDE used to increase the prediction accuracy with the help of class conditional densities of data using the Naı¨ve Bayes (NB) classifier.
It is acknowledged that feature selection is crucial yet has associated challenges of consuming fair amount of space and search time. Computation thus plays a very important part in the performance of a feature selection method. Bolasn-Canedo et al. 10 classified feature selection into three types, namely, wrapper, embedded and filters. [11][12][13] The analysis of these three feature selection techniques identifies wrapper capable of providing the highest accuracy, as mentioned by Lal et al. 14 In consistence with the same computational aspect, Wang et al. 15 proposed particle swarm optimization (PSO) intelligence as a powerful computation method being applied in large number of applications. Similarly, Jia et al. 16 proposed the use of particle swarm intelligence approach, statistical and discrete high-quality feature subsets for estimations establishing the same as one of the best computing techniques predominantly used in various applications today. 17 The combination of PSO and feature selection algorithms has immense capability to produce optimized prediction results in cancer diagnosis.
This study introduces a new particle swarm optimization integrated with non-dominating object-based feature selection method (PSO-NDS). The selected features are then subjected to three classification algorithms such as k-nearest method, fast decision tree and KDE which predict breast cancer. The data sets used in the study are WBCD and WDBC data sets from publicly available from the UCI machine learning data repository. The evaluation metrics used are predictions rate, accuracy, sensitivity, specificity and time complexity. The main novelty of the present work is as follows: The user of multi-classifier which reduces the possibility of error in the results generated. The implementation of PSO algorithm in association with non-dominating sorting (NDS) and multi-classifier such as KNN, decision tree and KDE for prediction of breast cancer. The use of Bayes' theorem to further revise the results and obtain the best predictions.
The subsequent research work has been arranged as follows. Section 'Feature selection methods' elaborates on the feature selection method, and section 'PSO using non-dominating method' describes classification techniques, and proposed PSO-based multi-classification model is elaborated in section 'Working and processing of PSO-NDS preprocessing'. The experimental setups of data and result comparisons are presented in section 'Results and Discussion'. This article concludes with conclusion and future directions discussed in section 'Conclusion'.

Feature selection methods
Feature selection method is used to remove irrelevant features from the data set in order to infer accurate information from the data analysis performed on the data set. For example, the biological platform for feature selection is based on gathering similar gene expressions. This section presents a brief overview on feature selections based on biological and cancer-related research works conducted. Hira and Gillies mentioned that the objective of feature selection and extraction is to avoid over-fitting of the data to continue further analysis. 18 As mentioned in the study, feature selection method is divided into three sub-methods, namely, filter, wrapper and embedded classification as shown in Figure 1.
The filter method extracts data without any prior learning. The wrapper classifier uses prior learning with evaluation and embedded method is a combination of filter feature selection and embedded classification technique. It is thus prominent that decision-making based on the most appropriate feature selection method can be a difficult task. Various feature selection methods and types of classification are shown in Table 1.
Jafari et al. 19 mentioned in their study that the T-test feature selection technique finds the maximal difference of mean with minimal variable. Hall 20 proposed correlation-based feature selection method to be used to find highly correlated data wherein each classification should be uncorrelated. Rau et al. 21 identified Bayesian network feature selection to be used in determining causal relationship with each class ensuring each class may not have any relationship. Yang et al. 22 highlighted that information gain could be used to measure common features performing comparisons with all the classes. The authors Ooi and Tan 23 proposed genetic algorithms (GAs) to be used to measure smaller set of features in order to produce the highest accuracy. Guyon et al. 24 proposed support-vector machine (SVM) technique for feature elimination and also pointed the fact that SVM classifiers could be a good choice and it omits irrelevant features using weighted approach.
Jiang et al. 25 proposed random forests for the creation of decision trees, the use of diverse samples of the original data and the use of different average algorithm for improving the accuracy. The study in Ma et al. 26 identified that least absolute shrinkage and selection operator (LASSO) method features are classified based on zero and nonzero. Anushaa and Sathiaseela 27 proposed the NLMOGA feature selection method which is based on constraint selection from a sub-population. This method involves finding the most suitable or closest set of objects from the group and uses Pareto's front method to minimize the inner classes. In this method, computation cost and time complexity are quite high. The authors Khan and Baig 28 proposed multi-object GA (non-dominated sorting genetic algorithm (NSGA-II)) to be used for resolving multiobjective feature and subset feature-based problems wherein data used for large attributes and non-relevant features are eliminated. The proposed algorithm NSGA-II features involve measurement using ID3. The experiment is conducted with the help of NDS. Two classification algorithms are used such as ID3 and Pareto's front. The NSGA-II method is applied to different applications such as salary prediction and DNA sequence, yielding immensely satisfactory accuracy (95.2) of prediction of data.
The authors Zeng et al. 29 proposed a method for optimization that uses genetic algorithm kernel density estimation (GA-KDE) named support tracker machines (STuMs). The method sweeps out the irrelevant information yielding better accuracy. Naseer et al. 30 proposed a hybrid approach using ant colony optimization (ACO) and multi-classification techniques. This hybrid approach used a filter-based classifier to enhance the accuracy prediction. The experiment was conducted using four different classifiers over 11 data sets and then the hybrid method results were compared to the PSO and GA results. The evaluation results appear to be similar to the PSO method with 95.27% and 95.99% accuracy, respectively. Sakri et al. 31 proposed a PSO-feature selection method which is based on three classifiers such as k-nearest neighbour (KNN), NB and fast decision tree. This method had four processing steps such as data acquisition, data preprocessing, classification with and without feature selection, and finally the comparative analysis. The method yielded lesser accuracy in its prediction compared to the other methods. Sheikhpour et al. 13 proposed a particle swarm optimization kernel density estimation (PSO-KDE) for breast cancer detection. The main intent of this method was to increase the accuracy, thereby reducing errors. The method produced optimal accuracy level using only one classifier which had possibilities of generating erroneous predictions in comparison to other popular methods. It is thus observed from the related research work that majority of the studies conducted using the PSO feature selection method produced optimal results compared to the other methods yet had associated challenges pertaining to its application on only linear methods. On the contrary, it is expected that prediction methods consider all dimensions of predictions. The KDE is a readily available package which can be used easily for classification and hence quite popular. Also the KDE works extremely accurately in case of bimodal or highly skewed distributions, especially used for estimates in discriminant analysis. The predictions are also more accurate as the misclassification rates are reduced.
The authors Reddy et al. 32 proposed a novel approach -deep neural network and support value (DNNS) for the prediction of breast cancer using largescale data sets form a reputed hospital in India. The accuracy, precision and recall values resulting from the proposed methodology were compared with the state of the art techniques. The accuracy (97.21%) of the DNNS-based approach, although was better than the other traditional approaches but was not extremely promising. Ramadan et al. 33 proposed a computer-aided diagnosis (CAD) system for the detection of breast cancer. The framework involved the use of mammogram data in which were classified for the purpose of disease prediction. The study highlighted the various features and factors contributing towards detection of breast cancer using CAD systems. The comparisons of the various CAD methods are conducted and the receiver operating characteristic (ROC) is calculated. However, the CAD results have not been found reliable enough to be confidently considered as a standalone technique for breast cancer diagnosis. The study finally indicates the need of deep learning and similar approaches for enhancing the performances of CAD systems in order to generate more accurate detection results.
The study by Mohammed et al. 34 analysed breast cancer data using machine learning techniques, namely, KNN, SVMs, NB and various other classification methods. The results of the classifier were validated and compared using two popular data sets -Wisconsin Breast Cancer (WBC) and the publicly available Breast Cancer data set. The study primarily emphasized on issues pertaining to handling of imbalanced data sets and resampling of the data was performed to resolve the issues. To evaluate the approach, 10-fold crossvalidation was also performed and the efficiency of the classifiers was gauged using the true positive and false positive values, ROC, standard deviation and accuracy metrics. The comparative analysis identified sequential minimal optimization (SMO) as a better classifier after conducting data resampling on the WBC data set. In case of Breast cancer data set, the J48 algorithm generated better results after resampling was conducted. The study by Hou et al. 35 performed the evaluation of four machine learning algorithm in predicting breast cancer among Chinese women. The data set included breast cancer cases and healthy patient data, considered as control for the modelling, training and testing of the machine learning models. The metrics used for evaluation were namely -area under the curve (AUC), sensitivity, specificity and accuracy. The results justified the superiority of XGBoost algorithm in comparison to the other approaches. The various classification techniques, namely, PSO, GA, ACO and ant colony optimization classifier ensemble (ACO-CE) techniques and the relevant classifier accuracy results are depicted in Table 2. 13,30 PSO using non-dominating method The main drawbacks of the previous methods as shown in Table 2 are that, it does not use multi-classifiers which enhances the possibility of high error and does not include re-verification of predicted results. The study consists of PSO, NDS and multi-classifier techniques such as k-nearest method, fast decision tree and KDE, and finally Bayes' theorem for revising the predicted results with the help of degree of belief for accuracy. The main objective of the study is to increase the accuracy, reducing the errors and revising or rechecking of the predicted results.
The multi-classification technique is used to increase the accuracy in different directions. The non-dominating method is used to rank the selection features based on good points. Bayes' theorem is used to verify the results with a high degree of belief. The proposed model of PSO-NDS is depicted in Figure 2. The PSO method has three parts to it, such as the input, processing and the output.
The proposed method utilizes a non-linear method, and hence training samples ((x i ,y i ), i = 1) from the input are considered high-dimensional feature space and the mapping function (F) is called the kernel function (k). The inner products use the kernel function. The non-linear approach uses the dual Lagrangian LD (a) Equation (2) is used to train the data in multidirectional ways and is used in feature selection and multi-classifications.

PSO
PSO is used to find the objectives, position and velocity based on the timing t. The precious objects (cancer cell)  .., v n }. The optimal solution searches for objects or particles and changes the position based on two factors-starting position of the particle and best position of the particle. The velocity and position of the particle are calculated using the following equation where v is the initial weight, v is the velocity, X is the position, C 1 and C 2 are the learning factors, PB is the personal best performance -best performance of the group. The basic processing steps of PSO algorithm are initialization, evaluation, finding the position of the particle, finding the best, and updating the velocity and position stopping and initialization the evaluation.

NDS
Population and objective of the function are calculated based on non-dominating objects. 28 The NDS technique is used to evaluate each subset of the data. Based on the features, if the subset of data is tidy, it is passed to evaluate and each subset data is assigned fitness values. In addition, an initial distance is used to calculate the feature subset. The crowding distance is calculated to find out how close the objects are to their neighbours. The large average data or the resultant cording distance value produces enhanced diversity. The population selection is based on ranking and crowding distances. Hence, the decision is taken based on the crowding distance. The selected distance generates (gen) crossover and mutation operators. The objective of the function with current cancer objects are sorted based on non-domination with N individuals' selection. N is the population size or objective function size. The final population size depends on the crowd distance pertaining to the feature subset in the cancer cell. The overall operation and flow representation of NDS techniques are shown in Figure 3. The sharing of distance and non-dominating object sharing are derived in equation (5) where d i,j is the distance between two individual objects, and a share is the distance allowed between two objects.

Multi-classification
The multi-classification method, also termed as multinomial classification is used to test the accuracy on a given data set with different identifications and labels. The multi-classifier uses different classifiers wherein each classifier uses different features for its prediction. Each feature of the data sets usually has different instances or sub-features. When multi-classifiers are used, these sub-features are also analysed which enhances the prediction results and also the prediction rate. The advantage of using a multi-classifier thus lies in the improvement of prediction rate and prediction results. However, it also has its associated challenges pertaining to increase in time complexity involved in the process of analysing the various features at different levels.
The classification is based on various instances and steps involved in multi-classification are as follows: Step 1: loading the test data.
Step 2: labelling the data set into test and training.
Step 3: training KNN classifier, fast decision classifier and kernel density classifier.
Step 4: using the classification to predict the test data.
Step 5: measuring the accuracy.
KNN classifier. KNN algorithm is a non-parametric classification and regression technique mentioned by Sakri et al. 30 In this study, k-nearest training feature space is used as input. The output is the most common nearest k-positive integer. If k = 1, it is assigned to nearest neighbours.
Fast decision tree. Fast decision tree classification is used for decision-making from large data sets as mentioned by Manapragada et al. 36 The algorithms help in decision-making 37-46 without compromising on the accuracy and to top it up increases the space complexity. This study implements fast decision tree with conditional independence. The conditional independency information gaining (IG) is defined as where S is a set of training instance, X is the attributes and x is the value, E is an entropy, S x is a subset of instance and similarity entropy id defined as P s P s (C i ) percentage of instance belongs to C i and |C|.
Kernel density classifier. Sheikhpour et al. 13 proposed the kernel density method which focuses on identifying past conditions similar to the prediction time. This method directly estimates the density of data without any assumption. Considering {x t } as independent having ddistributed training data with unknown distance P(x), wherein x is the closer point. The density is defined as where x t is the training instance, x' is the new arrival data, N is the number of instances, and h is the length of interval. This multi-classification technique increases the accuracy of the prediction as the prediction rate is measured in different scenarios. The first type of multiclassifier is KNN classifier and it classifies the affected cells or particles near to the adjacent cells. If the cells are affected, then it is considered as 1 and the surrounding cells are scanned. Similarly, all the affected cells are measured in each iteration of the scanning process. The second type of multi-classifier is the fast decision tree that increases the prediction rate in large data sets. When the data set or prediction range of the surface increases, the prediction accuracy decreases automatically. The fast decision tree increases space complexity of the prediction and gains information from the large space based on conditional independency of objects. Each positive cell is predicted using trained data and subsets of affected cells. Simultaneously when space of the prediction is increases, thereby false rate of the predictions using various instances reduces. The third classifier is the density classifier and this classifier classifies the cancer cells based on the past training experience. Once the affected cells are predicted, the corresponding density of the cells is measured without any assumptions. These three types of multi-classifiers thus contribute immensely in increasing the prediction rate and accuracy. The most important advantage in using the multi-classifier lies in the increase of prediction rate and prediction accuracy in comparison to the traditional methods. On the contrary, the disadvantages include the increase in time complexity in the prediction and analysis of the features.

Bayes' theorem
The main usage of Bayes' theorem is updating the prediction probability and increasing the belief rate using predicted results of affected cells. Mathematically, Bayes' theorem is represented as follows For proposition A and evidence B, P (A), the prior, is the initial degree of belief in A. The quotient P (B |A)/ P(B) represents the support B provides for A. P (A | B), the posterior, is the degree of belief having accounted for B.
In this work, A and B are correlated incorporating accuracy generated due to the implementation of multiclassifier and PSO. The error is also calculated with help of the following formula 13 Error rate = Error rate is the error rate for each data set.
In the proposed work, A is considered as the prediction evidence of breast cancer, and it also considered as the high-level belief. B is the corresponding evidence of the prediction.
Working and processing of PSO-NDS preprocessing WBCD 47 and WDBC 48,49 data sets are used in this study for learning and analysis of the data set. WDBC presents 569 instances with 30 features and WBCD presents 699 instances. The data are collected from various cases, images of human breast tissues which are digitalized. The various parameters are analysed using predefined parameters used such as velocity, starting point and objective function. The analysis and preprocessing technique followed is based on PSO-KDE approach 13 described in the following section.

Processing of PSO-NDS
The proposed multi-model consists of non-dominated sorting, PSO and multi-classification. Bayes' theorem is used for analysing the various factors. The processing steps of PSO-NDS are mentioned as follows: Step 1: parameters of PSO and instance of particles are first conserved.
Step 2: particle position and velocity of search space are initiated.
Step 3: object function of particle is calculated using PSO-NDS.
Step 4: using PSO-NDS, the various objective function is updated and individual performance of objects is updated.
Step 5: velocity and position of the particles are updated.
Step 6: if the desired number of iterations is not reached, the return to Step 3.
Step 7: the features and accuracy are presented.
The algorithm of PSO using NDS with multi features is shown in Algorithm 1.
The PSO-NDS with multi-classification simulation determines the accuracy and Bayes' theorem is used to make the relevant factors to maximize the objective function and features subset. This helps to maximize the accuracy of the prediction enhancing the accuracy rate. In PSO using NDS, the particles pbest are not compared with its potential offsprings, rather the pbest of the entire population of N particles and the N of these particles are combined to form a population of 2N particles temporarily. On these 2N particles, the non-dominated sorting method is applied which sorts the entire population into non-domination fronts. Here, the first front is a non-dominant set in the current population and the second front is dominated by individuals in the first one and process continues in the similar pattern. The individuals in each front are assigned a fitness value based on this source front. As an example, the individuals of the first front are assigned a fitness value of 1, individuals of the second front are assigned a value of 2 and this trend continues. Apart from the fitness value, the parameters -crowding distance and niche count -are computed for each individual to obtain the best distribution of non-dominant solutions. The overall summarizing step of the proposed method is mentioned as follows: Step 1: initialize the parameter.
Step 2: initialize the population.
Step 3: objective function is calculated using PSO-NDS.
Step 4: selection, mutation and crossover selected over particle. Update the velocity and population t = t + 1; Provide the best objective function NDS: non-dominating sorting.
Step 5: accuracy is calculated using multiclassification techniques.
Step 7: satisfied objective function is generated.

Results and discussion
The proposed PSO-NDS model was used for the prediction of breast cancer. For experimentation purposes, WBCD and WDBC data sets were used. For the performance evaluation purpose, various parameters such as number of iterations, velocity and initial position of particles are used and considered for experimentation. The WBCD and WDCD data sets are randomly divided into subsets and used for training. The number of iterations is set as 20. The experiment is conducted on the above parameters with 20 iterations. The justification on the superiority of PSO-non-dominating method against the traditional methods is prominent from the generated results. The PSO-non-dominating method produces better results in identifying the benign and malignant features or sub-features. The predictions are also more accurate and computed considering each feature as per the dominant properties. Finally, the computation of the study results of PSO-NDS is compared with PSO-KDE 13 and GA-KDE. 29 The experiment is validated using accuracy, specificity and sensitivity were validated and performed. The performance of PSO-NDS, PSO-KDE and GA-KDE experiment is analysed on training data sets to gain the ideal feature subset and classification. The accuracy of PSO-KDE and GA-KDE were almost similar but yielded inferior accuracies when compared to the proposed methods. Razieh et al. 13 proposed a PSO-KDE for breast cancer detection. The main intent of this method was to increase the accuracy, thereby reducing errors. The method produced optimal accuracy level using only one classifier which had possibilities of generating erroneous predictions in comparison to other popular methods. It is thus observed from the related research work that majority of the studies conducted using the PSO feature selection method produced optimal results compared to the other methods.
However, there exist associated challenges pertaining to its application on linear methods. In case of GA-KDE, the GA and non-parametric KDE-based classifier are hybridized to compute the optimal bandwidth and also the subset of features.
The experiment is conducted using various trails and average accuracy is calculated. Table 3 and Figure 4 show the accuracy of PSO-NDS in comparison to other stated of the art methods with relevant features mentioned.
The PSO-NDS achieved 98.28% and 98.8% accuracy for features between 5-6 and 15-20, respectively, as shown in the figure for WBCD and WDCD data sets. The proposed work obviously improved the accuracy compared to other studies conducted. The accuracy of the prediction of WBCD and WDCD data sets is generally high. But, in a real world scenario, when data sets with huge amounts of data are subjected to scanning and prediction, the prediction rate and accuracy of the prediction decrease as a natural effect. However, in the present work, the Bayes theorem is implemented to elevate the accuracy. The supporting factors and evidence are verified using revised mechanism, and the false positive prediction is easily reduced using the Bayes theorem. The increase in prediction accuracy is achieved by reducing the number of features, considering the most relevant ones in the analysis as shown in Table 4.  The average sensitivity of PSO-NDS is also compared with PSO-KDE and GA-KDE. The performance evaluations are shown in Table 4 and Figure 5. Similarly, Table 4 and Figure 6 show the comparative values of specificity wherein the performance of the proposed method proves to be better.
In Figure 5 and Table 4, the sensitivity of prediction is increased in both WBCD and WDCD data sets. In these data sets, minimum eight features are considered for predictions. But in the 12, 18 and 24 features, the false positive will increase. In Figures 4 and 5, if the features are not selected, that particular feature not used for prediction and that particular features are not affected.
In the data set used, as well as in case of real-time data, the number of the features decreases automatically and positive test cases increase with prediction rate decreasing otherwise. Figure 6 shows similar decrease in specificity (correctly generated negative). Table 5 shows the error rate of WBCD and WDCD of data set computation. The error ranges are less than 1.0. So for the predicted results, both sensitivity and specificity are achieved in the maximum 12 features of proposed work. But the number of prediction factors or features is more in the WDBC data sets, and in case of huge real data sets, the error rate may be increased.
The other factors that help to achieve the accuracy are the supporting factors such as objective function of particles, multi-classification prediction parameter and accuracy, all of which are considered for prediction of results. For the supporting rate and evidence of belief    around 10 factors are considered and based on the 10 factors, the results and prediction are supported to achieve the prediction results for breast cancer as shown in Table 6. In Bayes' theorem, the probability measures the 'degree of belief'. Also the prediction is based either on a single features, collected evidence or conditional probability which updates the belief evidence. It basically shows how a 'degree of belief' when expressed as a probability changes realistically based on the account of available evidence.

Conclusion
The proposed work emphasizes on predicting breast cancer with optimal level of performance measures using a multi-modal model. The data set used in the study is WBCD and WDCD which are popularly available for the purpose of conducting research. The proposed multi-modal classification model consists of various techniques, namely, NDS method, multiclassifier and Bayes' theorem to accurately classify the breast cancer data sets. The proposed multi-modal classification model is called PSO-NDS. The error factors were reduced using supporting factors with evidence of various factors. The proposed work is applied into an n-dimensional space. The proposed PSO-NDS model when implemented on the WBCD and WDCD data sets generated optimum (98.8%, 98.6%) level of accuracy. The sensitivity and specificity achieved were 98.8%, 97.12% and 99.8% and 98.3% which are quite promising. The future direction of research would involve prediction and detection of cancer cells using Internet of thing (IOT) devices.

Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.