Support vector regression model with variant tolerance

Most works on Support Vector Regression (SVR) focus on kernel or loss functions, with the corresponding support vectors obtained using a fixed-radius ε -tube, affording good predictive performance on datasets. However, the fixed radius limitation prevents the adaptive selection of support vectors according to the data distribution characteristics, compromising the performance of the SVR-based methods. Therefore, this study proposes an “Alterable ε i -Support Vector Regression” ( A ε i -SVR) model by applying a novel ε , named “Alterable ε i ,” to the SVR model. Based on the data point sparsity at each location, the model solves the different ε i at the corresponding position, and thus zoom-in or zoom-out the ε -tube by changing its radius. Such a variable ε -tube strategy diminishes noise and outliers in the dataset, enhancing the prediction performance of the A ε i -SVR model. Therefore, we suggest a novel non-deterministic algorithm to iteratively solve the complex problem of optimizing ε i associated with every location. Extensive experimental results demonstrate that our approach can improve the accuracy and stability on simulated and real data compared with the baseline methods.


Introduction
Support Vector Machines (SVM) is a powerful machine learning technique based on Statistical Learning and a promising tool to overcome the function approximation problem based on the Structural Risk Minimization (SRM) principle.3][14] This regression scheme is termed Support Vector Regression (SVR) and performs better than current machine learning algorithms.SVR solves practical problems by introducing prediction equation , where the predictor predicts new observations in the feature space and is defined as , where f() is a transformation function in the covariate space.By introducing SVR kernel functions, it presents an excellent nonlinear function regression capability and therefore has been successfully applied to nonlinear prediction systems.
Over the past few decades, various studies have focused on the kernel function k x i , x j À Á = hf(x i ), f(x j )i and loss function L(y À f x ð Þ).6][17] The literature presents several kernel functions such as the random radial basis function (RRBF) kernel, 18 the composite wavelet kernel, 19 the Hermite orthogonal polynomial kernel, 20 and the robust low-rank multiple kernels. 21Through these kernel functions, the SVR model can transform the original data into the nonlinear finite-dimensional kernel space to avoid the ''Dimensional Disaster'' and improve classification accuracy and prediction.
Considering the conventional loss functions, these include Huber loss function-based SVR, 3 quadratic loss function-based LS-SVR, 22 Maximum Likelihood Optimal and Robust Support Vector Regression model, 23 and L 1 -norm SVR. 24otivated by e-SVR, Pritam 25 introduced an e À PSVR model utilizing an e-penalty loss function that offers different penalty rates to the data points depending upon whether they lie inside or outside the e-tube.Moreover, Gupta and Gupta 26 proposed the asymmetric v-twin support vector regression (Asy-v-TSVR) that finds two non-parallel hyperplanes to construct the corresponding decision functions given two different e-insensitive loss functions, which are the lower and upper bound functions.Besides, Balasundaram and Meena 27 presented an e-insensitive asymmetric Huber function-based e-AHSVR model, where an e-insensitive loss function is integrated with the Huber loss function to compensate for the high complexity of the latter.Cheng and Lu 28 developed an e-insensitive square loss function-based Bayesian e-SVR model that adopts the minimizing structural risk principle through the e-insensitive square loss function while provides point-by-point probability predictions and allows determining the optimal hyperparameters by maximizing the Bayesian model evidence.
Most innovations on loss functions rely on the e-insensitive loss functions because, among all points in the training set, the e-insensitive loss function ignores the points inside the e-tube.Therefore, the prediction model is essentially determined by those support factors located at the edges or outside the tube, enhancing the generalization ability of e-SVR.Here, the e-insensitive loss function 1 is given by: where e .0. Figure 1 shows some import loss function.
Employing e in the e-SVR enhances robustness to errors with the error tolerance corresponding to each position being e.Such an approach allows the solution vector of the e-SVR model to become sparse.
However, these new loss functions only focus on the e-tube radius value or optimizing the hyper-parameter to minimize the empirical risk.However, when complex datasets are involved, a single value of e cannot obtain the support vectors at different locations of highdimensional data.Indeed, an e-tube with a fixed radius limits the choice of support vectors for each position, prohibiting it from utilizing the full information of the training set.
To address the above challenges, this work proposes the'Alterable e i -Support Vector Regression' (Ae i -SVR) based on an alterable e-insensitive loss function.Similar to the standard e-SVR, the Ae i -SVR exploits the loss function computed by the Alterable e i to measure the empirical risk with a regularization term 1 2 w T w.This loss function is given by: where e i .0, i = 1,2,.,n.The value of Alterable-e i depends on the distribution sparsity of the data points' location, allowing Ae i -SVR to capture support vectors for each position adaptively.Figure 2 illustrates proposed e i loss function with different value of e.
The e-tube constraints on standard e-SVR and Ae i -SVR are respectively.Hence, this paper introduces the concept of adaptive e i , which affords Ae i -SVR to exploit the information provided by the training set and thus adapt better to the data distribution characteristics and improve the prediction effect.Although the kernel function can map the data to a high-dimensional Hilbert space based on statistical learning and effectively dealing with high-dimensional data, the hyper-plane is not flexible enough to fit every point in the space when making predictions in the high-dimensional feature space.Therefore, the hyper-surface is obtained after training the Ae i -SVR model on the training set, which affords a better prediction performance than the hyper-plane.However, the hyper-plane's overall distribution varies with the value of e i at each location, posing a large computational burden to choose and solve e i .Hence, to overcome these challenges, this paper suggests a novel non-deterministic algorithm for determining e i .
Specifically, the proposed SVR model improves the previously discussed methods, including the loss function (standard e-SVR expression) and the kernel function (RBF kernel function) by introducing many penalty values e i through an iterative process, providing a variable e-tube.Opposing the standard e-tube, the variable e-tube based Ae i -SVR model allows a different penalty depending on the data distribution that enhances our method's flexibility in removing outliers, which in turn mitigates the noise interference on the prediction curve, and finally achieves better prediction results.
The remainder of this paper is organized as follows.Section 2 compares the proposed Ae i -SVR model against the e-SVR-based model for both linear and nonlinear scenarios.Section 3 derives Ae i -SVR, and Section 4 introduces a novel iterative fluctuation selfselection algorithm.Sections 5 and 6 present and discuss the key comparisons based on real-world benchmarks, and finally, Section 7 concludes this work and provides some future research directions.

Theory of alterable e i -support vector regression
This section shortly reviews the standard e-Support vector regression models and compares them with the Alterable Ae i -SVR in both linear and nonlinear scenarios.Given a training set T= x i ,y i ð Þ;x i 2R n ,y i 2R,i=1,2,...,n f g , C is a pre-specified value, b is the offset item, and variables that represent the upper and lower constraints on the system outputs.

Linear support vector regression model
To estimate the linear function f x ð Þ = v T x + b, where v is the weight vector and b is the offset term, the standard e-Support Vector Regression minimizes: subject to The Ae i -SVR model minimizes subject to vector machine for each position by iteratively obtaining the distinct e values per position.When the location with a dense data point distribution is involved, the Ae i -SVR model increases the number of support vectors at that location by shortening the e-tube radius.
On the contrary, at a location with a sparse data point distribution, our method increases the e-tube radius, decreasing the number of support vectors.This way, the hyper-plane orientation and displacement can be changed to fit the data points better and exhibit better predictive performance.

Non-linear support vector regression model
To estimate the linear function , where fðÞ : R n !H is a non-linear mapping from the input space to higher dimensional Hilbert space, the standard e-Support Vector Regression minimizes subject to The Ae i -SVR model minimizes subject to The geometric interpretation of the nonlinear SVR is presented in Figure 4(a), highlighting that e-SVR performs well on some data points but poorly on others.This is probably because, in this model, all data points are treated equally and have the same tolerance due to the limitation of the fixed-radius e-tube.Nevertheless, real data sets tend to be unevenly distributed, with both sparsely and densely distributed data points.Limiting the fixed radius prevents adaptively selecting the support vectors according to the distribution characteristics of the data, compromising the predictive performance of e-SVR.
Considering Ae i -SVR, in Figure 4(b), our model distinguishes the locations with sparse and dense data point distribution and calculates the distinct tolerances at the different positions so that the e-tube radius changes depending on the location.Moreover, changing the e-tube radius allows the prediction curve to be more flexible when dealing with datasets, thus producing better prediction results.It should be noted that this section presents only the schematic diagram of Ae i -SVR, and its performance on fitting the real data set is evaluated in Section 5.

Solution of alterable e i -support vector regression
Given a training set T= x i ,y i ð Þ:x i 2R n ,y i 2R,i=1,2,..., m,v2R n is the weight vector, b is a real constant, and ables that represent the upper and lower bound constraints of the system output, and C .0 is a pre-specified value.To solve equation ( 4), we formulate Lagrange as follows: (a) (b) where a + i 50 and a À i 50 (i = 1,2,.,m)are the Lagrange multipliers.
According to the above KKT conditions, the Wolfe dual of the primal problem equation ( 4) can be obtained as follows: subject to If there is a nonlinear relationship between y and x, then we can linearize it by mapping x to a higher dimensional feature space 29 through the mapping relationship f : R n !H.The estimated regressor is given by: Equation ( 13) is rewritten as subject to According to the Mercer condition, f x i ð Þf x j À Á can convert to a positive definite kernel k x i , x j À Á : Given the excellent fitting properties of the radial basis function (RBF), we employ it to define the kernel function k x i , x j À Á .An RBF kernel function can be expressed as: This transformation reduces the computational complexity of the optimization problems, and therefore the problem can be reformulated by equation ( 16) as follows: The above formula in the given computer calculation form becomes: Wei and He H ÀH Equation ( 21) is the Linear Programing Problem (LPP), where a À i and a + i denote the Lagrange multipliers estimated when the condition e i and the constant C are given.Here, we use the quadratic programing package in Python to solve a À i and a + i .Using equation ( 14), the Ae i -SVR function can be represented as: It is evident that the influence of e on the support vector has been transformed into the influence on the value of a À i À a + i À Á .Concerning the training set, the samples in the sample set S= x i ,y i ð Þ:x i 2R n ,y i 2R,i=1,2,...,s f g that satisfy the condition a À i Àa + i À Á 6 ¼0 are used as support vectors for the SVR and must locate on the edges or outside the tube.Here, we alter e i at each position in the training set and then change the number of support vectors for the counterpart location.In this case, given the training set, the adaptive e i is computed through the iterative algorithm (Algorithm 1) to determine the best support vectors at every position.Based on the KKT condition, each input dimension b is described as: Finally, we take the average value of b obtained by all the support vectors and obtain:

Iterative fluctuation self-selection algorithm
The developed framework comprises three steps.First, the data is divided into K subsets, and we randomly select K-1 subsets as training samples and the rest are the test samples, while the kernel parameter s, adaptive e i , and constant C are given.The second step determines the starting position of the iteration by introducing the standard e-SVR, after which e i = y i À f x i ð Þ is calculated.In this case, data points outside the tube should satisfy e i 5e + j + .Here F is the fluctuation coefficient, which provides a fluctuation range for the e value and increases the randomness of each e value to prevent overfitting.Moreover, we set a criterion Q value (Q = 0.1), when e i =y i j j.Q, and e i is considered as the corresponding e i for Ae i -SVR.After screening, the E original = e 1 , e 2 , Á Á Á , e i ð Þcan be obtained.The last step is to acquire the optimal E best = e best 1 ,e best 2 , ÁÁÁ,e best i À Á through the iterative fluctuation self-selection algorithm, which is summarized in Algorithm 1.
In Algorithm 1, MAE is used as the reliability index, and the fluctuation F 0 \ F \ 1 ð Þ is exploited to improve the model's generalization ability.By appropriately setting F, E UP , and E dow are generated, and then we find the best value E best among E UP , E MD , and E dow .If the F value is set too large, we may miss a satisfactory E best , and if it is set too small, the iteration process will last longer.
Figure 5 illustrates the relationship between the reliability index and the number of algorithm calls, revealing that as the number of iterations increases, the prediction accuracy of the Ae i -SVR model improves.
Algorithm 1: The iterative fluctuation self-selection algorithm Moreover, the following three common criteria are presented to verify the effectiveness of the Ae i -SVR model.

All datasets are described below:
All datasets contain 400 training and 100 test samples.Table 1 reports the performance of the Ae i -SVR, e-SVR, and LS-SVR models under several evaluation metrics.SSE/SST, RMSE, and MAE were calculated on seven simulation data sets with different data distribution characteristics, and the optimal parameter settings for e-SVR per dataset are listed in Table 1.To compare the performance more objectively, the values of C and s of the competitor models are aligned with the e-SVR.Table 1 highlights that without changing the parameters, the performance of the proposed Ae i -SVR model presents a significant improvement in the SSE/SST, RMSE, and MAE metrics compared with the e-SVR and LS-SVR models.Specifically, Ae i -SVR obtains the lowest RMSE value (0.103) in the artificial Dataset 1, inferring that the proposed Ae i -SVR model consistently outperforms the e-SVR and LS-SVR models, demonstrating an appealing generalization ability.Next, we calculated the reduction in SSE/SST, RMSE, and MAE for Ae i -SVR relative to the e-SVR and LS-SVR models.The corresponding results are illustrated in Figure 6, highlighting that the proposed model has an enhanced predictive performance compared to the competitor models, with the most obvious improvement being in the SSE/SST index.In Dataset 1, the SSE/SST value of Ae i -SVR is 0.074, achieving an improvement of 7.5% and 8.6% relative to e-SVR and LS-SVR, respectively.Additionally, Figure 6 reveals that the proposed model affords an improved prediction performance over the competitor models, reflecting the importance of ''Alterable e i .''Therefore, the experiments imply that assigning variable e depending on the data distribution characteristics is superior to treating each data point in the training set equally.

Real data application
We also apply our proposed method to real benchmark datasets, namely Traizines, Servo, NO 2 , Chwirut, Auto Mpg, Nelson, and Boston Housing.These datasets were downloaded from the UCI repository 33 (https://archive.ics.uci.edu/address) and NLSRD (https://www.itl.nist.gov/div898/strd/nls/nls_main.shtml/address), and are commonly used to evaluate regression methods.Further information on the seven datasets is reported in Table 2.In this trial, we challenged the proposed SVR against some existing SVR models, such as the e-SVR model, the Huber SVR model, and the e-PSVR.The eigenvectors in the range ½0, 1 were normalized for all datasets, and a 10-fold cross-validation method was used (Figure 7).The performance criterion for e-PSVR is the result obtained with the same dataset and processing. 25In Figure 8, we present the iterative results of three real benchmark datasets and highlight that Algorithm 1 convergences after about 30 iterations.
To comprehensively verify the effectiveness of the proposed Ae i -SVR model, this section evaluates our method based on the SSE/SST, RMSE, and MAE metrics and compares it against the current advanced methods presented in Tables 3-6.The experiments are conducted on seven real data sets, where 90% of the data are used for training and 10% for testing.During the trials, we adopted the original parameter setup as suggested by each paper.Table 3 reports the prediction performance of all competitor models, highlighting that Ae i -SVR yields the lowest RMSE (2.505) followed by e-PSVR (2.553) in the Auto Mpg dataset while employing the same parameters C and sigma.Additionally, the prediction performance of Ae i -SVR is superior to e-PSVR (2.553), Huber SVR (2.566), and e-SVR (2.567).Besides, the SSE/SST metric of Ae i -SVR shows a noticeable improvement in the Boston Housing dataset (0.123), which is much lower than e-SVR (0.229) and e-PSVR (0.228).After analyzing all three evaluation metrics, we conclude that Ae i -SVR attains better performance in the seven real datasets than SVR, where the e-tube has a single fixed value or two different values for inside and outside.Furthermore, although the Ae i -SVR model performs better on all datasets, the performance gain over the competitor methods varies.Therefore, Tables 4-6 calculate our model's improved prediction performance for each evaluation index and compare it against the other models.
Tables 4-6 highlight that from the three evaluation metrics, the developed model attains the highest performance on the Boston Housing dataset, affording an improvement of 46.29%, 7.64%, and 15.45% on the SSE/SST, RMSE, and MAE metrics, respectively.This is because our model focuses on solving the problem of inconsistent data distribution in real data, which is the case for the Boston Housing dataset.It should be noted that Ae i -SVR performs well on the other datasets, with the improvement being around 1%, but in some cases exceeds 10% compared to existing methods.Next, we randomly select one independent variable from the multiple independent variables as the x-axis and the dependent variable as the y-axis to demonstrate the model's fit to the data sets on a two-dimensional plane.
Figures 9-11 reveal that the Ae i -SVR model outperforms e-SVR on all data sets, with the Boston Housing dataset presenting the most pronounced prediction gap.Particularly, many data points are located in the lower half, and the distribution is relatively dense when the Indus value is 0.2.When the Indus value is 0.5, there are more data points in the upper half, and the distribution is sparse, with e-SVR being unable to adapt to the data distribution characteristics because the locations  where the data are sparse and dense are treated equally.Consequently, the model's fitting effect on Boston Housing at positions 0.2 and 0.5 is poor.However, since Ae i -SVR can vary the e-tube radius depending on the sparsity levels, it exhibits a more flexible predictive performance, allowing it to fit data points better at both sparse and dense locations.Furthermore, the Chwirut dataset is a two-dimensional dataset with low dimensionality and a relatively uniform data point distribution, evenly distributed at each location.In this dataset, Ae i -SVR and e-SVR have a relatively satisfactory fit to this dataset, with Figure 12 highlighting the minor fitting difference between the two lines due to the thinness of e-tube.
The experimental results demonstrate that Ae i -SVR outperforms the other SVR models because it has different penalty values for each data point in the dataset, which can be changed during the iteration process  based on the sample information, aiming to achieve better prediction.

Parameter sensitivity study
This experiment uses a 10-fold cross-validation sampling strategy to evaluate the prediction performance of Ae i -SVR under different parameter setups.Six independent experiments were conducted to investigate the sensitivity of the Ae i -SVR and e-SVR models while altering the c and gamma values on the Auto Mpg, Boston Housing, and NO 2 data sets.The range of the c and gamma values is chosen adjacent to the optimal value, and only one parameter is changed at a time, with the other taking the optimal value.The corresponding results are illustrated in Figure 13.
For this experiment, we adopt a 10-fold crossvalidation strategy, and in each fold, the average value is considered the final evaluation index after being evaluated on the test set.As observed, the prediction performance of the Ae i -SVR model is better than the e-SVR model under different C and sigma values.Compared with the standard e-SVR model, Ae i -SVR is less sensitive to the c-value as it maintains a good prediction effect even for a large c-value range.The sensitivity of Ae i -SVR to sigma is the same as for the SVR model.If for the standard SVR model the time complexity is O( Á ), while the time complexity of the Ae i -SVR model proposed in this paper is MNO( Á ), where M is the number of iterations set in Algorithm 1 and N is the number of samples for training.When the sample size is large, it will increase the computation time significantly.Therefore, when selecting the Ae i -SVR model, we can set a large search step for the C value and a standard search step   for sigma.Through this setting, we offset some of the time spent to iteratively solve for e i , thus balancing training time and prediction performance.

Conclusion and future works
This paper proposes a novel ''Alterable-e i '' model, which is adaptive to the data set distribution characteristics by determining e i at each position.Experimental results on several simulated and real data sets demonstrate that Alterable-e i has an obvious performance   improvement over the state-of-the-art models due to the non-fixed e-tube radius.Specifically, the proposed model has a significant advantage over current models for certain datasets.Overall, the proposed Ae i -SVR model's performance outperforms the standard SVR model for the following reasons.First, the developed iterative fluctuation selfselection algorithm can find the appropriate e penalty parameter for each location.Second, Ae i contributes to the removal of noise and outliers in the dataset through a variable e-tube formed by the free parameters.Third, Ae i successfully extracts important data information through the free parameters.
Moreover, flexible bounds can lead to better prediction results in classification learning tasks.This property enables the Alterable-e i proposed for regression tasks to distinguish data points with inconsistent distribution sparsity, which is quite coherent with the requirements of classifying data.Therefore, in the future, we aim to extend Alterable-e i to solve classification problems, such as image classification.

Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Figure 1 .
Figure 1.Loss function corresponding to several different models.

Figure 2 .
Figure 2. The proposed alterable e i loss function with different value of e.

Figure 3 Figure 3 .
Figure 3 illustrates an intuitive geometric interpretation of the linear SVR.Normally, the standard e-SVR model has a tube with a fixed e radius, which, although avoids over-fitting, it cannot obtain all support vectors.Indeed, the fixed radius e-tube provides the same tolerance for all data points, either closer or further away from the tube wall, enhancing the e-SVR model's insensitivity to outliers.To overcome this problem, we propose the Alterable-e i -SVR model, which, based on the training data set, captures the appropriate support

Figure 5 .
Figure 5. Summary of the steps performed by the iterative fluctuation self-selection algorithm.

Figure 7 .
Figure 7. Convergence curves of algorithm 1 for real world datasets versus iteration times.

Figure 8 .
Figure 8.Comparison of models fitting in the AutoMpg dataset.

Figure 9 .
Figure 9.Comparison of models fitting in the Boston Housing dataset.

Figure 10 .
Figure 10.Comparison of models fitting in the NO 2 dataset.

Figure 11 .
Figure 11.Comparison of models fitting in the Chwirut dataset.

Figure 13 .
Figure 13.Performance evaluation of Ae i -SVR and e-SVR on three datasets.First row (a-c): the testing accuracy of two models with different sigma.Second row (d-f): the training time of the two models with different C.
E = Eup , f (x) E = Emd , and f (x) E = Edow according to Eq 12 Calculate the training MAE E = Eup , MAE E = Emd and MAE E = Edow 13 if MAE E = Edow \ MAE E = Eup andMAE E = Edow \ MAE E = Emd then 14 E best = E dow 15 else if MAE E =Edow 5MAE E = Eup and MAE E= Eup \ MAE E = Emd then 16 E best = E up

Table 1 .
Performance comparison of Ae i -SVR, e-SVR, and LS-SVR models on the lation datasets.

Table 2 .
Description of seven real-world benchmark datasets.

Table 3 .
Performance comparison of Ae i -SVR, e-SVR, Huber SVR, e-PSVR, and Ae i -SVR models on real world benchmark datasets.

Table 4 .
Percentage reduction in Ae i -SVR compared to other models on SSE/SST.

Table 5 .
Percentage reduction in Ae i -SVR compared to other models on RMSE.

Table 6 .
Percentage reduction in Ae i -SVR compared to other models on MAE.