Support vector regression model for flight demand forecasting

Flight demand forecasting is a particularly critical component for airline revenue management because of the direct influence on the booking limits that determine airline profits. The traditional flight demand forecasting models generally only take day of the week (DOW) and the current data collection point (DCP) adds up bookings as the model input and uses linear regression, exponential smoothing, pick-up as well as other models to predict the final bookings of flights. These models can be regarded as time series flight demand forecasting models based on the interval between the current date and departure date. They fail to consider the early bookings change features in the specific flight pre-sale period, and have weak generalization ability, at last, they will lead to poor adaptability to the random changes of flight bookings. The support vector regression (SVR) model, which is derived from machine learning, has strong adaptability to nonlinear random changes of data and can adaptively learn the random disturbances of flight bookings. In this paper, flight bookings are automatically divided into peak, medium, and off (PMO) according to the season attribute. The SVR model is trained by using the vector composed of historical flight bookings and adding up bookings of DCP in the early stage of the flight pre-sale period. Compared with the traditional models, the priori information of flight is increased. We collect 2 years of domestic route bookings data of an airline in China before COVID-19 as the training and testing datasets, and divide these data into three categories: tourism, business, and general, the numerical results show that the SVR model significantly improves the forecasting accuracy and reduces RMSE compared with the traditional models. Therefore, this study provides a better choice for flight demand forecasting.


Introduction
Flight demand forecasting is a particularly critical component for airline revenue management because of the direct influence on the booking limits that determine airline profits.And it is a critical factor for airlines to create value from the past to the future. 1 For example, flight demand forecasting could be used for airline economic estimate, airline operating cost and measures of productivity, airline planning progress, and airline schedule optimization, especially for airport resource planning, air traffic control, etc. 2 Thus, flight demand forecasting become one of the basic technologies of aviation industry innovation.The earliest research about this problem is accompanied by overbooking applications, Beckmann and Bobkoski 3 focus on no-shows and cancellations.Taylor 4 focuses on booking behaviors that determined show-ups, and the distribution models for passengers' arrival processes are the main goal of researchers in these early years.Lyle 5 modeled demand as composed of a Gamma distribution with Poisson random errors, which gives a negative binomial distribution for total demand, similar studies also include Martinez and Sanchenz. 6Belobaba 7 regards that the normal probability distribution gives a good continuous on booking data to aggregate flight demand distributions.This conclusion is widely accepted and applied to the revenue management practice of many airlines.On the other hand, data contained in historical booking records are censored by the presence of booking and capacity limits on past demand.So detruncation is the element before demand forecasting, the theoretical area of detruncation started with the Expectation-Maximization (EM) algorithm (Dempster et al.). 8Swan 9 earlier addressed the downward bias of censoring on late data and suggested simple statistical remedial measures.Other detruncation algorithms include booking curve (BC) and projection detruncation (PD) Skwarek, 10 nearest research by Zeni. 11Weatherford 12 shows that EM is the method of mean absolution deviation minimization.In this paper, we use the widely accepted EM method to detruncate the booking data before demand forecasting.
In addition to the study of passenger arrival distribution models and detruncation, the specific models of flight demand forecasting are another focus of the study, which is also the goal of our paper.In the airline application sense, the research on the flight demand forecasting of setting seat limits began with Littlewood, 13 who used the exponential smoothing model for the first time to predict two-class booking results.Since then, some traditional mathematical prediction models have been applied to flight demand forecasting, such as linear regression, time series, moving average, and two kinds of pickup models.Wickham 14 summarized the traditional models before 1995.Later studies paid more attention to the improvement of flight prediction accuracy and the analysis of the impact on flight revenue, such as Zeni 10 and Fig. 15,16 Many follow-up studies are still based on the improvement of traditional models, such as Boyd 17 and Zaki. 18round the turn of the century, many researchers turned to original & destination (O&D) demand forecasting, 19 or studied the passenger choice model from the perspective of passenger behavior for customized dynamic pricing, 20 but these are not the focus of this paper.Larry 21 review paper is the best summary of flight forecasting at present.
Flight demand forecasting is difficult and costly and the results are sometimes unsatisfactory because the flight booking data has the characteristics of large demand variety, strong seasonal fluctuation, largely impacted by holidays and special events, non-linear, and so on.So some forecasting algorithms aiming at the problem of relatively stable seasons, such as ARIMA, 22 are difficult to achieve good results on the flight forecasting problem.The forecasting accuracy has an important influence on flight seat limits set, and further significantly affects flight revenue.Therefore, despite decades of development, flight demand forecasting has been a continuous concern of researchers and practitioners.So, it is necessary to explore and develop high accuracy flight demand forecasting models and innovative methods.The support vector regression (SVR) proposed by Vapnik 23 is a machine learning method based on statistical learning theory.Compared with traditional methods, SVR model is more suitable for the airline context, because SVR model can add the demand of the sales period before the same departure date and sales time into the forecasting model in the form of vector; so that more information up to now can be used in demand forecasting.Based on SVR, Anurag 24 presents a meteorological drought prediction.Quan 25 presents a water temperature prediction, which provides useful insights into the application of SVR in predication areas.In addition, this paper compares SVR model with linear regression, the pick-up model, and the exponential smoothing method, and finds that SVR model can improve the forecasting accuracy, which provides a better choice for the algorithm used by airlines in demand forecasting.

Flight booking data analysis
Although the flight booking data has large fluctuation characteristics, it shows the periodic characteristics of seasonal changes.Table 1 shows the statistics of the final bookings of representative flights of a Chinese airline in 2018 and 2019 without being affected by the pandemic COVID-19.We partitioned routes as tourism routes, business routes, and ordinary routes.And there are eight different flights on each type of route.From Table 1, we can see that there is a large range of variation in all three kinds of data sets, and this also will result in a large forecasting standard deviation.
Without losing the generality, Figure 1 compares the aggregated data for each month from 2018 to 2019 about a business route.Although the overall bookings fluctuate widely, but different years still show similar seasonal fluctuations.For example, the bookings from March to August are significantly higher than bookings in other months, which can be considered the peak season (P).January and November-December are lower and can be considered the off season (O).February and September are media, which can be considered the medium season (M).Affected by the short-term impact of the Chinese National Day, October was higher than September, but it still did not reach the peak of the peak season.Since February includes the Chinese Spring Festival, it has similar characteristics.
For the same business route as Figures 1 and 2 shows the average bookings changing trend by DOW (day of the week) in the same data set.We can find that the bookings and DOWs have a strong correlation.The average bookings on the first 3 days of the week are significantly higher than that on other days, and the lowest on Sunday.Through the above analysis, it is clear that although the flight bookings are relatively discrete, it is subject to seasonal characteristics, on the whole, showing the characteristics of peak season (P), medium season (M), and off season (O), and the DOW characteristic is obvious.A similar analysis was conducted on the bookings of tourist routes and ordinary routes.The results show the same seasonal characteristics and strong DOW correlation.These characteristics indicate that we need to establish different types of demand forecasting models on different classifications, and use classified data sets for model parameter training.

Organization of data
Chinese airlines generally start selling seats on domestic flights about 180 days before the flight departure and stop selling until the flight departure.This period is called the pre-sale period.During the whole pre-sale period, airlines generally use the automatic computer program to collect the booking data of the flights from the computer reservation system (CRS) within the scope of authorization.The farther from the departure date, the greater the collection interval, and the closer to the departure date, the smaller the collection interval.In the last 3 days, the collection is performed every 24 h, and we define every collection as a data collection point (DCP).The DCP setting in this paper is shown in Table 2, which defined nine DCPs as a whole.For example, DCP 8 means the collection day is 35 days before departure, the final DCP 0 represents the bookings after departure, and the bookings on DCP 0 are the forecast goal.
According to the DCP setting Table 2, the flight bookings can be obtained as Table 3, which mixed historical flights that have departure and future flights that are still in the pre-sale period.Assuming the current date is 2 June 2019, the flights before June 2nd have a departure, and we have to get adding up bookings on all nine DCPs.The flight on June 2nd will depart today, the adding up bookings in the last DCP0 still not collect, so it is blank.The flight on the fifth will depart after 4 days, so the last four DCPs are blank, and so on.The final bookings of DCP 0 should be predicted based on the last DCP which is not blank or all of the DCPs that are not blank.

Symbol definition
In order to introduce the traditional flight demand forecasting models, 21 we first define the relevant symbols as follows.
Note that there are N records in a flight bookings dataset as Table 3.Let n represent the flight sequence number order by date from far to recent, n ¼ 1, …, N , and let i represent the flight DCP ID, i ¼ 0, …, 8. DCP n, i represent the adding up bookings of flight n in its i-th DCP.If the flight order n is not specified, DCP i can be used to directly represent a designated flight at the DCP i or its adding up bookings at DCP i .For the n-th flight on its DCP i , it can be considered that the adding up bookings currently obtained by each DCP constitutes a vector DCP n, i ¼ ½DCP n, 8 , DCP n, 7 , /, DCP n, i , whose length is 8-i+1.
The booking data set can be divided into two parts: training set and testing set according to the current date.As shown in Table 3,

Detruncation
Because detruncation is the element before demand forecasting, so we use EM algorithm to correct constrained data before prediction.Suppose we have C þ O observations for a given DCP i and given flights, DCP 1, i , /, DCP CþO, i , i ¼ 0, …, 8, of which C observations are constrained because the flight had at least one class was closed.Since each DCP uses the same EM algorithm for detruncation processing, we omit the DCP subscript i, and we ignore the date series aspect of the observations and treat DCP 1 , /, DCP CþO as an unordered set of observations generated by an i.i.STEP 0. (Initialize): Initialize μ and σ to be μ ð0Þ and σ ð0Þ .k ¼ 1, Let δ > 0 be a small number, to be used as a stopping criterion.
STEP 3. (Convergence test): IF jμ ðkÞ À μ ðkÀ1Þ j < δ and jσ ðkÞ À σ ðkÀ1Þ j < δ THEN STOP; ELSE For each DCP of each flight, after detruncation for each potentially restricted class, we summarized all classes bookings detruncated as the flight aggregation bookings on every DCPs.This paper will predict the final bookings of each flight in testing set by using the adding up bookings of every DCPs and combined it with the characteristics of flight DOW and PMO.In this paper, all the following formulas assume that the flight to be predicted is in DCP i , and the prediction result of the final bookings of phase t-th flight in the testing set is d DCP t, 0 .
Traditional models for flight demand forecasting

Pick-up model
The traditional model selects historical flight data with the same characteristics in the training set to estimate the model parameters based on the DOW and PMO characteristics of the target flight to be predicted.The pick-up model calculates the average value of the booking increment from DCP s, i to DCP s, 0 for historical flights, which is used as the bookings increment forecast value of the same kind of flights on the DCP i to the departure day, the final flight demand forecasting result of the future t-th flight DCP i is its current adding up bookings plus the bookings increment forecasting value, as shown in equation (7).

Linear regression model
The linear regression model defines f ðxÞ ¼ a þ bðxÞ þ ε and ε as system residuals, and the estimated values of a and b need to be obtained from training set, as shown in equations 8-10.

Exponential smoothing model
The exponential smoothing model can be abbreviated as equation (11), and the N rows of observations should be strictly sorted according to the date from small to large, ∂ 2 ð0, 1Þ is the smoothing constant, and ∂ can take different values according to the seasonal variation.Experience shows that a small value of ∂ is conducive to rapid response to recent flight disturbances, which be used to switch from off season to peak season, such as 0.25.In a relatively stable period, it can be taken as 0.45.
SVR model for flight demand forecasting Support vector machines (SVM) introduce the idea that mapping nonlinear low-dimensional data to highdimensional space so that we could build a linear model to classify the data.In this way, a linear model can be built on nonlinear data, and this is the main idea of SVR.But different from linear regression model, SVR allows a deviation of no more than ε between the predicted data f ðxÞ and the real data y, in other words, we construct a gap of width 2ε, if the training samples fall within this interval, they are considered reasonable.Given a training sample ¼ fðx 1 , y 1 Þ, ðx 2 , y 2 Þ, …, ðx m , y m Þg, x i 2 R n , y i 2 R, We want to get a regression function Not only make most of the data be represented by this linear formula, but also makes the least amount of data not fall into the gap.Where w is the normal vector of the linear equation, which determines the direction of the sample in the hyperplane; b is the offset; and fðxÞ is the feature space after mapping x to the high-dimensional space.We can build an optimization model as follows min The objective function min1=2kwk 2 is the maximum margin of the model, C is a regularization coefficient.
Equation ( 13) is convex quadratic programming problem, its dual problem can be obtained by using Lagrangian relaxation as equation (15).
where λ i and b λ i are the dual variables.
After solving equation ( 15), the calculation formulas of w and b are obtained according to the KKT (Karush-Kuhn-Tucker) conditions Substituting Equations ( 16) and ( 17) into equation ( 12), the SVR can be expressed as where Kðx i , Since the SVR model has good modeling ability for nonlinear data, DOW, PMO and all DCPs before the current DCP k can be used as the model input x i , and DCP 0 can be used as the model output y i , which can still achieve a good modeling effect.The definition of x i is given below where, DOW 1 , …, DOW t denotes the average booking data in the past t weeks; P, M and O denote the average booked data for peak season, medium season and off season respectively.
It is difficult to calculate fðx i Þ T fðx j Þ in real time.In this paper, we choose Laplacian kernel as the kernel function according to the suggestion of related experts, which is defined as

Data set
This paper uses the complete bookings data of 24 roundtrip flights in 2018 and 2019 on three representative routes provided by an airline company in China.These routes covered business routes, tourism routes and common routes.Take the flight data from January 2018 to 31 October 2019 as the training set, including 29,363 flights, and the flight data from 1 November 2019 to 31 December 2019 as the testing set, including 2192 flights.The data set contains approximately 10% of incomplete or dirty data.It should be noted that we have selected the data before the COVID-19 epidemic for simulation to test the accuracy of the model under normal conditions.

Performance evaluation index
In this paper, RMSE (Root Mean Square Error) and Accuracy are selected to evaluate the prediction effect, as shown in equations ( 21) and (22).Since the prediction of each DCP is independent, without loss of generality, the formula does not mark the difference of DCP.
> > : where, b y t represents the demand forecasting result of t-th flights, x t ¼ DCP t, 0 represents the final unconstrained bookings of same flight t, and tϵf1, /, T g represent the subscript for testing data set.

Results and discussion
Impact of PMO characteristics.We first analyze the impact of PMO characteristics on the forecasting results of traditional methods.The training set is sorted from high to low according to the passenger load factor.The first 20% is marked as P, the last 20% is marked as O, and the remaining 60% is marked as M. We set the weights of P, M, and O to 3, 2, and 1, respectively.For the testing set flights to be predicted, we select the historical flight subset corresponding to the training set according to the DOW, and calculate the average value of the PMO weight of the historical flight subset.If the average value is larger than or equal to 2.5, we set the PMO characteristic of the flight to be predicted as M; if the average value is less than or equal to 1, we set the PMO characteristic of the flight to be predicted as O; otherwise, we set it as M. We partition all the flights according to DWO characteristics without considering the route characteristics.
The flights are partitioned into only including the DOW characteristics and including the DOW + PMO characteristics for forecasting.Since there is no event or holiday with significant impact within the flight date range of the testing set, the holiday flights in the selected subset are deleted when they are classified only according to the DOW characteristics.The impact of holidays is not concerned when the data is classified to the DOW + PMO characteristics because PMO has automatically classified flights with high load factor.The input form of SVR is shown in equation (19).There are eight flights per day for each type of route, and the simulation prediction operation is performed for each DCP of eight flights.The predicted flight date range is 60 days in total.
Table 5 shows the comparison between SVR and traditional demand forecasting methods on three representative routes: tourism, business and general.It can be seen that the accuracy of SVR algorithm is higher than that of traditional methods at all positions, and RMSE is lower than that of traditional methods.Therefore, SVR has absolute advantages no matter which type of route it is targeting, or from the perspective of forecasting the distance between DCP and departure DCP.For the traditional method, the prediction accuracy and RMSE of ordinary routes are obviously inferior to those of commercial routes and tourist routes, which is mainly due to the large percentage of standard deviation of ordinary routes, as shown in Table 1.We also see that, regardless of which method, the closer the DCP is to the departure date, the better the forecasting effect will be.However, the SVR forecasting accuracy and RMSE are significantly improved when approaching the departure DCP.This is because the traditional demand forecasting model only uses the current DCP i add-up bookings as model input, the forecasting model contains less a priori information, while the SVR forecasting model can add the demand of the sales period before the same departure date and sales time into the forecasting model in the form of vector, so that more information up to now can be used in demand forecasting.Obviously, the priori information that can be input in the long term of the SVR model is similar to the traditional models.With the approach of the departure date, the more priori information that can be obtained by the SVR model, the greater the improvement of the forecasting accuracy.

Conclusions
The flight demand forecasting problem has particularly important applications in airline route network planning, flight scheduling, human resource scheduling, and especially revenue management.This paper analyzes the seasonal and DOW characteristics of airline flight bookings and divides flight bookings into peak, medium, and off according to season attributes, combined with DOW attribute, flight demand forecasting is carried out.Different from the traditional models, only the current DCP add up bookings of historical flights and the characteristics of DOW and PMO are used for model parameter learning, we regard the flight demand forecasting problem as a twodimensional sequence prediction problem, and use the vector composed of historical flight booking data and the add up bookings of each DCP in the early stage of the flight pre-sale period to train the SVR model.Due to the strong adaptive learning ability of SVR model for nonlinear data with large fluctuations, the flight demand forecasting model based on SVR increases the priori information of flight compared with traditional models, to achieve great improvement in both forecasting Accuracy and RMSE.In this paper, the testing data set selects the 2-year data of three types of routes: tourism, business, and general routes of an airline in China that are not affected by COVID-19.This study provides a better choice for flight demand forecasting.
flights before June 1 are the training set and flights after June 2 are the testing set.There are M rows in the training set and T rows in the testing set, where M þ T ¼ N .We Let s ¼ 1, …, M represent the subscript for the training set, and let let t ¼ 1, …, T represent the subscript for the testing set.
d. process.And we assume DCP 1 , /, DCP C are constrained at booking limits b 1 , /, b C , so the DCP 1 ¼ b 1 , /, DCP C ¼ b C .The remaining O observations are unconstrained.The steps of the EM algorithm for our observations follow:

Figure 2 .
Figure 2. Trend chart of the average value of DOW bookings for a business route.

Table 1 .
Statistical analysis table of flight booking data.
Figure 1.Trend chart of the total monthly bookings for a business route.

Table 3 .
Bookings data form table.

Table 4
Advantages of SVR.In this subsection, we test the advantages of SVR over traditional forecasting methods.We use Laplacian kernel in SVR.The research of traditional methods is just like the actual use of airlines, i.e., each prediction is the result with minimum RMSE in parameter learning of selecting pick-up, regression, and smoothing methods.The traditional forecasting method uses DOW + PMO for training subset selection, while SVR does not add any selection on the complete observation set, and does not distinguish between DOW and seasonal characteristics PMO.All characteristics are learned by SVR model itself.

Table 5 .
Comparison table of SVR and traditional methods.