Novel Optimization-Based Parameter Estimation Method for the Bass Diffusion Model

The Bass model is the most popular model for forecasting the diffusion process of a new product. However, the controlling parameters in it are unknown in practice and need to be determined in advance. Currently, the estimation of the controlling parameters has been approached by various techniques. In this case, a novel optimization-based parameter estimation (OPE) method for the Bass model is proposed in the theoretical framework of system dynamics (SD). To do this, the SD model of the Bass differential equation is first established and then the corresponding optimization mathematical model is formulated by introducing the controlling parameters as design variable and the discrepancy of the adopter function to the reference value as objective function. Using the VENSIM software, the present SD optimization model is solved, and its effectiveness and accuracy are demonstrated by two examples: one involves the exact solution and another is related to the actual user diffusion problem from Chinese Mobile. The results show that the present OPE method can produce higher predicting accuracy of the controlling parameters than the nonlinear weighted least squares method and the genetic algorithms. Moreover, the reliability interval of the estimated parameters and the goodness of fitting of the optimal results are given as well to further demonstrate the accuracy of the present OPE method.


Introduction
Mathematical modeling of innovation diffusion developed by Bass in 1969 has received much attention of research community since the model can help them to better understand the market response to new products, assess new product introduction strategies, and make a model-based decision (F. M. Bass, 1969). The characteristic that distinguishes the Bass model from other growth models is that Bass explicitly incorporated some key behavioral assumptions from Rogers' theory of diffusion of innovation (Rogers, 1962). Due to this, the Bass model has been widely applied in economic market to quantitatively forecast the new product sales (Fan et al., 2017;Moe & Fader, 2002), product life cycle (Guo, 2014), 3G mobile subscription in China (Lim et al., 2012), Diffusion of Mobile Telephony in China (Liu et al., 2012), oil production in the Eagle Ford Shale (Tunstall, 2015), marketing innovations to old-age consumers (Pannhorst & Dost, 2019), wind power development factors (She et al., 2019), and so on. More recently, the Bass diffusion model was extended to describe the consumer quantity of B2C commerce platforms (Li et al., 2020). The classic Bass model includes three controlling parameters: the coefficient of innovation, the coefficient of imitation, and the total market potential. These parameters are practically unknown in advance and must be estimated based on some experimental or empirical data. This issue is the so-called parameter estimation problem. Therefore, the successful applications of the Bass model necessarily involve the determination of these parameters (Mahajan et al., 1991). Currently, the estimation of these parameters has been approached by many different techniques. For example, the ordinary least squares approach was first depicted by Bass for estimating the parameters in the diffusion model (F. M. Bass, 1969), and now it has become a very popular technique for parameter estimation (Dennis & Schnabel, 1996). Darija and Dragan estimated the parameters in the Bass model by the nonlinear weighted least squares (NLS) fitting approach, and the existence of the least squares estimate was theoretically proved (Marković & Jukić, 2013). Mahajan et al. (1986) developed a simple algebraic estimation procedure for the Bass diffusion model of new product acceptance. Yang and Wu established the Bass diffusion model of China Mobile user data by the NLS method and the genetic algorithms (AL) and reported that the AL was more suitable than the NLS approach for parameter estimation, especially when less data are available for building a growing product diffusion model (Yang & Wu, 2005). In addition, the maximum likelihood estimation approach for the Bass diffusion model was developed by Schmittlein and Mahajan to determine the new product acceptance (Schmittlein & Mahajan, 1982). Meng and He employed the ant colony algorithm (ACA) for parameter estimation in the Bass model and pointed out that the ACA performs better than the NLS and the AL (Meng & He, 2009). Satoh proposed a discrete Bass model featured by a difference equation and gave an ordinary least squares procedure for parameter estimation in this discrete Bass model (Satoh, 2001). However, the discrete Bass model can be applied only if a discrete equation with exact solution is derived. Recently, Grasman and Kornelis estimated the parameters of the Bass model using the numerical solution of the Bass equation, so discrete data of future sales can be obtained from the fitted model for the purpose of stock management (Grasman & Kornelis, 2019).
Although various techniques have been developed for parameter estimation in the Bass diffusion model, it is observed that these parameter estimation approaches may lead to different model fitting results that further affect the forecasting accuracy of the Bass model. So, how to improve the accuracy of model parameter estimation is still an open issue for ensuring the effectiveness and accuracy of the Bass diffusion model. In this study, an optimization-based parameter estimation (OPE) method for the Bass model is proposed based on the system dynamics (SD) theory. First, the mathematical description of the Bass model is translated into the SD model and its validity is assessed by the data check. Then, the OPE method of the Bass model is depicted by establishing the mathematical optimization model, which can be solved iteratively. Finally, the performance of OPE method is discussed by two examples.

Basic Form of Bass Model
To describe the time-dependent diffusion process of a new product, Bass proposed the following constant coefficient differential equation (F. M. Bass, 1969) d d where y t ( ) is the cumulative number of adopters (CNA) of a new product at time t, p > 0 is the coefficient of innovation, q ≥ 0 is the coefficient of imitation, and m > 0 is the total market potential, that is, the maximum CNA to be reached. Therefore, the function y t ( ) and its derivative strongly depend on the coefficients p, q, and m. Thus, we can rewrite the function y t ( ) as y y t p q m = ( , , , ).
Typically, if the initial condition is assumed as y t ( ) = = 0 0, the integration of Equation 1 can give the following analytical solution:  Figure 1 clearly indicates the variations of the CNA function for various controlling parameters p and q under the assumption m = 1. It is found that all curves (almost) approach to the limit value m = 1. Moreover, the parameters p and q have a significant influence on the shape of the CNA curve. The higher the parameter p or q, the more easily the curve converges.

Establishment of the SD Model
To establish the SD model corresponding to the Bass differential equation, a flow diagram is generally required from which the types of variables and the interrelation and connectivity between the variables can be clearly distinguished. Figure 2 is the flow diagram of the Bass model of interest by using the system dynamics software VENSIM, where the CNA function y in Equation 1 is regarded as a state variable, the parameters p, q, and m are exogenous variables, and RT is rate variable.
According to the flow diagram, the SD model can be correspondingly established by translating the flow diagram into the SD equations. In other words, to establish the SD model, the relationship between different variables should be quantitatively described by the SD mathematical expressions. Because the differential equation of the Bass model is relatively simple, the SD expression to be defined is only the relationship between the rate variable and the exogenous variables, that is, Generally, before running the SD simulation, the values of the parameters (p, q, m) of the Bass model must be assigned. However, these parameters are practically unknown in advance. In this study, they will be estimated by the optimization method to be developed subsequently.

Model Check
For the SD model, model checking is necessary to demonstrate the effectiveness of the model. However, there is no unique way for SD model testing (Barlas, 1989(Barlas, , 1996. J. W. Forrester, the founder of the SD theory, pointed out that the qualitative check method of the SD model is alternatively effective (Forrester & Senge, 1980). Here, the dimensional consistency check and time integral step check for the established SD model are carried out. Then a qualitative test is performed to meet the hypothesis of the Bass model defined as: the number of adopters y increases when p , q, and m increase, and vice versa.
The model qualitative testing shown in Figures 3 to 5 is performed under the initial assumption of p = 0 1 . , q = 0 1 . , and m = 1000. In Figures 3 to 5, the red line refers to the curve evaluated by the initial parameters, and the blue line corresponds to the induced curve by the changed value of parameter. Figure 3 shows that when p increases from 0.1 to 0.2, the function y increases. Figure 4 shows that when q increases from 0.1 to 0.35, the function y increases as well. Figure 5 shows that when m is reduced from 1,000 to 750, the function y reduces too. Numerical experimental results show that the function of SD model is consistent with the hypothesis of the model.

OPE Method
In general, according to a set of time-dependent reference data that include the values of CNA at different time instances, the values of model parameters p, q, and m can be estimated (F. M. Bass, 1969). In this study, the estimation of the parameters p, q, and m is conducted based on optimization operation of the SD model. To do this, the following optimization mathematical function is established where { } X is the design variable array, whose elements are the parameters p, q, and m of the Bass model. F X ( ) is the objective function consisting of F X oi ( ) and F ri . F X oi ( ) is   the CNA function to be optimized. F ri is the reference function that F X oi ( ) is to be reached. n is the number of discrete time points at which the value of the CNA function F X y t p q m oi i ( ) ( , , , ) = is to be determined, and the value of the reference function F ri is in some way obtained. { } X l and { } X u are the lower and upper limits of the design variables, respectively.
Besides, the solving of the optimization model needs an initial guess of the design variable, that is, The aim of this optimization model is to make the curve F oi to be optimized is as close as possible to the given reference curve F ri by optimizing the value of the design variable under the specific constraint. Because this procedure is related to mathematical optimizing theory, so it is called optimization-based parameter estimation (OPE) method. The OPE method can not only determine the estimated values of the parameters p, q, and m but also give the reliability and confidence interval of the estimated parameters.

Results and Analysis
Example 1 Reference data. To validate the accuracy of the OPE method for estimating the parameters p, q, and m of the Bass model, the reference function is obtained from the analytic solution (3) by assuming p = .001, q = 0.65, and m = 1,000 for convenience. It is worth noting that both p and q in the Bass model vary in the range of (0,1], and p is always much less than q (F. M. Bass, 1969). Moreover, the parameter m representing the converged value of the CNA function (see Equation 5) is case-dependent and should be set according to the characteristic of problem of interest in practice. Here, m =1 000 , is simply set without any special purpose. This set of reference data is listed in Table 1. Then, the OPE method is adopted to evaluate the optimal values of the parameters p, q, m, and the optimized CNA function. The validity of the OPE method can be demonstrated by comparing the optimized results to the analytical solutions. [ With the SD optimization, the optimal values of the parameters p, q, and m by the OPE method are shown in Table 2 Figure 6 shows the curves y:aREF, y:aRR1, and y:aRR2 after the implementation of OPE method. The curve y:aREF is the reference CNA curve in which the data are the exact CNA y i in Table 1. The curve y:aR1 is the initial curve corresponding to the initial guess of parameter p, q, and m. The curve y:aR2 is the optimized CNA curve obtained by the OPE method. It can be seen that the curves y:aRR2 and y:aRFRF show extremely good agreement.
To measure the quality of the optimal results, the following goodness of fitting (R 2 ) is defined: where yi  and y i denote the estimated values and the reference values of the CNA function, respectively. y i is the average of the reference values. Here, the reference value of y i is the exact value obtained from Equation 3, as listed in Table 1.
Using Equation 11, the goodness of fit of the optimal curve y:aRR2 is 0.9999. Therefore, it is confirmed that the minimization of the objective function F defined by Equation 7 provides a much good fitness of the CNA function to the reference function.

Example 2
In the second example, the real case on the evolution of the number of users of China Mobile is taken into consideration, and the user diffusion model from China Mobile is solved to demonstrate the performance of the present OPE method.
China mobile user data. Yang and Wu calculated the number of 2G users per year (NUPY) of China Mobile from 1991 to 2004 by the statistics method (Yang & Wu, 2005), which are listed in Table 3 (unit: ten thousand). The change in NUPY was described by the Bass diffusion model, in which the values of the parameters p, q, and m were estimated by Yang and Wu using the NLS method and the genetic algorithm (AL), respectively (Yang & Wu, 2005). Here, the results from the NLS and AL methods will be compared to those from the present OPE method.
Parameter estimation. To solve the SD optimization model defined in Equation 7, an initial guess of the design variable is assumed as [ , , ] [ . , . , , ] * * * p q m = 0 0002 0 65 33 482 , with which the initial curve of the function F oi to be optimized can be determined. Here, the initial value of the parameter m is set as 33,482, the maximum value of NUPY in Table 3, because m is case-dependent and represents the converged value of the NUPY in the Bass model. The real user data y i in Table 3 corresponds to the reference curve F ri . The upper and lower limits of the design variable are assumed as 0.0001 ≤ p ≤ 1, 0.0001 ≤ q ≤ 1, 5 ≤ m ≤ 40,000.
Using the VENSIM software, the present SD optimization model is solved and the optimal values of the parameters p, q, and m are obtained as p q m = = = 0 0002 0 72466 39 485 7 .
, . , , . . Moreover, the estimation interval under 95% reliability of the parameters is given by  Figure 7 shows the variations of the initial curve y:bR1, the optimal curve y:bR2, and the reference curve y:bREF. It   Figure 6. Variations of the initial, reference, and optimal curves in Example 1.  1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 Time (Year) y : bREF y : bR1 y : bR2 Figure 7. Variations of the initial, reference, and optimal curves in Example 2.
can be seen that the optimal results are in good agreement with the reference (real) results.
Comparison of parameter estimations. In this section, the estimated results from the NLS method, the AL method, and the present OPE method are compared to demonstrate the advantage of the present method. Table 4 lists the estimated results of the parameters p, q, and m. It is observed from Table 4 that the OPE and AL methods give similar level of the parameter estimates, which look more reasonable than that from the NLS method. To further illustrate the comparison, Table 5 lists the CNA results from the three estimation methods. It is indicated that the OPE method produces better predictions than the NLS and AL methods, especially after the year 1993. To show the variation of the predictions more clearly, the CNA results are plotted in Figure 8. It can be seen from Figure 8 that the curve estimated by the OPE method is closer to the actual data curve than that estimated by the NLS method and the AL method. Among the three methods, the NLS method performs worst.
On the contrary, the mean values of the predicted user data from the three methods and their goodness of fitting (R 2 ) to the actual data are listed in Table 6. The average value of the actual data is 8,094.4. It is found from Table 6 that the average value from the OPE method is the closest to the average value of the actual data than that from the other two methods. The relative error between the OPE prediction and the actual value is 0.14% only. Correspondingly, the OPE method gives the best R 2 value, as expected.
To further illustrate the effectiveness of the present OPE method, the number of 4G user data per half year (NUPHY) of Chinese Mobile issued by Ministry of Industry and Information Technology of the People's Republic of China in 2019 is analyzed. The 4G user data from December 2014 to December 2018 are tabulated in Table 7. For convenience of plotting, the time record is numbered with digit.
To solve the SD optimization model defined in Equation 7, the initial guess of the design variable is assumed as [p*, q*, Note. AL = genetic algorithm; NLS = nonlinear weighted least squares; OPE = optimization-based parameter estimation.
The SD optimization model of this problem is solved using the VENSIM software, and the optimal values of the parameters p, q, and m are obtained as Equation 15. The estimation interval with 95% reliability of the parameters is given by  Figure 9 shows the variations of the initial curve y:cR1, the optimal curve y:cR2, and the reference curve y:cREF. It can be seen from Figure 9 that the optimal curve is in good agreement with the reference curve representing the real user data. Moreover, the R 2 of the results of the optimal curve y:cR2 to the real data is calculated according to Equation 11, and the value is R 2 = .9986.
Finally, it can be seen from the solving procedure previously that the implementation of the SD-based OPE method is fully independent of the exact solution of the problem. The algorithm can be implemented as long as the corresponding SD model can be established. This means that the present OPE method can be extended for solving more complex Bass-type problems including more controlling parameters.

Conclusion and Future Work
Summary Although the Bass model is attractive due to its conciseness and effectiveness, parameter estimation is still a critical issue for the application of the Bass model because the three parameters in the Bass model are unknown in advance in the practice. In this case, the novel OPE method is proposed for estimating the three controlling parameters and the variation of CNA function of the Bass model. To do this, the Bass differential equation is first translated into the SD expression representing the relationship between the rate variable and the exogenous variables, and then the SD-based optimization model is established by introducing the design variable, the objective function, and the constraint condition of the design variable. Subsequently, the present SD-based optimization model is solved by the VENSIM software, and its effectiveness and accuracy are demonstrated by the two case studies. The main conclusions of this work include the following: 1. To the best knowledge of the author, the present OPE method is the first attempt for solving the Bass model in the framework of system dynamics. 2. The present OPE model involves the SD equations of the problem which can be easily implemented. 3. The present OPE method exhibits excellent performance in parameter estimation and function prediction, and the accuracy is better than that of the NLS and AL methods. 4. The present OPE method provides an excellent model analysis technique and can be extended to more Basstype problems such as forecasting mortality caused by COVID-19 (Gurumurthy & Mukherjee, 2020) and exploring the needs and experiences of educators (J. Bass et al., 2020).

Future Research
The case shows how the novel SD-based optimization model can be used to solve the Bass model of product diffusion, including the parameter estimation and the optimal prediction of CAN function. However, this work has several limitations that should be addressed in future research. First, we have only studied the three-parameter Bass model; however, the more complex diffusion model, that is, food safety information diffusion (Liang, 2021), including more controlling parameters, may be encountered. The related methodological process should be developed following the idea in this study. Second, in addition to the Bass model, the diffusion models available also include Logistic model and Gompertz model. These models may differ with the different study period (Liu et al., 2012). Therefore, further forecasting performance assessment of these diffusion models and the related optimal models are interesting for future research.

Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The  Figure 9. Variations of the initial, reference, and optimal curves for 4G user analysis in Example 2.
research is partially supported by the Science and Technology Development of Henan Province (No: 142102210563).

Ethics Statement
The author stated that no animal and human studies were included in this work.