Solving a Distribution-Free Multi-Period Newsvendor Problem With Advance Purchase Discount via an Online Ordering Solution

In this paper, we study a distribution-free multi-period newsvendor problem with advance purchase discount (APD). In addition to the regular-order placed at the beginning of each period, a decision-maker (DM) can also commit to an advance-order from the upstream supplier and receive discounts. The goal of the DM is to maximize total profits, and in this problem, the DM only has access to past demand data. To solve this problem, we apply an online method based on the theory of prediction and learning with expert advice to propose an explicit online ordering solution by using the fixed-stock policy as expert advice. With the properties of the gain function, we derive a theoretical result that guarantees, for any given advance-order quantity, the newsvendor’s cumulative gains achieved by the proposed online ordering solution converge to those from the best expert advice in hindsight for a sufficient large horizon. In addition, we extend the problem to the discrete case and obtain the corresponding explicit strategy and performance guarantee. Finally, numerical studies illustrate the effectiveness of the proposed solution, and the newsvendor’s total profits are comparable to the best expert advice. Sensitivity analysis also shows the robustness of the proposed solution.


Introduction
Inventory management is one of the classical operations management problems which has attracted wide attention from industry and academia.A decision-maker (DM) needs to minimize the costs or maximize the profits by choosing an ordering quantity in the inventory problem.In real life, the sales of newspapers, electronic products, and blood product control are typical examples (Bravo-Moreno, 2019).
Since the classical newsvendor problem was pioneered by Arrow et al. (1951) and Morse et al. (1951), there has attracted considerable literature on this topic (Khouja, 1999;Pedroza-Gutie´rrez & Herna´ndez, 2020;Yan et al., 2011;J. Zhang et al., 2021).The classical newsvendor problem assumes the probability distribution of the demand is fully known.The optimal decision is known as a critical quantile of the inverse cumulative distribution of the demand.However, the reality is that DM often does not know the demand distribution in advance.Thus, some research assumes only the mean and standard deviation of the demand is known and uses the minimax approach, a common approach for modeling demand uncertainty in the literature, to study this problem.Scarf (1958) considers a distribution-free newsvendor problem, and show a (s, S)-policy is optimal.Gallego and Moon (1993) give a new proof of the optimality of (s, S)-policy proposed by Scarf, and extend the analysis to the recourse case.Moon and Choi (1995) allow customers to balk when inventory is low and relax the cumulative distribution function of the demand and merely assume that the first two moments of the distribution function are known.Alfares and Elmorra (2005) consider the shortage cost to the newsvendor problem.Khouja (2000) extends the single-period problem to the case where demand is price-dependent and multiple discounts with prices, and puts the corresponding algorithm to get the optimal order quantity and discount settings to finally achieve the purpose of selling excess inventory.Sarkar et al. (2018) develop a distribution-free newsvendor model with consignment policy and retailer's loyalty reduction.When the family of demand distribution is known, the Bayesian update is also a common approach in the literature.Related studies include Azoury (1985) and Scarf (1959).
When there is no assumption on inherent demand distribution, and the DM only has access to the historical demands, many pieces of research propose data-driven approaches to this problem (Gallego & Moon, 1993;Huh & Rusmevichientong, 2009;Levina et al., 2010;Li et al., 2017).Levi et al. (2007Levi et al. ( , 2015) ) apply the sample average approximation (SAA) to the newsvendor model and multi-period inventory model.They use samples from the inherent demand to build empirical distribution and establish uniform bounds on the number of samples to guarantee the SAA is near-optimal.Also, with historical demand data, Bookbinder and Lordahl (1989) use the bootstrap method to ensure the inventory re-order levels by estimating the fractile of the inherent demand distribution.Huh et al. (2011) use the well-known Kaplan-Meier estimator from statistics to study a data-driven inventory control problem with censored demands.They prove that the proposed policies almost surely converge to the optimal solutions.Other studies include Ban andRudin (2019), B. Chen et al. (2019), Gan (2019), and Huh and Rusmevichientong (2009).Following this stream of research, this paper investigates a distribution-free multiperiod newsvendor problem with advance purchase discount.In such a case, there is no statistical assumption on the inherent demand, and the DM only has access to the past demand data and gains feedback.By using a new method of online prediction with expert advice from computer science, this study modifies the regular-order strategy with the gained feedback from different experts' advice.In addition, the whole process does not need to solve the specific distribution of the potential demand function, which is also the main difference between this study and the above studies.Thus, this study enriches research on data-driven newsvendor problems.
Many studies have addressed the inventory problem with advance purchase discount and they mainly focus on sellers who provide end consumers within a supply chain.As Gan et al. (2019) summarize, there are many reasons for suppliers to do this, such as savings in operating costs (Gilbert & Ballou, 1999), soliciting information directly from the buyers or shaping competition in the downstream market.Gilbert and Ballou (1999) study a supply chain consisting of a steel distributor and some customers, and show that careful balancing of advance order time and price discounts can lead to lower costs for all channel members.Cachon (2004) studies a supply chain coordination problem involving advance purchase discounts, and he also considers the risk allocation of participants in the supply chain.Dong and Zhu (2007) consider the issue of inventory ownership in a supplierretailer supply chain, and they find that Pareto improvements can be achieved when inventory ownership is transferred from the individual to the share, and sometimes vice versa.Chintapalli et al. (2017) find that when supplier's production cost is lower for advance orders, an advance purchase discount contract alone does not achieve the supply chain coordination, but the ones with a pre-specified minimum order do.Cvsa and Gilbert (2002) and J. Y. Chen et al. (2017) show that advance purchase discount from the supplier can shape the downstream competition and benefit participants other than retailers in a supply chain with one supplier and two retailers.Cho and Tang (2013) find the retailer's advance selling is better than other strategies, such as regular and a mix of advanced and regular strategies.Tang and Girotra (2017) use real data to study an advance purchase discount contract considering the retailer's information acquisition cost and the wholesaler's limited information about the cost, and they find that advance purchase discount contract can incentivize retailers to share demand information with dual-purchasing wholesalers.Ganet et al. (2019) extend the research of Scarf (1958) and introduce an advance purchase discount into Scarf's model.They show that for any given advance order size, an advance-order dependent (s, S) policy is optimal.
In this paper, we apply the Weak Aggregating Algorithm (WAA) to this distribution-free multi-period newsvendor problem.The Weak Aggregating Algorithm (WAA) is an online algorithm, first proposed by Kalnishkan and Vyugin (2008) and is improved from Vovk's (2001) Aggregating Algorithm (AA).In current literature, some research has applied WAA to study the multi-period newsvendor problem.Levina et al. (2010) first apply the WAA method to the multi-period newsvendor problem and propose an online explicit ordering solution.In addition, they show a theoretical guarantee of cumulative profits.Y. Zhang et al. (2014) extend this problem to a non-stationary demand and propose a competitive ordering policy.Y. Zhang and Yang (2016) consider a two-product muti-period stationary newsvendor problem.Y. Zhang et al. (2019a) extend the twoproduct multi-period stationary newsvendor problem to a non-stationary case with budget constraints.They show that their policy is competitive with the best expert advice.Y. Zhang et al. (2019b) study a discrete newsvendor problem with order value-based free-shaping.Based on the return loss function, they obtain online ordering strategies and show the threshold of the order valuebased free-shipping significantly affects the cumulative losses.Y. Zhang et al. (2020) extend the research of G. Zhang (2010) and learn a multi-period newsvendor problem with quantity discounts.
Different from the above research, we consider the impact of advance-order on the regular-order decision in the distribution-free multi-period newsvendor problem and find the optimal regular-order decision under different advance purchase contracts.Meanwhile, with the inspiration of the WAA, we first obtain the explicit online ordering solution for this problem.Then, we derive a theoretical guarantee which ensures that for any given advance-order quantity, our online ordering solution convergences to the best expert advice for a sufficient large horizon.The remainder of this paper is organized as follows.The Weak Aggregating Algorithm is introduced in Section 2. In Section 3.1, we formulate our online ordering solution and theoretical guarantee of its cumulative gains under the condition of continuous distribution.Based on Section 3.1, we continue to discuss the discrete distribution in Section 3.2.Numerical studies are carried out in Section 4. The paper concludes in Section 5.

Weak Aggregating Algorithm
The online ordering solution will be obtained by applying the Weak Aggregating Algorithm in this distribution-free multi-period newsvendor problem with advance purchase discount.The Weak Aggregating Algorithm (WAA) proposed by Kalnishkan and Vyugin (2008) is an online prediction and learning method with expert advice.It makes the decision based on advice from a pool of experts and aims to develop an algorithm to compete with a benchmark set of ''experts'' who can be free agents or strategies.Given a set of experts who give decisions at the beginning of each period, the DM makes ordering decisions by merging these decisions in a certain way, then meets the demand and gets the feedback.The WAA is similar to the Aggregating Algorithm proposed by Vovk (2001) but uses a learning rate parameter that is proportional to ffiffi ffi n p .In WAA, an initial weight distribution will be set on an expert set when the planning horizon starts.In each period, the weights will be recomputed and assigned to each expert according to the feedback from the previous period and the level of trust DM (newsvendor in this problem) has in each expert.We denote the experts set by Y and assume Y is a measurable space.The decisions made by the experts and the newsvendor are from a set T .The demand set is denoted by D. Gain function p in each period is defined on T 3 D.In one period n, given the newsvendor's decision y n 2 T and demand d n 2 D, the newsvendor's gain is g n = p(y n , d n ).Given the expert u 0 s decision y u n , the gain is g n = p(y u n , d n ).The cumulative gains for the newsvendor and the expert u in first n period are G n = P n i = 1 g i and G u n = P n i = 1 g u i , respectively.We let q(du) as the prior weights assigned to the experts.The weights are recomputed continuously in period n and they are represented by a probability measure p n (du).For more details about the WAA applied to the newsvendor problem, please refer to the following pseudo-code: -Initialize the cumulative gains G n and G u n are 0, u 2 Y; -In each period n = 1, 2, . .., 1.The experts' weights are recomputed: 2. Experts give the decisions y u n ; 3. The newsvendor make the decision 4. The demand d n arrives; 5.The cumulative gains are updated:

Analytic Results
In many industries, ordering in advance to get discounts is a widely-used method for suppliers and retailers.In this section, we incorporate the advance purchase order into the multi-period newsvendor problem and develop ordering solutions by the WAA framework, as mentioned above.Before the start of the planning horizon, a sourcing contract for advance order between the supplier and the retailer is confirmed, and a fixed size of advance orders will be shipped to the retailer for each period.We assume there is only one product is considered.Let Z denotes the advance order size for each period, c 1 is unit cost for an advance order, c 2 (c 1 \c 2 ) is unit cost for a regular order, selling price is p (p.c 2 ), B is an upper bound on total ordering quantity from the supplier by the newsvendor.Throughout this paper, rigorous proofs are provided in the appendix.

Online Ordering Solution for the Continuous Case
Based on the assumption and notations above, given advance order size Z, regular order quantity y, and demand d, the newsvendor's gain in one period is To obtain an explicit ordering decision y, we apply the WAA to stationary experts who keep the regular order quantity at the same value throughout the planning horizon to construct the ordering solution.Let d (1) , . . ., d (nÀ1) be the order statistics demand for the first n À 1 periods, The expert u = y 2 ½0, B stays the same value throughout the planning horizon N .For the convenience of presenting formulas, we let by applying the WAA to stationary expert advice with advance purchase discount, the online ordering solution for regular order in period n is where Proof.With the advance purchase discount factor and the procedure of the WAA, the order quantity in period n can be written as follows: q(dy)

:
Based on the order statistics demand, we have Similarly, Hence, we get the explicit online ordering solution y n .u Base on the following lemma from Levina et al. (2010), the theoretical guarantee for ordering solution (5) is obtained.
Lemma 3.1.Let p 2 ½ÀL, 0. The WAA guarantees that, for all N , Theorem 3.2.The online solution (5) for the multiperiod newsvendor problem with advance purchase discount guarantees that, for all N, We can see that the average performance of newsvendor utilizing the WAA is at most an order of ln (N )= ffiffiffiffi N p worse than any of the experts in Y in this theory.Let s(s\c 1 ) and u be the unit salvage value and shortage cost.When further considering the shortage cost and salvage value, the ordering solution and theoretical guarantee can be obtained by replacing p by p + u À s and c by c À s in equations ( 5) and ( 9).

Online Ordering Solution for the Discrete Case
The online ordering solution and theoretical guarantee obtained by the above cases have an assumption that the product is infinitely divisible, that is, the demand and the total orders (regular ordering quantity plus advance ordering quantity) in one period can be any values in [0, B].This section considers a more realistic situation where the total ordering quantity and demand in one period are discrete integers in [0,B].Same as x3:1, first we let d (1) , . . ., d (nÀ1) be the order statistics demand for the first n À 1 periods, d (t) 2 ½0, B for t = 1, . . ., n À 1. Considering the advance order Z, we let e d (t) = maxfd (t) À Z, 0g for t = 1, . . ., n À 1 and set e d (0) = 0 and e d (n) = B. Suppose that e d (k + 1) = e d (k) + m k , k = 0, . . ., n À 1, where m k are integers.For the convenience of presenting formulas, we let and When there is no salvage value and shortage cost, the online ordering solution can be presented as function ( 12) according to the procedure of WAA with stationary expert advice.Based on the lemma 3.3 in Levina et al. (2010), theoretical guarantee can be obtained in Theorem 3.4.With the advance purchase discount factor, the regular ordering quantity for discrete multi-period newsvendor problem in period n is given as follows: where Proof.According to the decision-making process of the WAA, the regular order quantity is where similarly, Wang et al.
The WAA guarantees that, for all N and u 2 Y, Theorem 3.4.The online solution (12) for the discrete multi-period newsvendor problem with advance purchase discount guarantees that, for all N, The proof of Theorem 3.4 can be easily obtained by Lemma 3.3.

Numerical Studies
In this section, we carry out numerical studies to illustrate the competitive performance of our proposed online ordering solution.In our setting, we set B = 21, p = 2, c 1 = 0:8, c 2 = 1, u = 0:2, s = 0:2, and N = f30, 60, 90, 120, 150, 180g.The stationary demand in our study means the demand in each period is a random number in ½0, B. We consider two different demand distributions: uniform and normal.The uniform distribution is on the interval ½0, B; The normal distribution has a mean of 10 and a standard deviation of f2, 4, 6, 8g.For the convenience of programming and simplicity, we consider a discrete multi-period newsvendor problem in which we truncate each distribution to interval ½0, B and only take integers.For the convenience of presenting, the performance of online ordering solution and the benchmark best expert solution without shortage cost and salvage cost are referred to as the POS and BPOS.Similarly, the performances with shortage cost and salvage value are referred as POSC and BPOSC.In Section 4.1, we first find the optimal advance purchase size with (without) shortage cost and salvage value.In order to show the performance of online solution POS and POSC, in Section 4.2 we compare them with the best expert advice.In Section 4.3, we perform a sensitivity analysis on different demand parameters and compare the corresponding POS and POSC with the BPOS and BPOSC.

Finding the Optimal Z
In this section, we consider two demand distributions mentioned above and the situation with (without) shortage cost and salvage value.Figures 1 to 4 show the cumulative gains under different advance order sizes and planning horizon N for two demand distributions, respectively.From the figures, we find that there is indeed an optimal advance order size and cumulative gain function G N is approximately convex in Z.And as N increases, G N approaches the cumulative gains recommended by the best expert advice determined in hindsight, a more detailed comparison in section 4.2.

Competitive Performance of POS and POSC Versus BPOS and BPOSC
To clearly show the performance of POS, POSC, BPOS, and BPOSC, we generate cumulative gains by taking the    average of 100 times per trial.The results are presented in Tables 1 and 2, where Ratio = POS=BPOS 3 100%.
From the tables, we can see that almost all the average Ratios further show POS and POSC are competitive when the benchmark is the best fixed-stock policy under the different advance order sizes.Figure 5 shows the effect of planning horizon length N on the performance of POS and POSC where Z = 0, from the figure we can see the Ratio s increase with N , which confirms the theoretical guarantee derived above that when N is large enough, the performance of POS and POSC can be as good as the best expert advice.Figures 6 and 7 show the changes in regular order quantity and demand over time, we can see that when the shortage cost and salvage value are considered, POSC's order quantity is higher than POS's order quantity.

Sensitivity Analysis
In this subsection, we perform a sensitivity analysis on the demand distribution first, and then on the ratio of u=s.For a given advance order size Z, cumulative gains change over time under different parameters are shown in Figures 8 and 9. From the figures, we can see that as the volatility of demand increases, the cumulative gains decrease, which is consistent with intuitive perception, because the increase in uncertainty will increase the difficulty of forecasting, and from Table 3, we can see the optimal fixed-stock policy also show similar results, and our POS (POSC) needs more time to be as good as the optimal fixed-stock policy.
Next, we test how cumulative gains are affected by different ratios of u and s, the demand distribution is normal distribution.We let Z = f0, 10g and u=s = f0, 0:5, 1, 1:5g, demand distribution is consistent with the initial setting.From Figures 10 and 11, we can see that under normal demand distributions and different advance order sizes, our POSC's cumulative gains all increase with the ratio of u=s.Because from function 3, it's easy to see that the gain function g increase with u.
From Table 4, we find the cumulative gains of the best fixed-stock policies BPOSC s all decrease with the ratio of u=s.Because for the best fixed-stock policy, due to the increase in the cost of shortage, the dynamic order quantity shows better demand satisfaction and cost reduction.

Conclusions
In this paper, we study a distribution-free multi-period newsvendor problem with advance purchase discount, which widely exists in real life.We design an explicit online ordering solution for this problem using the weak    aggregating algorithm from computer science.Taking the best fixed-stock policy determined in hindsight as the benchmark, we prove that the proposed online solution can theoretically guarantee that the cumulative gains are competitive to the benchmark.More importantly, the results obtained in this study can provide a reference for industrial managers who need to order perishables continuously for a long time when the demand distribution is unknown.Finally, it is interesting to expand this problem to the multi-product case and integrate some other practical factors into the problem in future research.

Appendix
Proof of Theorem 3.

Figure 1 .
Figure 1.Graphs of G N for the uniform demand distribution.The optimal advance ordering quantity is 12.For the left graph, N = 30; for the right graph, N = 180.

Figure 2 .
Figure 2. Graphs of G N for the uniform demand distribution and considering salvage value and shortage cost.The optimal advance ordering quantity is 14.For the left graph, N = 30; for the right graph, N = 180.

Figure 3 .
Figure 3. Graphs of G N for the normal demand distribution.The optimal advance ordering quantity is 10.For the left graph, N = 30; for the right graph, N = 180.

Figure 4 .
Figure 4. Graphs of G N for the normal demand distribution and considering salvage value and shortage cost.The optimal advance ordering quantity is 10.For the left graph, N = 30; for the right graph, N = 180.

Figure 6 .
Figure 6.Regular order quntity changes over time where Z = 0 and demand satifies a uniform distribution.

Figure 9 .
Figure 9.Under normal distribution, the Cumulative gains of shortage cost and salvage value are considered where SD (standard deviation) in {2, 4, 6, 8} and Z = 0.

Figure 7 .
Figure 7. Regular order quntity changes over time where Z = 0 and demand satifies a uniform distribution.
2. Under the setting above, Y = ½0, B where B = B À Z is finite.The online ordering solution 5 is obtained by the WAA to stationary experts where y u n = y 2 ½0, B stays the same value throughout all of the planning days, decision set T is in ½0, B and demand set D is in ½0, B. According to gain function (3), the largest profit in one period comes from when the advance ordering quantity plus the regular ordering quantity equals demand B, which is (Z + B)p Àc 1 Z À c 2 B; and the worst gain comes from when the advance ordering quantity plus the regular ordering quantity equals B but the demand is 0, which is equal to Àc 1 Z À c 2 B. Thus the gain function satisfiesÀ c 1 Z À c 2 B ł p(Z, y, d) ł (Z + B)p À c 1 Z À c 2 B: y, d) :¼ p(Z, y, d) À ½(Z + B)p À c 1 Z À c 2 Band get À p(Z + B) ł p(Z, y, d) ł 0 Defining L :¼ Àp(Z + B), and put this problem into the framework of Lemma 3.1.Let d :¼ max (p À c 2 , c 2 ) ł p be the least upper bound of the absolute value of the slope of the gain function as a function of y.According to Levina et al. (2010), we can bound of the integral ln Ð Y e G u N = ffiffiffi N p qd(u) in ((8)) from below by replacing the interval of intehration ½0, B by the B= ffiffiffiffi N p neighborhood of some given initial stock y 2 ½0, B. The volume of this neighborhood y is an interval of length at least B= ffiffiffiffi N p

Figure 10 .
Figure 10.Changes in cumulative gains over time under different ratios of u=s and Z = 0.

Figure 11 .
Figure 11.Changes in cumulative gains over time under different ratios of u=s and Z = 10.

Table 2 .
Results for Uniform Distriubtion With Shortage Cost and Salvage Value Where Advance Order Size is 0, 4, 9, and 14 and N = 30.

Table 3 .
Cumulative Gains and Ratio Under Different Demand Distribution Where Z = 0.