The Perfect Inventory Competition
Businesses need accurate forecasts and adequate inventory policies to run their supply chains. But how important are they to reach service level targets while reducing total inventory (and costs)? No formula will answer this question. In this article, we (SupChains) will lay the framework that we will use to assess the impact of forecasting accuracy and adequate inventory policies.
Forecasting Accuracy: Does it Matter?
When business leaders are pondering over investment decisions, they want to know the expected benefits and costs of a project. Improving the demand forecasting engine or changing the inventory policies is no exception to that.
This forecast-improvement project costs 100,000$. How much extra forecasting accuracy can I expect from it? And how will this extra accuracy translate in higher service levels or lower inventory levels?
These are important questions: You do not want to invest in an initiative if you do know how much you are going to get out of it. Unfortunately, there is no consensus on how much you can expect from accurate forecasts. Of course, you can expect lower inventory and higher service levels. Unfortunately, no simple formula will give you a straightforward, precise answer on how much you can expect.
Nevertheless, software vendors, institutions, and consultants tried to answer this 1-million-dollar question. Let’s mention some of the reported figures.
- Gartner reported in a 2017 report that for every 1% of forecast improvement, a consumer goods company could achieve a 2.7% reduction in finished goods inventory (days), a 3.2% reduction in transportation costs, and a 3.9% reduction in inventory obsolescence. We hardly think that these impressive ratios could be applied to any business without a deep understanding of their inventory policies, supply networks, and current forecasting process and abilities.
- The Institute of Business Forecasting published a savings calculator on its website. Unfortunately, this simplistic tool doesn’t grasp any of the usual supply chain complexities (or even basic concepts such as safety stocks).
- Eric Wilson published on the IBF blog that “from our experience, a 15% forecast accuracy improvement will deliver a 3% or higher pre-tax improvement”.
- The McKinsey Global Institute reported that “improving forecasting accuracy by 10 to 20 percent translates into a potential 5 percent reduction in inventory costs and revenue increases of 2 to 3 percent”
More recently, the forecasting journal Foresight published a special feature on the subject (Issue 68, Special Feature: Does Forecast Accuracy Even Matter?), where various articles discussed how forecasting accuracy impacts inventory costs and levels. Instead of using simplistic formulas or referencing their own experiences, they used simulations to assess the impact of forecasting.
We strongly agree with the objective (assessing how forecasting accuracy impact inventory and service levels) and methodology (using inventory simulations) proposed by these authors. Unfortunately, we disagree with some important details on how to run the simulations and tune the inventory policies. At SupChains, we think that good inventory planning is tied to both accurate forecasts and using adequate inventory policies. One without the other wouldn’t drive much value.
Henceforth, as supply chain data scientists, we decided to run our own set of experiments. The remaining part of this article will explain our dual objective (assessing business value and model calibration) and experimental framework (model competitions using simulations).
Measuring the Quality of Models
Let’s first discuss why the usual methods we use to track the quality of forecasts and inventory policy aren’t good enough to measure their impact on business. Instead, we propose to run competitions using simulations.
Measuring the quality of a forecast is relatively easy. As explained in Demand Forecasting Best Practices, use value-weighted metrics to track both the accuracy and bias of your forecasts by comparing them to unconstrained demand. You can compare various forecasts and pick the best one.
To put it in a very straightforward way: you forecast the future demand of a product to be 100 units, and your colleague predicts 50. Actual demand is 120. You are more accurate. Unfortunately, accuracy metrics won’t tell you how much business value a forecast generated for your company.
Assessing the quality of an inventory model is more complicated. Let’s imagine the following debate: you want to set the safety stocks of a product to 50 units and trigger weekly replenishments; your colleague wants to go for 60 pieces of safety stock with bi-weekly replenishments. How do you know which policy is best?
One way to assess the quality of an inventory policy would be to run it for a few months while tracking inventory and service metrics. Unfortunately, this is (heavily) time-consuming, comes with unbearable risks (imagine testing a poor model), and you can’t test a policy in isolation from disruptive effects (what if a major viral outbreak happens during your testing period?).
The only way to settle this topic would be to use historical data to simulate both policies and track their impact in terms of service levels and inventory levels. [Technically, to be complete, our competition should track the total costs resulting from both inventory policies (such as transaction and purchasing costs). Due to the difficulties in assessing such costs in practice (as discussed later), we will limit the scope of the comparison to service and inventory levels.]
As we have seen, regular forecasting metrics or trying out different inventory policies are not suitable for assessing the actual business value of models. Instead, we will use simulations.
The Perfect End-to-End Inventory Competition
To assess the importance (and quality) of both forecast and inventory models, we will organize a matrix competition assessing different pairs of forecast/inventory models using a simulation engine.
Our simulation engine will go back in time and generate week-by-week supply orders based on the historical information available at the time (forecasts, inventory levels, historical forecasting accuracy) and the chosen inventory and forecasting models. A special effort will be required to assess historical unconstrained demand during shortage periods (we discuss different techniques in this article.)
Let’s review the main models we will throw into the competition:
- Forecasting Models. We will compare benchmarks (such as moving averages) against statistical and machine-learning models. The more advanced models will be based on SupChains’ forecasting engine (which often beats benchmarks by 20 to 35%, see our latest case studies here).
- Inventory Models. We will compare demand-driven and forecast-driven inventory models. Demand-driven models look at historical demand variation to assess how much safety stock is required (this practice is often used in practice, even though it makes no sense). Whereas forecast-driven models look at historical forecast errors to gauge safety stock requirements. Our comparison will also include inadequate models to replicate the most common mistakes practitioners tend to make.
To simplify our simulation, we will limit its scope by assuming common periodic replenishments for all products and disregarding all minimum order quantities or values.
The 4 Biggest Mistakes When Using the Safety Stock Formula
The usual safety stock formula Ss = z sigma sqrt(L) is based on multiple assumptions and shortcuts often not respected…
Now that the foundation for the competition is clear, we can discuss how we will evaluate the models along two dimensions: business value (how much service level you get for a given inventory level) and calibration (does the model deliver on its expected service level).
Objective #1: High Business Value (High Service Level, Low Inventory)
The main objective of our models is to achieve more (high service levels) with less (low inventory levels). Tracking service levels can be a source of confusion as there are many ways to measure service. We will focus on measuring fill rates.
Service Level Definitions
Service level metrics often confuse planners. This is dangerous: using different definitions will result in different…
Alternatively, we could choose profit maximization (or cost minimization) as an objective. To do so, our simulation engine should then include multiple cost variables (such as holding, transaction, and purchasing costs) and selling prices as inputs and should generate as an output an optimal inventory policy maximizing profits. (Service levels would then be a by-product.) Unfortunately, in practice, capturing all these costs is tremendously challenging for companies — if not downright impossible. Moreover, many supply chain leaders set explicit service level targets as part of their S&OP process or strategy. The planners have then to deliver on these targets with the minimum costs and inventory levels. We will therefore stick to service-level objectives.
Using our simulation framework, for each inventory-forecast model, we can draw the Pareto frontier: how much service level can you achieve based on an inventory level?
The Pareto frontier shows the trade-off that you can obtain between inventory levels (or, more generally, costs) and service levels. Using the same inventory-forecast model combination, you can always ask for higher (or lower) service levels, but it will result in higher (resp. lower) inventory levels.
As the efficient frontier changes depending on the forecasting model, we will be able to reply to the question: “Does lower MAPE, MAE, RMSE, or Bias result in higher business value?”
Objective #2: Calibration
Imagine you set your inventory model to achieve a 98% fill rate. After a few months, you realize that you are achieving around 99.5% service level. (Let’s, for the sake of the discussion, assume that you let the model run alone without any human intervention.) Is this a good thing? I don’t think so. It might be a sign that the inventory model is poorly calibrated (or using wrong data inputs), resulting in inappropriate inventory requirements and supply orders.
When looking at service levels, you might be tempted to think that more is better. In other words, you can be glad to get 99.5% service when you asked for only 98%. Actually, this might not be a profitable situation as you didn’t get the extra service for free: you had to use extra inventory to get it. Most likely, you could have achieved 98% with a much lower inventory level. There ain’t no such thing as a free lunch.
Using our simulation engine, we will assess the ability of the models to deliver on their promises — we call this model calibration. Simply put, if you ask for a 95% fill rate, will you get a 95% fill rate?
See the results we obtained for one of our clients:
What’s the Business Impact of 10% Extra Forecast Accuracy?
How much savings will we achieve if we improve forecasting accuracy by 10%?