Assessing Products’ Forecastability: Forecasting Benchmarks vs. COV
In this article, I will present you with 3 ways to assess the forecastability of your products (or any time series). Spoiler: industry benchmarks are the worst.
When discussing forecasting in workshops, I usually get the following question from my clients:
Is our current forecasting accuracy % good enough?
Imagine the following case, you are responsible for forecasting the demand of a portfolio of products, and you want to know if your current accuracy is good or bad.
Here are 3 ways to do this from worse to best.
Industry Benchmarks
Many companies want to compare themselves to their peers by buying industry benchmarks from data providers. However, I would not advise you to use industry benchmarks to assess your forecasting capabilities.
Here’s why:
- Industry benchmarks are expensive. Better, free solutions exist (as we will discuss in the following paragraphs).
- Many benchmark providers won’t compute forecast accuracy themselves. Instead, they will simply ask companies to fill in a cell in an excel file. Who knows how everyone measures forecast accuracy.
- You cannot be sure that your competitors measure forecasting accuracy with the same metrics as you (especially if you use value-weighted KPIs).
- Your competitors might be measuring forecasting accuracy at another aggregation level (for example, by country, whereas you track it per region).
- Different businesses follow different strategies with varying product portfolios sizes. For example, you might have 500 products in your catalog, whereas your main low-cost competitor only offers 20. Naturally, this will result in better accuracy. Nevertheless, your low-cost competitor achieving a good forecast accuracy doesn’t say anything about your own demand forecasting process.
Demand Coefficient of Variation (COV)
The demand coefficient of variation (COV) is computed as the demand standard deviation divided by its mean.
The coefficient of variation of a dataset represents the ratio between its standard deviation and its mean. It is often expressed as a percentage.COV = σ/µ
Many practitioners (such as Lora Cecere) use it to measure a time series’ variability. But, unfortunately, as I will show you, the historical demand deviation doesn’t always correlate with the demand forecastability (actually, I think COV is irrelevant). In other words, measuring the COV won’t tell you if your product is easily forecastable.
If you are lucky enough to deal with flat-demand products, COV is a good indicator of their forecastability.
If you work in a real supply chain, it is much more likely that your products display trends and seasonalities. In such cases, COV won’t give a good indication of forecastability. Look at the following two examples. They are rather easy to forecast, and yet their COV will look quite bad.
You would face the same issue with promotion-driven products. For example, you could easily forecast that a specific product would get a 200% sales uplift during a promotion. Yet COV will flag this product as erratic based on its historical behavior.
Forecast Benchmarks
Instead of measuring an item’s COV to assess its forecastability, you should use forecast benchmarks. The idea is simple:
- Run a simple forecasting algorithm — such as a moving average — through historical periods and track its accuracy using your favorite metric (mine is weighted MAE + |Bias|).
- Compare the benchmark’s results against your model/process.
Using a naive forecast as a benchmark used to be considered as a best practice. But naive forecasts are often too inaccurate, resulting in poor results that are easily beatable by any other forecasting technique. So instead, I like to try out a few (very) simple benchmarks — a few moving averages and a few seasonal moving averages — and keep the best one. This way, I am sure to use a benchmark that is not artificially bad.
Q&A
What is a good forecasting benchmark?
A moving average is a good forecasting benchmark as it is straightforward to use and provides good results. Usually, averaging the last 3 to 6 periods will achieve the best results. However, if you have seasonal products, you should use seasonal moving averages instead.
Should I use a naïve forecast as a benchmark?
Avoid using a naïve forecast as a forecasting benchmark: it is too easy to beat. Achieving a better forecast accuracy than a naïve forecast shouldn’t be considered satisfactory for a model.
Should I use or buy industry benchmarks?
Industry benchmarks are expensive and not always available. Moreover, you cannot ensure that your competitors measure forecasting accuracy the same way as you (especially if you use value-weighted KPIs). So instead, you should compare your forecasts against a forecasting benchmark (such as a moving average).
Is demand COV a good forecasting benchmark?
As measuring demand COV will tag products with trend or seasonality as difficult to forecast, demand COV should not be used to assess the products’ forecastability. Instead, you can assess the forecastability of a product by measuring a benchmark’s forecast error (such as a moving average or a seasonal moving average).
— This is a bonus paragraph for the forecast KPIs lovers —
Including Forecasting Benchmark in Forecasting Metrics
Two forecasting metrics include a forecasting benchmark at their core:
- MASE (Mean Absolute Scaled Error), where the model’s MAE is divided by the naive historical MAE.
- RMSSE (Mean Squared Scaled Error), where the model’s RMSE is divided (scaled) by the naive historical RMSE.
RMSSE was used during the M5 forecasting competition, whereas MASE was used during M4 (and initially proposed by Hyndman in 2006).
The utilization of RMSSE/MASE might make sense in some forecasting exercises. But they have two drawbacks for day-to-day business utilization.
- MASE and RMSSE are scaled metrics: as naïve errors scale each product, the resulting metric is “scale-less,” usually ranging from 1 to 0.5. Therefore, to compute a metric across your portfolio, you will need to weight products by their turnovers. Otherwise, your biggest sellers will be considered as important as your long-tail products.
The weighted versions of RMSSE and MASE are called WRMSSE and WMASE. More info in my article Forecast KPI: How to Assess the Accuracy of a Product Portfolio - Both metrics are difficult to compute, explain, and interpret (especially the weighted versions). Thus, they could be used as a technical metric but not as a business KPI.
Usually, comparing your overall forecast accuracy to the one achieved by a (good) forecasting benchmark will be more straightforward than computing WRMSSE/WMASE, and it will result in the same kind of insights (if not better as you can use better benchmarks than a naive forecast).