Measuring forecast accuracy (or error) is not an easy task as there is no one-size-fits-all indicator. Only experimentation will show you what Key Performance Indicator (KPI) is best for you. As you will see, each indicator will avoid some pitfalls but will be prone to others.
The first distinction we have to make is the difference between the precision of a forecast and its bias:
When it comes to demand forecasting, most supply chains rely on populating 18-month forecasts with monthly buckets. Should this be considered a best practice, or is it merely a by-default, overlooked choice? I have seen countless supply chains forecasting demand at an irrelevant aggregation level — whether material, geographical or temporal. In this article, I propose an original 4-dimensions forecasting framework that will enable you to set up a tailor-made forecasting process for your supply chain. I like to use this framework to kick off any forecasting project.
An accurate forecast is not good enough.
You need a useful one.
I recently read yet another article showing you how to speed up the apply function in pandas. These articles will usually tell you to parallelize the apply function to make it 2 to 4 times faster.
Before I show you how to make it 600 times faster, let’s illustrate a use case using the vanilla apply().
Let’s imagine you have a pandas dataframe df and want to perform some operation on it.
I will use a dataframe with 1m rows and five columns (with integers ranging from 0 to 10; I am using a setup similar to this article)
Let’s start with a few questions. Read them first before going through the article. By the end of your reading, you should be able to answer them. (The answers are provided at the end as well as a Python implementation)
Usual articles will perform the following case: create a list using a for loop versus a list comprehension. So let’s do it and time it.
iterations = 100000000start = time.time()
mylist = 
for i in range(iterations):
end = time.time()
print(end - start)
>> 9.90 secondsstart = time.time()
mylist = [i+1 for i in range(iterations)]
end = time.time()
print(end - start)
>> 8.20 seconds
As we can see, the for loop is slower than the list comprehension (9.9 seconds vs. 8.2 seconds).
List comprehensions are faster than for loops to create lists.
But, this is…
When discussing forecasting in workshops, I usually get the following question from my clients:
Is our current forecasting accuracy % good enough?
Imagine the following case, you are responsible for forecasting the demand of a portfolio of products, and you want to know if your current accuracy is good or bad.
Here are 3 ways to do this from worse to best.
Many companies want to compare themselves to their peers by buying industry benchmarks from data providers. However, I would not advise you to use industry benchmarks to assess your forecasting capabilities.
As Data Scientists, we like to run many time-intensive experiments. Reducing the training speed of our models means that we can conduct more experiments in the same amount of time. Moreover, we can also leverage this speed by creating bigger model ensembles, ultimately resulting in higher accuracy.
Chen and Guestrin (from the University of Washington) released XGBoost dates in 2016. They achieved significant speedups and increased predictive power compared to regular gradient boosting (see my book for a comparison, see scikit-learn for regular gradient boosting). This new model soon became data scientists' favorite on Kaggle.
Let’s run XGBoost ‘vanilla’ version…
ABC analysis is the wrong methodology used to answer the right questions.
Before jumping in the discussion on why ABC analysis should be avoided — and what to do instead. Let’s take a minute to define ABC XYZ categorizations.
ABC Analysis is a simplistic, arbitrary technique to categorize items based on two thresholds along one dimension. Items are then segregated into three categories (A, B, and C). Group A contains the few most important items. Whereas the trivial many items are categorized as C.
Usually, ABC analysis is performed based on volume (as shown in the figure below):
This article is an extract from my book Data Science for Supply Chain Forecast.
The history of artificial neurons dates back to the 1940s, when Warren McCulloch (a neuroscientist) and Walter Pitts (a logician) modeled the biological working of an organic neuron in a first artificial neuron to show how simple units could replicate logical functions.
Inspired by Warren McCulloch’s and Walter Pitts’ publication, Frank Rosenblatt (a research psychologist working at Cornell Aeronautical Laboratory) worked in the 1950s on the Perceptron: a single layer of neurons able to classify pictures of a few hundred pixels. …
As a supply chain consultant, I often help my clients to create better inventory models. It is a difficult task — primarily because of data quality and misaligned forecasts. When launching an inventory optimization initiative, it is essential to understand where we start and where we want to go. This will allow you to build up the right expectations, understand what data is required and how much complexity to expect.