The demand for a product or service keeps changing from time to time. No business can improve its financial performance without estimating customer demand and future sales of products/services accurately. Sales forecasting refers to the process of estimating demand for or sales of a particular product over a specific period of time. In this article, I will show you how machine learning can be used to predict sales on a real-world business problem taken from Kaggle.
Rossmann operates over 3,000 drug stores in 7 European countries. Currently, Rossmann store managers are tasked with predicting their daily sales up to six weeks in advance. Store sales are influenced by many factors, including promotions, competition, school, and state holidays, seasonality, and locality. With thousands of individual managers predicting sales based on their unique circumstances, the accuracy of results can be quite varied.
To be able to forecast sales six weeks in advance based on the given data. In this project, we will be working on a time-series dataset to be able to know the relations of the target variable with the other variables and design regression and Deep-Learning models to predict sales in a consecutive time manner.
Data can be downloaded from here.
It contains the following Data fields:
Id — an Id that represents a (Store, Date) duple within the test set.
Store — a unique Id for each store.
Sales — the turnover for any given day (this is what you are predicting).
Customers — the number of customers on a given day.
Open — an indicator for whether the store was open: 0 = closed, 1 = open.
StateHoliday — indicates a state holiday. Normally all stores, with few exceptions, are closed on state holidays. Note that all schools are closed on public holidays and weekends. a = public holiday, b = Easter holiday, c = Christmas, 0 = None.
SchoolHoliday — indicates if the (Store, Date) was affected by the closure of public schools.
StoreType — differentiates between 4 different store models: a, b, c, d.
Assortment — describes an assortment level: a = basic, b = extra, c = extended.
CompetitionDistance — the distance in meters to the nearest competitor store.
CompetitionOpenSince[Month/Year] — gives the approximate year and month of the time the nearest competitor was opened.
Promo — indicates whether a store is running a promo on that day.
Promo2 — Promo2 is a continuing and consecutive promotion for some stores: 0 = store is not participating, 1 = store is participating.
Promo2Since[Year/Week] — describes the year and calendar week when the store started participating in Promo2.
PromoInterval — describes the consecutive intervals Promo2 is started, naming the months the promotion is started anew. E.g. “Feb, May, Aug, Nov” means each round starts in February, May, August, and November of any given year for that store.
Promoted stores have higher total sales and customers.
Train and test datasets have almost similar distributions.
Most of the stores don’t work in holidays so the company's total sale decreases during the holidays.
Most of the promoted stores have a higher total sale per customer compared to the unpromoted stores.
Store type ‘A’ has the largest sales compared to the other types.
In terms of total sales and customer size assortment type ‘b’ has the familiarity.
The last months have a surge in aggregated sales and so is the last season of the year.
Days before and after holidays show a surge in total sales.