Demand Forecasting for @tafelPH
This is my individual project for our Machine Learning class.
Summary
Tafelph is a Shopee-based online pen boutique. They sell specialty pens, markers and highlighters from popular Japanese brands like Zebra, Uni and Pilot. All of the items sold are for pre-order, and customers usually need to wait for at least a month to get hold of their ordered items. Due to the shop’s per-order configuration, almost 52 % of the placed orders were canceled.
One of the solutions to this problem is having on-hand and ready to ship to customers stocks. But since the online shop is owned and run by a self-supporting med student, the capital is only limited. Aside from this, Tafel has a wide variety of products like pens, highlighters, pencils and markers of different color, sizes and types from 12 different brands adding up to 1,331 SKUs. It is not feasible for the shop to store inventory for every SKU due to budget and space constraints. However, top fast-moving SKUs (or those items that are frequently ordered) can be identified, and forecast the demand in the next few weeks. In doing so, the shop will have an idea of the expected volume of transactions in the next few weeks and plan its resources to acquire enough inventory to satisfy customers’ demand, at least for the top SKUs identified.
The shop’s daily order transactions were extracted from Shopee database. For this study, I have only considered transactions from June 2018 to July 2020 as sales only started to become stable in June 2019.
Forecasting can be done using ARIMA or any regressor models. For this project, I have explored ARIMA, Decision Tree, Random Forest and Gradient Boosting Regressor algorithms. Random forests and GBM are typically not used in forecasting time series data. However, lagged time series data can be used as features for modeling. Product features such as name, brand, category, and price can also be incorporated. It can also forecast sales for multiple products at the same time, unlike ARIMA which can only predict one product at one time.
Over-all, Gradient Boosting Regressor is the best model based on RMSE. An overstock or understock of 4 items is already acceptable to the shop owner. The model also identified Original Price and demand at the previous 2 weeks (q_lag2) as the top contributors to the predictions. This buying pattern can be associated with paydays given every two weeks.