National University of Singapore

Department of Industrial Systems Engineering & Management

B.Eng(ISE) Final Year Project (2021/2022)

 Applying Machine Learning to Product Categorization for a Lubricant Manufacturing Company

Wei Yize

Abstract

Product Categorization in the manufacturing industry is a long-lasting problem. The traditional method is using ABC Analysis or Analytic Hierarchy Process (AHP) Analysis. Both methods include a lot of manual work and is prone to errors. With the advancements in machine learning and even deep learning techniques, it is possible to propose better algorithms with regards to accuracy, flexibility, and convenience. Therefore, this thesis investigates a collection of both supervised and unsupervised machine learning models that can be applied to product categorization.  In the sample study, the models are evaluated using the data of the sales history from a manufacturing company. The initial dataset contains the information on the individual transactions. After the data cleaning step, the relevant features for various products are extracted and regrouped. In general, the product categorization algorithm aims to categorize the objects into various groups based on a certain number of features. In this thesis, there are five common supervised learning algorithms being investigated. They are K-Nearest Neighbors, Decision Tree, Support Vector Machine, Naïve Bayes and Multi-layer Perceptron. Moreover, we have also incorporated two unsupervised learning algorithms, which are Hierarchical Clustering and Principle Component Analysis. For each machine learning method, the algorithm will be explained and the important parameters will be discussed. Then, the performance of the algorithm will be evaluated based on the prediction accuracy. According to the results, the Tree Based Models are the most promising model for product categorization. The inherent rationale of the binary split is very similar to that of human in this particular problem. It is also recommended to reduce the dimension of the features in advance as well. The contribution of this study is threefold. Motivated by a real-world problem in a manufacturing company, the experiments have clearly demonstrated and validated the capabilities of the machine learning techniques in the future. Also, product categorization itself is a topic of great value. In this study, multiple ma chine learning methods have been investigated in a systematic way, and compared with the baseline model that is used in the industry. The experiments have been conducted with detailed documentation to inspire future research in this field.