Wine Quality

Classification & Regression on Wine Quality

SHIVAN ANAND

05419011921

AI-DS B1

ABOUT THE PROJECT

This project presents the results of applying Random Forest Classifier and Logistic Regression on the Red Wine dataset. The purpose of this study is to compare the accuracy of these two algorithms in predicting the quality of red wine. Wine quality is an important factor that influences the consumer preference and market value of wine. In this report, we aim to build a machine learning model that can predict the quality of wine based on its physicochemical properties. We use the Red-Wine quality dataset from the UCI Machine Learning Repository, which contains 1599 samples of red wine and 12 features. We perform some exploratory data analysis and data preprocessing on the dataset, and then apply a Random Forest Classifier model using sklearn. We evaluate the model performance on the testing set using various metrics. We compare the results with a baseline logistic regression model, and find that the Random Forest Classifier achieves a higher accuracy of 0.925 and a higher f1-score. We conclude that the Random Forest Classifier is a suitable model for predicting wine quality, and discuss some limitations and implications of the analysis. We also provide some recommendations for future work, such as using more features, applying feature selection or dimensionality reduction techniques, and exploring other machine learning models or ensemble methods.

ABOUT DATASET

We use the wine quality dataset from the UCI Machine Learning Repository, which contains 1599 samples of red wine and 12 features. The features are fixed acidity, volatile acidity, citric acid, residual sugar, chlorides, free sulfur dioxide, total sulfur dioxide, density, pH, sulphates, and alcohol. The 3 target variable is the quality of wine, which is a score between 0 and 10. We download the dataset from https://archive.ics.uci.edu/ml/datasets/wine+quality and load it into a pandas dataframe.

RANDOM FOREST CLASSIFIER

Random Forest Classifier is a supervised machine learning algorithm used for classification, regression, and other tasks using decision trees. It is a meta estimator that fits a number of decision tree classifiers on various sub-samples of the dataset and uses averaging to improve the predictive accuracy and control over-fitting.

REPORT LINK

https://drive.google.com/file/d/1dYZIkPe2HfO96F6GECaLI0EvKlS_Ur5B/view?usp=sharing

Questions?

Contact me to get more information on the project

Email Me

Page updated

Google Sites

Report abuse