SHIVAN ANAND
05419011921
AI-DS B1
This project presents the results of applying Random Forest Classifier and Logistic Regression on the Red Wine dataset. The purpose of this study is to compare the accuracy of these two algorithms in predicting the quality of red wine. Wine quality is an important factor that influences the consumer preference and market value of wine. In this report, we aim to build a machine learning model that can predict the quality of wine based on its physicochemical properties. We use the Red-Wine quality dataset from the UCI Machine Learning Repository, which contains 1599 samples of red wine and 12 features. We perform some exploratory data analysis and data preprocessing on the dataset, and then apply a Random Forest Classifier model using sklearn. We evaluate the model performance on the testing set using various metrics. We compare the results with a baseline logistic regression model, and find that the Random Forest Classifier achieves a higher accuracy of 0.925 and a higher f1-score. We conclude that the Random Forest Classifier is a suitable model for predicting wine quality, and discuss some limitations and implications of the analysis. We also provide some recommendations for future work, such as using more features, applying feature selection or dimensionality reduction techniques, and exploring other machine learning models or ensemble methods.
We use the wine quality dataset from the UCI Machine Learning Repository, which contains 1599 samples of red wine and 12 features. The features are fixed acidity, volatile acidity, citric acid, residual sugar, chlorides, free sulfur dioxide, total sulfur dioxide, density, pH, sulphates, and alcohol. The 3 target variable is the quality of wine, which is a score between 0 and 10. We download the dataset from https://archive.ics.uci.edu/ml/datasets/wine+quality and load it into a pandas dataframe.
Random Forest Classifier is a supervised machine learning algorithm used for classification, regression, and other tasks using decision trees. It is a meta estimator that fits a number of decision tree classifiers on various sub-samples of the dataset and uses averaging to improve the predictive accuracy and control over-fitting.
Contact me to get more information on the project