Due in class by the last class of the semester (Apr 30, 2024) or to submit on the day of final presentation). Grading:
1) 20% Description of the problem or dataset, including how you processed the data
2) 10% Description of the machine learning algorithm you pick.
3) 30% Description on the performance metric you use, how you split the data for training and test, and report your results with the selected performance metric. Visualization, if applicable, of your results.
4) 20% R code (and/or Python or other scripts that is used to process the data)
5) 20% Writing of the report (correctness, clarity etc)
This project is about machine learning. You will pick a data set suitable for classification tasks and apply logistic regression (including multiple logistic regression) or Neural Network. One source for the data set is the UC Irvine Machine Learning Repository. You will write a project report CLEARLY describing the following:
a) A description of the data set (please write in your own words), including where this data set is from, what this data is for, what are the features and response variable, number of instances, number of features, and how you obtain the training and test set.
b) A description of the method, including how you fit the model, what are the model parameters and their fitted value, and interpret the model fitting results (same as you did for Project 1) if you are doing logistic regression.
c) The performance metric used in your evaluation of the method, and what is your reported result. If the data set you choose has been used by other people in the literature, please mention their performance metric and result.