Automatidata - Taxi Tip Prediction
Project Overview
I developed a machine learning model aimed at predicting whether taxi passengers would give a generous tip, defined as 20% or more. This project showcased my expertise in various aspects of data science, including data preprocessing, feature engineering, and model evaluation using Python. By leveraging these skills, I was able to construct a predictive model that could provide valuable insights into customer tipping behavior in the taxi industry.
Objective
Utilized a dataset of taxi trip records to build a predictive model for tip generosity.
Employed various machine learning algorithms to achieve the best predictive performance.
Tools Used
Python
Pandas
SQL
Jupyter Notebook
Methods
Conducted data cleaning and preprocessing to prepare the dataset for modeling.
Engineered features such as mean trip duration and distance to improve model performance.
Employed Random Forest and XGBoost classifiers for modeling and GridSearchCV for hyperparameter tuning.
Results
Achieved an F1 score of 0.7136 with the Random Forest model and 0.6955 with the XGBoost model.
Identified key features influencing tipping behavior, including VendorID and predicted fare.
Recommendations
Propose further feature engineering, such as creating features based on fare rounding behavior.
Suggest collecting additional data, such as past tipping behavior, to enhance model performance.
Conclusion
Successfully built a predictive model for taxi tip generosity with acceptable performance metrics.
Demonstrated proficiency in machine learning techniques and data analysis.
Tools Used
Python, Pandas, NumPy, Scikit-learn, XGBoost, Jupyter Notebook.