My capstone project will be to predict Airbnb prices in Washington D.C. Airbnb is an internet marketplace where people can rent out their property for a fixed rate per day. The main issue for hosts is that Airbnb can’t really determine the best price to sell their property, so it is up to the host to determine it. There are many factors that can determine the price of a property. It mainly depends on what the majority is looking for when they’re looking for places to stay. Machine Learning can be used to find important features from thousands of Airbnb properties to show what people look for when they look for places to stay.
The datasets that I plan on using is from a website that is made by Airbnb called insideairbnb.com. The website provides Airbnb data from multiple cities throughout the world. One of data file is called listings.csv, which is the most important one. This file provides the ID, accommodation, bathroom, daily rate, location, etc. Using machine learning I will be able to extract information based on feature importance in order to accomplish my goal.
For my project, I plan on using two models: Hedonic Regression and Gradient Boosting. Hedonic Regression is great for data that is related to real estate (AKA, multiple properties), and gradient boosting is an overall popular model that people use due to its accuracy and speed. I plan on using these two models to accurately depict future prices for such locations based on feature importance and other factors like seasonality.
The second delivery involved looking at similar projects that other people worked on to analyze their initial findings during their exploratory data analysis. There were many interesting elements that I found while looking at other projects, which can be seen on my delivery 2 presentation.
The data set that I will be using has 106 columns and 9,153 rows. Most of the rows in the data set aren't going to contribute to predicting Airbnb prices, so I went forward and only kept the columns I believed would impact the price of a property.