This project was a two-week hackathon as part of my Data Science for Engineers class. My professor obtained data of Austin real estate closings for the past six months and wanted us to come up with some predictions for close prices and close-to-list ratios, given that the Austin real estate market is quite hot. I was with a team of three other mechanical engineering undergraduates for this project.
We started out by reducing the dataset to only look at single-family residences in the Austin area, and were able to visualize the market changes that our professor was talking about through patterns in the DOM and close-to-list ratio over time.
There were more than 250 features available in our dataset, and we narrowed down 20-30 features of interest to help predict the close price. We also engineered new features based on property distances from points of interest in the Austin area (major cities, businesses, and the metro) to see how they might influence the price of a home.
Above: an example of a heatmap for some of the numerical features to examine their correlation to the close price
Left: latitudes and longitudes of our properties (blue) and various points of interest (red, orange, purple) in the Austin area
Finally, we ran various regression models using several combinations of these features and compared their performance. A random forest model produced the best results, with our predictions of close price having a mean absolute error of about 16% with our final set of features. We found that square footage and location-based features (latitude, longitude and distances to points of interest) are among they key drivers of home price. Our results were compiled into a Jupyter notebook for final presentations - including a live demo of our model - to our professor and client that were well-received.