Python Project
Python Project
Machine Learning Model for Predicting House Price
EXECUTIVE SUMMARY
Apex Realty is a web based online real-estate platform that provides instant price valuation of properties in the United States using the features of the property provided as inputs by customers through the platform. Initially, as a startup company, we are planning to build regression models which can predict the property sales prices in Washington DC at the initial stage, and we will update the model which will cover other states for expanding our business model. The historic data was collected from a Kaggle data source named as “KC_Housesales_Data” which provides necessary data for houses sold in Washington DC over a period of 2 years. Data Cleansing, pre-processing, and transformation were performed before the model preparation. We performed exploratory data analysis to elicit meaningful insight from the dataset that can help our business in making better decisions and to explore the dataset to understand any trends, correlations, or patterns among variables. For building the predictor model we created a hist gradient boosting model, which is a tree-based model and Elastic Net linear regression model. We compared these models to determine which one is performing the best to predict the sales prices of houses, and we concluded that Hist-gradient boost model was most effective in predicting the sales price of houses, with an R2 score of 0.90. However, we can improve the model in future by collecting more data and incorporating supplementary features such as income level of the community, infrastructure availability, population density, cost of living, average sales price of neighborhood houses etc.
Machine Learning Model Used : Hist-Gradient Boosting and Elastic Net Models
Tools used:-
PowerBI - Visualization
Python - Data Cleaning and model building
MySQL - Data Collection and handling
Tableau - Visualization
RStudio - Analytics and Testing the model
Heat Map - Houses for Sale in Washington DC