This dataset contains house sale prices for King County, which includes Seattle. It includes homes sold between May 2014 and May 2015. We invoke the sklearn libraries to model and predict house prices in training and testing framework.
id : A notation for a house
date: Date house was sold
price: Price is prediction target
bedrooms: Number of bedrooms
bathrooms: Number of bathrooms
sqft_living: Square footage of the home
sqft_lot: Square footage of the lot
floors :Total floors (levels) in house
waterfront :House which has a view to a waterfront
view: Has been viewed
condition :How good the condition is overall
grade: overall grade given to the housing unit, based on King County grading system
sqft_above : Square footage of house apart from basement
sqft_basement: Square footage of the basement
yr_built : Built Year
yr_renovated : Year when house was renovated
zipcode: Zip code
lat: Latitude coordinate
long: Longitude coordinate
sqft_living15 : Living room area in 2015(implies-- some renovations) This might or might not have affected the lotsize area
sqft_lot15 : LotSize area in 2015(implies-- some renovations)
In the Google Colabs below we use three modelling techniques: (1) OLS (2) Random Forest (3) XGBoost - extrapolated from Boston House prices. The OLS technique and R square and error estimation are explained here. The difference between Random Forest and XGBoost is explained simply here.
Pivot tables in Excel provide a practical means for summarizing a more extensive database or spreadsheet by introducing a more compact table. Pandas provides a similar functionality to excel in this regard by making use of the pandas function pivot_table . Below we will focus on explaining the pandas pivot_table function and how to use it for the Kings County dataset.
Please check here.