Dataset: https://www.kaggle.com/deltacrot/property-sales
Software: R
Dataset Description:
Property sales data for the 2007-2019 period for one specific (unknown) region.
Variables include the date of sale, price, property type, number of bedrooms, and a 4-digit post code (used for reference only).
Method Employed:
Cluster to identify the representative series
Compare ETS, Arima and neural network model
Bagging
What are some possible patterns of property selling prices?
Is there any relationship between the selling price and the property type?
Is there any relationship between the selling price and the number of bedrooms?
Can we predict the price of a certain type of property using a time series model?
Split into 10 categories series
Property Type: Unit/ house
Number of bedrooms:1,2,3,4,5
Exclude unit 4 and unit 5
Reason: Too many missing value
Choose median as representative
Reason: do not have big seasonality
Limitation: may lose information
Choose 2008 to 2018
Reason: most series get values during this time
TS outliers
Get the clean data
Transformation
Stabilize the variance
Make sure the prediction is always be positive
Training set and test set
The training set is from 2008 to 2017. And we use last one year as our test set.
Both unit 1 and unit 2 are stable without a strong trend.
Both house 1 and house 2 are stable with big variance.
House 3 and house 4 have a strong trend with small variance
House 5 have a strong trend and big variance.
Unit 1, House 2 don't have trend or any seasonalities
House 3 have strong trend.
Unit 1 is hardest series to predict.
Non-linear Test
p - value is 0.86, indicating no need for neural network here
Best ETS: ETS(A,N,N) - Fits the fact that no trend and no seasonality
Alpha: 0.1141
L: 12.6458
Sigma: 0.1097
R result shows the best arima should be arima (1,0,2), but according to the pattern no trend and no seasonality, arima (1,0,1) also be fitted.
The ma2 doesn't significant plus the sigma of ARIMA 101 is less than Sigma of ARIMA 102.
Model Quality - Accuracy and Residue
Arima 101 has the lowest MAPE and MASE in test set and variance
Arima 102 has the smallest MAPE and MASE in traing set.
Arima 101 choose to represents unit with 1 BR
Forecast Interval
Arima 101 has narrower interval with ETS model
Don't have clear pattern on the retrospective accuracy and rolling window cv on data
Keep ARIMA 101
Non-linear test : 0.3278. No need for neural network model here
Best ETS: ETS(A,N,N): Alpha: 0.0001, L: 12.9521, Sigma: 0.157
Best ARIMA: ARIMA(0,0,0) : Mean:12.9521 Sigma: 0.1570
Model Quality Check
The Arima 000 and ETS ANN model have similar result.
Either ARIMA 000 or ETS ANN model will be a good choice in this situation.
Non-linear Test : 0.4342
Using Whole Dataset: ETS(A,N,N)(not make sense)
Reason: Outlier at beginning
Using data from 2009 to 2017 : ETS(M,Ad,N) (Make sense ), ARIMA(0,1,3)
Model Quality Check
ETS (M,Ad,N) has better result than ARIMA model
Bagging Model
Using bootstrap and bagging model here.
A more clear model we had with similar trend and pattern to the orginal dataset.
Model Forecast
The ETS model have narrower interval than ARIMA model
Compared MAPE and MASE among all three models,
Bootstrap model has least MAPE and MASE.
The bootstrap model will be used here to represent the House Price with 3BR
Conclusion:
Three Groups and their represent model
Unit 1 and Unit 2 - Arima (1,0,1)
House 3, House 4 and House 5 - Bagging
House 1 and House 2 - Arima (0,0,0) or ETS(A,N,N)
Limitation and Future Work
Data Preparation
Just take the median as representative
Need better way in data cleaning
Model
Only use ETS and ARIMA model when building
Need Try other models
Data
More data about property price
Data can include location
Analysis can include other factors in the future