Registration for Cambridge Corporate Finance Theory Symposium is open!
Machine Learning, Building Vintage and Property Values
By Thies Lindenthal, CERF Fellow (The Department of Land Economy)
Sometimes, all you need is a bit of luck. Erik Johnson (University of Alabama) and I had explored a new way to integrate images from Google Street View as additional input to automatic real estate valuation systems. Writing up the working paper[1], we were looking for relevant policy implications beyond the mundane goal of boosting price prediction accuracy. We struggled. But then the head of UK’s Building Better, Building Beautiful Commission went on the record, claiming that Britain’s housing supply constraints will evaporate if only developers build “as our Georgian and Victorian forebearers built [. . . ] All objections to new building would slip away in the sheer relief of the public”[2]. The research we had done enabled us to put this refreshing view to the test (and to add a policy dimension to the paper).
In a nutshell, our approach automates a process that those of us who have been trying to find a place to rent or buy are surely familiar with: To learn more about a potentially interesting home, one looks it up on Google Street View and tries to infer additional information from the images of the building itself and also get a feeling for the neighbourhood. Street level images are a rich data source, answering many questions such as: How big is the property and garden? How old is it? Is the exterior well-kept? Has the house charm? Is it’s architecture pleasing to the subjective eye? And much more. The challenge is to automatically identify the correct building on Street View, take the best possible picture and to classify the property in several dimensions using computer vision (CV) and machine learning (ML) techniques.
Extracting images of individual buildings from Street View was a bigger challenge than expected. Google’s address information are often relatively broad guesses in the UK. Try finding e.g. “84 Vinery Road, Cambridge, CB1 3DT” on Street View to experience the problem yourself. Based on exact maps from the Ordnance Survey we solve this more technical first step and collect front images of practically all residential homes in Cambridge.
In the ML application, we initially focus on training a classifier for the vintage of buildings. According to colleagues from the architecture department, local houses can be classified into seven broad eras: Georgian (c1714–1837) houses feature key characteristics such as sash windows, fan lights above doors, the use of stucco on facades, often wrought work grilles, railings etc. In the Early Victorian era (c1837–c1870s), a growing taste for individualized embellishment led to the development of elaborate features such as carved barge boards or finials. The development of sheet glass led to sash windows becoming more affordable, and, increasingly, wider. In the Late Victorian era (c1870s–1901), bay windows became more and more widespread, and increasingly substantial. Edwardian architecture (1901-1910) tends to be less ornate than late Victorian architecture. The Interwar period (1918–1939) saw the cost of building construction fall, amidst a drive to provide better housing for the working classes. New housing types were being favoured. The Postwar (1950-1980) era continued on this path, with an embrace of high-rise as well as low rise housing. Facades vary greatly between brick, tiling, pebbledash and render. Our cut-off year for our Contemporary era to begin is 1980. Revival are contemporary buildings trying to emulate historical architecture. It should be self-evident, that the sheer amount of details and variations defies a simplistic classification approach.
We suggest a transfer learning approach in which the images are first translated into high-dimensional feature vectors using an existing CV model (Inception V3[3]). A classifier is then trained to categorise the buildings into vintages, based on the feature vectors (Softmax). An true innovation of our approach is that we include information on neighbouring buildings into the classification, exploiting spatial dependency in building vintages.
Note: Feature vectors generated by Inception V3 have 2,048 dimensions which favours a ML approach (in contrast to e.g. multinomial logit regressions) in the classification step.
Two final-year architectural students classified a large sub-sample of approximately 25,000 images from our data set of Cambridge houses. This is a much larger sample than ultimately needed. In our case, each category requires less than 250 samples to reach almost fully diminished training accuracy for additional observations. We greatly exceed this number so that we can compare the out-of-sample convolutional neural network predictions to the groundtruth as assigned by the experts. This allows us to examine the power and size of the assignment tests. In addition having both human and machine classification for a large sample of the data allows for a robustness checks on the machine comparisons. The accuracy of the automatic prediction is high (Table 1): A machine can relatively reliably tell different building vintages apart, even Revival styles are detected. All comes at modest cost, classifying the universe of buildings in Cambridge takes only seconds on a contemporary laptop.
Table 1: Confusion matrix – Predicted vintage vs. ground truth
Note: Recall is the share of buildings from a ground truth category being predicted correctly (diagonal in mid panel) and Precision is the share of buildings predicted to belong to a category that are indeed from that category. The F1-score is the harmonious mean of Precision and Recall: F1-score = 2 Recall * Precision / (Recall + Precision)
Coming back to the claim made by Building Better, Building Beautiful on historic aesthetics being valued by the people: If that were true, buyers should prefer revival architecture over more contemporary designs. Also, buildings with adjacent buildings in historic or revival appearance should command a price premium. How hard we look, we cannot find any evidence for such a preference in real transaction data. After controlling for a house’s location, size and quality, modern designs are as sought after as replicas of old styles. Not surprising, reviving the good old times will not solve the housing shortage.
We have to speed up the publication of our paper as much as we can, or we risk losing our policy relevance again: The chairman of the helpful government commission has been fired in the meantime – for reasons not related to our research, though.
[1] https://github.com/thies/paper-uk-vintages/blob/master/text/manuscript_assa.pdf
[2] Scruton, Roger. 2018. “The Fabric of the City.” Colin Amery Memorial Lecture. Policy Exchange.
https://policyexchange.org.uk/wp-content/uploads/2018/11/The-Fabric-of-the-City.pdf.
[3] Szegedy, Christian, Vincent Vanhoucke, Sergey Ioffe, Jonathon Shlens, and Zbigniew Wojna. 2015. “Rethink-
ing the Inception Architecture for Computer Vision.” https://doi.org/10.1109/CVPR.2016.308.