I wanted to build a program that would be able to visualize various real estate statistics based on location. To do this, I created a program that would gather physical addresses from a real estate classifieds website. Next, I trimmed that data using excel to make the data more readable. Then I used the Google Maps API package with GoogleGeocoding implimented in VBA to convert the physical addresses to longitude and latitude coordinates. After I achieved a complete list of GIS data with coordinates, I downloaded a shapefile with official government boundaries to create an background visualization to use as reference to the longitude and latitude data points. Next, I created a program in R that plots data points based on different metrics such as average price per square meter.
Originally started from the idea of creating a program that focuses on real estate statistics similar to Zillow or Redfin. Since these services are already widely adopted within the US, I wanted to focus on a market that is slightly less developed. After traveling to Oslo, Norway, I decided to research into how Norwegians browse the local residential real estate market. After some research, I came across the website Finn.No. This website is similar to a Norwegian Craigslist that is also used for many of the real estate classifieds. Unfortunately the website is in Norwegian, so I just used the Google Translate Google Chrome add on to translate the page to English (I don't know any Norwegian). After browsing the website, I thought that the information could be displayed more clearly and that there could be a lot of valuable information gathered from some of the data.
After I became more familiar with the website, I decided to develop a plan before moving forward. My main long term goal was to get the average price per square foot (in USD) given a relatively large sample of publicly listed Norwegian properties. I figured that this was a good start and began to move forward with the idea. I set out to learn how to pull relevant data from the Finn.No website. I discovered that they did not have any very helpful public API's available, so I decided to get my data straight from the source: via web scraping. Before this project, I did not do very much with web scraping, so this was an entirely new experience. After reading into it, I learned that one of the most popular libraries was the Beautiful Soup library in Python. After learning the library with my little knowledge of Python, I was able to successfully identify the targeted HTML element by inspecting the code. Image 1 here shows the code that was scraped. After this was downloaded and saved to a Microsoft Excel worksheet, I needed to make the data more readable. After some excel work using VBA macros code, the data was much more readable. It looked like this in image 2. Next, I used some basic text editing Excel functions make the data a little more organized. From here I used the code listed on my GitHub to connect to the Google Maps API where I was able to convert the Norwegian addresses into longitude and latitude coordinates using Google Geocoding. Finally, I had a CSV file with the address, price (in NOK and USD), size (in SQM and SQft), longitude/latitude coordinates and also the URL for reference. From here, I learned how to use R Studio to better analyze some of the data points. From here I was able to gather key information on the mean, median, mode, standard deviation, maximum value and minimum value for both $/SQft and also overall price in USD. I was also able to download a shapefile of the geographical boundaries using a shapefile where I used the longitude/latitude datapoints to overlay dots on the map of the country. See image 3 here for the result of running the code in R.
---------------USD/SQFT---------------
Mean: 368.9702
Median: 318.62
SD: 200.4875
MAX: 1223
MIN: 41.82
--------------PRICE (USD)-------------
Mean: 397073.6
Median: 329800
SD: 268381.4
MAX: 2764500
MIN: 9700
I have definitely learned that the price of buying property varies greatly from country to country. From my results, the mean price of the ~1,300 properties I analyzed was 397,073 or just slightly under 400,000 USD. The median of this data was found to be 329,800 USD. According to Zillow.com, the median price of a home in the US is 266,800 USS. This means that from my data, the median price of a house in Norway is roughly 1.24 times or 24% more expensive than the US. Obviously these prices vary day to day based on exchange rates, but they are a good indicator of the price of the real estate market in Norway.