Here I will be adding a new data set to the Climate Change data environment. This data set contains temperature, precipitation and evaporation data mainly for the USA.
The data set is available from Kaggle and described here.
As a first step, I have loaded the data set into a staging table in my Postgres database. Below is a screenshot of this table. It has almost 156 million rows (one row for each day for each weather station for a period of 30 years.
This table is too big for my system, especially if multiple users would access it at once.
Therefore, I created a Materialized View called mv_USAdata_US_2020_2021, which only has the rows for the USA for the years 2020 and 2021. Below you can see the definition of this materialzed view in Postgres. The materialized view has about 16 million rows for two years of USA data.
The workflow below will take the yearly maximum for each weather station and write it to the USAdata_yearly_max table.
I have also created a database table with the USA weather station information from Kaggle. However, I would like to perform analysis by State, but the state information is missing from the stations input file.
I developed the workflow below to solve this issue. The GeoFile reader takes the cb_2018_us_state_5m.shp shapefile as input. I downloaded this file from https://www.census.gov/geographies/mapping-files/time-series/geo/carto-boundary-file.html (under the 'States') category. This gives me all the USA states as you can see in the output of the Geospatial View node. Now I can read my original stations database table and using the Spatial Join node, obtain the State each weather station lies within. I write this infomation to a new database table, called usa_stations_with_state.
Now I have everything in place to perform analysis by State.
Below you can see an extract from the usa_stations_with_state table.
Note I have been a bit sloppy here, I should have used the Column Renamer node to rename the NAME column to state_name. Mixing uppercase and lowercase column names is not a good practice. I will do so in the next version ;)