On this topic I will be trying to add information from the Spanish weather service (AEMET) to my climate change environment.
The first workflow below will extract and add weather station information and load to the table aemet_stations.
If I execute a SELECT count (distinct province) FROM public.aemet_stations on this table, I get a result of 54. This is an issue, because officially there are only 50 provinces in Spain. This means we have 4 provinces too many.
Let`s try to spot them:
One obvious 'error' we can see straight at the beginning of our table : 'BALEARES' and 'ILLES BALEARS' are obviously the same Province. I will change all occurrences of 'ILLES BALEARS' to 'BALEARES'
Another one is that some stations have a value 'null' for province, Apparently, these rows have all columns 'null'. Why ???. For now I will just delete them, as they are of no use.
Another error seems to be that we have both "SANTA CRUZ DE TENERIFE" and "STA. CRUZ DE TENERIFE", which are obviously the same. I will update them to SANTA CRUZ DE TENERIFE.
It's OK now, we have 52 values , including 'Ceuta' and 'Melilla' which were not in the 'official' list of province
(I have downloaded the 'official' list of provinces in Spain as a csv file from this web address : https://datos.gob.es/en/catalogo/a09002970-provincias-de-espana
I have used this file to lookup the provinces and find the 'wrong' ones. Below is a picture of the workflow to do this (Aemet2 - TODO)
On the TODO next page, we will load some climate data from Aemet,