Data

DATA FRAME

The data frame was created in a Microsoft Office Excel Spreadsheet. This table contains 107 rows representing the sub-basins modelled by the SWAT hydrologic model and 135 columns which are variables estimated for each sub-basin. Table 1 shows a data subset and a detailed description of the data frame.

Data Types


  • Predictor Variables → The 40 years time frame (1980 - 2019) is defined as the categorical-ordinal predictor variable for the statistical analyses.

Land Cover data is defined as a numerical-continuous predictor variable for the analysis of the second research question, where it is expected that the change in land cover generates a response in the water availability.


  • Response Variables → The annual water yield, mean annual precipitation (MAP), and mean annual temperature (MAT) estimated for each of the 107 sub-basins from 1980 - 2019 are defined as numerical-continuous response variables.

Land Cover is used as a numerical-continuous response variable for the descriptive analysis presented at the end of this section.

Table 1. Data frame description

  • Column A: Categorical-Ordinal Variable : Sub-basin number ID from 1 to 107 given by the SWAT model.

  • Column B: Categorical-Nominal Variable : Basin name according to the Costa Rican official cartographic information. Four basins: Nicaragua Lake, Papagayo Gulf, Santa Elena Bay, and Tempisque River.

  • Columns C-AP: Numerical-Continuous Variables : Absolute Annual Water Yield in millimetres estimated by the SWAT model in each of the 107 sub-basins for 40 years (1980 - 2019).

  • Columns AQ-CD: Numerical-Continuous Variables : Absolute Mean Annual Precipitation (MAP) of each of the 107 sub-basins from 1980 to 2019, extracted from the 1 km² climate grid generated for the ACG drainage area from the ClimateSA software interpolation data.

  • Columns CE-DS: Numerical-Continuous Variables : Absolute Mean Annual Temperature (MAT) of each of the 107 sub-basins from 1980 to 2019, extracted from the 1 km² climate grid generated for the ACG drainage area from the ClimateSA software interpolation data.

  • Columns DT-EE: Numerical-Continuous Variables : Area in square kilometres of each of the land cover classified for the years 1979, 1997, and 2015. Four land cover classes were identified in each period: 1. Agricultural land (AGRL), 2. Deciduous Forest (FRSD), 3. Forest Evergreen Forest (FRSE), and 4. Pasture/Hay (PAST).

DATA EXPLORATION

Water Availability - Water Yield [mm]

Water yield Vs Time was plotted for each of the 107 sub-basins of the ACG's drainage area over 40 years between 1980 and 2019 to observe possible trends (Figure 10a). The lines have been coloured according to the four main catchments shown in Figure 1. In most of the sub-basins, there are peaks of change every four years. However, there are specific periods where the described trend does not hold, as can be observed between 1980-1984, 1996 - 2000, and 2005 - 2009 where more than 20% of the sub-basins have very low values while the rest have very high values. These sub-basins with low values are located in the western and northwestern parts of the ACG, bordering the Pacific Ocean.


The annual distribution of the data has been plotted using box and whisker plots (Figure 10b) which allows visualizing the ranges of the estimates, the mean values, which vary between 369.3 mm and 2,180.7 mm per year, and the outliers, which are present in 30% of the years studied. A high data variability is mainly observed in the sizes of the interquartile ranges, which in turn captures the spatial variability of the data along the 107 sub-basins.

Figure 10. Water Yield

Annual Water Yield (a) of 107 sub-basins within the ACG's drainage area. Black circles indicate sub-basins with some range of decoupling from the observed trend, which are spatially highlighted in the map at the top left. The distribution of the annual data is depicted through box and whisker plots (b). IQR stands for interquantile range.

Mean Annual Precipitation (MAP) [mm]

Precipitation values follow a definite temporal pattern for all sub-basins with peaks of change every three and four years (Figure 11a). The greatest variation is in the magnitude of rainfall, with the highest values in the sub-basins located in the Lake Nicaragua basin, followed by some of the sub-basins located in the Tempisque River basin. The lowest rainfall values occur towards the sub-basins near the Pacific Ocean in the Gulf of Papagayo and Santa Elena Bay regions. However, more than 70% of the sub-basins within the Tempisque River basin register values close to the average of the whole study area (~2,000 mm) or lower.


Overall, the distribution of the data (Figure 11b) follows a similar pattern to that evidenced by the water yield (Figure 10b), which may be the first indication of the correlation between water availability and the amount of rainfall coming from the atmosphere. All data present a positively skewed distribution because the median is close to the first quartile in all years, meaning that most of the rainfall estimates are less than the mean value of the whole study area. It is noteworthy that no outliers are found in any year.

Figure 11. Mean Annual Precipitation (MAP)

Mean Annual Precipitation (a) of 107 sub-basins within the ACG's drainage area. The green square indicates sub-basins with a specific pattern of values below the mean, which are spatially highlighted in the map at the top left. The distribution of the annual data is depicted through box and whisker plots (b). IQR stands for interquantile range.

Mean Annual Temperature (MAT) [°C]

The same approach used to describe the water yield and precipitation values is implemented to visualize the temporal pattern of temperature in the 107 sub-watersheds (Figure 12a) and the general distribution of the data (Figure 12b). The average annual temperature varies between 21 - 27 degrees Celsius. In particular, the sub-basins with the lowest average annual values are located towards the East and Northeast of the study region. The areas close to the Pacific Ocean have the highest average temperatures. It is evidenced that the sub-basins of the Tempisque River that presented below-average rainfall values (Figure 11a) in this case are the ones that show the highest average temperature values.


The annual distribution of the data (Figure 12b) follows a homogeneous pattern without much variation across all sub-basins. All data present a negatively skewed distribution because the median is close to the third quartile in all years, meaning that most of the temperature estimates exceed the mean value of the whole study area (25.5°C). It should be noted that all years show outliers within Tukey's inner fence (1.5 times the interquartile range).

Figure 12. Mean Annual Temperature (MAT)

Mean Annual Temperature (a) of 107 sub-basins within the ACG's drainage area. The green square indicates sub-basins with a specific pattern of values above the mean, which are spatially highlighted in the map at the top right. The distribution of the annual data is depicted through box and whisker plots (b). IQR stands for interquantile range.

ACG Drainage Area - Water and Climate

Water yield, precipitation and climate datasets for the entire drainage area within the ACG are summarized by the four main watersheds through box-and-whisker plots (Figure 13).


By observing the medians, it can be established that the water yield values (Figure 13a) differ across all the basins due to the different topographic features in the area. The Tempisque River basin has the highest average values with no values less than 800 mm per year, and Santa Elena Bay shows the lowest data dispersion.


Concerning rainfall, looking at the median, it is established that the Nicaragua Lake basin is the one that differs the most with respect to the other basins and, at the same time, is where the greatest rainfall is recorded because no estimate is less than 1700 mm on average per year.


Temperature does not show significant differences across the four basins. Lake Nicaragua and the Tempisque River show the greatest dispersion of estimates.


For rainfall and temperature estimates, outliers are only present in the Tempisque River basin. Similarly, for these two variables, it is observed that the larger the basin area, the greater the dispersion of the data.

Figure 13. Water and Climate-Study Area

box and whisker plots summarizing the annual values distribution of the water yield (a), precipitation (b), and temperature (c) in the four main watersheds comprising the study area. IQR stands for interquantile range.

Land Cover

The variation of the area (km²) during 1979, 1997, and 2015 of the four vegetation covers examined in this project (Figure 9), namely: Agricultural land (14a), Deciduous Forest (14b), Evergreen Forest (14c), and Pasture/Hay (14d) was plotted to observe preliminary patterns (Figure 14). Santa Elena Bay and Papagayo Gulf show the lowest land cover values [km²] because they are the smallest watersheds. In these two basins, the two types of forests have slightly increased while the agriculture and pastures decreased by 2015.


Forest recovery is also observed for the biggest basins, Nicaragua Lake and Tempisque River. However, in the latter, agricultural activities have increased by more than 200 km² during 36 years. This could mean that even though the forest is being recovered in this basin, the native forest in other areas is also being removed to open land for crops at a higher rate. Pastures and hay have decreased in all the basins, which can be associated with the decrease in beef production in the late 70s [1].

Figure 14. Vegetation Land Covers temporal variation-Study Area

Changes in the area [km²] identified based on 1979, 1997, and 2015 vegetation land cover maps. Information is presented by the four watersheds comprising the drainage area within the Conservation Area of Guanacaste (ACG). The land cover categories are: Agricultural land (a), Deciduous Forest (b), Evergreen Forest (c), and Pasture/Hay (d).