Data

Variables

Sampling units

Sampling units varied across datasets. Environmental data were obtained for 5km2  reef areas (0.05 degree resolution)using satellite data (Andrello et al. 2022). Habitat quality data were averaged across replicate 50 meter transects (1-16 replicates/reef) for each reef (Darling et al. 2019). Percent coral cover was evaluated in situ at point intercepts about 5 cm apart along each transect. Reefs within 4 km of each other were clustered into sites for which the cluster centroid was identified as the geospatial location. Other site-level variables in this dataset were extracted from satellite data using these site locations. The countries within which sites are located were determined to calculate nation-level metrics like the human development index, which was considered a local stressor variable. Restoration design data were collected for individual projects that vary in scale and methodology (Boström-Einarsson et al. 2020; see Table 3).

Predictor variables

All variables are continuous, except those indicated with an *. These variables are either discrete, binary or categorical. However, only restoration method variables are binary or categorical. 

Response variables

Discrete variables are marked with an *. 

Data Tables

Prior to analysis, I scoped the raw datasets of environmental stressors (Andrello et al. 2022; Table 2), reef habitat quality (>2,000 reefs; Darling et al. 2019; Madin et al. 2019; Table 3), and coral restoration project design (Boström-Einarsson et al. 2020; Table 4) to identify missing values, outliers, and skew in the data. A number of variables within the stressor and restoration datasets were missing values (e.g. Table 2). Due to these missing values and limited spatial overlap between some of the reefs, only 183 out of 243 restoration sites were analyzed in this study. 

Table 2. Raw data table showing environmental stressor predictor variables at n = 54,596 reefs (6 of which are displayed) globally (Andrello et al. 2022). Variables include fishing pressure (calculated as market gravity or market population/(hours of travel to reef)2), coastal development (coastal human population), industrial development (number of ports within 5km), tourism (tourist visits driven by reefs), sediment (water pollution), nitrogen (water pollution), cumulative climate score, historical climate score, future climate score (projected), recent climate score, cyclone days or exposure (maximum number of days/year of exposure to cyclone), and connectivity to other reefs (km2 reef area within 100km of each site; considered reef habitat quality variable).

allreefs_head

Table 3. Raw data table showing reef habitat quality predictor variables for n = 2,584 reefs (6 of which are displayed) globally (Darling et al. 2019). Variables include total percent coral cover (% total cover), percent cover of competitive corals (% cover competitive), percent cover of stress-tolerant corals (% cover stresstolerant), percent cover of weedy corals (% cover weedy), population growth (change in population density 2000-2010; considered local stressor variable), maximum market gravity (maximum fishing pressure; considered local stressor variable), human development index (2015; considered local stressor variable), maximum DHW (degree heating weeks calculated as cumulative heat stress over 12 weeks; considered climate stressor variable), past maximum DHW (highest maximum over past 30 years; considered climate stressor variable), years since maximum DHW (years since past maximum; considered climate stressor variable), net primary production (30-year mean in C/m2/day), wave exposure (30-year mean of wave energy in kW/m), maximum temperature days (days/year; considered climate stressor variable), reef area (100km), and depth (m).

d_head

Table 4. Raw data table showing restoration method predictor and restoration outcome response variables at n = 243 reefs (6 of which are displayed) globally (Boström-Einarsson et al. 2020). Variables include strategy (restoration strategy), monoculture (whether a single coral species was restored; yes/no), morphology (restored coral morphology), source of coral fragments (e.g. transplanted from nursery), post monitoring length (months of monitoring post restoration), temporal scale (length of restoration project in months), and % survival (mean percentage of replanted corals surviving). 

rest_head

Data exploration

To maximize potential correlations among these variables and the variation explained by principal coordinates, I transformed the dataset (n = 2,227) used in my clustering and ordination analyses. To first assessed whether the data were normally distributed, I conducted Shapiro-Wilk normality tests on each variable. As all variables within the dataset were non-normally distributed (p<0.05), I applied either a square root, log, or inverse transformation to each (Figure 8). 

Figure 8. Boxplot showing the distributions of untransformed (A) and transformed (D) data for each variable. Data were standardized to a range of 0 to 1 to display both distributions along the same axis. Variables that were not transformed are marked with asterisks (*). HDI refers to human development index.

Additionally, along with extreme outliers, any variables with high amounts of missing values were removed. To identify moderately to highly correlated (>0.5) variables, I calculated a Spearman's rank correlation coefficient for each pair of variables in the transformed dataset. Among these pairs, I found 14 correlated variables (Figure 9): 

Following the exclusion of these variables, I standardize this data within a range of 0 to 1. This dataset also excluded categorical (e.g. habitat type) variables, as Principal Coordinate Analyses are not compatible with categorical data types. However, I have incorporated these variables into my CART and random forest analyses. 

Figure 9. Pairwise scatterplots of correlated variables, whereby the diagonal indicates each variable name and placement.