When looking at the wind patterns at the Kakubari station from 2016 to 2024, we found a remarkable consistency in direction. The wind almost always arrives from the East-Southeast, regardless of the year. However, the intensity tells a different story: there is a sharp contrast between seasons. During the dry months, wind speeds pick up significantly, averaging around 5.2 m/s, compared to a much calmer 3.0 m/s in the wet season. This seasonal surge in wind speed suggests stronger atmospheric mixing and likely stretches the tower’s "view" further upwind. Because the direction remains so steady, we can suggest that our measurements are consistently capturing the same stable sector of the forest (Figure 4).
Figure 4. Seasonal and interannual wind roses for the Kakubari station from 2016 to 2024. The panels illustrate a consistent prevailing wind direction from the East-Southeast across all years; however, the dry season consistently exhibits higher wind speeds (averaging five meters per second) compared to the wet season, indicating stronger atmospheric forcing and increased turbulence during the water-limited period.
To estimate the footprint observed by the tower, it was necessary to use the Planetary Boundary Layer Height variable, which plays a fundamental role in the of Kljun et al. (2015) model. This variable represents the physical height of the atmosphere, allowing us to define how far the influence of the flow reaches (vertical scaling), and how wide the lateral scattering is. The histogram of the planetary boundary layer height is shown in Figure 5.
Figure 5. Histogram of the planetary boundary layer height. The bimodal distribution in the boundary layer height correspond to values during the day and night.
Once we had the necessary information, the footprint model was estimated, revealing where the tower’s data are coming from. We observed that eighty percent of the measured fluxes typically originate within 600 meters upwind (Figure 6). Interestingly, this spatial reach isn't uniform; the distribution shows two distinct peaks. While the main signal comes from about 600 meters out, a secondary peak appears much closer (around 150 meters) during periods of intense turbulence. These findings are vital because they confirm that what we are measuring is the actual primary forest canopy of the protected area, rather than just local variations on the ground surface.
Figure 6. Frequency distribution of the 80% cumulative flux contribution distance (X80) estimated using the Kljun et al. analytical model. The multimodal distribution shows a primary peak around six hundred meters, confirming that the measured carbon and water exchanges represent the forest canopy at an ecosystem scale; the secondary peaks below two hundred meters correspond to periods of high atmospheric instability and enhanced vertical mixing.
Distribution and Seasonality of Variables
Before performing the multivariate analysis, the magnitude and dispersion of the variables to be used were analyzed using box plots (Figure 7). This step allows for the identification of the operational range of carbon and energy fluxes, as well as environmental and structural factors present at the site. Visual inspection of the medians and outliers reveals the natural variability of the tropical dry forest, justifying the application of filters and transformations to stabilize the variance and improve the performance of the statistical models.
Figure 7. General distribution and extreme values of parameters at the Kakubari station. The box plots show the central distribution and dispersion of energy fluxes and predictors. The presence of extreme values is attributed to the high seasonality of the site and the dynamics of gas exchange during transition periods, which are subsequently treated using Winsorization techniques to avoid biases in the PCA and CANCOR.
Statistical Filtering and Noise Reduction
The raw dataset was subjected to a multi-step filtering process, beginning with low friction velocity (u* < 0.1 m/s) that were removed to avoid bias under low-turbulence conditions, while the air temperature was used only when it was 295 K or above.
A logarithmic transform was applied to variables such as Gross Primary Productivity, Net Ecosystem Exchange, Latent Heat Flux, and both water use efficiency to compress the tails of the data (Figure 8). This reduced the existing skewness, linearizing their relationships. Additionally, the Winsorization technique (2-98% percentiles) was applied to address outliers without losing sample size. Initially, a three-sigma threshold was used, but it was eliminating too many rows given the variability present in the variables related to water use efficiency.
Figure 8. Distribution observed in variables before applying PCA and CANCOR analysis. Variables such as Vapor Pressure Deficit, Net Ecosystem Exchange, Enhanced Vegetation Index, and Normalized Difference Vegetation Index present bimodal distributions because this dataset include both seasons, showing the differences in magnitude between wet and dry season.
The spatial origin of the data was validated through the analytical footprint model, where the eighty percent cumulative flux distance (X80) was calculated to identify what the tower sees based on the wind speed/direction, friction velocity and planetary boundary layer height. The values for X80 were limited between 0 to 1000 m based on the buffer applied to spatial variables.
The main dataset was divided per season to apply two principal components analysis, where both distribution variables are shown in Figure 9. Transition phases were considered between seasons: April-May correspond to dry phase, while November-December was wet phase. Comparing the dry and wet phases of the forest reveals a bimodal shift, as seen in Gross Primary Productivity, Enhanced Vegetation Index, Net Ecosystem Exchange, and Vapor Pressure Deficit, all of which are directly influenced by water availability and temperature changes. A different pattern is observed in variables such as footprint extent, intrinsic water use efficiency, and latent heat flux, where these variables maintain similar ranges of values throughout both seasons. This latter finding benefits footprint extent, suggesting that the tower measures the same forest area during both the rainy and dry seasons.
Figure 9. Comparison of Seasonal Densities (Dry vs. Wet). Multipanel density plots showing the overlap and segregation of climatic, phenological, and flow variables. Yellow shaded areas represent the dry phase, while green areas correspond to the wet phase.
The General Perspective (Combined Dataset)
By analyzing the full dataset, we found that two primary forces explain nearly 71% of the forest's variability (Figure 10). The first component (PC1) reflected a gradient between atmospheric conditions (Vapor Pressure Deficit, Air temperature, and Photosynthetic Active Radiation) and variables associated with vegetation productivity and condition (Gross Primary Productivity, NDVI, EVI, and Latent Heat Flux), suggesting a clear differentiation between environmental drivers and ecosystem responses. Meanwhile, the second component (PC2) highlighted the variability associated with water use efficiency in the face of turbulent energy fluxes.
Figure 10. PCA obtained with both seasons. PC1 was primarily associated with positive loadings of Net Ecosystem Exchange (0.437), Air temperature (0.426), Photosynthetic Active Radiation (0.342), and Vapor Pressure Deficit (0.173). This may represent an atmospheric demand gradient that opposes conditions of higher productivity. PC2 was dominated by Intrinsic Water Use Efficiency and Water Use Efficiency (0.537 and 0.475 respectively), followed by Vapor Pressure Deficit (0.436) with a negative loading for Latent Heat Flux (-0.405). This axis reflects differences in water use efficiency, rather than direct changes in forest productivity.
Dry and Wet Seasons
During the dry season (Figure 11), the PCA showed a greater separation between atmospheric conditions (especially Vapor Pressure Deficit and Photosynthetic Active Radiation) and ecosystem flux responses (Gross Primary Productivity, Latent Heat Flux, and Intrinsic Water Use Efficiency), as a result of the water stress characteristic of this period. This pattern suggests that photosynthetic activity and energy fluxes are strongly limited by water availability and ambient temperature conditions. On the other hand, during the wet season (Figure 12), productivity variables (Gross Primary Productivity and Latent Heat Flux) were more strongly associated with greenness indices (NDVI, EVI), indicating a synchrony between forest recovery (foliar growth) and increased ecosystem fluxes under conditions of greater water availability. These results reflect the marked seasonality of the tropical dry forest, demonstrating that the dataset maintains internal consistency and is capable of capturing the physiological and environmental gradients of the ecosystem.
Figure 11. PCA during dry season. PC1 was dominated by Vapor Pressure Deficit (0.412), Intrinsic Water Use Efficiency and Water Use Efficiency (0.41 and 0.354 respectively) with a contribution of 0.358 by Air temperature. PC2 showed strong contributions from Net Ecosystem Exchange (0.484) and Photosynthetic Active Radiation (0.327). This PCA shows a gradient dominated by atmospheric dryness, associated with lower productivity and lower canopy activity, but with an increase in water use efficiency related to a conservative behavior of the ecosystem under water stress.
Figure 12. PCA during wet season. PC1 was dominated by strong positive loading of Intrinsic Water Use Efficiency and Water Use Efficiency (0.505 and 0.393 respectively), Vapor Pressure Deficit (0.476), Air temperature (0.301) and vegetation indices. This demonstrates that the variability is not dominated by atmospheric drought, but by differences in canopy state and photosynthetic activity, suggesting that productivity is limited primarily by biological conditions rather than environmental stress.
The PCA provides an overview of the covariance structure in the dataset, making it useful as a descriptive method. However, this type of analysis does not allow for a direct assessment of the relationships between the variables. For this reason, a canonical correlation analysis will be used to explicitly evaluate the relationships between the set of environmental variables and the ecosystem responses. This approach allows for the identification of dominant modes of covariance between predictors and responses, providing a more direct assessment of how environmental conditions are associated with flux variability, while still acknowledging that these relationships are correlative rather than causal.
Atmospheric Data Sources: While we improved our analysis by using hourly reanalysis data instead of constant assumptions, we recognize that these are still model-based estimates. Without direct, on-site measurements of the boundary layer height, there is always a small degree of uncertainty in exactly how the footprint is calculated.
The Complexity of the Canopy: Our study assumes that what the sensors record at the top of the tower perfectly reflects the exchange happening at the forest floor. However, in a dense forest like Kakubari, air can sometimes get "trapped" under the canopy during calm nights. This means some carbon or water exchange might be delayed or missed by the sensors.
Separating Soil and Air: We focused heavily on how the dryness of the air affects the forest, but we lacked continuous data on soil moisture at different depths. Because of this, it is difficult to say exactly where the water stress begins—whether the trees are reacting to dry soil, dry air, or, most likely, a complex combination of both.
Statistical Limitations: The analyses are based on correlative approaches using temporally autocorrelated variables. As a result, findings should be interpreted as patterns of association rather than evidence of causal relationships.