Methane (CH4) is a potent greenhouse gas with a global warming potential 21 times that of CO2, making its monitoring crucial for climate action. Recent data shows a significant rise in CH4 emissions, primarily from agriculture, land degradation, and fossil fuels, highlighting the need for better predictive models and mitigation strategies.
A key challenge in methane monitoring is incomplete satellite data, with cloud cover rendering 40% of TROPOMI Sentinel-5P observations unusable. Additionally, environmental factors like land cover and soil temperature complicate source attribution.
This research addresses these gaps by integrating satellite, soil, and land use data with machine learning models to infer missing and obscured CH4 readings. Random Forest is identified as the most accurate model in this case. Shapley analysis is employed to reveal key emission drivers. By bridging data gaps and improving source attribution, this work supports informed climate policy and sustainable land management.
Data from the TROPOMI Sentinel-5P satellite for methane measurements, MIDAS UK soil temperature records, and the NERC EDS 2021 land cover dataset are used to create a comprehensive dataset. The data undergoes preprocessing steps, including geospatial matching, feature engineering, and normalization to address gaps caused by cloud cover and ensure data consistency. Feature selection based on Pearson correlation and Shapley analysis is performed to identify key drivers of methane emissions, allowing for enhanced model explainability. Multiple machine learning models, including Random Forest, Gradient Boosting, and Neural Networks, are trained and tested.
The performance of the predictive models is evaluated using Root Mean Square Error (RMSE) and Mean Absolute Error (MAE), with the Random Forest model achieving the highest accuracy (RMSE = 29.48, MAE = 20.82). The trained model is used to estimate missing CH4 values in satellite observations, generating complete methane distribution maps.
Additionally, Shapley analysis highlights the influence of factors such as land use, soil temperature, and seasonal variations on methane emissions. The study demonstrates that agriculture, particularly arable land and improved grassland, is a major contributor to methane levels. The findings provide actionable insights for policymakers, supporting data-driven interventions such as sustainable farming practices, improved manure management, and wetland restoration to mitigate methane emissions and advance climate action.
This study demonstrates the effectiveness of machine learning in predicting atmospheric methane concentrations and bridging data gaps in satellite observations. By integrating TROPOMI Sentinel-5P methane data with soil temperature and land use datasets, the research identifies key drivers of methane emissions, with agricultural activities emerging as a dominant contributor. The Random Forest model achieves the highest accuracy, enabling the creation of complete methane distribution maps and providing valuable insights for sustainable land management. Shapley analysis further highlights the impact of seasonal variations and land use patterns on methane levels, reinforcing the need for targeted mitigation strategies. These findings support data-driven climate action, emphasizing the role of AI-powered solutions in environmental monitoring and policy development.