We explore how machine learning (ML) techniques can be applied to improve the understanding and prediction of methane emissions in cattle farming. Methane, a potent greenhouse gas, is a byproduct of enteric fermentation in cattle, and its reduction is essential in mitigating climate change. Our research aims to develop predictive models for methane emissions and to identify key environmental and biological factors influencing methane production, with the ultimate goal of offering actionable insights for reducing emissions in livestock farming.
We utilized data from the North Wyke Farm Platform (NWFP), a state-of-the-art research facility in the UK, which collects detailed environmental and biological data, and methane data from cattle through GreenFeed systems. We create a machine-learning-ready dataset by combining various factors such as animal weight, breed, age, and environmental conditions (e.g., pasture type and weather data). They are combined by interpolating them against each methane record at the time the methane was measured with resultant datasets offering methane metrics in terms of both rate (grams per day), and intensity (grams per kilogram live-weight-gain).
To predict methane emissions, we built several machine learning models, including Gradient Boosting, Random Forest, and Neural Networks. The Gradient Boosting model emerged as the best-performing algorithm, with a correlation coefficient (r2) of 0.383 for methane rate (g/day) and an RMSE (Root Mean Squared Error) of 51.8 g/day. For methane intensity it performs with r2 of 0.316 and RMSE of 65.9 (g/kg live weight gain). We also applied explainable AI techniques, specifically Shapley analysis, to gain insights into how different variables influence methane emissions. This allowed us to identify key predictors and their contribution to methane production, making the model more interpretable for practical applications.
Our findings revealed that cattle weight and age were the most significant predictors of methane emissions. Weight was particularly important for predicting total methane output per day, while age was more closely tied to methane. Environmental factors, such as humidity and time of day, also influenced emissions, although their impact was less pronounced. We found that cattle raised in intensive feeding systems had worse methane per day, but improved methane intensity.
The GreenFeed system exhibits high measurement noise. While this can be partially mitigated using data cleaning techniques and the high sample size afforded by GreenFeed, a significant amount of unexplained variability in the data remains. We underscore the continued need for improved data collection methods to enhance the accuracy of future studies.
Our research demonstrates the potential of machine learning models to offer valuable insights into methane mitigation strategies for cattle farming. By incorporating explainability into our models, we provide stakeholders with a deeper understanding of how different farm practices impact methane production. This research contributes to a step forward in leveraging advanced data-driven methods to tackle the pressing challenge of identifying interventions which reduce methane emissions in agriculture without compromising productivity.