Improving productivity while minimizing environmental impact is a critical challenge in sustainable agriculture. This project aims to introduces a hybrid Machine Learning (ML) pipeline to predict milk yield and milk urea concentration in Holstein dairy cattle, as well as estimate ammonia emission and urinary urea nitrogen excretion. Our proposed ML pipeline used animal biological characteristic, feeding information as well as milk compositions to predict milk yield and components, including milk urea concentration. In addition, we developed a novel stacking method that combines multiple models to further improve prediction performance. The proposed pipeline improves model accuracy, providing valuable insights into managing cattle for better milk production and reducing the environmental impact of livestock farming. In addition, our research integrates ML models with empirical statistical methods to estimate ammonia emissions and urinary urea nitrogen (UUN) from ML-predicted milk urea nitrogen (MUN = Milk urea concentration * 46%). The derived emissions closely align with established statistical benchmarks, demonstrating the pipeline’s reliability in estimating environmental impacts. These findings allow farmers to make data-driven dietary adjustments, advancing sustainable dairy farming practices by simultaneously optimising productivity and reducing ecological footprints.
The proposed ML pipeline incorporates data analytics and pre-processing, feature selection, model training and evaluation to provide a systematic approach for analysing livestock data as well as estimate ammonia and urinary urea nitrogen excretion.
Data pre-process: Missing data and outliers are addressed; four different methods are applied to estimate the feature importance for the prediction problem. The ranking of the importance of different features for milk prediction is obtained by averaging the ranking of features in each method.
Modelling and Interpretation: Baseline models and proposed Stacking models are proposed. Finally, the performances of each model are compared and evaluated. SHAP analysis is performed to present the model's explainability.
Ammonia emission estimation: Previously established statistical equations are utilised to estimate ammonia emission from dairy cattle manure based on MUN (function f) and UUN excretion (function g). UUN excretion can be inferred from MUN by the function h. This provides a practical method for indirectly estimating ammonia emissions and UUN excretion.
Figure 1. Hybrid ML pipeline for milk yield, milk urea prediction and ammonia emission estimation from dairy cows.
SHAP analysis is a powerful method used to interpret machine learning (ML) models, offering insights that can be directly applied by farmers and stakeholders. Swarm plots of SHAP values are used to visualize the importance of features for predicting milk yield and milk urea levels. These plots illustrate the magnitude and direction (positive or negative) of each feature's contribution to model predictions. Each dot represents an instance, with colour indicating feature value (low: blue, high: red). Features are ranked on the vertical axis by their mean absolute SHAP values, with higher values signifying greater influence.
High negative SHAP values for LactationM reflect the biological decline in milk production as lactation progresses, influenced by nutritional depletion and hormonal changes. Other key factors include “DM intake” (dry matter intake) and “Diet type”, underscoring the critical role of diet quantity and quality. “DMI/100kgBW” (normalized dry matter intake by body weight) also shows substantial relevance. Conversely, features such as “GestationM (gestation month)” and “Body weight” have minimal influence on milk yield.
Figure 2. SHAP analysis for feature influence in milk yield Stacking model.
“N intake” ranks as the most positively influential feature, which highlight its critical role in milk urea levels. Features such as **DMI/100kgBW”, “DM intake”, and “Body weight” exhibit notable negative influences, suggesting that maintaining nutrient balance and considering animal body size are vital for managing milk urea. By contrast, features like “Ash intake” and “Milk yield” are weakly associated with milk urea concentration, as indicated by their low SHAP values.
Figure 3. SHAP analysis for feature influence in milk urea predictive models.
For ammonia emission: The MUN range of 5-15 mg/dL (green shaded area) is empirically validated for statistical equation f1, where the estimated ammonia values from ML-predicted milk urea align well with the statistics-based model. Outside this range, ML predictions indicate a slightly varying linear association between MUN and ammonia. For urinary urea nitrogen excretion: Estimated UUN values generally align with the statistical equations. Most variations falling within the uncertainty range (e.g., orange).
This project is introduced a ML pipeline to improving productivity (e.g., milk yield) while reducing environmental impact is a key challenge in sustainable agriculture. In this research, we developed and validated a hybrid ML pipeline to explain associations among complex variables in dairy farming, such as animal bio-characteristics, feeding strategies, milk yield and milk component quality. To support actionable decision-making, we applied SHAP analysis to assess feature importance and relationships with target outcomes, providing valuable insights into the influence of complex, interrelated variables in dairy farming.
A novel contribution of this research lies in integrating machine learning with empirical statistical models to estimate ammonia emissions and urinary urea nitrogen (UUN), providing insights into the environmental impact of dairy farming. Ammonia and UUN are estimated from MUN values derived from ML-predicted milk urea. The estimated emissions closely align with statistical models. Milk urea provides farmers with insights into environmental impacts. This information enables them to make informed dietary adjustments, promoting more sustainable dairy management practices.