Suitability of different machine learning algorithms for the classification of the proportion of grassland-based forages at herd-level using mid-infrared spectral information from routine milk control

As the call for an international standard for milk from grassland-based production systems continues to grow, so too do the monitoring and evaluation policies surrounding this topic. Individual stipulations by countries and milk producers to market their milk under their own grass-fed labels include...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of dairy science 2024-08
Hauptverfasser: Birkinshaw, A., Sutter, M., Nussbaum, M., Kreuzer, M., Reidy, B.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:As the call for an international standard for milk from grassland-based production systems continues to grow, so too do the monitoring and evaluation policies surrounding this topic. Individual stipulations by countries and milk producers to market their milk under their own grass-fed labels include a compulsory number of grazing days per year, ranging from 120 d for certain labels to 180 d for others, a specified amount of herbage in the diet or a prescribed dietary proportion of grassland-based forages (GBF) fed and produced on farm. As these multifarious policy and label requirements are laborious and costly to monitor on farm, fast economical proxies would be advantageous to verify the proportion of GBF consumed by the cows in the final product. With this in mind, we employed readily available mid-infrared spectral data (n = 1132 spectra) from routine milk controls to develop binary classification models for 4 main feed groups from a primarily forage-based diet: Total GBF (≥50% n = 955, ≥ 75% n = 599, ≥ 85% n = 356), pasture (≥20% n = 451, ≥ 50% n = 284, ≥ 70% n = 152), fresh herbage (pasture + fresh herbage indoor feeding, ≥ 20% n = 517, ≥ 50% n = 325, ≥ 70% n = 182) and whole plant corn (fresh + conserved) (≥10% n = 646, ≥ 30% n = 187), the latter as a negative control. We compared 4 machine learning methods to assess which statistical model performs best at discriminating these classes. Three of these models have not yet been tested for herd-level dietary proportion classification and all 4 follow completely different approaches: least absolute shrinkage and selection operator (LASSO), partial least squares discriminant analysis (PLS-DA), random forest (RF) and support vector machines (SVM). Seasonality has been a missing element from previous dietary herbage proportion classification models. As grazing and fresh herbage indoor feeding are highly dependent on the season, we developed an indicator to incorporate seasonality in a consistent, unbiased manner into our models. We also tested 3 sets of covariates. The first set included only mid-infrared spectra derived data, the second included mid-infrared spectra derived data plus seasonality indices and the third included mid-infrared spectra derived data, seasonality indices and additional herd specific information (DIM, breed and parity). Of the 4 machine learning algorithms tested for the binary classification of GBF proportion at herd level, LASSO and PLS-DA performed best according to evaluation
ISSN:0022-0302
1525-3198
1525-3198
DOI:10.3168/jds.2024-25090