Machine Learning in Complex Organic Mixtures: Applying Domain Knowledge Allows for Meaningful Performance with Small Data Sets

The ability to quantify individual components of complex mixtures is a challenge found throughout the life and physical sciences. An improved capacity to generate large data sets along with the uptake of machine-learning (ML)-based analysis tools has allowed for various “omics” disciplines to realiz...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of the American Chemical Society 2024-08, Vol.146 (32), p.22563-22569
Hauptverfasser: Le, Katelyn, Radović, Jagoš R., MacCallum, Justin L., Larter, Stephen R., Van Humbeck, Jeffrey F.
Format: Artikel
Sprache:eng
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The ability to quantify individual components of complex mixtures is a challenge found throughout the life and physical sciences. An improved capacity to generate large data sets along with the uptake of machine-learning (ML)-based analysis tools has allowed for various “omics” disciplines to realize exceptional advances. Other areas of chemistry that deal with complex mixtures often do not leverage these advances. Environmental samples, for example, can be more difficult to access, and the resulting small data sets are less appropriate for unconstrained ML approaches. Herein, we present an approach to address this latter issue. Using a very small environmental data set35 high-resolution mass spectra gathered from various solvent extractions of Canadian petroleum fractionswe show that the application of specific domain knowledge can lead to ML models with notable performance.
ISSN:0002-7863
1520-5126
1520-5126
DOI:10.1021/jacs.4c06595