PyPads: Transparent Machine Learning Experiment Tracking

Despite algorithmic advancements in the field of machine learning, a need for improvement in the infrastructure supporting machine learning development and research has become increasingly apparent. Machine learning experiments usually tend to be more ad-hoc in nature, and results are communicated m...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Datenbank-Spektrum : Zeitschrift für Datenbanktechnologie : Organ der Fachgruppe Datenbanken der Gesellschaft für Informatik e.V 2024-03, Vol.24 (1), p.53-62
Hauptverfasser: Weißgerber, Thomas, Amor, Mehdi Ben, Fellicious, Christofer, Granitzer, Michael
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Despite algorithmic advancements in the field of machine learning, a need for improvement in the infrastructure supporting machine learning development and research has become increasingly apparent. Machine learning experiments usually tend to be more ad-hoc in nature, and results are communicated most often in the form of a publication. Experimental details are often omitted due to size or time constraints, or simply because the complexity in terms of technical setup or parametrization became intractable. Even access to code bases, disregard important properties of the environment and experimental setup, like for example random generators or computing infrastructure. At the same time, tracking and communicating an often inherently exploratory scientific process is a task with considerable effort. We explored different venues to tackle these issues from a data science engineering point of view. The efforts resulted in PyPads, a framework providing an infrastructure to extend experimental setups with logging, communication and analysis features in a mostly non-intrusive way. PyPads can be extended to different Python-based frameworks, utilizing community driven, descriptive metadata in an effort to harmonize library specific logs in an ontology. Meanwhile, we also try to emphasize similarities to practices in software engineering, which have turned out to be essential in practical applications.
ISSN:1618-2162
1610-1995
DOI:10.1007/s13222-023-00459-w