Labels in a haystack: Approaches beyond supervised learning in biomedical applications

Recent advances in biomedical machine learning demonstrate great potential for data-driven techniques in health care and biomedical research. However, this potential has thus far been hampered by both the scarcity of annotated data in the biomedical domain and the diversity of the domain's subf...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Patterns (New York, N.Y.) N.Y.), 2021-12, Vol.2 (12), p.100383-100383, Article 100383
Hauptverfasser:	Yakimovich, Artur, Beaugnon, Anaël, Huang, Yi, Ozkirimli, Elif
Format:	Artikel
Sprache:	eng
Schlagworte:	active learning data annotation data labeling data value machine learning Review self-supervised learning semi-supervised learning zero-shot learning
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Recent advances in biomedical machine learning demonstrate great potential for data-driven techniques in health care and biomedical research. However, this potential has thus far been hampered by both the scarcity of annotated data in the biomedical domain and the diversity of the domain's subfields. While unsupervised learning is capable of finding unknown patterns in the data by design, supervised learning requires human annotation to achieve the desired performance through training. With the latter performing vastly better than the former, the need for annotated datasets is high, but they are costly and laborious to obtain. This review explores a family of approaches existing between the supervised and the unsupervised problem setting. The goal of these algorithms is to make more efficient use of the available labeled data. The advantages and limitations of each approach are addressed and perspectives are provided. As machine learning models become more complex, requirements for large annotated datasets grow. Annotating data for machine learning applications is especially challenging in the biomedical domain as it requires domain expertise of highly trained specialists to perform the annotations. Several strategies to either increase efficiency of label utilization or improve the annotation process have been proposed by the machine learning community. In this review we explore these strategies, including semi-supervised learning, active learning, data augmentation, transfer learning, self-supervision, weak-supervision, and zero- or few-shot learning. We show successful examples of research that has applied these strategies to multi-modal biomedical data. We conclude that raising awareness of these strategies in the biomedical community may contribute to further adoption of machine learning techniques in this research field.
ISSN:	2666-3899 2666-3899
DOI:	10.1016/j.patter.2021.100383