Regularization of sequence data for machine learning

We examine the problem of classifying biological sequences, and in particular the challenge of generalizing results to novel input data. We observe that the high-dimensionality of sequence data representations results in an extremely sparsely populated input space. This motivates a need for regulari...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Bai, Bryan, Kremer, S. C.
Format:	Tagungsbericht
Sprache:	eng
Schlagworte:	Complexity theory deep architecture DNA DNA barcoding generalization Kernel Learning systems Machine learning neural network non-monophyletic species support vector machine Support vector machines Training
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	We examine the problem of classifying biological sequences, and in particular the challenge of generalizing results to novel input data. We observe that the high-dimensionality of sequence data representations results in an extremely sparsely populated input space. This motivates a need for regularization (a form of inductive bias), in order to achieve generalization. We discuss regularization in the context of regular neural networks, deep belief networks and support vector machines, and provide experimental results for these architectures. Our results support the importance of using an effective regularization method and identify which methods work well on a real-world dataset.
DOI:	10.1109/BIBMW.2011.6112350