Combining hidden Markov models and latent semantic analysis for topic segmentation and labeling: Method and clinical application

Abstract Motivation Topic segmentation and labeling systems enable fine-grained information search. However, previously proposed methods require annotated data to adapt to different information needs and have limited applicability to texts with short segment length. Methods We introduce an unsupervi...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:International journal of medical informatics (Shannon, Ireland) Ireland), 2009-12, Vol.78 (12), p.e1-e6
Hauptverfasser: Ginter, Filip, Suominen, Hanna, Pyysalo, Sampo, Salakoski, Tapio
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Abstract Motivation Topic segmentation and labeling systems enable fine-grained information search. However, previously proposed methods require annotated data to adapt to different information needs and have limited applicability to texts with short segment length. Methods We introduce an unsupervised method based on a combination of hidden Markov models and latent semantic analysis which allows the topics of interest to be defined freely, without the need for data annotation, and can identify short segments. Results The method is evaluated on intensive care nursing narratives and motivated by information needs in this domain. The method is shown to considerably outperform a keyword-based heuristic baseline and to achieve a level of performance comparable to that of a related supervised method trained on 3600 manually annotated words.
ISSN:1386-5056
1872-8243
DOI:10.1016/j.ijmedinf.2009.02.003