Semantic Analysis and Organization of Spoken Documents Based on Parameters Derived From Latent Topics

Spoken documents are audio signals and are thus not easily displayed on-screen and not easily scanned and browsed by the user. It is therefore highly desirable to automatically construct summaries, titles, latent topic trees and key term-based topic labels for these spoken documents to aid the user...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on audio, speech, and language processing speech, and language processing, 2011-09, Vol.19 (7), p.1875-1889
Hauptverfasser: KONG, Sheng-Yi, LEE, Lin-Shan
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Spoken documents are audio signals and are thus not easily displayed on-screen and not easily scanned and browsed by the user. It is therefore highly desirable to automatically construct summaries, titles, latent topic trees and key term-based topic labels for these spoken documents to aid the user in browsing. We refer to this as semantic analysis and organization. Also, as network content is both copious and dynamic, with topics and domains changing everyday, the approaches here must be primarily unsupervised. We propose a framework for unsupervised semantic analysis and organization of spoken documents and for this purpose propose two measures derived from latent topic analysis: latent topic significance and latent topic entropy. We show that these can be integrated into an application system, with which the user can more easily navigate archives of spoken documents. Probabilistic latent semantic analysis is used as a typical example approach for unsupervised topic analysis in most experiments, although latent Dirichlet allocation is also used in some experiments to show that the proposed measures are equally applicable for different analysis approaches. All of the experiments were performed on Mandarin Chinese broadcast news.
ISSN:1558-7916
2329-9290
1558-7924
2329-9304
DOI:10.1109/TASL.2010.2102592