Genre Classification and Domain Transfer for Information Filtering

The World Wide Web is a vast repository of information, but the sheer volume makes it dificult to identify useful documents.W e identify document genre is an important factor in retrieving useful documents and focus on the novel document genre dimension of subjectivity. We investigate three approach...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Finn, Aidan, Kushmerick, Nicholas, Smyth, Barry
Format: Buchkapitel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The World Wide Web is a vast repository of information, but the sheer volume makes it dificult to identify useful documents.W e identify document genre is an important factor in retrieving useful documents and focus on the novel document genre dimension of subjectivity. We investigate three approaches to automatically classifying documents by genre: traditional bag of words techniques, part-of-speech statistics, and hand-crafted shallow linguistic features. We are particularly interested in domain transfer: how well the learned classifiers generalize from the training corpus to a new document corpus.Our experiments demonstrate that the part-of-speech approach is better than traditional bag of words techniques, particularly in the domain transfer conditions.
ISSN:0302-9743
1611-3349
DOI:10.1007/3-540-45886-7_23