Summarization and categorization of text data in high-level data cleaning for information retrieval

In view of the exponential growth of online document corpora, even perfect retrieval will fetch too much material for a user to cope with. One way to reduce this problem is automatic domain-specific summarization tailored to user's needs, which is a kind of high-level data cleaning. This requir...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Applied artificial intelligence 2003-05, Vol.17 (5-6), p.461-474
Hauptverfasser: Saravanan, M., Raj, P. C. Reghu, Raman, S.
Format: Artikel
Sprache:eng
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:In view of the exponential growth of online document corpora, even perfect retrieval will fetch too much material for a user to cope with. One way to reduce this problem is automatic domain-specific summarization tailored to user's needs, which is a kind of high-level data cleaning. This requires some method of discovering classes of similar items that may be grouped into predetermined domains. We explore whether there exists a synergic relation between systems for classification and those for summarization by way of composing those subsystems. In other words, we examine whether prior summarization will increase the performance of the classifier system and vice versa. In both cases, the answer is affirmative, as we show in this paper. We propose a text-mining framework in which these subsystems are treated as constituents of a knowledge discovery process for text corpora.
ISSN:0883-9514
1087-6545
DOI:10.1080/713827177