Using heterogeneous linguistic knowledge in local coherence identification for information retrieval

This paper proposes a novel approach to automatic text segmentation without a full semantic understanding. In order to analyse the linguistic bonds and determine the degree of coherence that a text may exhibit, the tremendous diversity of textual relations in a discourse network is represented. A co...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Journal of information science 2000-01, Vol.26 (5), p.313-328
1. Verfasser:	Chan, Samuel W.K.
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Automatic text analysis Cluster Based Retrieval Cognitive aspects Coherence Discourse Exact sciences and technology Information and communication sciences Information processing and retrieval Information Retrieval Information retrieval systems. Information and document management system Information retrieval. Man machine relationship Information science. Documentation Linguistic analysis Linguistic Theory Linguistics Natural Language Processing Sciences and techniques of general use Studies System design and modelling Text Handling Textual Analysis
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	This paper proposes a novel approach to automatic text segmentation without a full semantic understanding. In order to analyse the linguistic bonds and determine the degree of coherence that a text may exhibit, the tremendous diversity of textual relations in a discourse network is represented. A corpus of mutual linguistic knowledge that captures the similarity of meaning and causal relations is encoded in the discourse network, which is then subjected to a cluster algorithm. As a result, segments in the text are segregated into clusters according to their textual similarity. Topic boundaries in a text can be identified by observing the shifts of segments from one cluster to another. The experimental results show that the combination of the heterogeneous knowledge is capable of addressing the topic shifts. Comparison with a related method demonstrates that the algorithm is closely related to the topic boundaries. Given the increasing recognition of text structure in the fields of information retrieval in unpartitioned text, this approach provides a quantitative model and an efficient tool in text segmentation.
ISSN:	0165-5515 1741-6485
DOI:	10.1177/016555150002600504