Using heterogeneous linguistic knowledge in local coherence identification for information retrieval
This paper proposes a novel approach to automatic text segmentation without a full semantic understanding. In order to analyse the linguistic bonds and determine the degree of coherence that a text may exhibit, the tremendous diversity of textual relations in a discourse network is represented. A co...
Gespeichert in:
Veröffentlicht in: | Journal of information science 2000-01, Vol.26 (5), p.313-328 |
---|---|
1. Verfasser: | |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | This paper proposes a novel approach to automatic text segmentation without a full semantic understanding. In order to analyse the linguistic bonds and determine the degree of coherence that a text may exhibit, the tremendous diversity of textual relations in a discourse network is represented. A corpus of mutual linguistic knowledge that captures the similarity of meaning and causal relations is encoded in the discourse network, which is then subjected to a cluster algorithm. As a result, segments in the text are segregated into clusters according to their textual similarity. Topic boundaries in a text can be identified by observing the shifts of segments from one cluster to another. The experimental results show that the combination of the heterogeneous knowledge is capable of addressing the topic shifts. Comparison with a related method demonstrates that the algorithm is closely related to the topic boundaries. Given the increasing recognition of text structure in the fields of information retrieval in unpartitioned text, this approach provides a quantitative model and an efficient tool in text segmentation. |
---|---|
ISSN: | 0165-5515 1741-6485 |
DOI: | 10.1177/016555150002600504 |