Exploring term dependences in probabilistic information retrieval model

Most previous information retrieval (IR) models assume that terms of queries and documents are statistically independent from each another. However, this kind of conditional independence assumption is obviously and openly understood to be wrong, so we present a new method of incorporating term depen...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Information processing & management 2003-07, Vol.39 (4), p.505-519
Hauptverfasser: Cho, Bong-Hyun, Lee, Changki, Lee, Gary Geunbae
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Most previous information retrieval (IR) models assume that terms of queries and documents are statistically independent from each another. However, this kind of conditional independence assumption is obviously and openly understood to be wrong, so we present a new method of incorporating term dependence in probabilistic retrieval model by adapting Bahadur–Lazarsfeld expansion (BLE) to compensate the weakness of the assumption. In this paper, we describe a theoretic process to apply BLE to the general probabilistic models and the state-of-the-art 2-Poisson model. Through the experiments on two standard document collections, HANTEC2.0 in Korean and WT10g in English, we demonstrate that incorporation of term dependences using the BLE significantly contribute to the improvement of performance in at least two different language IR systems.
ISSN:0306-4573
1873-5371
DOI:10.1016/S0306-4573(02)00078-X