Multi-word terms selection for information retrieval
Purpose A number of approaches and algorithms have been proposed over the years as a basis for automatic indexing. Many of these approaches suffer from precision inefficiency at low recall. The choice of indexing units has a great impact on search system effectiveness. The authors dive beyond simple...
Gespeichert in:
Veröffentlicht in: | Information discovery and delivery 2023-01, Vol.51 (1), p.74-87 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Purpose
A number of approaches and algorithms have been proposed over the years as a basis for automatic indexing. Many of these approaches suffer from precision inefficiency at low recall. The choice of indexing units has a great impact on search system effectiveness. The authors dive beyond simple terms indexing to propose a framework for multi-word terms (MWT) filtering and indexing.
Design/methodology/approach
In this paper, the authors rely on ranking MWT to filter them, keeping the most effective ones for the indexing process. The proposed model is based on filtering MWT according to their ability to capture the document topic and distinguish between different documents from the same collection. The authors rely on the hypothesis that the best MWT are those that achieve the greatest association degree. The experiments are carried out with English and French languages data sets.
Findings
The results indicate that this approach achieved precision enhancements at low recall, and it performed better than more advanced models based on terms dependencies.
Originality/value
Using and testing different association measures to select MWT that best describe the documents to enhance the precision in the first retrieved documents. |
---|---|
ISSN: | 2398-6247 2398-6255 2398-6247 |
DOI: | 10.1108/IDD-12-2021-0142 |