Improvement of building field association term dictionary using passage retrieval
Field Association ( FA) terms are a limited set of discriminating terms that can specify document fields. Document fields can be decided efficiently if there are many relevant FA terms in that documents. An earlier approach built FA terms dictionary using a WWW search engine, but there were irreleva...
Gespeichert in:
Veröffentlicht in: | Information processing & management 2007-11, Vol.43 (6), p.1793-1807 |
---|---|
Hauptverfasser: | , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Field Association (
FA)
terms are a limited set of discriminating terms that can specify document fields. Document fields can be decided efficiently if there are many relevant
FA terms in that documents. An earlier approach built
FA terms dictionary using a
WWW search engine, but there were irrelevant selected
FA terms in that dictionary because that approach extracted
FA terms from the whole documents. This paper proposes a new approach for extracting
FA terms using passage (portions of a document text) technique rather than extracting them from the whole documents. This approach extracts
FA terms more accurately than the earlier approach. The proposed approach is evaluated for 38,372 articles from the large tagged corpus. According to experimental results, it turns out that by using the new approach about 24% more relevant
FA terms are appending to the earlier
FA term dictionary and around 32% irrelevant
FA terms are deleted. Moreover, precision and recall are achieved 98% and 94% respectively using the new approach. |
---|---|
ISSN: | 0306-4573 1873-5371 |
DOI: | 10.1016/j.ipm.2006.12.006 |