Designing an efficient unigram keyword detector for documents using Relative Entropy
In this work we propose a statistical approach to identify unigram keywords for a document. We identify unigram keywords as features which effectively captures the importance of a word in a document and evaluates its potential to be a keyword. We make use of relative entropy, displacement and varian...
Gespeichert in:
Veröffentlicht in: | Multimedia tools and applications 2022-11, Vol.81 (26), p.37747-37761 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | In this work we propose a statistical approach to identify unigram keywords for a document. We identify unigram keywords as features which effectively captures the importance of a word in a document and evaluates its potential to be a keyword. We make use of relative entropy, displacement and variance of terms in a document have been evaluated in the context of keyword identification. The proposed approach works on single documents without the requirement of any pre-training of the model. We also evaluate the effectiveness of our features against the gold standard of “term frequency” and compare the usefulness of the proposed feature set with term frequency. The results of our proposed method are presented and compared with existing algorithms. |
---|---|
ISSN: | 1380-7501 1573-7721 |
DOI: | 10.1007/s11042-022-12657-x |