Designing an efficient unigram keyword detector for documents using Relative Entropy

In this work we propose a statistical approach to identify unigram keywords for a document. We identify unigram keywords as features which effectively captures the importance of a word in a document and evaluates its potential to be a keyword. We make use of relative entropy, displacement and varian...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Multimedia tools and applications 2022-11, Vol.81 (26), p.37747-37761
Hauptverfasser:	Rathi, R. N., Mustafi, A.
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Computer Communication Networks Computer Science Data Structures and Information Theory Documents Entropy Keywords Multimedia Multimedia Information Systems Special Purpose and Application-Based Systems Text analysis
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	In this work we propose a statistical approach to identify unigram keywords for a document. We identify unigram keywords as features which effectively captures the importance of a word in a document and evaluates its potential to be a keyword. We make use of relative entropy, displacement and variance of terms in a document have been evaluated in the context of keyword identification. The proposed approach works on single documents without the requirement of any pre-training of the model. We also evaluate the effectiveness of our features against the gold standard of “term frequency” and compare the usefulness of the proposed feature set with term frequency. The results of our proposed method are presented and compared with existing algorithms.
ISSN:	1380-7501 1573-7721
DOI:	10.1007/s11042-022-12657-x