TripleRank: An unsupervised keyphrase extraction algorithm

Automatic keyphrase extraction algorithms aim to identify words and phrases that contain the core information in documents. As online scholarly resources have become widespread in recent years, better keyphrase extraction techniques are required to improve search efficiency. We present two features,...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Knowledge-based systems 2021-05, Vol.219, p.106846, Article 106846
Hauptverfasser: Li, Tuohang, Hu, Liang, Li, Hongtu, Sun, Chengyu, Li, Shuai, Chi, Ling
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Automatic keyphrase extraction algorithms aim to identify words and phrases that contain the core information in documents. As online scholarly resources have become widespread in recent years, better keyphrase extraction techniques are required to improve search efficiency. We present two features, keyphrase semantic diversity and keyphrase coverage, to overcome limitations of existing methods for unsupervised keyphrase extraction. Keyphrase semantic diversity is the degree of semantic variety in the extraction result, which is introduced to avoid extracting synonym phrases that contain the same high-score candidate. Keyphrase coverage refers to candidates’ representativeness of other words in documents. We propose an unsupervised keyphrase extraction method called TripleRank, which evaluates three features: word position (a sensitive feature for academic documents) and two innovative features mentioned above. The architecture of TripleRank includes three sub-models that score the three features and a summing model. Though involving multiple models, there is no typical iteration process in TripleRank; hence, the computational cost is relatively low. TripleRank has led the experiment results on four academic datasets compared to four state-of-the-art baseline models, which confirmed the influence of keyphrase semantic diversity and keyphrase coverage and proved the efficiency of our method.
ISSN:0950-7051
1872-7409
DOI:10.1016/j.knosys.2021.106846