An efficient domain-independent approach for supervised keyphrase extraction and ranking

We present a supervised learning approach for automatic extraction of keyphrases from single documents. Our solution uses simple to compute statistical and positional features of candidate phrases and does not rely on any external knowledge base or on pre-trained language models or word embeddings....

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
1. Verfasser: Ramaswamy, Sriraghavendra
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Ramaswamy, Sriraghavendra
description We present a supervised learning approach for automatic extraction of keyphrases from single documents. Our solution uses simple to compute statistical and positional features of candidate phrases and does not rely on any external knowledge base or on pre-trained language models or word embeddings. The ranking component of our proposed solution is a fairly lightweight ensemble model. Evaluation on benchmark datasets shows that our approach achieves significantly higher accuracy than several state-of-the-art baseline models, including all deep learning-based unsupervised models compared with, and is competitive with some supervised deep learning-based models too. Despite the supervised nature of our solution, the fact that does not rely on any corpus of "golden" keywords or any external knowledge corpus means that our solution bears the advantages of unsupervised solutions to a fair extent.
doi_str_mv 10.48550/arxiv.2404.07954
format Article
fullrecord <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2404_07954</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2404_07954</sourcerecordid><originalsourceid>FETCH-LOGICAL-a674-74f0f2404d62ea01ee64ff8bd917d47fbdc2ed502ffd1d3d46111906215377b3</originalsourceid><addsrcrecordid>eNotj71uwjAUhb10qGgfoFP9AkntxD9kRKh_ElKHdugW3eTeCxbgWE6K4O0LtMs50hk-nU-IB61KM7dWPUE-hkNZGWVK5RtrbsX3IkpiDn2gOEkc9hBiESJSonOcJ0gpD9BvJA9Zjj-J8iGMhHJLp7TJMJKk45Shn8IQJUSUGeI2xPWduGHYjXT_3zPx-fL8tXwrVh-v78vFqgDnTeENK778QVcRKE3kDPO8w0Z7NJ477CtCqypm1FijcVrrRrlK29r7rp6Jxz_q1axNOewhn9oLsb0a1r-Ii01g</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>An efficient domain-independent approach for supervised keyphrase extraction and ranking</title><source>arXiv.org</source><creator>Ramaswamy, Sriraghavendra</creator><creatorcontrib>Ramaswamy, Sriraghavendra</creatorcontrib><description>We present a supervised learning approach for automatic extraction of keyphrases from single documents. Our solution uses simple to compute statistical and positional features of candidate phrases and does not rely on any external knowledge base or on pre-trained language models or word embeddings. The ranking component of our proposed solution is a fairly lightweight ensemble model. Evaluation on benchmark datasets shows that our approach achieves significantly higher accuracy than several state-of-the-art baseline models, including all deep learning-based unsupervised models compared with, and is competitive with some supervised deep learning-based models too. Despite the supervised nature of our solution, the fact that does not rely on any corpus of "golden" keywords or any external knowledge corpus means that our solution bears the advantages of unsupervised solutions to a fair extent.</description><identifier>DOI: 10.48550/arxiv.2404.07954</identifier><language>eng</language><subject>Computer Science - Computation and Language ; Computer Science - Information Retrieval ; Computer Science - Learning</subject><creationdate>2024-03</creationdate><rights>http://creativecommons.org/licenses/by-sa/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2404.07954$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2404.07954$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Ramaswamy, Sriraghavendra</creatorcontrib><title>An efficient domain-independent approach for supervised keyphrase extraction and ranking</title><description>We present a supervised learning approach for automatic extraction of keyphrases from single documents. Our solution uses simple to compute statistical and positional features of candidate phrases and does not rely on any external knowledge base or on pre-trained language models or word embeddings. The ranking component of our proposed solution is a fairly lightweight ensemble model. Evaluation on benchmark datasets shows that our approach achieves significantly higher accuracy than several state-of-the-art baseline models, including all deep learning-based unsupervised models compared with, and is competitive with some supervised deep learning-based models too. Despite the supervised nature of our solution, the fact that does not rely on any corpus of "golden" keywords or any external knowledge corpus means that our solution bears the advantages of unsupervised solutions to a fair extent.</description><subject>Computer Science - Computation and Language</subject><subject>Computer Science - Information Retrieval</subject><subject>Computer Science - Learning</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotj71uwjAUhb10qGgfoFP9AkntxD9kRKh_ElKHdugW3eTeCxbgWE6K4O0LtMs50hk-nU-IB61KM7dWPUE-hkNZGWVK5RtrbsX3IkpiDn2gOEkc9hBiESJSonOcJ0gpD9BvJA9Zjj-J8iGMhHJLp7TJMJKk45Shn8IQJUSUGeI2xPWduGHYjXT_3zPx-fL8tXwrVh-v78vFqgDnTeENK778QVcRKE3kDPO8w0Z7NJ477CtCqypm1FijcVrrRrlK29r7rp6Jxz_q1axNOewhn9oLsb0a1r-Ii01g</recordid><startdate>20240324</startdate><enddate>20240324</enddate><creator>Ramaswamy, Sriraghavendra</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20240324</creationdate><title>An efficient domain-independent approach for supervised keyphrase extraction and ranking</title><author>Ramaswamy, Sriraghavendra</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a674-74f0f2404d62ea01ee64ff8bd917d47fbdc2ed502ffd1d3d46111906215377b3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Computation and Language</topic><topic>Computer Science - Information Retrieval</topic><topic>Computer Science - Learning</topic><toplevel>online_resources</toplevel><creatorcontrib>Ramaswamy, Sriraghavendra</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Ramaswamy, Sriraghavendra</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>An efficient domain-independent approach for supervised keyphrase extraction and ranking</atitle><date>2024-03-24</date><risdate>2024</risdate><abstract>We present a supervised learning approach for automatic extraction of keyphrases from single documents. Our solution uses simple to compute statistical and positional features of candidate phrases and does not rely on any external knowledge base or on pre-trained language models or word embeddings. The ranking component of our proposed solution is a fairly lightweight ensemble model. Evaluation on benchmark datasets shows that our approach achieves significantly higher accuracy than several state-of-the-art baseline models, including all deep learning-based unsupervised models compared with, and is competitive with some supervised deep learning-based models too. Despite the supervised nature of our solution, the fact that does not rely on any corpus of "golden" keywords or any external knowledge corpus means that our solution bears the advantages of unsupervised solutions to a fair extent.</abstract><doi>10.48550/arxiv.2404.07954</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier DOI: 10.48550/arxiv.2404.07954
ispartof
issn
language eng
recordid cdi_arxiv_primary_2404_07954
source arXiv.org
subjects Computer Science - Computation and Language
Computer Science - Information Retrieval
Computer Science - Learning
title An efficient domain-independent approach for supervised keyphrase extraction and ranking
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-25T13%3A27%3A53IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=An%20efficient%20domain-independent%20approach%20for%20supervised%20keyphrase%20extraction%20and%20ranking&rft.au=Ramaswamy,%20Sriraghavendra&rft.date=2024-03-24&rft_id=info:doi/10.48550/arxiv.2404.07954&rft_dat=%3Carxiv_GOX%3E2404_07954%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true