An efficient domain-independent approach for supervised keyphrase extraction and ranking

We present a supervised learning approach for automatic extraction of keyphrases from single documents. Our solution uses simple to compute statistical and positional features of candidate phrases and does not rely on any external knowledge base or on pre-trained language models or word embeddings....

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
1. Verfasser:	Ramaswamy, Sriraghavendra
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Computation and Language Computer Science - Information Retrieval Computer Science - Learning
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Ramaswamy, Sriraghavendra
description	We present a supervised learning approach for automatic extraction of keyphrases from single documents. Our solution uses simple to compute statistical and positional features of candidate phrases and does not rely on any external knowledge base or on pre-trained language models or word embeddings. The ranking component of our proposed solution is a fairly lightweight ensemble model. Evaluation on benchmark datasets shows that our approach achieves significantly higher accuracy than several state-of-the-art baseline models, including all deep learning-based unsupervised models compared with, and is competitive with some supervised deep learning-based models too. Despite the supervised nature of our solution, the fact that does not rely on any corpus of "golden" keywords or any external knowledge corpus means that our solution bears the advantages of unsupervised solutions to a fair extent.
doi_str_mv	10.48550/arxiv.2404.07954
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2404_07954</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2404_07954</sourcerecordid><originalsourceid>FETCH-LOGICAL-a674-74f0f2404d62ea01ee64ff8bd917d47fbdc2ed502ffd1d3d46111906215377b3</originalsourceid><addsrcrecordid>eNotj71uwjAUhb10qGgfoFP9AkntxD9kRKh_ElKHdugW3eTeCxbgWE6K4O0LtMs50hk-nU-IB61KM7dWPUE-hkNZGWVK5RtrbsX3IkpiDn2gOEkc9hBiESJSonOcJ0gpD9BvJA9Zjj-J8iGMhHJLp7TJMJKk45Shn8IQJUSUGeI2xPWduGHYjXT_3zPx-fL8tXwrVh-v78vFqgDnTeENK778QVcRKE3kDPO8w0Z7NJ477CtCqypm1FijcVrrRrlK29r7rp6Jxz_q1axNOewhn9oLsb0a1r-Ii01g</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>An efficient domain-independent approach for supervised keyphrase extraction and ranking</title><source>arXiv.org</source><creator>Ramaswamy, Sriraghavendra</creator><creatorcontrib>Ramaswamy, Sriraghavendra</creatorcontrib><description>We present a supervised learning approach for automatic extraction of keyphrases from single documents. Our solution uses simple to compute statistical and positional features of candidate phrases and does not rely on any external knowledge base or on pre-trained language models or word embeddings. The ranking component of our proposed solution is a fairly lightweight ensemble model. Evaluation on benchmark datasets shows that our approach achieves significantly higher accuracy than several state-of-the-art baseline models, including all deep learning-based unsupervised models compared with, and is competitive with some supervised deep learning-based models too. Despite the supervised nature of our solution, the fact that does not rely on any corpus of "golden" keywords or any external knowledge corpus means that our solution bears the advantages of unsupervised solutions to a fair extent.</description><identifier>DOI: 10.48550/arxiv.2404.07954</identifier><language>eng</language><subject>Computer Science - Computation and Language ; Computer Science - Information Retrieval ; Computer Science - Learning</subject><creationdate>2024-03</creationdate><rights>http://creativecommons.org/licenses/by-sa/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2404.07954$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2404.07954$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Ramaswamy, Sriraghavendra</creatorcontrib><title>An efficient domain-independent approach for supervised keyphrase extraction and ranking</title><description>We present a supervised learning approach for automatic extraction of keyphrases from single documents. Our solution uses simple to compute statistical and positional features of candidate phrases and does not rely on any external knowledge base or on pre-trained language models or word embeddings. The ranking component of our proposed solution is a fairly lightweight ensemble model. Evaluation on benchmark datasets shows that our approach achieves significantly higher accuracy than several state-of-the-art baseline models, including all deep learning-based unsupervised models compared with, and is competitive with some supervised deep learning-based models too. Despite the supervised nature of our solution, the fact that does not rely on any corpus of "golden" keywords or any external knowledge corpus means that our solution bears the advantages of unsupervised solutions to a fair extent.</description><subject>Computer Science - Computation and Language</subject><subject>Computer Science - Information Retrieval</subject><subject>Computer Science - Learning</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotj71uwjAUhb10qGgfoFP9AkntxD9kRKh_ElKHdugW3eTeCxbgWE6K4O0LtMs50hk-nU-IB61KM7dWPUE-hkNZGWVK5RtrbsX3IkpiDn2gOEkc9hBiESJSonOcJ0gpD9BvJA9Zjj-J8iGMhHJLp7TJMJKk45Shn8IQJUSUGeI2xPWduGHYjXT_3zPx-fL8tXwrVh-v78vFqgDnTeENK778QVcRKE3kDPO8w0Z7NJ477CtCqypm1FijcVrrRrlK29r7rp6Jxz_q1axNOewhn9oLsb0a1r-Ii01g</recordid><startdate>20240324</startdate><enddate>20240324</enddate><creator>Ramaswamy, Sriraghavendra</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20240324</creationdate><title>An efficient domain-independent approach for supervised keyphrase extraction and ranking</title><author>Ramaswamy, Sriraghavendra</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a674-74f0f2404d62ea01ee64ff8bd917d47fbdc2ed502ffd1d3d46111906215377b3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Computation and Language</topic><topic>Computer Science - Information Retrieval</topic><topic>Computer Science - Learning</topic><toplevel>online_resources</toplevel><creatorcontrib>Ramaswamy, Sriraghavendra</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Ramaswamy, Sriraghavendra</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>An efficient domain-independent approach for supervised keyphrase extraction and ranking</atitle><date>2024-03-24</date><risdate>2024</risdate><abstract>We present a supervised learning approach for automatic extraction of keyphrases from single documents. Our solution uses simple to compute statistical and positional features of candidate phrases and does not rely on any external knowledge base or on pre-trained language models or word embeddings. The ranking component of our proposed solution is a fairly lightweight ensemble model. Evaluation on benchmark datasets shows that our approach achieves significantly higher accuracy than several state-of-the-art baseline models, including all deep learning-based unsupervised models compared with, and is competitive with some supervised deep learning-based models too. Despite the supervised nature of our solution, the fact that does not rely on any corpus of "golden" keywords or any external knowledge corpus means that our solution bears the advantages of unsupervised solutions to a fair extent.</abstract><doi>10.48550/arxiv.2404.07954</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2404.07954
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2404_07954
source	arXiv.org
subjects	Computer Science - Computation and Language Computer Science - Information Retrieval Computer Science - Learning
title	An efficient domain-independent approach for supervised keyphrase extraction and ranking
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-25T13%3A27%3A53IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=An%20efficient%20domain-independent%20approach%20for%20supervised%20keyphrase%20extraction%20and%20ranking&rft.au=Ramaswamy,%20Sriraghavendra&rft.date=2024-03-24&rft_id=info:doi/10.48550/arxiv.2404.07954&rft_dat=%3Carxiv_GOX%3E2404_07954%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true