Multifeature Fusion Keyword Extraction Algorithm Based on TextRank

Keyword extraction is the predecessor of many tasks, and its results directly affect search, recommendation, classification, and other tasks. In this study, we take Chinese text as the research object and propose a multi-feature fusion keyword extraction algorithm combined with BERT semantics and K-...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE access 2022, Vol.10, p.71805-71813
Hauptverfasser:	Guo, Wenming, Wang, Zihao, Han, Fang
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms BERT word vector Bit error rate Data mining Dictionaries Electronic mail Feature extraction Information retrieval Iterative methods K-Truss graph Keyword extraction Keywords Semantics Task analysis TextRank Trusses Words (language)
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	71813
container_issue
container_start_page	71805
container_title	IEEE access
container_volume	10
creator	Guo, Wenming Wang, Zihao Han, Fang
description	Keyword extraction is the predecessor of many tasks, and its results directly affect search, recommendation, classification, and other tasks. In this study, we take Chinese text as the research object and propose a multi-feature fusion keyword extraction algorithm combined with BERT semantics and K-Truss graph(BSKT). The BSKT algorithm is based on the TextRank algorithm, which combines BERT semantic features, K-Truss features, and other features. First, the BSKT algorithm obtains the word vectors from the BERT pretraining model to calculate the semantic difference, which is used to optimize the iterative process of the TextRank word graph. Then, the BSKT algorithm obtains its K-Truss graph by decomposing the TextRank word graph and obtains the truss level feature of the word. Finally, by combining the word IDF and truss level features, the BSKT algorithm scores the words to extract keywords. Experimental results show that the BSKT algorithm achieves better performance than the latest keyword extraction algorithm SCTR in the task of extracting 1-10 keywords. Furthermore, the increment in F1 increased by 11.2% when the BSKT algorithm was used to extract three keywords from the Sensor dataset.
doi_str_mv	10.1109/ACCESS.2022.3188861
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2688692454</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9815884</ieee_id><doaj_id>oai_doaj_org_article_2e003bd00081478c8e2909d1308f7145</doaj_id><sourcerecordid>2688692454</sourcerecordid><originalsourceid>FETCH-LOGICAL-c408t-ac461876d64b11f83a35e5eba10e9d0e5cf7aa7c07363f09c371e9af6cd8cf413</originalsourceid><addsrcrecordid>eNpNkFFPwjAUhRejiQT5Bbws8XnYrl3XPsICSsSYCD43pbvFIVDsugj_3sII8b705uSc0_aLoj5GA4yReBoWxXg-H6QoTQcEc84Zvok6KWYiIRlht__2-6hX12sUhgcpyzvR6K3Z-MqA8o2DeNLUld3Fr3D8ta6MxwfvlPYnabhZWVf5r208UjWUcZAWcPAfavf9EN0Ztamhdzm70edkvChektn787QYzhJNEfeJ0pRhnrOS0SXGhhNFMshgqTACUSLItMmVyjXKCSMGCU1yDEIZpkuuDcWkG03b3tKqtdy7aqvcUVpVybNg3Uoq5yu9AZkCQmRZnv9Jc645pAKJEhPETY5pFroe2669sz8N1F6ubeN24fkyZYGgSGlGg4u0Lu1sXTsw11sxkif2smUvT-zlhX1I9dtUBQDXhOA445ySP0ZafoU</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2688692454</pqid></control><display><type>article</type><title>Multifeature Fusion Keyword Extraction Algorithm Based on TextRank</title><source>IEEE Open Access Journals</source><source>DOAJ Directory of Open Access Journals</source><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><creator>Guo, Wenming ; Wang, Zihao ; Han, Fang</creator><creatorcontrib>Guo, Wenming ; Wang, Zihao ; Han, Fang</creatorcontrib><description>Keyword extraction is the predecessor of many tasks, and its results directly affect search, recommendation, classification, and other tasks. In this study, we take Chinese text as the research object and propose a multi-feature fusion keyword extraction algorithm combined with BERT semantics and K-Truss graph(BSKT). The BSKT algorithm is based on the TextRank algorithm, which combines BERT semantic features, K-Truss features, and other features. First, the BSKT algorithm obtains the word vectors from the BERT pretraining model to calculate the semantic difference, which is used to optimize the iterative process of the TextRank word graph. Then, the BSKT algorithm obtains its K-Truss graph by decomposing the TextRank word graph and obtains the truss level feature of the word. Finally, by combining the word IDF and truss level features, the BSKT algorithm scores the words to extract keywords. Experimental results show that the BSKT algorithm achieves better performance than the latest keyword extraction algorithm SCTR in the task of extracting 1-10 keywords. Furthermore, the increment in F1 increased by 11.2% when the BSKT algorithm was used to extract three keywords from the Sensor dataset.</description><identifier>ISSN: 2169-3536</identifier><identifier>EISSN: 2169-3536</identifier><identifier>DOI: 10.1109/ACCESS.2022.3188861</identifier><identifier>CODEN: IAECCG</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Algorithms ; BERT word vector ; Bit error rate ; Data mining ; Dictionaries ; Electronic mail ; Feature extraction ; Information retrieval ; Iterative methods ; K-Truss graph ; Keyword extraction ; Keywords ; Semantics ; Task analysis ; TextRank ; Trusses ; Words (language)</subject><ispartof>IEEE access, 2022, Vol.10, p.71805-71813</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2022</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c408t-ac461876d64b11f83a35e5eba10e9d0e5cf7aa7c07363f09c371e9af6cd8cf413</citedby><cites>FETCH-LOGICAL-c408t-ac461876d64b11f83a35e5eba10e9d0e5cf7aa7c07363f09c371e9af6cd8cf413</cites><orcidid>0000-0003-2336-1434</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9815884$$EHTML$$P50$$Gieee$$Hfree_for_read</linktohtml><link.rule.ids>315,781,785,865,2103,4025,27635,27925,27926,27927,54935</link.rule.ids></links><search><creatorcontrib>Guo, Wenming</creatorcontrib><creatorcontrib>Wang, Zihao</creatorcontrib><creatorcontrib>Han, Fang</creatorcontrib><title>Multifeature Fusion Keyword Extraction Algorithm Based on TextRank</title><title>IEEE access</title><addtitle>Access</addtitle><description>Keyword extraction is the predecessor of many tasks, and its results directly affect search, recommendation, classification, and other tasks. In this study, we take Chinese text as the research object and propose a multi-feature fusion keyword extraction algorithm combined with BERT semantics and K-Truss graph(BSKT). The BSKT algorithm is based on the TextRank algorithm, which combines BERT semantic features, K-Truss features, and other features. First, the BSKT algorithm obtains the word vectors from the BERT pretraining model to calculate the semantic difference, which is used to optimize the iterative process of the TextRank word graph. Then, the BSKT algorithm obtains its K-Truss graph by decomposing the TextRank word graph and obtains the truss level feature of the word. Finally, by combining the word IDF and truss level features, the BSKT algorithm scores the words to extract keywords. Experimental results show that the BSKT algorithm achieves better performance than the latest keyword extraction algorithm SCTR in the task of extracting 1-10 keywords. Furthermore, the increment in F1 increased by 11.2% when the BSKT algorithm was used to extract three keywords from the Sensor dataset.</description><subject>Algorithms</subject><subject>BERT word vector</subject><subject>Bit error rate</subject><subject>Data mining</subject><subject>Dictionaries</subject><subject>Electronic mail</subject><subject>Feature extraction</subject><subject>Information retrieval</subject><subject>Iterative methods</subject><subject>K-Truss graph</subject><subject>Keyword extraction</subject><subject>Keywords</subject><subject>Semantics</subject><subject>Task analysis</subject><subject>TextRank</subject><subject>Trusses</subject><subject>Words (language)</subject><issn>2169-3536</issn><issn>2169-3536</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>ESBDL</sourceid><sourceid>RIE</sourceid><sourceid>DOA</sourceid><recordid>eNpNkFFPwjAUhRejiQT5Bbws8XnYrl3XPsICSsSYCD43pbvFIVDsugj_3sII8b705uSc0_aLoj5GA4yReBoWxXg-H6QoTQcEc84Zvok6KWYiIRlht__2-6hX12sUhgcpyzvR6K3Z-MqA8o2DeNLUld3Fr3D8ta6MxwfvlPYnabhZWVf5r208UjWUcZAWcPAfavf9EN0Ztamhdzm70edkvChektn787QYzhJNEfeJ0pRhnrOS0SXGhhNFMshgqTACUSLItMmVyjXKCSMGCU1yDEIZpkuuDcWkG03b3tKqtdy7aqvcUVpVybNg3Uoq5yu9AZkCQmRZnv9Jc645pAKJEhPETY5pFroe2669sz8N1F6ubeN24fkyZYGgSGlGg4u0Lu1sXTsw11sxkif2smUvT-zlhX1I9dtUBQDXhOA445ySP0ZafoU</recordid><startdate>2022</startdate><enddate>2022</enddate><creator>Guo, Wenming</creator><creator>Wang, Zihao</creator><creator>Han, Fang</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>ESBDL</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>7SR</scope><scope>8BQ</scope><scope>8FD</scope><scope>JG9</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0003-2336-1434</orcidid></search><sort><creationdate>2022</creationdate><title>Multifeature Fusion Keyword Extraction Algorithm Based on TextRank</title><author>Guo, Wenming ; Wang, Zihao ; Han, Fang</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c408t-ac461876d64b11f83a35e5eba10e9d0e5cf7aa7c07363f09c371e9af6cd8cf413</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Algorithms</topic><topic>BERT word vector</topic><topic>Bit error rate</topic><topic>Data mining</topic><topic>Dictionaries</topic><topic>Electronic mail</topic><topic>Feature extraction</topic><topic>Information retrieval</topic><topic>Iterative methods</topic><topic>K-Truss graph</topic><topic>Keyword extraction</topic><topic>Keywords</topic><topic>Semantics</topic><topic>Task analysis</topic><topic>TextRank</topic><topic>Trusses</topic><topic>Words (language)</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Guo, Wenming</creatorcontrib><creatorcontrib>Wang, Zihao</creatorcontrib><creatorcontrib>Han, Fang</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE Open Access Journals</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>IEEE access</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Guo, Wenming</au><au>Wang, Zihao</au><au>Han, Fang</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Multifeature Fusion Keyword Extraction Algorithm Based on TextRank</atitle><jtitle>IEEE access</jtitle><stitle>Access</stitle><date>2022</date><risdate>2022</risdate><volume>10</volume><spage>71805</spage><epage>71813</epage><pages>71805-71813</pages><issn>2169-3536</issn><eissn>2169-3536</eissn><coden>IAECCG</coden><abstract>Keyword extraction is the predecessor of many tasks, and its results directly affect search, recommendation, classification, and other tasks. In this study, we take Chinese text as the research object and propose a multi-feature fusion keyword extraction algorithm combined with BERT semantics and K-Truss graph(BSKT). The BSKT algorithm is based on the TextRank algorithm, which combines BERT semantic features, K-Truss features, and other features. First, the BSKT algorithm obtains the word vectors from the BERT pretraining model to calculate the semantic difference, which is used to optimize the iterative process of the TextRank word graph. Then, the BSKT algorithm obtains its K-Truss graph by decomposing the TextRank word graph and obtains the truss level feature of the word. Finally, by combining the word IDF and truss level features, the BSKT algorithm scores the words to extract keywords. Experimental results show that the BSKT algorithm achieves better performance than the latest keyword extraction algorithm SCTR in the task of extracting 1-10 keywords. Furthermore, the increment in F1 increased by 11.2% when the BSKT algorithm was used to extract three keywords from the Sensor dataset.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/ACCESS.2022.3188861</doi><tpages>9</tpages><orcidid>https://orcid.org/0000-0003-2336-1434</orcidid><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 2169-3536
ispartof	IEEE access, 2022, Vol.10, p.71805-71813
issn	2169-3536 2169-3536
language	eng
recordid	cdi_proquest_journals_2688692454
source	IEEE Open Access Journals; DOAJ Directory of Open Access Journals; Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals
subjects	Algorithms BERT word vector Bit error rate Data mining Dictionaries Electronic mail Feature extraction Information retrieval Iterative methods K-Truss graph Keyword extraction Keywords Semantics Task analysis TextRank Trusses Words (language)
title	Multifeature Fusion Keyword Extraction Algorithm Based on TextRank
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-18T09%3A07%3A24IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Multifeature%20Fusion%20Keyword%20Extraction%20Algorithm%20Based%20on%20TextRank&rft.jtitle=IEEE%20access&rft.au=Guo,%20Wenming&rft.date=2022&rft.volume=10&rft.spage=71805&rft.epage=71813&rft.pages=71805-71813&rft.issn=2169-3536&rft.eissn=2169-3536&rft.coden=IAECCG&rft_id=info:doi/10.1109/ACCESS.2022.3188861&rft_dat=%3Cproquest_cross%3E2688692454%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2688692454&rft_id=info:pmid/&rft_ieee_id=9815884&rft_doaj_id=oai_doaj_org_article_2e003bd00081478c8e2909d1308f7145&rfr_iscdi=true