A multi-view method of scientific paper classification via heterogeneous graph embeddings

The classification task of scientific papers can be implemented based on contents or citations. In order to improve the performance on this task, we express papers as nodes and integrate scientific papers’ contents and citations into a heterogeneous graph. It has two types of edges. One type represe...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Scientometrics 2022-08, Vol.127 (8), p.4847-4872
Hauptverfasser:	Lv, Yiqin, Xie, Zheng, Zuo, Xiaojing, Song, Yiping
Format:	Artikel
Sprache:	eng
Schlagworte:	Classification Computer Science Decision trees Graph theory Information Storage and Retrieval Library Science Multilayer perceptrons Nodes Scientific papers Vector spaces
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	4872
container_issue	8
container_start_page	4847
container_title	Scientometrics
container_volume	127
creator	Lv, Yiqin Xie, Zheng Zuo, Xiaojing Song, Yiping
description	The classification task of scientific papers can be implemented based on contents or citations. In order to improve the performance on this task, we express papers as nodes and integrate scientific papers’ contents and citations into a heterogeneous graph. It has two types of edges. One type represents the semantic similarity between papers, derived from papers’ titles and abstracts. The other type represents the citation relationship between papers and the journals or proceedings of conferences of their references. We utilize a contrastive learning method to embed the nodes in the heterogeneous graph into a vector space. Then, we feed the paper node vectors into classifiers, such as the decision tree, multilayer perceptron, and so on. We conduct experiments on three datasets of scientific papers: the Microsoft Academic Graph with 63,211 scientific papers in 20 classes, the Proceedings of the National Academy of Sciences with 38,243 scientific papers in 18 classes, and the American Physical Society with 443,845 scientific papers in 5 classes. The experimental results on the multi-class task show that our multi-view method scores the classification accuracy up to 98%, outperforming state-of-the-arts.
doi_str_mv	10.1007/s11192-022-04419-1
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2700752681</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2700752681</sourcerecordid><originalsourceid>FETCH-LOGICAL-c319t-75a52f085f95d2af660d9b2eeb3d39de82f541786d5cb243d71cee51fef43bec3</originalsourceid><addsrcrecordid>eNp9kE1LxDAQhoMouK7-AU8Bz9VM0rTpcVn8ggUvevAU0mbSzbL9MGlX_Pd2reDNwzAMvM8M8xByDewWGMvvIgAUPGF8qjSFIoETsgCpVMJVBqdkwUCopADBzslFjDs2QYKpBXlf0WbcDz45ePykDQ7bztLO0Vh5bAfvfEV702Og1d7EeJzN4LuWHryhWxwwdDW22I2R1sH0W4pNidb6to6X5MyZfcSr374kbw_3r-unZPPy-LxebZJKQDEkuTSSO6akK6TlxmUZs0XJEUthRWFRcSdTyFVmZVXyVNgcKkQJDl0qSqzEktzMe_vQfYwYB73rxtBOJzXPpzclzxRMKT6nqtDFGNDpPvjGhC8NTB8V6lmhnhTqH4X6CIkZilO4rTH8rf6H-gbOgnV7</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2700752681</pqid></control><display><type>article</type><title>A multi-view method of scientific paper classification via heterogeneous graph embeddings</title><source>SpringerNature Journals</source><creator>Lv, Yiqin ; Xie, Zheng ; Zuo, Xiaojing ; Song, Yiping</creator><creatorcontrib>Lv, Yiqin ; Xie, Zheng ; Zuo, Xiaojing ; Song, Yiping</creatorcontrib><description>The classification task of scientific papers can be implemented based on contents or citations. In order to improve the performance on this task, we express papers as nodes and integrate scientific papers’ contents and citations into a heterogeneous graph. It has two types of edges. One type represents the semantic similarity between papers, derived from papers’ titles and abstracts. The other type represents the citation relationship between papers and the journals or proceedings of conferences of their references. We utilize a contrastive learning method to embed the nodes in the heterogeneous graph into a vector space. Then, we feed the paper node vectors into classifiers, such as the decision tree, multilayer perceptron, and so on. We conduct experiments on three datasets of scientific papers: the Microsoft Academic Graph with 63,211 scientific papers in 20 classes, the Proceedings of the National Academy of Sciences with 38,243 scientific papers in 18 classes, and the American Physical Society with 443,845 scientific papers in 5 classes. The experimental results on the multi-class task show that our multi-view method scores the classification accuracy up to 98%, outperforming state-of-the-arts.</description><identifier>ISSN: 0138-9130</identifier><identifier>EISSN: 1588-2861</identifier><identifier>DOI: 10.1007/s11192-022-04419-1</identifier><language>eng</language><publisher>Cham: Springer International Publishing</publisher><subject>Classification ; Computer Science ; Decision trees ; Graph theory ; Information Storage and Retrieval ; Library Science ; Multilayer perceptrons ; Nodes ; Scientific papers ; Vector spaces</subject><ispartof>Scientometrics, 2022-08, Vol.127 (8), p.4847-4872</ispartof><rights>Akadémiai Kiadó, Budapest, Hungary 2022</rights><rights>Akadémiai Kiadó, Budapest, Hungary 2022.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c319t-75a52f085f95d2af660d9b2eeb3d39de82f541786d5cb243d71cee51fef43bec3</citedby><cites>FETCH-LOGICAL-c319t-75a52f085f95d2af660d9b2eeb3d39de82f541786d5cb243d71cee51fef43bec3</cites><orcidid>0000-0003-0391-8725</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s11192-022-04419-1$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s11192-022-04419-1$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>314,780,784,27924,27925,41488,42557,51319</link.rule.ids></links><search><creatorcontrib>Lv, Yiqin</creatorcontrib><creatorcontrib>Xie, Zheng</creatorcontrib><creatorcontrib>Zuo, Xiaojing</creatorcontrib><creatorcontrib>Song, Yiping</creatorcontrib><title>A multi-view method of scientific paper classification via heterogeneous graph embeddings</title><title>Scientometrics</title><addtitle>Scientometrics</addtitle><description>The classification task of scientific papers can be implemented based on contents or citations. In order to improve the performance on this task, we express papers as nodes and integrate scientific papers’ contents and citations into a heterogeneous graph. It has two types of edges. One type represents the semantic similarity between papers, derived from papers’ titles and abstracts. The other type represents the citation relationship between papers and the journals or proceedings of conferences of their references. We utilize a contrastive learning method to embed the nodes in the heterogeneous graph into a vector space. Then, we feed the paper node vectors into classifiers, such as the decision tree, multilayer perceptron, and so on. We conduct experiments on three datasets of scientific papers: the Microsoft Academic Graph with 63,211 scientific papers in 20 classes, the Proceedings of the National Academy of Sciences with 38,243 scientific papers in 18 classes, and the American Physical Society with 443,845 scientific papers in 5 classes. The experimental results on the multi-class task show that our multi-view method scores the classification accuracy up to 98%, outperforming state-of-the-arts.</description><subject>Classification</subject><subject>Computer Science</subject><subject>Decision trees</subject><subject>Graph theory</subject><subject>Information Storage and Retrieval</subject><subject>Library Science</subject><subject>Multilayer perceptrons</subject><subject>Nodes</subject><subject>Scientific papers</subject><subject>Vector spaces</subject><issn>0138-9130</issn><issn>1588-2861</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><recordid>eNp9kE1LxDAQhoMouK7-AU8Bz9VM0rTpcVn8ggUvevAU0mbSzbL9MGlX_Pd2reDNwzAMvM8M8xByDewWGMvvIgAUPGF8qjSFIoETsgCpVMJVBqdkwUCopADBzslFjDs2QYKpBXlf0WbcDz45ePykDQ7bztLO0Vh5bAfvfEV702Og1d7EeJzN4LuWHryhWxwwdDW22I2R1sH0W4pNidb6to6X5MyZfcSr374kbw_3r-unZPPy-LxebZJKQDEkuTSSO6akK6TlxmUZs0XJEUthRWFRcSdTyFVmZVXyVNgcKkQJDl0qSqzEktzMe_vQfYwYB73rxtBOJzXPpzclzxRMKT6nqtDFGNDpPvjGhC8NTB8V6lmhnhTqH4X6CIkZilO4rTH8rf6H-gbOgnV7</recordid><startdate>20220801</startdate><enddate>20220801</enddate><creator>Lv, Yiqin</creator><creator>Xie, Zheng</creator><creator>Zuo, Xiaojing</creator><creator>Song, Yiping</creator><general>Springer International Publishing</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>E3H</scope><scope>F2A</scope><orcidid>https://orcid.org/0000-0003-0391-8725</orcidid></search><sort><creationdate>20220801</creationdate><title>A multi-view method of scientific paper classification via heterogeneous graph embeddings</title><author>Lv, Yiqin ; Xie, Zheng ; Zuo, Xiaojing ; Song, Yiping</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c319t-75a52f085f95d2af660d9b2eeb3d39de82f541786d5cb243d71cee51fef43bec3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Classification</topic><topic>Computer Science</topic><topic>Decision trees</topic><topic>Graph theory</topic><topic>Information Storage and Retrieval</topic><topic>Library Science</topic><topic>Multilayer perceptrons</topic><topic>Nodes</topic><topic>Scientific papers</topic><topic>Vector spaces</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Lv, Yiqin</creatorcontrib><creatorcontrib>Xie, Zheng</creatorcontrib><creatorcontrib>Zuo, Xiaojing</creatorcontrib><creatorcontrib>Song, Yiping</creatorcontrib><collection>CrossRef</collection><collection>Library & Information Sciences Abstracts (LISA)</collection><collection>Library & Information Science Abstracts (LISA)</collection><jtitle>Scientometrics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Lv, Yiqin</au><au>Xie, Zheng</au><au>Zuo, Xiaojing</au><au>Song, Yiping</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A multi-view method of scientific paper classification via heterogeneous graph embeddings</atitle><jtitle>Scientometrics</jtitle><stitle>Scientometrics</stitle><date>2022-08-01</date><risdate>2022</risdate><volume>127</volume><issue>8</issue><spage>4847</spage><epage>4872</epage><pages>4847-4872</pages><issn>0138-9130</issn><eissn>1588-2861</eissn><abstract>The classification task of scientific papers can be implemented based on contents or citations. In order to improve the performance on this task, we express papers as nodes and integrate scientific papers’ contents and citations into a heterogeneous graph. It has two types of edges. One type represents the semantic similarity between papers, derived from papers’ titles and abstracts. The other type represents the citation relationship between papers and the journals or proceedings of conferences of their references. We utilize a contrastive learning method to embed the nodes in the heterogeneous graph into a vector space. Then, we feed the paper node vectors into classifiers, such as the decision tree, multilayer perceptron, and so on. We conduct experiments on three datasets of scientific papers: the Microsoft Academic Graph with 63,211 scientific papers in 20 classes, the Proceedings of the National Academy of Sciences with 38,243 scientific papers in 18 classes, and the American Physical Society with 443,845 scientific papers in 5 classes. The experimental results on the multi-class task show that our multi-view method scores the classification accuracy up to 98%, outperforming state-of-the-arts.</abstract><cop>Cham</cop><pub>Springer International Publishing</pub><doi>10.1007/s11192-022-04419-1</doi><tpages>26</tpages><orcidid>https://orcid.org/0000-0003-0391-8725</orcidid></addata></record>
fulltext	fulltext
identifier	ISSN: 0138-9130
ispartof	Scientometrics, 2022-08, Vol.127 (8), p.4847-4872
issn	0138-9130 1588-2861
language	eng
recordid	cdi_proquest_journals_2700752681
source	SpringerNature Journals
subjects	Classification Computer Science Decision trees Graph theory Information Storage and Retrieval Library Science Multilayer perceptrons Nodes Scientific papers Vector spaces
title	A multi-view method of scientific paper classification via heterogeneous graph embeddings
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-04T02%3A53%3A11IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20multi-view%20method%20of%20scientific%20paper%20classification%20via%20heterogeneous%20graph%20embeddings&rft.jtitle=Scientometrics&rft.au=Lv,%20Yiqin&rft.date=2022-08-01&rft.volume=127&rft.issue=8&rft.spage=4847&rft.epage=4872&rft.pages=4847-4872&rft.issn=0138-9130&rft.eissn=1588-2861&rft_id=info:doi/10.1007/s11192-022-04419-1&rft_dat=%3Cproquest_cross%3E2700752681%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2700752681&rft_id=info:pmid/&rfr_iscdi=true