Computational linguistics literature and citations oriented citation linkage, classification and summarization

Scientific literature is currently the most important resource for scholars, and their citations have provided researchers with a powerful latent way to analyze scientific trends, influences and relationships of works and authors. This paper is focused on automatic citation analysis and summarizatio...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	International journal on digital libraries 2018-09, Vol.19 (2-3), p.173-190
Hauptverfasser:	Li, Lei, Mao, Liyuan, Zhang, Yazhao, Chi, Junqi, Huang, Taiwen, Cong, Xiaoyue, Peng, Heng
Format:	Artikel
Sprache:	eng
Schlagworte:	Bibliometrics Citation analysis Clustering Computation Computer Science Database Management Dirichlet problem Information retrieval Information Systems and Communication Service Linguistics Natural language processing Sentences Support vector machines
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	190
container_issue	2-3
container_start_page	173
container_title	International journal on digital libraries
container_volume	19
creator	Li, Lei Mao, Liyuan Zhang, Yazhao Chi, Junqi Huang, Taiwen Cong, Xiaoyue Peng, Heng
description	Scientific literature is currently the most important resource for scholars, and their citations have provided researchers with a powerful latent way to analyze scientific trends, influences and relationships of works and authors. This paper is focused on automatic citation analysis and summarization for the scientific literature of computational linguistics, which are also the shared tasks in the 2016 workshop of the 2nd Computational Linguistics Scientific Document Summarization at BIRNDL 2016 (The Joint Workshop on Bibliometric-enhanced Information Retrieval and Natural Language Processing for Digital Libraries). Each citation linkage between a citation and the spans of text in the reference paper is recognized according to their content similarities via various computational methods. Then the cited text span is classified to five pre-defined facets, i.e., Hypothesis, Implication, Aim, Results and Method, based on various features of lexicons and rules via Support Vector Machine and Voting Method. Finally, a summary of the reference paper from the cited text spans is generated within 250 words. hLDA (hierarchical Latent Dirichlet Allocation) topic model is adopted for content modeling, which provides knowledge about sentence clustering (subtopic) and word distributions (abstractiveness) for summarization. We combine hLDA knowledge with several other classical features using different weights and proportions to evaluate the sentences in the reference paper. Our systems have been ranked top one and top two according to the evaluation results published by BIRNDL 2016, which has verified the effectiveness of our methods.
doi_str_mv	10.1007/s00799-017-0219-5
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2088229280</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2088229280</sourcerecordid><originalsourceid>FETCH-LOGICAL-c316t-638d9c3f8ee53a68de141eb6b1bd258c0a1b00fea81920d23621b90518d3459c3</originalsourceid><addsrcrecordid>eNp1kE9PwzAMxSMEEmPwAbhV4krBTpYuPaKJf9IkLnCO0jSdMrp2xOkBPj0pnbQTF9t6ej_LfoxdI9whwPKeUinLHHCZA8cylydshgvBcxQAp4dZAvJzdkG0BQBUuJyxbtXv9kM00fedabPWd5vBU_SW0hxdMHEILjNdnVk_uSjrg3dddEdpxD7Nxt1mtjVEvvF20keOht3OBP_zp1yys8a05K4Ofc4-nh7fVy_5-u35dfWwzq3AIuaFUHVpRaOck8IUqna4QFcVFVY1l8qCwQqgcUZhyaHmouBYlSBR1WIhEzlnN9Pefei_BkdRb_shpA9Jc1CK85IrSC6cXDb0RME1eh98OvZbI-gxVj3FqlOseoxVy8TwiaHk7TYuHDf_D_0COmN9Ag</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2088229280</pqid></control><display><type>article</type><title>Computational linguistics literature and citations oriented citation linkage, classification and summarization</title><source>SpringerLink Journals - AutoHoldings</source><creator>Li, Lei ; Mao, Liyuan ; Zhang, Yazhao ; Chi, Junqi ; Huang, Taiwen ; Cong, Xiaoyue ; Peng, Heng</creator><creatorcontrib>Li, Lei ; Mao, Liyuan ; Zhang, Yazhao ; Chi, Junqi ; Huang, Taiwen ; Cong, Xiaoyue ; Peng, Heng</creatorcontrib><description>Scientific literature is currently the most important resource for scholars, and their citations have provided researchers with a powerful latent way to analyze scientific trends, influences and relationships of works and authors. This paper is focused on automatic citation analysis and summarization for the scientific literature of computational linguistics, which are also the shared tasks in the 2016 workshop of the 2nd Computational Linguistics Scientific Document Summarization at BIRNDL 2016 (The Joint Workshop on Bibliometric-enhanced Information Retrieval and Natural Language Processing for Digital Libraries). Each citation linkage between a citation and the spans of text in the reference paper is recognized according to their content similarities via various computational methods. Then the cited text span is classified to five pre-defined facets, i.e., Hypothesis, Implication, Aim, Results and Method, based on various features of lexicons and rules via Support Vector Machine and Voting Method. Finally, a summary of the reference paper from the cited text spans is generated within 250 words. hLDA (hierarchical Latent Dirichlet Allocation) topic model is adopted for content modeling, which provides knowledge about sentence clustering (subtopic) and word distributions (abstractiveness) for summarization. We combine hLDA knowledge with several other classical features using different weights and proportions to evaluate the sentences in the reference paper. Our systems have been ranked top one and top two according to the evaluation results published by BIRNDL 2016, which has verified the effectiveness of our methods.</description><identifier>ISSN: 1432-5012</identifier><identifier>EISSN: 1432-1300</identifier><identifier>DOI: 10.1007/s00799-017-0219-5</identifier><language>eng</language><publisher>Berlin/Heidelberg: Springer Berlin Heidelberg</publisher><subject>Bibliometrics ; Citation analysis ; Clustering ; Computation ; Computer Science ; Database Management ; Dirichlet problem ; Information retrieval ; Information Systems and Communication Service ; Linguistics ; Natural language processing ; Sentences ; Support vector machines</subject><ispartof>International journal on digital libraries, 2018-09, Vol.19 (2-3), p.173-190</ispartof><rights>Springer-Verlag GmbH Germany 2017</rights><rights>International Journal on Digital Libraries is a copyright of Springer, (2017). All Rights Reserved.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c316t-638d9c3f8ee53a68de141eb6b1bd258c0a1b00fea81920d23621b90518d3459c3</citedby><cites>FETCH-LOGICAL-c316t-638d9c3f8ee53a68de141eb6b1bd258c0a1b00fea81920d23621b90518d3459c3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s00799-017-0219-5$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s00799-017-0219-5$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>314,780,784,27923,27924,41487,42556,51318</link.rule.ids></links><search><creatorcontrib>Li, Lei</creatorcontrib><creatorcontrib>Mao, Liyuan</creatorcontrib><creatorcontrib>Zhang, Yazhao</creatorcontrib><creatorcontrib>Chi, Junqi</creatorcontrib><creatorcontrib>Huang, Taiwen</creatorcontrib><creatorcontrib>Cong, Xiaoyue</creatorcontrib><creatorcontrib>Peng, Heng</creatorcontrib><title>Computational linguistics literature and citations oriented citation linkage, classification and summarization</title><title>International journal on digital libraries</title><addtitle>Int J Digit Libr</addtitle><description>Scientific literature is currently the most important resource for scholars, and their citations have provided researchers with a powerful latent way to analyze scientific trends, influences and relationships of works and authors. This paper is focused on automatic citation analysis and summarization for the scientific literature of computational linguistics, which are also the shared tasks in the 2016 workshop of the 2nd Computational Linguistics Scientific Document Summarization at BIRNDL 2016 (The Joint Workshop on Bibliometric-enhanced Information Retrieval and Natural Language Processing for Digital Libraries). Each citation linkage between a citation and the spans of text in the reference paper is recognized according to their content similarities via various computational methods. Then the cited text span is classified to five pre-defined facets, i.e., Hypothesis, Implication, Aim, Results and Method, based on various features of lexicons and rules via Support Vector Machine and Voting Method. Finally, a summary of the reference paper from the cited text spans is generated within 250 words. hLDA (hierarchical Latent Dirichlet Allocation) topic model is adopted for content modeling, which provides knowledge about sentence clustering (subtopic) and word distributions (abstractiveness) for summarization. We combine hLDA knowledge with several other classical features using different weights and proportions to evaluate the sentences in the reference paper. Our systems have been ranked top one and top two according to the evaluation results published by BIRNDL 2016, which has verified the effectiveness of our methods.</description><subject>Bibliometrics</subject><subject>Citation analysis</subject><subject>Clustering</subject><subject>Computation</subject><subject>Computer Science</subject><subject>Database Management</subject><subject>Dirichlet problem</subject><subject>Information retrieval</subject><subject>Information Systems and Communication Service</subject><subject>Linguistics</subject><subject>Natural language processing</subject><subject>Sentences</subject><subject>Support vector machines</subject><issn>1432-5012</issn><issn>1432-1300</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2018</creationdate><recordtype>article</recordtype><sourceid>8G5</sourceid><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GNUQQ</sourceid><sourceid>GUQSH</sourceid><sourceid>M2O</sourceid><recordid>eNp1kE9PwzAMxSMEEmPwAbhV4krBTpYuPaKJf9IkLnCO0jSdMrp2xOkBPj0pnbQTF9t6ej_LfoxdI9whwPKeUinLHHCZA8cylydshgvBcxQAp4dZAvJzdkG0BQBUuJyxbtXv9kM00fedabPWd5vBU_SW0hxdMHEILjNdnVk_uSjrg3dddEdpxD7Nxt1mtjVEvvF20keOht3OBP_zp1yys8a05K4Ofc4-nh7fVy_5-u35dfWwzq3AIuaFUHVpRaOck8IUqna4QFcVFVY1l8qCwQqgcUZhyaHmouBYlSBR1WIhEzlnN9Pefei_BkdRb_shpA9Jc1CK85IrSC6cXDb0RME1eh98OvZbI-gxVj3FqlOseoxVy8TwiaHk7TYuHDf_D_0COmN9Ag</recordid><startdate>20180901</startdate><enddate>20180901</enddate><creator>Li, Lei</creator><creator>Mao, Liyuan</creator><creator>Zhang, Yazhao</creator><creator>Chi, Junqi</creator><creator>Huang, Taiwen</creator><creator>Cong, Xiaoyue</creator><creator>Peng, Heng</creator><general>Springer Berlin Heidelberg</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>3V.</scope><scope>7SC</scope><scope>7XB</scope><scope>88I</scope><scope>8AL</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FK</scope><scope>8G5</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ALSLI</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>CNYFK</scope><scope>DWQXO</scope><scope>GNUQQ</scope><scope>GUQSH</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K7-</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>M0N</scope><scope>M1O</scope><scope>M2O</scope><scope>M2P</scope><scope>MBDVC</scope><scope>P5Z</scope><scope>P62</scope><scope>PADUT</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>Q9U</scope></search><sort><creationdate>20180901</creationdate><title>Computational linguistics literature and citations oriented citation linkage, classification and summarization</title><author>Li, Lei ; Mao, Liyuan ; Zhang, Yazhao ; Chi, Junqi ; Huang, Taiwen ; Cong, Xiaoyue ; Peng, Heng</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c316t-638d9c3f8ee53a68de141eb6b1bd258c0a1b00fea81920d23621b90518d3459c3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2018</creationdate><topic>Bibliometrics</topic><topic>Citation analysis</topic><topic>Clustering</topic><topic>Computation</topic><topic>Computer Science</topic><topic>Database Management</topic><topic>Dirichlet problem</topic><topic>Information retrieval</topic><topic>Information Systems and Communication Service</topic><topic>Linguistics</topic><topic>Natural language processing</topic><topic>Sentences</topic><topic>Support vector machines</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Li, Lei</creatorcontrib><creatorcontrib>Mao, Liyuan</creatorcontrib><creatorcontrib>Zhang, Yazhao</creatorcontrib><creatorcontrib>Chi, Junqi</creatorcontrib><creatorcontrib>Huang, Taiwen</creatorcontrib><creatorcontrib>Cong, Xiaoyue</creatorcontrib><creatorcontrib>Peng, Heng</creatorcontrib><collection>CrossRef</collection><collection>ProQuest Central (Corporate)</collection><collection>Computer and Information Systems Abstracts</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Science Database (Alumni Edition)</collection><collection>Computing Database (Alumni Edition)</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>Research Library (Alumni Edition)</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>Social Science Premium Collection</collection><collection>Advanced Technologies & Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>Library & Information Science Collection</collection><collection>ProQuest Central Korea</collection><collection>ProQuest Central Student</collection><collection>Research Library Prep</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>Computer Science Database</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Computing Database</collection><collection>Library Science Database</collection><collection>Research Library</collection><collection>Science Database</collection><collection>Research Library (Corporate)</collection><collection>Advanced Technologies & Aerospace Database</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>Research Library China</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>ProQuest Central Basic</collection><jtitle>International journal on digital libraries</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Li, Lei</au><au>Mao, Liyuan</au><au>Zhang, Yazhao</au><au>Chi, Junqi</au><au>Huang, Taiwen</au><au>Cong, Xiaoyue</au><au>Peng, Heng</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Computational linguistics literature and citations oriented citation linkage, classification and summarization</atitle><jtitle>International journal on digital libraries</jtitle><stitle>Int J Digit Libr</stitle><date>2018-09-01</date><risdate>2018</risdate><volume>19</volume><issue>2-3</issue><spage>173</spage><epage>190</epage><pages>173-190</pages><issn>1432-5012</issn><eissn>1432-1300</eissn><abstract>Scientific literature is currently the most important resource for scholars, and their citations have provided researchers with a powerful latent way to analyze scientific trends, influences and relationships of works and authors. This paper is focused on automatic citation analysis and summarization for the scientific literature of computational linguistics, which are also the shared tasks in the 2016 workshop of the 2nd Computational Linguistics Scientific Document Summarization at BIRNDL 2016 (The Joint Workshop on Bibliometric-enhanced Information Retrieval and Natural Language Processing for Digital Libraries). Each citation linkage between a citation and the spans of text in the reference paper is recognized according to their content similarities via various computational methods. Then the cited text span is classified to five pre-defined facets, i.e., Hypothesis, Implication, Aim, Results and Method, based on various features of lexicons and rules via Support Vector Machine and Voting Method. Finally, a summary of the reference paper from the cited text spans is generated within 250 words. hLDA (hierarchical Latent Dirichlet Allocation) topic model is adopted for content modeling, which provides knowledge about sentence clustering (subtopic) and word distributions (abstractiveness) for summarization. We combine hLDA knowledge with several other classical features using different weights and proportions to evaluate the sentences in the reference paper. Our systems have been ranked top one and top two according to the evaluation results published by BIRNDL 2016, which has verified the effectiveness of our methods.</abstract><cop>Berlin/Heidelberg</cop><pub>Springer Berlin Heidelberg</pub><doi>10.1007/s00799-017-0219-5</doi><tpages>18</tpages></addata></record>
fulltext	fulltext
identifier	ISSN: 1432-5012
ispartof	International journal on digital libraries, 2018-09, Vol.19 (2-3), p.173-190
issn	1432-5012 1432-1300
language	eng
recordid	cdi_proquest_journals_2088229280
source	SpringerLink Journals - AutoHoldings
subjects	Bibliometrics Citation analysis Clustering Computation Computer Science Database Management Dirichlet problem Information retrieval Information Systems and Communication Service Linguistics Natural language processing Sentences Support vector machines
title	Computational linguistics literature and citations oriented citation linkage, classification and summarization
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-08T08%3A43%3A45IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Computational%20linguistics%20literature%20and%20citations%20oriented%20citation%20linkage,%20classification%20and%20summarization&rft.jtitle=International%20journal%20on%20digital%20libraries&rft.au=Li,%20Lei&rft.date=2018-09-01&rft.volume=19&rft.issue=2-3&rft.spage=173&rft.epage=190&rft.pages=173-190&rft.issn=1432-5012&rft.eissn=1432-1300&rft_id=info:doi/10.1007/s00799-017-0219-5&rft_dat=%3Cproquest_cross%3E2088229280%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2088229280&rft_id=info:pmid/&rfr_iscdi=true