Computational linguistics literature and citations oriented citation linkage, classification and summarization

Scientific literature is currently the most important resource for scholars, and their citations have provided researchers with a powerful latent way to analyze scientific trends, influences and relationships of works and authors. This paper is focused on automatic citation analysis and summarizatio...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:International journal on digital libraries 2018-09, Vol.19 (2-3), p.173-190
Hauptverfasser: Li, Lei, Mao, Liyuan, Zhang, Yazhao, Chi, Junqi, Huang, Taiwen, Cong, Xiaoyue, Peng, Heng
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 190
container_issue 2-3
container_start_page 173
container_title International journal on digital libraries
container_volume 19
creator Li, Lei
Mao, Liyuan
Zhang, Yazhao
Chi, Junqi
Huang, Taiwen
Cong, Xiaoyue
Peng, Heng
description Scientific literature is currently the most important resource for scholars, and their citations have provided researchers with a powerful latent way to analyze scientific trends, influences and relationships of works and authors. This paper is focused on automatic citation analysis and summarization for the scientific literature of computational linguistics, which are also the shared tasks in the 2016 workshop of the 2nd Computational Linguistics Scientific Document Summarization at BIRNDL 2016 (The Joint Workshop on Bibliometric-enhanced Information Retrieval and Natural Language Processing for Digital Libraries). Each citation linkage between a citation and the spans of text in the reference paper is recognized according to their content similarities via various computational methods. Then the cited text span is classified to five pre-defined facets, i.e., Hypothesis, Implication, Aim, Results and Method, based on various features of lexicons and rules via Support Vector Machine and Voting Method. Finally, a summary of the reference paper from the cited text spans is generated within 250 words. hLDA (hierarchical Latent Dirichlet Allocation) topic model is adopted for content modeling, which provides knowledge about sentence clustering (subtopic) and word distributions (abstractiveness) for summarization. We combine hLDA knowledge with several other classical features using different weights and proportions to evaluate the sentences in the reference paper. Our systems have been ranked top one and top two according to the evaluation results published by BIRNDL 2016, which has verified the effectiveness of our methods.
doi_str_mv 10.1007/s00799-017-0219-5
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2088229280</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2088229280</sourcerecordid><originalsourceid>FETCH-LOGICAL-c316t-638d9c3f8ee53a68de141eb6b1bd258c0a1b00fea81920d23621b90518d3459c3</originalsourceid><addsrcrecordid>eNp1kE9PwzAMxSMEEmPwAbhV4krBTpYuPaKJf9IkLnCO0jSdMrp2xOkBPj0pnbQTF9t6ej_LfoxdI9whwPKeUinLHHCZA8cylydshgvBcxQAp4dZAvJzdkG0BQBUuJyxbtXv9kM00fedabPWd5vBU_SW0hxdMHEILjNdnVk_uSjrg3dddEdpxD7Nxt1mtjVEvvF20keOht3OBP_zp1yys8a05K4Ofc4-nh7fVy_5-u35dfWwzq3AIuaFUHVpRaOck8IUqna4QFcVFVY1l8qCwQqgcUZhyaHmouBYlSBR1WIhEzlnN9Pefei_BkdRb_shpA9Jc1CK85IrSC6cXDb0RME1eh98OvZbI-gxVj3FqlOseoxVy8TwiaHk7TYuHDf_D_0COmN9Ag</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2088229280</pqid></control><display><type>article</type><title>Computational linguistics literature and citations oriented citation linkage, classification and summarization</title><source>SpringerLink Journals - AutoHoldings</source><creator>Li, Lei ; Mao, Liyuan ; Zhang, Yazhao ; Chi, Junqi ; Huang, Taiwen ; Cong, Xiaoyue ; Peng, Heng</creator><creatorcontrib>Li, Lei ; Mao, Liyuan ; Zhang, Yazhao ; Chi, Junqi ; Huang, Taiwen ; Cong, Xiaoyue ; Peng, Heng</creatorcontrib><description>Scientific literature is currently the most important resource for scholars, and their citations have provided researchers with a powerful latent way to analyze scientific trends, influences and relationships of works and authors. This paper is focused on automatic citation analysis and summarization for the scientific literature of computational linguistics, which are also the shared tasks in the 2016 workshop of the 2nd Computational Linguistics Scientific Document Summarization at BIRNDL 2016 (The Joint Workshop on Bibliometric-enhanced Information Retrieval and Natural Language Processing for Digital Libraries). Each citation linkage between a citation and the spans of text in the reference paper is recognized according to their content similarities via various computational methods. Then the cited text span is classified to five pre-defined facets, i.e., Hypothesis, Implication, Aim, Results and Method, based on various features of lexicons and rules via Support Vector Machine and Voting Method. Finally, a summary of the reference paper from the cited text spans is generated within 250 words. hLDA (hierarchical Latent Dirichlet Allocation) topic model is adopted for content modeling, which provides knowledge about sentence clustering (subtopic) and word distributions (abstractiveness) for summarization. We combine hLDA knowledge with several other classical features using different weights and proportions to evaluate the sentences in the reference paper. Our systems have been ranked top one and top two according to the evaluation results published by BIRNDL 2016, which has verified the effectiveness of our methods.</description><identifier>ISSN: 1432-5012</identifier><identifier>EISSN: 1432-1300</identifier><identifier>DOI: 10.1007/s00799-017-0219-5</identifier><language>eng</language><publisher>Berlin/Heidelberg: Springer Berlin Heidelberg</publisher><subject>Bibliometrics ; Citation analysis ; Clustering ; Computation ; Computer Science ; Database Management ; Dirichlet problem ; Information retrieval ; Information Systems and Communication Service ; Linguistics ; Natural language processing ; Sentences ; Support vector machines</subject><ispartof>International journal on digital libraries, 2018-09, Vol.19 (2-3), p.173-190</ispartof><rights>Springer-Verlag GmbH Germany 2017</rights><rights>International Journal on Digital Libraries is a copyright of Springer, (2017). All Rights Reserved.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c316t-638d9c3f8ee53a68de141eb6b1bd258c0a1b00fea81920d23621b90518d3459c3</citedby><cites>FETCH-LOGICAL-c316t-638d9c3f8ee53a68de141eb6b1bd258c0a1b00fea81920d23621b90518d3459c3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s00799-017-0219-5$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s00799-017-0219-5$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>314,780,784,27923,27924,41487,42556,51318</link.rule.ids></links><search><creatorcontrib>Li, Lei</creatorcontrib><creatorcontrib>Mao, Liyuan</creatorcontrib><creatorcontrib>Zhang, Yazhao</creatorcontrib><creatorcontrib>Chi, Junqi</creatorcontrib><creatorcontrib>Huang, Taiwen</creatorcontrib><creatorcontrib>Cong, Xiaoyue</creatorcontrib><creatorcontrib>Peng, Heng</creatorcontrib><title>Computational linguistics literature and citations oriented citation linkage, classification and summarization</title><title>International journal on digital libraries</title><addtitle>Int J Digit Libr</addtitle><description>Scientific literature is currently the most important resource for scholars, and their citations have provided researchers with a powerful latent way to analyze scientific trends, influences and relationships of works and authors. This paper is focused on automatic citation analysis and summarization for the scientific literature of computational linguistics, which are also the shared tasks in the 2016 workshop of the 2nd Computational Linguistics Scientific Document Summarization at BIRNDL 2016 (The Joint Workshop on Bibliometric-enhanced Information Retrieval and Natural Language Processing for Digital Libraries). Each citation linkage between a citation and the spans of text in the reference paper is recognized according to their content similarities via various computational methods. Then the cited text span is classified to five pre-defined facets, i.e., Hypothesis, Implication, Aim, Results and Method, based on various features of lexicons and rules via Support Vector Machine and Voting Method. Finally, a summary of the reference paper from the cited text spans is generated within 250 words. hLDA (hierarchical Latent Dirichlet Allocation) topic model is adopted for content modeling, which provides knowledge about sentence clustering (subtopic) and word distributions (abstractiveness) for summarization. We combine hLDA knowledge with several other classical features using different weights and proportions to evaluate the sentences in the reference paper. Our systems have been ranked top one and top two according to the evaluation results published by BIRNDL 2016, which has verified the effectiveness of our methods.</description><subject>Bibliometrics</subject><subject>Citation analysis</subject><subject>Clustering</subject><subject>Computation</subject><subject>Computer Science</subject><subject>Database Management</subject><subject>Dirichlet problem</subject><subject>Information retrieval</subject><subject>Information Systems and Communication Service</subject><subject>Linguistics</subject><subject>Natural language processing</subject><subject>Sentences</subject><subject>Support vector machines</subject><issn>1432-5012</issn><issn>1432-1300</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2018</creationdate><recordtype>article</recordtype><sourceid>8G5</sourceid><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GNUQQ</sourceid><sourceid>GUQSH</sourceid><sourceid>M2O</sourceid><recordid>eNp1kE9PwzAMxSMEEmPwAbhV4krBTpYuPaKJf9IkLnCO0jSdMrp2xOkBPj0pnbQTF9t6ej_LfoxdI9whwPKeUinLHHCZA8cylydshgvBcxQAp4dZAvJzdkG0BQBUuJyxbtXv9kM00fedabPWd5vBU_SW0hxdMHEILjNdnVk_uSjrg3dddEdpxD7Nxt1mtjVEvvF20keOht3OBP_zp1yys8a05K4Ofc4-nh7fVy_5-u35dfWwzq3AIuaFUHVpRaOck8IUqna4QFcVFVY1l8qCwQqgcUZhyaHmouBYlSBR1WIhEzlnN9Pefei_BkdRb_shpA9Jc1CK85IrSC6cXDb0RME1eh98OvZbI-gxVj3FqlOseoxVy8TwiaHk7TYuHDf_D_0COmN9Ag</recordid><startdate>20180901</startdate><enddate>20180901</enddate><creator>Li, Lei</creator><creator>Mao, Liyuan</creator><creator>Zhang, Yazhao</creator><creator>Chi, Junqi</creator><creator>Huang, Taiwen</creator><creator>Cong, Xiaoyue</creator><creator>Peng, Heng</creator><general>Springer Berlin Heidelberg</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>3V.</scope><scope>7SC</scope><scope>7XB</scope><scope>88I</scope><scope>8AL</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FK</scope><scope>8G5</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ALSLI</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>CNYFK</scope><scope>DWQXO</scope><scope>GNUQQ</scope><scope>GUQSH</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K7-</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>M0N</scope><scope>M1O</scope><scope>M2O</scope><scope>M2P</scope><scope>MBDVC</scope><scope>P5Z</scope><scope>P62</scope><scope>PADUT</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>Q9U</scope></search><sort><creationdate>20180901</creationdate><title>Computational linguistics literature and citations oriented citation linkage, classification and summarization</title><author>Li, Lei ; Mao, Liyuan ; Zhang, Yazhao ; Chi, Junqi ; Huang, Taiwen ; Cong, Xiaoyue ; Peng, Heng</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c316t-638d9c3f8ee53a68de141eb6b1bd258c0a1b00fea81920d23621b90518d3459c3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2018</creationdate><topic>Bibliometrics</topic><topic>Citation analysis</topic><topic>Clustering</topic><topic>Computation</topic><topic>Computer Science</topic><topic>Database Management</topic><topic>Dirichlet problem</topic><topic>Information retrieval</topic><topic>Information Systems and Communication Service</topic><topic>Linguistics</topic><topic>Natural language processing</topic><topic>Sentences</topic><topic>Support vector machines</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Li, Lei</creatorcontrib><creatorcontrib>Mao, Liyuan</creatorcontrib><creatorcontrib>Zhang, Yazhao</creatorcontrib><creatorcontrib>Chi, Junqi</creatorcontrib><creatorcontrib>Huang, Taiwen</creatorcontrib><creatorcontrib>Cong, Xiaoyue</creatorcontrib><creatorcontrib>Peng, Heng</creatorcontrib><collection>CrossRef</collection><collection>ProQuest Central (Corporate)</collection><collection>Computer and Information Systems Abstracts</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Science Database (Alumni Edition)</collection><collection>Computing Database (Alumni Edition)</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>Research Library (Alumni Edition)</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>Social Science Premium Collection</collection><collection>Advanced Technologies &amp; Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>Library &amp; Information Science Collection</collection><collection>ProQuest Central Korea</collection><collection>ProQuest Central Student</collection><collection>Research Library Prep</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>Computer Science Database</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Computing Database</collection><collection>Library Science Database</collection><collection>Research Library</collection><collection>Science Database</collection><collection>Research Library (Corporate)</collection><collection>Advanced Technologies &amp; Aerospace Database</collection><collection>ProQuest Advanced Technologies &amp; Aerospace Collection</collection><collection>Research Library China</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>ProQuest Central Basic</collection><jtitle>International journal on digital libraries</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Li, Lei</au><au>Mao, Liyuan</au><au>Zhang, Yazhao</au><au>Chi, Junqi</au><au>Huang, Taiwen</au><au>Cong, Xiaoyue</au><au>Peng, Heng</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Computational linguistics literature and citations oriented citation linkage, classification and summarization</atitle><jtitle>International journal on digital libraries</jtitle><stitle>Int J Digit Libr</stitle><date>2018-09-01</date><risdate>2018</risdate><volume>19</volume><issue>2-3</issue><spage>173</spage><epage>190</epage><pages>173-190</pages><issn>1432-5012</issn><eissn>1432-1300</eissn><abstract>Scientific literature is currently the most important resource for scholars, and their citations have provided researchers with a powerful latent way to analyze scientific trends, influences and relationships of works and authors. This paper is focused on automatic citation analysis and summarization for the scientific literature of computational linguistics, which are also the shared tasks in the 2016 workshop of the 2nd Computational Linguistics Scientific Document Summarization at BIRNDL 2016 (The Joint Workshop on Bibliometric-enhanced Information Retrieval and Natural Language Processing for Digital Libraries). Each citation linkage between a citation and the spans of text in the reference paper is recognized according to their content similarities via various computational methods. Then the cited text span is classified to five pre-defined facets, i.e., Hypothesis, Implication, Aim, Results and Method, based on various features of lexicons and rules via Support Vector Machine and Voting Method. Finally, a summary of the reference paper from the cited text spans is generated within 250 words. hLDA (hierarchical Latent Dirichlet Allocation) topic model is adopted for content modeling, which provides knowledge about sentence clustering (subtopic) and word distributions (abstractiveness) for summarization. We combine hLDA knowledge with several other classical features using different weights and proportions to evaluate the sentences in the reference paper. Our systems have been ranked top one and top two according to the evaluation results published by BIRNDL 2016, which has verified the effectiveness of our methods.</abstract><cop>Berlin/Heidelberg</cop><pub>Springer Berlin Heidelberg</pub><doi>10.1007/s00799-017-0219-5</doi><tpages>18</tpages></addata></record>
fulltext fulltext
identifier ISSN: 1432-5012
ispartof International journal on digital libraries, 2018-09, Vol.19 (2-3), p.173-190
issn 1432-5012
1432-1300
language eng
recordid cdi_proquest_journals_2088229280
source SpringerLink Journals - AutoHoldings
subjects Bibliometrics
Citation analysis
Clustering
Computation
Computer Science
Database Management
Dirichlet problem
Information retrieval
Information Systems and Communication Service
Linguistics
Natural language processing
Sentences
Support vector machines
title Computational linguistics literature and citations oriented citation linkage, classification and summarization
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-08T08%3A43%3A45IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Computational%20linguistics%20literature%20and%20citations%20oriented%20citation%20linkage,%20classification%20and%20summarization&rft.jtitle=International%20journal%20on%20digital%20libraries&rft.au=Li,%20Lei&rft.date=2018-09-01&rft.volume=19&rft.issue=2-3&rft.spage=173&rft.epage=190&rft.pages=173-190&rft.issn=1432-5012&rft.eissn=1432-1300&rft_id=info:doi/10.1007/s00799-017-0219-5&rft_dat=%3Cproquest_cross%3E2088229280%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2088229280&rft_id=info:pmid/&rfr_iscdi=true