Predicting the impact of scientific concepts using full-text features
New scientific concepts, interpreted broadly, are continuously introduced in the literature, but relatively few concepts have a long‐term impact on society. The identification of such concepts is a challenging prediction task that would help multiple parties—including researchers and the general pub...
Gespeichert in:
Veröffentlicht in: | Journal of the Association for Information Science and Technology 2016-11, Vol.67 (11), p.2684-2696 |
---|---|
Hauptverfasser: | , , , , , , , , , , , , , , , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 2696 |
---|---|
container_issue | 11 |
container_start_page | 2684 |
container_title | Journal of the Association for Information Science and Technology |
container_volume | 67 |
creator | McKeown, Kathy Daume III, Hal Chaturvedi, Snigdha Paparrizos, John Thadani, Kapil Barrio, Pablo Biran, Or Bothe, Suvarna Collins, Michael Fleischmann, Kenneth R. Gravano, Luis Jha, Rahul King, Ben McInerney, Kevin Moon, Taesun Neelakantan, Arvind O'Seaghdha, Diarmuid Radev, Dragomir Templeton, Clay Teufel, Simone |
description | New scientific concepts, interpreted broadly, are continuously introduced in the literature, but relatively few concepts have a long‐term impact on society. The identification of such concepts is a challenging prediction task that would help multiple parties—including researchers and the general public—focus their attention within the vast scientific literature. In this paper we present a system that predicts the future impact of a scientific concept, represented as a technical term, based on the information available from recently published research articles. We analyze the usefulness of rich features derived from the full text of the articles through a variety of approaches, including rhetorical sentence analysis, information extraction, and time‐series analysis. The results from two large‐scale experiments with 3.8 million full‐text articles and 48 million metadata records support the conclusion that full‐text features are significantly more useful for prediction than metadata‐only features and that the most accurate predictions result from combining the metadata and full‐text features. Surprisingly, these results hold even when the metadata features are available for a much larger number of documents than are available for the full‐text features. |
doi_str_mv | 10.1002/asi.23612 |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_1845809376</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1845809376</sourcerecordid><originalsourceid>FETCH-LOGICAL-c4062-ca74f60576ca16bf44f770439e1107c71b45d58169b6d2045684026c6dedcc6c3</originalsourceid><addsrcrecordid>eNp1kE9PwjAYhxujiQQ5-A121MOg_7sdCSJiEE3EcGxK12p1bLPtonx7h6g3T-97eJ7f4QHgHMEhghCPVHBDTDjCR6CHCYEp4pQc__2EnYJBCK8QQgTzjGHUA9MHbwqno6uek_hiErdtlI5JbZOgnamis04nuq60aWJI2rDnbFuWaTSfMbFGxdabcAZOrCqDGfzcPni6nq4mN-nifjafjBepppDjVCtBLYdMcK0Q31hKrRCQktwgBIUWaENZwTLE8w0vMKSMZxRirnlhCq25Jn1wcdhtfP3emhDl1gVtylJVpm6DRBllGcyJ4B16eUC1r0PwxsrGu63yO4mg3NeSXS35XatjRwf2w5Vm9z8ox4_zXyM9GC50If4M5d8kF0QwuV7O5ORqtVzckju5Jl8w33m0</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1845809376</pqid></control><display><type>article</type><title>Predicting the impact of scientific concepts using full-text features</title><source>Wiley Online Library Journals Frontfile Complete</source><source>Business Source Complete</source><creator>McKeown, Kathy ; Daume III, Hal ; Chaturvedi, Snigdha ; Paparrizos, John ; Thadani, Kapil ; Barrio, Pablo ; Biran, Or ; Bothe, Suvarna ; Collins, Michael ; Fleischmann, Kenneth R. ; Gravano, Luis ; Jha, Rahul ; King, Ben ; McInerney, Kevin ; Moon, Taesun ; Neelakantan, Arvind ; O'Seaghdha, Diarmuid ; Radev, Dragomir ; Templeton, Clay ; Teufel, Simone</creator><creatorcontrib>McKeown, Kathy ; Daume III, Hal ; Chaturvedi, Snigdha ; Paparrizos, John ; Thadani, Kapil ; Barrio, Pablo ; Biran, Or ; Bothe, Suvarna ; Collins, Michael ; Fleischmann, Kenneth R. ; Gravano, Luis ; Jha, Rahul ; King, Ben ; McInerney, Kevin ; Moon, Taesun ; Neelakantan, Arvind ; O'Seaghdha, Diarmuid ; Radev, Dragomir ; Templeton, Clay ; Teufel, Simone</creatorcontrib><description>New scientific concepts, interpreted broadly, are continuously introduced in the literature, but relatively few concepts have a long‐term impact on society. The identification of such concepts is a challenging prediction task that would help multiple parties—including researchers and the general public—focus their attention within the vast scientific literature. In this paper we present a system that predicts the future impact of a scientific concept, represented as a technical term, based on the information available from recently published research articles. We analyze the usefulness of rich features derived from the full text of the articles through a variety of approaches, including rhetorical sentence analysis, information extraction, and time‐series analysis. The results from two large‐scale experiments with 3.8 million full‐text articles and 48 million metadata records support the conclusion that full‐text features are significantly more useful for prediction than metadata‐only features and that the most accurate predictions result from combining the metadata and full‐text features. Surprisingly, these results hold even when the metadata features are available for a much larger number of documents than are available for the full‐text features.</description><identifier>ISSN: 2330-1635</identifier><identifier>EISSN: 2330-1643</identifier><identifier>DOI: 10.1002/asi.23612</identifier><language>eng</language><publisher>Blackwell Publishing Ltd</publisher><subject>Information retrieval ; machine learning ; Metadata ; natural language processing ; scientometrics ; Sentences ; Tasks ; Texts</subject><ispartof>Journal of the Association for Information Science and Technology, 2016-11, Vol.67 (11), p.2684-2696</ispartof><rights>2016 ASIS&T</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c4062-ca74f60576ca16bf44f770439e1107c71b45d58169b6d2045684026c6dedcc6c3</citedby><cites>FETCH-LOGICAL-c4062-ca74f60576ca16bf44f770439e1107c71b45d58169b6d2045684026c6dedcc6c3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://onlinelibrary.wiley.com/doi/pdf/10.1002%2Fasi.23612$$EPDF$$P50$$Gwiley$$H</linktopdf><linktohtml>$$Uhttps://onlinelibrary.wiley.com/doi/full/10.1002%2Fasi.23612$$EHTML$$P50$$Gwiley$$H</linktohtml><link.rule.ids>314,777,781,1412,27905,27906,45555,45556</link.rule.ids></links><search><creatorcontrib>McKeown, Kathy</creatorcontrib><creatorcontrib>Daume III, Hal</creatorcontrib><creatorcontrib>Chaturvedi, Snigdha</creatorcontrib><creatorcontrib>Paparrizos, John</creatorcontrib><creatorcontrib>Thadani, Kapil</creatorcontrib><creatorcontrib>Barrio, Pablo</creatorcontrib><creatorcontrib>Biran, Or</creatorcontrib><creatorcontrib>Bothe, Suvarna</creatorcontrib><creatorcontrib>Collins, Michael</creatorcontrib><creatorcontrib>Fleischmann, Kenneth R.</creatorcontrib><creatorcontrib>Gravano, Luis</creatorcontrib><creatorcontrib>Jha, Rahul</creatorcontrib><creatorcontrib>King, Ben</creatorcontrib><creatorcontrib>McInerney, Kevin</creatorcontrib><creatorcontrib>Moon, Taesun</creatorcontrib><creatorcontrib>Neelakantan, Arvind</creatorcontrib><creatorcontrib>O'Seaghdha, Diarmuid</creatorcontrib><creatorcontrib>Radev, Dragomir</creatorcontrib><creatorcontrib>Templeton, Clay</creatorcontrib><creatorcontrib>Teufel, Simone</creatorcontrib><title>Predicting the impact of scientific concepts using full-text features</title><title>Journal of the Association for Information Science and Technology</title><addtitle>J Assn Inf Sci Tec</addtitle><description>New scientific concepts, interpreted broadly, are continuously introduced in the literature, but relatively few concepts have a long‐term impact on society. The identification of such concepts is a challenging prediction task that would help multiple parties—including researchers and the general public—focus their attention within the vast scientific literature. In this paper we present a system that predicts the future impact of a scientific concept, represented as a technical term, based on the information available from recently published research articles. We analyze the usefulness of rich features derived from the full text of the articles through a variety of approaches, including rhetorical sentence analysis, information extraction, and time‐series analysis. The results from two large‐scale experiments with 3.8 million full‐text articles and 48 million metadata records support the conclusion that full‐text features are significantly more useful for prediction than metadata‐only features and that the most accurate predictions result from combining the metadata and full‐text features. Surprisingly, these results hold even when the metadata features are available for a much larger number of documents than are available for the full‐text features.</description><subject>Information retrieval</subject><subject>machine learning</subject><subject>Metadata</subject><subject>natural language processing</subject><subject>scientometrics</subject><subject>Sentences</subject><subject>Tasks</subject><subject>Texts</subject><issn>2330-1635</issn><issn>2330-1643</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2016</creationdate><recordtype>article</recordtype><recordid>eNp1kE9PwjAYhxujiQQ5-A121MOg_7sdCSJiEE3EcGxK12p1bLPtonx7h6g3T-97eJ7f4QHgHMEhghCPVHBDTDjCR6CHCYEp4pQc__2EnYJBCK8QQgTzjGHUA9MHbwqno6uek_hiErdtlI5JbZOgnamis04nuq60aWJI2rDnbFuWaTSfMbFGxdabcAZOrCqDGfzcPni6nq4mN-nifjafjBepppDjVCtBLYdMcK0Q31hKrRCQktwgBIUWaENZwTLE8w0vMKSMZxRirnlhCq25Jn1wcdhtfP3emhDl1gVtylJVpm6DRBllGcyJ4B16eUC1r0PwxsrGu63yO4mg3NeSXS35XatjRwf2w5Vm9z8ox4_zXyM9GC50If4M5d8kF0QwuV7O5ORqtVzckju5Jl8w33m0</recordid><startdate>201611</startdate><enddate>201611</enddate><creator>McKeown, Kathy</creator><creator>Daume III, Hal</creator><creator>Chaturvedi, Snigdha</creator><creator>Paparrizos, John</creator><creator>Thadani, Kapil</creator><creator>Barrio, Pablo</creator><creator>Biran, Or</creator><creator>Bothe, Suvarna</creator><creator>Collins, Michael</creator><creator>Fleischmann, Kenneth R.</creator><creator>Gravano, Luis</creator><creator>Jha, Rahul</creator><creator>King, Ben</creator><creator>McInerney, Kevin</creator><creator>Moon, Taesun</creator><creator>Neelakantan, Arvind</creator><creator>O'Seaghdha, Diarmuid</creator><creator>Radev, Dragomir</creator><creator>Templeton, Clay</creator><creator>Teufel, Simone</creator><general>Blackwell Publishing Ltd</general><scope>BSCLL</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>201611</creationdate><title>Predicting the impact of scientific concepts using full-text features</title><author>McKeown, Kathy ; Daume III, Hal ; Chaturvedi, Snigdha ; Paparrizos, John ; Thadani, Kapil ; Barrio, Pablo ; Biran, Or ; Bothe, Suvarna ; Collins, Michael ; Fleischmann, Kenneth R. ; Gravano, Luis ; Jha, Rahul ; King, Ben ; McInerney, Kevin ; Moon, Taesun ; Neelakantan, Arvind ; O'Seaghdha, Diarmuid ; Radev, Dragomir ; Templeton, Clay ; Teufel, Simone</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c4062-ca74f60576ca16bf44f770439e1107c71b45d58169b6d2045684026c6dedcc6c3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2016</creationdate><topic>Information retrieval</topic><topic>machine learning</topic><topic>Metadata</topic><topic>natural language processing</topic><topic>scientometrics</topic><topic>Sentences</topic><topic>Tasks</topic><topic>Texts</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>McKeown, Kathy</creatorcontrib><creatorcontrib>Daume III, Hal</creatorcontrib><creatorcontrib>Chaturvedi, Snigdha</creatorcontrib><creatorcontrib>Paparrizos, John</creatorcontrib><creatorcontrib>Thadani, Kapil</creatorcontrib><creatorcontrib>Barrio, Pablo</creatorcontrib><creatorcontrib>Biran, Or</creatorcontrib><creatorcontrib>Bothe, Suvarna</creatorcontrib><creatorcontrib>Collins, Michael</creatorcontrib><creatorcontrib>Fleischmann, Kenneth R.</creatorcontrib><creatorcontrib>Gravano, Luis</creatorcontrib><creatorcontrib>Jha, Rahul</creatorcontrib><creatorcontrib>King, Ben</creatorcontrib><creatorcontrib>McInerney, Kevin</creatorcontrib><creatorcontrib>Moon, Taesun</creatorcontrib><creatorcontrib>Neelakantan, Arvind</creatorcontrib><creatorcontrib>O'Seaghdha, Diarmuid</creatorcontrib><creatorcontrib>Radev, Dragomir</creatorcontrib><creatorcontrib>Templeton, Clay</creatorcontrib><creatorcontrib>Teufel, Simone</creatorcontrib><collection>Istex</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Journal of the Association for Information Science and Technology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>McKeown, Kathy</au><au>Daume III, Hal</au><au>Chaturvedi, Snigdha</au><au>Paparrizos, John</au><au>Thadani, Kapil</au><au>Barrio, Pablo</au><au>Biran, Or</au><au>Bothe, Suvarna</au><au>Collins, Michael</au><au>Fleischmann, Kenneth R.</au><au>Gravano, Luis</au><au>Jha, Rahul</au><au>King, Ben</au><au>McInerney, Kevin</au><au>Moon, Taesun</au><au>Neelakantan, Arvind</au><au>O'Seaghdha, Diarmuid</au><au>Radev, Dragomir</au><au>Templeton, Clay</au><au>Teufel, Simone</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Predicting the impact of scientific concepts using full-text features</atitle><jtitle>Journal of the Association for Information Science and Technology</jtitle><addtitle>J Assn Inf Sci Tec</addtitle><date>2016-11</date><risdate>2016</risdate><volume>67</volume><issue>11</issue><spage>2684</spage><epage>2696</epage><pages>2684-2696</pages><issn>2330-1635</issn><eissn>2330-1643</eissn><abstract>New scientific concepts, interpreted broadly, are continuously introduced in the literature, but relatively few concepts have a long‐term impact on society. The identification of such concepts is a challenging prediction task that would help multiple parties—including researchers and the general public—focus their attention within the vast scientific literature. In this paper we present a system that predicts the future impact of a scientific concept, represented as a technical term, based on the information available from recently published research articles. We analyze the usefulness of rich features derived from the full text of the articles through a variety of approaches, including rhetorical sentence analysis, information extraction, and time‐series analysis. The results from two large‐scale experiments with 3.8 million full‐text articles and 48 million metadata records support the conclusion that full‐text features are significantly more useful for prediction than metadata‐only features and that the most accurate predictions result from combining the metadata and full‐text features. Surprisingly, these results hold even when the metadata features are available for a much larger number of documents than are available for the full‐text features.</abstract><pub>Blackwell Publishing Ltd</pub><doi>10.1002/asi.23612</doi><tpages>13</tpages></addata></record> |
fulltext | fulltext |
identifier | ISSN: 2330-1635 |
ispartof | Journal of the Association for Information Science and Technology, 2016-11, Vol.67 (11), p.2684-2696 |
issn | 2330-1635 2330-1643 |
language | eng |
recordid | cdi_proquest_miscellaneous_1845809376 |
source | Wiley Online Library Journals Frontfile Complete; Business Source Complete |
subjects | Information retrieval machine learning Metadata natural language processing scientometrics Sentences Tasks Texts |
title | Predicting the impact of scientific concepts using full-text features |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-21T05%3A06%3A23IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Predicting%20the%20impact%20of%20scientific%20concepts%20using%20full-text%20features&rft.jtitle=Journal%20of%20the%20Association%20for%20Information%20Science%20and%20Technology&rft.au=McKeown,%20Kathy&rft.date=2016-11&rft.volume=67&rft.issue=11&rft.spage=2684&rft.epage=2696&rft.pages=2684-2696&rft.issn=2330-1635&rft.eissn=2330-1643&rft_id=info:doi/10.1002/asi.23612&rft_dat=%3Cproquest_cross%3E1845809376%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1845809376&rft_id=info:pmid/&rfr_iscdi=true |