It all starts with entities: A Salient entity topic model
Entities play an essential role in understanding textual documents, regardless of whether the documents are short, such as tweets, or long, such as news articles. In short textual documents, all entities mentioned are usually considered equally important because of the limited amount of information....
Gespeichert in:
Veröffentlicht in: | Natural language engineering 2020-09, Vol.26 (5), p.531-549 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 549 |
---|---|
container_issue | 5 |
container_start_page | 531 |
container_title | Natural language engineering |
container_volume | 26 |
creator | Wu, Chuan Kanoulas, Evangelos de Rijke, Maarten |
description | Entities play an essential role in understanding textual documents, regardless of whether the documents are short, such as tweets, or long, such as news articles. In short textual documents, all entities mentioned are usually considered equally important because of the limited amount of information. In long textual documents, however, not all entities are equally important: some are salient and others are not. Traditional entity topic models (ETMs) focus on ways to incorporate entity information into topic models to better explain the generative process of documents. However, entities are usually treated equally, without considering whether they are salient or not. In this work, we propose a novel ETM, Salient Entity Topic Model, to take salient entities into consideration in the document generation process. In particular, we model salient entities as a source of topics used to generate words in documents, in addition to the topic distribution of documents used in traditional topic models. Qualitative and quantitative analysis is performed on the proposed model. Application to entity salience detection demonstrates the effectiveness of our model compared to the state-of-the-art topic model baselines. |
doi_str_mv | 10.1017/S1351324919000585 |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2431797939</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><cupid>10_1017_S1351324919000585</cupid><sourcerecordid>2431797939</sourcerecordid><originalsourceid>FETCH-LOGICAL-c360t-2174a9b399f18bc62ac9ff32eb0adc7aa01bb9e24488e34d1126cacc0686747a3</originalsourceid><addsrcrecordid>eNp1UE1LwzAYDqLgnP4AbwHP1bxJ2jTextA5GHiYnsvbNNWMdp1Jhuzfm9GBB_H0fjxf8BByC-weGKiHNYgcBJcaNGMsL_MzMgFZ6KwEYOdpT3B2xC_JVQibxJGg5IToZaTYdTRE9DHQbxc_qd1GF50Nj3RG19i5dI-_A43DzhnaD43trslFi12wN6c5Je_PT2_zl2z1uljOZ6vMiILFjKcY1LXQuoWyNgVHo9tWcFszbIxCZFDX2nIpy9IK2QDwwqAxrCgLJRWKKbkbfXd--NrbEKvNsPfbFFlxKUBppYVOLBhZxg8heNtWO-969IcKWHVsqPrTUNKIkwb72rvmw_5a_6_6AfJ-ZnE</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2431797939</pqid></control><display><type>article</type><title>It all starts with entities: A Salient entity topic model</title><source>Cambridge University Press Journals Complete</source><creator>Wu, Chuan ; Kanoulas, Evangelos ; de Rijke, Maarten</creator><creatorcontrib>Wu, Chuan ; Kanoulas, Evangelos ; de Rijke, Maarten</creatorcontrib><description>Entities play an essential role in understanding textual documents, regardless of whether the documents are short, such as tweets, or long, such as news articles. In short textual documents, all entities mentioned are usually considered equally important because of the limited amount of information. In long textual documents, however, not all entities are equally important: some are salient and others are not. Traditional entity topic models (ETMs) focus on ways to incorporate entity information into topic models to better explain the generative process of documents. However, entities are usually treated equally, without considering whether they are salient or not. In this work, we propose a novel ETM, Salient Entity Topic Model, to take salient entities into consideration in the document generation process. In particular, we model salient entities as a source of topics used to generate words in documents, in addition to the topic distribution of documents used in traditional topic models. Qualitative and quantitative analysis is performed on the proposed model. Application to entity salience detection demonstrates the effectiveness of our model compared to the state-of-the-art topic model baselines.</description><identifier>ISSN: 1351-3249</identifier><identifier>EISSN: 1469-8110</identifier><identifier>DOI: 10.1017/S1351324919000585</identifier><language>eng</language><publisher>Cambridge, UK: Cambridge University Press</publisher><subject>Information sources ; Parameter estimation ; Qualitative analysis ; Quantitative analysis ; Salience ; Topics ; Variables</subject><ispartof>Natural language engineering, 2020-09, Vol.26 (5), p.531-549</ispartof><rights>Cambridge University Press 2019</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c360t-2174a9b399f18bc62ac9ff32eb0adc7aa01bb9e24488e34d1126cacc0686747a3</citedby><cites>FETCH-LOGICAL-c360t-2174a9b399f18bc62ac9ff32eb0adc7aa01bb9e24488e34d1126cacc0686747a3</cites><orcidid>0000-0002-8784-0808</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.cambridge.org/core/product/identifier/S1351324919000585/type/journal_article$$EHTML$$P50$$Gcambridge$$H</linktohtml><link.rule.ids>164,314,780,784,27924,27925,55628</link.rule.ids></links><search><creatorcontrib>Wu, Chuan</creatorcontrib><creatorcontrib>Kanoulas, Evangelos</creatorcontrib><creatorcontrib>de Rijke, Maarten</creatorcontrib><title>It all starts with entities: A Salient entity topic model</title><title>Natural language engineering</title><addtitle>Nat. Lang. Eng</addtitle><description>Entities play an essential role in understanding textual documents, regardless of whether the documents are short, such as tweets, or long, such as news articles. In short textual documents, all entities mentioned are usually considered equally important because of the limited amount of information. In long textual documents, however, not all entities are equally important: some are salient and others are not. Traditional entity topic models (ETMs) focus on ways to incorporate entity information into topic models to better explain the generative process of documents. However, entities are usually treated equally, without considering whether they are salient or not. In this work, we propose a novel ETM, Salient Entity Topic Model, to take salient entities into consideration in the document generation process. In particular, we model salient entities as a source of topics used to generate words in documents, in addition to the topic distribution of documents used in traditional topic models. Qualitative and quantitative analysis is performed on the proposed model. Application to entity salience detection demonstrates the effectiveness of our model compared to the state-of-the-art topic model baselines.</description><subject>Information sources</subject><subject>Parameter estimation</subject><subject>Qualitative analysis</subject><subject>Quantitative analysis</subject><subject>Salience</subject><subject>Topics</subject><subject>Variables</subject><issn>1351-3249</issn><issn>1469-8110</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GNUQQ</sourceid><recordid>eNp1UE1LwzAYDqLgnP4AbwHP1bxJ2jTextA5GHiYnsvbNNWMdp1Jhuzfm9GBB_H0fjxf8BByC-weGKiHNYgcBJcaNGMsL_MzMgFZ6KwEYOdpT3B2xC_JVQibxJGg5IToZaTYdTRE9DHQbxc_qd1GF50Nj3RG19i5dI-_A43DzhnaD43trslFi12wN6c5Je_PT2_zl2z1uljOZ6vMiILFjKcY1LXQuoWyNgVHo9tWcFszbIxCZFDX2nIpy9IK2QDwwqAxrCgLJRWKKbkbfXd--NrbEKvNsPfbFFlxKUBppYVOLBhZxg8heNtWO-969IcKWHVsqPrTUNKIkwb72rvmw_5a_6_6AfJ-ZnE</recordid><startdate>20200901</startdate><enddate>20200901</enddate><creator>Wu, Chuan</creator><creator>Kanoulas, Evangelos</creator><creator>de Rijke, Maarten</creator><general>Cambridge University Press</general><scope>AAYXX</scope><scope>CITATION</scope><scope>3V.</scope><scope>7T9</scope><scope>7XB</scope><scope>88G</scope><scope>8AL</scope><scope>8FE</scope><scope>8FG</scope><scope>8FI</scope><scope>8FJ</scope><scope>8FK</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ALSLI</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>CPGLG</scope><scope>CRLPW</scope><scope>DWQXO</scope><scope>FYUFA</scope><scope>GHDGH</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K7-</scope><scope>L6V</scope><scope>M0N</scope><scope>M2M</scope><scope>M7S</scope><scope>P5Z</scope><scope>P62</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PSYQQ</scope><scope>PTHSS</scope><scope>Q9U</scope><orcidid>https://orcid.org/0000-0002-8784-0808</orcidid></search><sort><creationdate>20200901</creationdate><title>It all starts with entities: A Salient entity topic model</title><author>Wu, Chuan ; Kanoulas, Evangelos ; de Rijke, Maarten</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c360t-2174a9b399f18bc62ac9ff32eb0adc7aa01bb9e24488e34d1126cacc0686747a3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Information sources</topic><topic>Parameter estimation</topic><topic>Qualitative analysis</topic><topic>Quantitative analysis</topic><topic>Salience</topic><topic>Topics</topic><topic>Variables</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Wu, Chuan</creatorcontrib><creatorcontrib>Kanoulas, Evangelos</creatorcontrib><creatorcontrib>de Rijke, Maarten</creatorcontrib><collection>CrossRef</collection><collection>ProQuest Central (Corporate)</collection><collection>Linguistics and Language Behavior Abstracts (LLBA)</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Psychology Database (Alumni)</collection><collection>Computing Database (Alumni Edition)</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Hospital Premium Collection</collection><collection>Hospital Premium Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>Social Science Premium Collection</collection><collection>Advanced Technologies & Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>Linguistics Collection</collection><collection>Linguistics Database</collection><collection>ProQuest Central Korea</collection><collection>Health Research Premium Collection</collection><collection>Health Research Premium Collection (Alumni)</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>Computer Science Database</collection><collection>ProQuest Engineering Collection</collection><collection>Computing Database</collection><collection>Psychology Database</collection><collection>Engineering Database</collection><collection>Advanced Technologies & Aerospace Database</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest One Psychology</collection><collection>Engineering Collection</collection><collection>ProQuest Central Basic</collection><jtitle>Natural language engineering</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Wu, Chuan</au><au>Kanoulas, Evangelos</au><au>de Rijke, Maarten</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>It all starts with entities: A Salient entity topic model</atitle><jtitle>Natural language engineering</jtitle><addtitle>Nat. Lang. Eng</addtitle><date>2020-09-01</date><risdate>2020</risdate><volume>26</volume><issue>5</issue><spage>531</spage><epage>549</epage><pages>531-549</pages><issn>1351-3249</issn><eissn>1469-8110</eissn><abstract>Entities play an essential role in understanding textual documents, regardless of whether the documents are short, such as tweets, or long, such as news articles. In short textual documents, all entities mentioned are usually considered equally important because of the limited amount of information. In long textual documents, however, not all entities are equally important: some are salient and others are not. Traditional entity topic models (ETMs) focus on ways to incorporate entity information into topic models to better explain the generative process of documents. However, entities are usually treated equally, without considering whether they are salient or not. In this work, we propose a novel ETM, Salient Entity Topic Model, to take salient entities into consideration in the document generation process. In particular, we model salient entities as a source of topics used to generate words in documents, in addition to the topic distribution of documents used in traditional topic models. Qualitative and quantitative analysis is performed on the proposed model. Application to entity salience detection demonstrates the effectiveness of our model compared to the state-of-the-art topic model baselines.</abstract><cop>Cambridge, UK</cop><pub>Cambridge University Press</pub><doi>10.1017/S1351324919000585</doi><tpages>19</tpages><orcidid>https://orcid.org/0000-0002-8784-0808</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1351-3249 |
ispartof | Natural language engineering, 2020-09, Vol.26 (5), p.531-549 |
issn | 1351-3249 1469-8110 |
language | eng |
recordid | cdi_proquest_journals_2431797939 |
source | Cambridge University Press Journals Complete |
subjects | Information sources Parameter estimation Qualitative analysis Quantitative analysis Salience Topics Variables |
title | It all starts with entities: A Salient entity topic model |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-29T21%3A29%3A05IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=It%20all%20starts%20with%20entities:%20A%20Salient%20entity%20topic%20model&rft.jtitle=Natural%20language%20engineering&rft.au=Wu,%20Chuan&rft.date=2020-09-01&rft.volume=26&rft.issue=5&rft.spage=531&rft.epage=549&rft.pages=531-549&rft.issn=1351-3249&rft.eissn=1469-8110&rft_id=info:doi/10.1017/S1351324919000585&rft_dat=%3Cproquest_cross%3E2431797939%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2431797939&rft_id=info:pmid/&rft_cupid=10_1017_S1351324919000585&rfr_iscdi=true |