It all starts with entities: A Salient entity topic model

Entities play an essential role in understanding textual documents, regardless of whether the documents are short, such as tweets, or long, such as news articles. In short textual documents, all entities mentioned are usually considered equally important because of the limited amount of information....

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Natural language engineering 2020-09, Vol.26 (5), p.531-549
Hauptverfasser: Wu, Chuan, Kanoulas, Evangelos, de Rijke, Maarten
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 549
container_issue 5
container_start_page 531
container_title Natural language engineering
container_volume 26
creator Wu, Chuan
Kanoulas, Evangelos
de Rijke, Maarten
description Entities play an essential role in understanding textual documents, regardless of whether the documents are short, such as tweets, or long, such as news articles. In short textual documents, all entities mentioned are usually considered equally important because of the limited amount of information. In long textual documents, however, not all entities are equally important: some are salient and others are not. Traditional entity topic models (ETMs) focus on ways to incorporate entity information into topic models to better explain the generative process of documents. However, entities are usually treated equally, without considering whether they are salient or not. In this work, we propose a novel ETM, Salient Entity Topic Model, to take salient entities into consideration in the document generation process. In particular, we model salient entities as a source of topics used to generate words in documents, in addition to the topic distribution of documents used in traditional topic models. Qualitative and quantitative analysis is performed on the proposed model. Application to entity salience detection demonstrates the effectiveness of our model compared to the state-of-the-art topic model baselines.
doi_str_mv 10.1017/S1351324919000585
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2431797939</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><cupid>10_1017_S1351324919000585</cupid><sourcerecordid>2431797939</sourcerecordid><originalsourceid>FETCH-LOGICAL-c360t-2174a9b399f18bc62ac9ff32eb0adc7aa01bb9e24488e34d1126cacc0686747a3</originalsourceid><addsrcrecordid>eNp1UE1LwzAYDqLgnP4AbwHP1bxJ2jTextA5GHiYnsvbNNWMdp1Jhuzfm9GBB_H0fjxf8BByC-weGKiHNYgcBJcaNGMsL_MzMgFZ6KwEYOdpT3B2xC_JVQibxJGg5IToZaTYdTRE9DHQbxc_qd1GF50Nj3RG19i5dI-_A43DzhnaD43trslFi12wN6c5Je_PT2_zl2z1uljOZ6vMiILFjKcY1LXQuoWyNgVHo9tWcFszbIxCZFDX2nIpy9IK2QDwwqAxrCgLJRWKKbkbfXd--NrbEKvNsPfbFFlxKUBppYVOLBhZxg8heNtWO-969IcKWHVsqPrTUNKIkwb72rvmw_5a_6_6AfJ-ZnE</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2431797939</pqid></control><display><type>article</type><title>It all starts with entities: A Salient entity topic model</title><source>Cambridge University Press Journals Complete</source><creator>Wu, Chuan ; Kanoulas, Evangelos ; de Rijke, Maarten</creator><creatorcontrib>Wu, Chuan ; Kanoulas, Evangelos ; de Rijke, Maarten</creatorcontrib><description>Entities play an essential role in understanding textual documents, regardless of whether the documents are short, such as tweets, or long, such as news articles. In short textual documents, all entities mentioned are usually considered equally important because of the limited amount of information. In long textual documents, however, not all entities are equally important: some are salient and others are not. Traditional entity topic models (ETMs) focus on ways to incorporate entity information into topic models to better explain the generative process of documents. However, entities are usually treated equally, without considering whether they are salient or not. In this work, we propose a novel ETM, Salient Entity Topic Model, to take salient entities into consideration in the document generation process. In particular, we model salient entities as a source of topics used to generate words in documents, in addition to the topic distribution of documents used in traditional topic models. Qualitative and quantitative analysis is performed on the proposed model. Application to entity salience detection demonstrates the effectiveness of our model compared to the state-of-the-art topic model baselines.</description><identifier>ISSN: 1351-3249</identifier><identifier>EISSN: 1469-8110</identifier><identifier>DOI: 10.1017/S1351324919000585</identifier><language>eng</language><publisher>Cambridge, UK: Cambridge University Press</publisher><subject>Information sources ; Parameter estimation ; Qualitative analysis ; Quantitative analysis ; Salience ; Topics ; Variables</subject><ispartof>Natural language engineering, 2020-09, Vol.26 (5), p.531-549</ispartof><rights>Cambridge University Press 2019</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c360t-2174a9b399f18bc62ac9ff32eb0adc7aa01bb9e24488e34d1126cacc0686747a3</citedby><cites>FETCH-LOGICAL-c360t-2174a9b399f18bc62ac9ff32eb0adc7aa01bb9e24488e34d1126cacc0686747a3</cites><orcidid>0000-0002-8784-0808</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.cambridge.org/core/product/identifier/S1351324919000585/type/journal_article$$EHTML$$P50$$Gcambridge$$H</linktohtml><link.rule.ids>164,314,780,784,27924,27925,55628</link.rule.ids></links><search><creatorcontrib>Wu, Chuan</creatorcontrib><creatorcontrib>Kanoulas, Evangelos</creatorcontrib><creatorcontrib>de Rijke, Maarten</creatorcontrib><title>It all starts with entities: A Salient entity topic model</title><title>Natural language engineering</title><addtitle>Nat. Lang. Eng</addtitle><description>Entities play an essential role in understanding textual documents, regardless of whether the documents are short, such as tweets, or long, such as news articles. In short textual documents, all entities mentioned are usually considered equally important because of the limited amount of information. In long textual documents, however, not all entities are equally important: some are salient and others are not. Traditional entity topic models (ETMs) focus on ways to incorporate entity information into topic models to better explain the generative process of documents. However, entities are usually treated equally, without considering whether they are salient or not. In this work, we propose a novel ETM, Salient Entity Topic Model, to take salient entities into consideration in the document generation process. In particular, we model salient entities as a source of topics used to generate words in documents, in addition to the topic distribution of documents used in traditional topic models. Qualitative and quantitative analysis is performed on the proposed model. Application to entity salience detection demonstrates the effectiveness of our model compared to the state-of-the-art topic model baselines.</description><subject>Information sources</subject><subject>Parameter estimation</subject><subject>Qualitative analysis</subject><subject>Quantitative analysis</subject><subject>Salience</subject><subject>Topics</subject><subject>Variables</subject><issn>1351-3249</issn><issn>1469-8110</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GNUQQ</sourceid><recordid>eNp1UE1LwzAYDqLgnP4AbwHP1bxJ2jTextA5GHiYnsvbNNWMdp1Jhuzfm9GBB_H0fjxf8BByC-weGKiHNYgcBJcaNGMsL_MzMgFZ6KwEYOdpT3B2xC_JVQibxJGg5IToZaTYdTRE9DHQbxc_qd1GF50Nj3RG19i5dI-_A43DzhnaD43trslFi12wN6c5Je_PT2_zl2z1uljOZ6vMiILFjKcY1LXQuoWyNgVHo9tWcFszbIxCZFDX2nIpy9IK2QDwwqAxrCgLJRWKKbkbfXd--NrbEKvNsPfbFFlxKUBppYVOLBhZxg8heNtWO-969IcKWHVsqPrTUNKIkwb72rvmw_5a_6_6AfJ-ZnE</recordid><startdate>20200901</startdate><enddate>20200901</enddate><creator>Wu, Chuan</creator><creator>Kanoulas, Evangelos</creator><creator>de Rijke, Maarten</creator><general>Cambridge University Press</general><scope>AAYXX</scope><scope>CITATION</scope><scope>3V.</scope><scope>7T9</scope><scope>7XB</scope><scope>88G</scope><scope>8AL</scope><scope>8FE</scope><scope>8FG</scope><scope>8FI</scope><scope>8FJ</scope><scope>8FK</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ALSLI</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>CPGLG</scope><scope>CRLPW</scope><scope>DWQXO</scope><scope>FYUFA</scope><scope>GHDGH</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K7-</scope><scope>L6V</scope><scope>M0N</scope><scope>M2M</scope><scope>M7S</scope><scope>P5Z</scope><scope>P62</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PSYQQ</scope><scope>PTHSS</scope><scope>Q9U</scope><orcidid>https://orcid.org/0000-0002-8784-0808</orcidid></search><sort><creationdate>20200901</creationdate><title>It all starts with entities: A Salient entity topic model</title><author>Wu, Chuan ; Kanoulas, Evangelos ; de Rijke, Maarten</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c360t-2174a9b399f18bc62ac9ff32eb0adc7aa01bb9e24488e34d1126cacc0686747a3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Information sources</topic><topic>Parameter estimation</topic><topic>Qualitative analysis</topic><topic>Quantitative analysis</topic><topic>Salience</topic><topic>Topics</topic><topic>Variables</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Wu, Chuan</creatorcontrib><creatorcontrib>Kanoulas, Evangelos</creatorcontrib><creatorcontrib>de Rijke, Maarten</creatorcontrib><collection>CrossRef</collection><collection>ProQuest Central (Corporate)</collection><collection>Linguistics and Language Behavior Abstracts (LLBA)</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Psychology Database (Alumni)</collection><collection>Computing Database (Alumni Edition)</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Hospital Premium Collection</collection><collection>Hospital Premium Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>Materials Science &amp; Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>Social Science Premium Collection</collection><collection>Advanced Technologies &amp; Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>Linguistics Collection</collection><collection>Linguistics Database</collection><collection>ProQuest Central Korea</collection><collection>Health Research Premium Collection</collection><collection>Health Research Premium Collection (Alumni)</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>Computer Science Database</collection><collection>ProQuest Engineering Collection</collection><collection>Computing Database</collection><collection>Psychology Database</collection><collection>Engineering Database</collection><collection>Advanced Technologies &amp; Aerospace Database</collection><collection>ProQuest Advanced Technologies &amp; Aerospace Collection</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest One Psychology</collection><collection>Engineering Collection</collection><collection>ProQuest Central Basic</collection><jtitle>Natural language engineering</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Wu, Chuan</au><au>Kanoulas, Evangelos</au><au>de Rijke, Maarten</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>It all starts with entities: A Salient entity topic model</atitle><jtitle>Natural language engineering</jtitle><addtitle>Nat. Lang. Eng</addtitle><date>2020-09-01</date><risdate>2020</risdate><volume>26</volume><issue>5</issue><spage>531</spage><epage>549</epage><pages>531-549</pages><issn>1351-3249</issn><eissn>1469-8110</eissn><abstract>Entities play an essential role in understanding textual documents, regardless of whether the documents are short, such as tweets, or long, such as news articles. In short textual documents, all entities mentioned are usually considered equally important because of the limited amount of information. In long textual documents, however, not all entities are equally important: some are salient and others are not. Traditional entity topic models (ETMs) focus on ways to incorporate entity information into topic models to better explain the generative process of documents. However, entities are usually treated equally, without considering whether they are salient or not. In this work, we propose a novel ETM, Salient Entity Topic Model, to take salient entities into consideration in the document generation process. In particular, we model salient entities as a source of topics used to generate words in documents, in addition to the topic distribution of documents used in traditional topic models. Qualitative and quantitative analysis is performed on the proposed model. Application to entity salience detection demonstrates the effectiveness of our model compared to the state-of-the-art topic model baselines.</abstract><cop>Cambridge, UK</cop><pub>Cambridge University Press</pub><doi>10.1017/S1351324919000585</doi><tpages>19</tpages><orcidid>https://orcid.org/0000-0002-8784-0808</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1351-3249
ispartof Natural language engineering, 2020-09, Vol.26 (5), p.531-549
issn 1351-3249
1469-8110
language eng
recordid cdi_proquest_journals_2431797939
source Cambridge University Press Journals Complete
subjects Information sources
Parameter estimation
Qualitative analysis
Quantitative analysis
Salience
Topics
Variables
title It all starts with entities: A Salient entity topic model
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-29T21%3A29%3A05IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=It%20all%20starts%20with%20entities:%20A%20Salient%20entity%20topic%20model&rft.jtitle=Natural%20language%20engineering&rft.au=Wu,%20Chuan&rft.date=2020-09-01&rft.volume=26&rft.issue=5&rft.spage=531&rft.epage=549&rft.pages=531-549&rft.issn=1351-3249&rft.eissn=1469-8110&rft_id=info:doi/10.1017/S1351324919000585&rft_dat=%3Cproquest_cross%3E2431797939%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2431797939&rft_id=info:pmid/&rft_cupid=10_1017_S1351324919000585&rfr_iscdi=true