InfoCTM: A Mutual Information Maximization Perspective of Cross-Lingual Topic Modeling
Cross-lingual topic models have been prevalent for cross-lingual text analysis by revealing aligned latent topics. However, most existing methods suffer from producing repetitive topics that hinder further analysis and performance decline caused by low-coverage dictionaries. In this paper, we propos...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | |
container_volume | |
creator | Wu, Xiaobao Dong, Xinshuai Nguyen, Thong Liu, Chaoqun Pan, Liangming Luu, Anh Tuan |
description | Cross-lingual topic models have been prevalent for cross-lingual text
analysis by revealing aligned latent topics. However, most existing methods
suffer from producing repetitive topics that hinder further analysis and
performance decline caused by low-coverage dictionaries. In this paper, we
propose the Cross-lingual Topic Modeling with Mutual Information (InfoCTM).
Instead of the direct alignment in previous work, we propose a topic alignment
with mutual information method. This works as a regularization to properly
align topics and prevent degenerate topic representations of words, which
mitigates the repetitive topic issue. To address the low-coverage dictionary
issue, we further propose a cross-lingual vocabulary linking method that finds
more linked cross-lingual words for topic alignment beyond the translations of
a given dictionary. Extensive experiments on English, Chinese, and Japanese
datasets demonstrate that our method outperforms state-of-the-art baselines,
producing more coherent, diverse, and well-aligned topics and showing better
transferability for cross-lingual classification tasks. |
doi_str_mv | 10.48550/arxiv.2304.03544 |
format | Article |
fullrecord | <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2304_03544</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2304_03544</sourcerecordid><originalsourceid>FETCH-LOGICAL-a674-d41b1d2ee5eca46eb5623d8feef2d5a9796d956c74c980a8d3cc05feb1a8d8413</originalsourceid><addsrcrecordid>eNotj7tugzAYhb10iNI-QKb6BaA2vmC6RaiXSKB2QF2RsX9XlgAjQ6K0T9-QdDoX6RzpQ2hHScqVEORJx7M_pRkjPCVMcL5BX4fRhbKpn_Ee18flqHu8NnHQiw8jrvXZD_73Fj4hzhOYxZ8AB4fLGOY5qfz4va6aMHmD62ChvzT36M7pfoaHf92i5vWlKd-T6uPtUO6rRMucJ5bTjtoMQIDRXEInZMascgAus0IXeSFtIaTJuSkU0coyY4hw0NGLV5yyLXq83V7B2in6QcefdgVsr4DsD0zwTGU</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>InfoCTM: A Mutual Information Maximization Perspective of Cross-Lingual Topic Modeling</title><source>arXiv.org</source><creator>Wu, Xiaobao ; Dong, Xinshuai ; Nguyen, Thong ; Liu, Chaoqun ; Pan, Liangming ; Luu, Anh Tuan</creator><creatorcontrib>Wu, Xiaobao ; Dong, Xinshuai ; Nguyen, Thong ; Liu, Chaoqun ; Pan, Liangming ; Luu, Anh Tuan</creatorcontrib><description>Cross-lingual topic models have been prevalent for cross-lingual text
analysis by revealing aligned latent topics. However, most existing methods
suffer from producing repetitive topics that hinder further analysis and
performance decline caused by low-coverage dictionaries. In this paper, we
propose the Cross-lingual Topic Modeling with Mutual Information (InfoCTM).
Instead of the direct alignment in previous work, we propose a topic alignment
with mutual information method. This works as a regularization to properly
align topics and prevent degenerate topic representations of words, which
mitigates the repetitive topic issue. To address the low-coverage dictionary
issue, we further propose a cross-lingual vocabulary linking method that finds
more linked cross-lingual words for topic alignment beyond the translations of
a given dictionary. Extensive experiments on English, Chinese, and Japanese
datasets demonstrate that our method outperforms state-of-the-art baselines,
producing more coherent, diverse, and well-aligned topics and showing better
transferability for cross-lingual classification tasks.</description><identifier>DOI: 10.48550/arxiv.2304.03544</identifier><language>eng</language><subject>Computer Science - Computation and Language</subject><creationdate>2023-04</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2304.03544$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2304.03544$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Wu, Xiaobao</creatorcontrib><creatorcontrib>Dong, Xinshuai</creatorcontrib><creatorcontrib>Nguyen, Thong</creatorcontrib><creatorcontrib>Liu, Chaoqun</creatorcontrib><creatorcontrib>Pan, Liangming</creatorcontrib><creatorcontrib>Luu, Anh Tuan</creatorcontrib><title>InfoCTM: A Mutual Information Maximization Perspective of Cross-Lingual Topic Modeling</title><description>Cross-lingual topic models have been prevalent for cross-lingual text
analysis by revealing aligned latent topics. However, most existing methods
suffer from producing repetitive topics that hinder further analysis and
performance decline caused by low-coverage dictionaries. In this paper, we
propose the Cross-lingual Topic Modeling with Mutual Information (InfoCTM).
Instead of the direct alignment in previous work, we propose a topic alignment
with mutual information method. This works as a regularization to properly
align topics and prevent degenerate topic representations of words, which
mitigates the repetitive topic issue. To address the low-coverage dictionary
issue, we further propose a cross-lingual vocabulary linking method that finds
more linked cross-lingual words for topic alignment beyond the translations of
a given dictionary. Extensive experiments on English, Chinese, and Japanese
datasets demonstrate that our method outperforms state-of-the-art baselines,
producing more coherent, diverse, and well-aligned topics and showing better
transferability for cross-lingual classification tasks.</description><subject>Computer Science - Computation and Language</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotj7tugzAYhb10iNI-QKb6BaA2vmC6RaiXSKB2QF2RsX9XlgAjQ6K0T9-QdDoX6RzpQ2hHScqVEORJx7M_pRkjPCVMcL5BX4fRhbKpn_Ee18flqHu8NnHQiw8jrvXZD_73Fj4hzhOYxZ8AB4fLGOY5qfz4va6aMHmD62ChvzT36M7pfoaHf92i5vWlKd-T6uPtUO6rRMucJ5bTjtoMQIDRXEInZMascgAus0IXeSFtIaTJuSkU0coyY4hw0NGLV5yyLXq83V7B2in6QcefdgVsr4DsD0zwTGU</recordid><startdate>20230407</startdate><enddate>20230407</enddate><creator>Wu, Xiaobao</creator><creator>Dong, Xinshuai</creator><creator>Nguyen, Thong</creator><creator>Liu, Chaoqun</creator><creator>Pan, Liangming</creator><creator>Luu, Anh Tuan</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20230407</creationdate><title>InfoCTM: A Mutual Information Maximization Perspective of Cross-Lingual Topic Modeling</title><author>Wu, Xiaobao ; Dong, Xinshuai ; Nguyen, Thong ; Liu, Chaoqun ; Pan, Liangming ; Luu, Anh Tuan</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a674-d41b1d2ee5eca46eb5623d8feef2d5a9796d956c74c980a8d3cc05feb1a8d8413</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Computer Science - Computation and Language</topic><toplevel>online_resources</toplevel><creatorcontrib>Wu, Xiaobao</creatorcontrib><creatorcontrib>Dong, Xinshuai</creatorcontrib><creatorcontrib>Nguyen, Thong</creatorcontrib><creatorcontrib>Liu, Chaoqun</creatorcontrib><creatorcontrib>Pan, Liangming</creatorcontrib><creatorcontrib>Luu, Anh Tuan</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Wu, Xiaobao</au><au>Dong, Xinshuai</au><au>Nguyen, Thong</au><au>Liu, Chaoqun</au><au>Pan, Liangming</au><au>Luu, Anh Tuan</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>InfoCTM: A Mutual Information Maximization Perspective of Cross-Lingual Topic Modeling</atitle><date>2023-04-07</date><risdate>2023</risdate><abstract>Cross-lingual topic models have been prevalent for cross-lingual text
analysis by revealing aligned latent topics. However, most existing methods
suffer from producing repetitive topics that hinder further analysis and
performance decline caused by low-coverage dictionaries. In this paper, we
propose the Cross-lingual Topic Modeling with Mutual Information (InfoCTM).
Instead of the direct alignment in previous work, we propose a topic alignment
with mutual information method. This works as a regularization to properly
align topics and prevent degenerate topic representations of words, which
mitigates the repetitive topic issue. To address the low-coverage dictionary
issue, we further propose a cross-lingual vocabulary linking method that finds
more linked cross-lingual words for topic alignment beyond the translations of
a given dictionary. Extensive experiments on English, Chinese, and Japanese
datasets demonstrate that our method outperforms state-of-the-art baselines,
producing more coherent, diverse, and well-aligned topics and showing better
transferability for cross-lingual classification tasks.</abstract><doi>10.48550/arxiv.2304.03544</doi><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | DOI: 10.48550/arxiv.2304.03544 |
ispartof | |
issn | |
language | eng |
recordid | cdi_arxiv_primary_2304_03544 |
source | arXiv.org |
subjects | Computer Science - Computation and Language |
title | InfoCTM: A Mutual Information Maximization Perspective of Cross-Lingual Topic Modeling |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-30T04%3A23%3A27IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=InfoCTM:%20A%20Mutual%20Information%20Maximization%20Perspective%20of%20Cross-Lingual%20Topic%20Modeling&rft.au=Wu,%20Xiaobao&rft.date=2023-04-07&rft_id=info:doi/10.48550/arxiv.2304.03544&rft_dat=%3Carxiv_GOX%3E2304_03544%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |