Semantic-based topic representation using frequent semantic patterns

Topic modeling discovers the hidden topics in a document collection. Most of the existing topic models focus only on word usage and generate the topics based on the word frequency and co-occurrence without considering the meaning of the text. In this paper, we propose a novel approach to generate a...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Knowledge-based systems 2021-03, Vol.216, p.106808, Article 106808
Hauptverfasser: Kapugama Geeganage, Dakshi T., Xu, Yue, Li, Yuefeng
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page 106808
container_title Knowledge-based systems
container_volume 216
creator Kapugama Geeganage, Dakshi T.
Xu, Yue
Li, Yuefeng
description Topic modeling discovers the hidden topics in a document collection. Most of the existing topic models focus only on word usage and generate the topics based on the word frequency and co-occurrence without considering the meaning of the text. In this paper, we propose a novel approach to generate a semantic pattern-based topic representation based on the meaning of the text to represent the topics in a document collection. The proposed approach considers both the semantics and co-occurrence of words to generate a set of frequent semantic patterns to represent each topic. The semantics are captured by matching the words in each topic with concepts in the Probase ontology. A set of frequent semantic patterns in each topic is generated based on the co-occurrence of the matched words to represent the topic. Hence, our approach differs from traditional topic models because of the meaningful frequent semantic patterns generated based on the ontology. The proposed topic representation was evaluated in terms of topic quality and information filtering performance against a set of state-of-the-art systems. Perplexity, coherence, and topic word distribution were examined in the topic quality evaluation. The generated frequent semantic patterns were used as features for the information filtering evaluation. Our topic representation outperformed in all the evaluations.
doi_str_mv 10.1016/j.knosys.2021.106808
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2502902595</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S095070512100071X</els_id><sourcerecordid>2502902595</sourcerecordid><originalsourceid>FETCH-LOGICAL-c334t-4b9f128305f9a528e5c19bce50b75e03c7bebe1089251bbc7deccb4e61b5df163</originalsourceid><addsrcrecordid>eNp9kE9LxDAQxYMouK5-Aw8Fz10nadM2F0HWv7DgQT2HJJ1KqpvUJCv47c3SPXsaeLz3ZuZHyCWFFQXaXI-rT-fjb1wxYDRLTQfdEVnQrmVlW4M4JgsQHMoWOD0lZzGOAMAY7Rbk7hW3yiVrSq0i9kXykzVFwClgRJdUst4Vu2jdRzEE_N5lrYiHSDGplDC4eE5OBvUV8eIwl-T94f5t_VRuXh6f17eb0lRVncpai4GyrgI-CMVZh9xQoQ1y0C1HqEyrUSOFTjBOtTZtj8boGhuqeT_QplqSq7l3Cj6fEpMc_S64vFIyDkwA44JnVz27TPAxBhzkFOxWhV9JQe55yVHOvOSel5x55djNHMP8wY_FIKOx6Az2NqBJsvf2_4I_YT13Bw</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2502902595</pqid></control><display><type>article</type><title>Semantic-based topic representation using frequent semantic patterns</title><source>ScienceDirect Journals (5 years ago - present)</source><creator>Kapugama Geeganage, Dakshi T. ; Xu, Yue ; Li, Yuefeng</creator><creatorcontrib>Kapugama Geeganage, Dakshi T. ; Xu, Yue ; Li, Yuefeng</creatorcontrib><description>Topic modeling discovers the hidden topics in a document collection. Most of the existing topic models focus only on word usage and generate the topics based on the word frequency and co-occurrence without considering the meaning of the text. In this paper, we propose a novel approach to generate a semantic pattern-based topic representation based on the meaning of the text to represent the topics in a document collection. The proposed approach considers both the semantics and co-occurrence of words to generate a set of frequent semantic patterns to represent each topic. The semantics are captured by matching the words in each topic with concepts in the Probase ontology. A set of frequent semantic patterns in each topic is generated based on the co-occurrence of the matched words to represent the topic. Hence, our approach differs from traditional topic models because of the meaningful frequent semantic patterns generated based on the ontology. The proposed topic representation was evaluated in terms of topic quality and information filtering performance against a set of state-of-the-art systems. Perplexity, coherence, and topic word distribution were examined in the topic quality evaluation. The generated frequent semantic patterns were used as features for the information filtering evaluation. Our topic representation outperformed in all the evaluations.</description><identifier>ISSN: 0950-7051</identifier><identifier>EISSN: 1872-7409</identifier><identifier>DOI: 10.1016/j.knosys.2021.106808</identifier><language>eng</language><publisher>Amsterdam: Elsevier B.V</publisher><subject>Concepts ; Filtration ; Knowledge representation ; Ontology ; Patterns ; Quality assessment ; Semantics ; Topic representation ; Words (language)</subject><ispartof>Knowledge-based systems, 2021-03, Vol.216, p.106808, Article 106808</ispartof><rights>2021 Elsevier B.V.</rights><rights>Copyright Elsevier Science Ltd. Mar 15, 2021</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c334t-4b9f128305f9a528e5c19bce50b75e03c7bebe1089251bbc7deccb4e61b5df163</citedby><cites>FETCH-LOGICAL-c334t-4b9f128305f9a528e5c19bce50b75e03c7bebe1089251bbc7deccb4e61b5df163</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://dx.doi.org/10.1016/j.knosys.2021.106808$$EHTML$$P50$$Gelsevier$$H</linktohtml><link.rule.ids>314,777,781,3537,27905,27906,45976</link.rule.ids></links><search><creatorcontrib>Kapugama Geeganage, Dakshi T.</creatorcontrib><creatorcontrib>Xu, Yue</creatorcontrib><creatorcontrib>Li, Yuefeng</creatorcontrib><title>Semantic-based topic representation using frequent semantic patterns</title><title>Knowledge-based systems</title><description>Topic modeling discovers the hidden topics in a document collection. Most of the existing topic models focus only on word usage and generate the topics based on the word frequency and co-occurrence without considering the meaning of the text. In this paper, we propose a novel approach to generate a semantic pattern-based topic representation based on the meaning of the text to represent the topics in a document collection. The proposed approach considers both the semantics and co-occurrence of words to generate a set of frequent semantic patterns to represent each topic. The semantics are captured by matching the words in each topic with concepts in the Probase ontology. A set of frequent semantic patterns in each topic is generated based on the co-occurrence of the matched words to represent the topic. Hence, our approach differs from traditional topic models because of the meaningful frequent semantic patterns generated based on the ontology. The proposed topic representation was evaluated in terms of topic quality and information filtering performance against a set of state-of-the-art systems. Perplexity, coherence, and topic word distribution were examined in the topic quality evaluation. The generated frequent semantic patterns were used as features for the information filtering evaluation. Our topic representation outperformed in all the evaluations.</description><subject>Concepts</subject><subject>Filtration</subject><subject>Knowledge representation</subject><subject>Ontology</subject><subject>Patterns</subject><subject>Quality assessment</subject><subject>Semantics</subject><subject>Topic representation</subject><subject>Words (language)</subject><issn>0950-7051</issn><issn>1872-7409</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><recordid>eNp9kE9LxDAQxYMouK5-Aw8Fz10nadM2F0HWv7DgQT2HJJ1KqpvUJCv47c3SPXsaeLz3ZuZHyCWFFQXaXI-rT-fjb1wxYDRLTQfdEVnQrmVlW4M4JgsQHMoWOD0lZzGOAMAY7Rbk7hW3yiVrSq0i9kXykzVFwClgRJdUst4Vu2jdRzEE_N5lrYiHSDGplDC4eE5OBvUV8eIwl-T94f5t_VRuXh6f17eb0lRVncpai4GyrgI-CMVZh9xQoQ1y0C1HqEyrUSOFTjBOtTZtj8boGhuqeT_QplqSq7l3Cj6fEpMc_S64vFIyDkwA44JnVz27TPAxBhzkFOxWhV9JQe55yVHOvOSel5x55djNHMP8wY_FIKOx6Az2NqBJsvf2_4I_YT13Bw</recordid><startdate>20210315</startdate><enddate>20210315</enddate><creator>Kapugama Geeganage, Dakshi T.</creator><creator>Xu, Yue</creator><creator>Li, Yuefeng</creator><general>Elsevier B.V</general><general>Elsevier Science Ltd</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>E3H</scope><scope>F2A</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>20210315</creationdate><title>Semantic-based topic representation using frequent semantic patterns</title><author>Kapugama Geeganage, Dakshi T. ; Xu, Yue ; Li, Yuefeng</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c334t-4b9f128305f9a528e5c19bce50b75e03c7bebe1089251bbc7deccb4e61b5df163</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Concepts</topic><topic>Filtration</topic><topic>Knowledge representation</topic><topic>Ontology</topic><topic>Patterns</topic><topic>Quality assessment</topic><topic>Semantics</topic><topic>Topic representation</topic><topic>Words (language)</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Kapugama Geeganage, Dakshi T.</creatorcontrib><creatorcontrib>Xu, Yue</creatorcontrib><creatorcontrib>Li, Yuefeng</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>Library &amp; Information Sciences Abstracts (LISA)</collection><collection>Library &amp; Information Science Abstracts (LISA)</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Knowledge-based systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Kapugama Geeganage, Dakshi T.</au><au>Xu, Yue</au><au>Li, Yuefeng</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Semantic-based topic representation using frequent semantic patterns</atitle><jtitle>Knowledge-based systems</jtitle><date>2021-03-15</date><risdate>2021</risdate><volume>216</volume><spage>106808</spage><pages>106808-</pages><artnum>106808</artnum><issn>0950-7051</issn><eissn>1872-7409</eissn><abstract>Topic modeling discovers the hidden topics in a document collection. Most of the existing topic models focus only on word usage and generate the topics based on the word frequency and co-occurrence without considering the meaning of the text. In this paper, we propose a novel approach to generate a semantic pattern-based topic representation based on the meaning of the text to represent the topics in a document collection. The proposed approach considers both the semantics and co-occurrence of words to generate a set of frequent semantic patterns to represent each topic. The semantics are captured by matching the words in each topic with concepts in the Probase ontology. A set of frequent semantic patterns in each topic is generated based on the co-occurrence of the matched words to represent the topic. Hence, our approach differs from traditional topic models because of the meaningful frequent semantic patterns generated based on the ontology. The proposed topic representation was evaluated in terms of topic quality and information filtering performance against a set of state-of-the-art systems. Perplexity, coherence, and topic word distribution were examined in the topic quality evaluation. The generated frequent semantic patterns were used as features for the information filtering evaluation. Our topic representation outperformed in all the evaluations.</abstract><cop>Amsterdam</cop><pub>Elsevier B.V</pub><doi>10.1016/j.knosys.2021.106808</doi></addata></record>
fulltext fulltext
identifier ISSN: 0950-7051
ispartof Knowledge-based systems, 2021-03, Vol.216, p.106808, Article 106808
issn 0950-7051
1872-7409
language eng
recordid cdi_proquest_journals_2502902595
source ScienceDirect Journals (5 years ago - present)
subjects Concepts
Filtration
Knowledge representation
Ontology
Patterns
Quality assessment
Semantics
Topic representation
Words (language)
title Semantic-based topic representation using frequent semantic patterns
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-19T18%3A08%3A53IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Semantic-based%20topic%20representation%20using%20frequent%20semantic%20patterns&rft.jtitle=Knowledge-based%20systems&rft.au=Kapugama%20Geeganage,%20Dakshi%20T.&rft.date=2021-03-15&rft.volume=216&rft.spage=106808&rft.pages=106808-&rft.artnum=106808&rft.issn=0950-7051&rft.eissn=1872-7409&rft_id=info:doi/10.1016/j.knosys.2021.106808&rft_dat=%3Cproquest_cross%3E2502902595%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2502902595&rft_id=info:pmid/&rft_els_id=S095070512100071X&rfr_iscdi=true