Semantic-based topic representation using frequent semantic patterns
Topic modeling discovers the hidden topics in a document collection. Most of the existing topic models focus only on word usage and generate the topics based on the word frequency and co-occurrence without considering the meaning of the text. In this paper, we propose a novel approach to generate a...
Gespeichert in:
Veröffentlicht in: | Knowledge-based systems 2021-03, Vol.216, p.106808, Article 106808 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | 106808 |
container_title | Knowledge-based systems |
container_volume | 216 |
creator | Kapugama Geeganage, Dakshi T. Xu, Yue Li, Yuefeng |
description | Topic modeling discovers the hidden topics in a document collection. Most of the existing topic models focus only on word usage and generate the topics based on the word frequency and co-occurrence without considering the meaning of the text. In this paper, we propose a novel approach to generate a semantic pattern-based topic representation based on the meaning of the text to represent the topics in a document collection. The proposed approach considers both the semantics and co-occurrence of words to generate a set of frequent semantic patterns to represent each topic. The semantics are captured by matching the words in each topic with concepts in the Probase ontology. A set of frequent semantic patterns in each topic is generated based on the co-occurrence of the matched words to represent the topic. Hence, our approach differs from traditional topic models because of the meaningful frequent semantic patterns generated based on the ontology. The proposed topic representation was evaluated in terms of topic quality and information filtering performance against a set of state-of-the-art systems. Perplexity, coherence, and topic word distribution were examined in the topic quality evaluation. The generated frequent semantic patterns were used as features for the information filtering evaluation. Our topic representation outperformed in all the evaluations. |
doi_str_mv | 10.1016/j.knosys.2021.106808 |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2502902595</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S095070512100071X</els_id><sourcerecordid>2502902595</sourcerecordid><originalsourceid>FETCH-LOGICAL-c334t-4b9f128305f9a528e5c19bce50b75e03c7bebe1089251bbc7deccb4e61b5df163</originalsourceid><addsrcrecordid>eNp9kE9LxDAQxYMouK5-Aw8Fz10nadM2F0HWv7DgQT2HJJ1KqpvUJCv47c3SPXsaeLz3ZuZHyCWFFQXaXI-rT-fjb1wxYDRLTQfdEVnQrmVlW4M4JgsQHMoWOD0lZzGOAMAY7Rbk7hW3yiVrSq0i9kXykzVFwClgRJdUst4Vu2jdRzEE_N5lrYiHSDGplDC4eE5OBvUV8eIwl-T94f5t_VRuXh6f17eb0lRVncpai4GyrgI-CMVZh9xQoQ1y0C1HqEyrUSOFTjBOtTZtj8boGhuqeT_QplqSq7l3Cj6fEpMc_S64vFIyDkwA44JnVz27TPAxBhzkFOxWhV9JQe55yVHOvOSel5x55djNHMP8wY_FIKOx6Az2NqBJsvf2_4I_YT13Bw</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2502902595</pqid></control><display><type>article</type><title>Semantic-based topic representation using frequent semantic patterns</title><source>ScienceDirect Journals (5 years ago - present)</source><creator>Kapugama Geeganage, Dakshi T. ; Xu, Yue ; Li, Yuefeng</creator><creatorcontrib>Kapugama Geeganage, Dakshi T. ; Xu, Yue ; Li, Yuefeng</creatorcontrib><description>Topic modeling discovers the hidden topics in a document collection. Most of the existing topic models focus only on word usage and generate the topics based on the word frequency and co-occurrence without considering the meaning of the text. In this paper, we propose a novel approach to generate a semantic pattern-based topic representation based on the meaning of the text to represent the topics in a document collection. The proposed approach considers both the semantics and co-occurrence of words to generate a set of frequent semantic patterns to represent each topic. The semantics are captured by matching the words in each topic with concepts in the Probase ontology. A set of frequent semantic patterns in each topic is generated based on the co-occurrence of the matched words to represent the topic. Hence, our approach differs from traditional topic models because of the meaningful frequent semantic patterns generated based on the ontology. The proposed topic representation was evaluated in terms of topic quality and information filtering performance against a set of state-of-the-art systems. Perplexity, coherence, and topic word distribution were examined in the topic quality evaluation. The generated frequent semantic patterns were used as features for the information filtering evaluation. Our topic representation outperformed in all the evaluations.</description><identifier>ISSN: 0950-7051</identifier><identifier>EISSN: 1872-7409</identifier><identifier>DOI: 10.1016/j.knosys.2021.106808</identifier><language>eng</language><publisher>Amsterdam: Elsevier B.V</publisher><subject>Concepts ; Filtration ; Knowledge representation ; Ontology ; Patterns ; Quality assessment ; Semantics ; Topic representation ; Words (language)</subject><ispartof>Knowledge-based systems, 2021-03, Vol.216, p.106808, Article 106808</ispartof><rights>2021 Elsevier B.V.</rights><rights>Copyright Elsevier Science Ltd. Mar 15, 2021</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c334t-4b9f128305f9a528e5c19bce50b75e03c7bebe1089251bbc7deccb4e61b5df163</citedby><cites>FETCH-LOGICAL-c334t-4b9f128305f9a528e5c19bce50b75e03c7bebe1089251bbc7deccb4e61b5df163</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://dx.doi.org/10.1016/j.knosys.2021.106808$$EHTML$$P50$$Gelsevier$$H</linktohtml><link.rule.ids>314,777,781,3537,27905,27906,45976</link.rule.ids></links><search><creatorcontrib>Kapugama Geeganage, Dakshi T.</creatorcontrib><creatorcontrib>Xu, Yue</creatorcontrib><creatorcontrib>Li, Yuefeng</creatorcontrib><title>Semantic-based topic representation using frequent semantic patterns</title><title>Knowledge-based systems</title><description>Topic modeling discovers the hidden topics in a document collection. Most of the existing topic models focus only on word usage and generate the topics based on the word frequency and co-occurrence without considering the meaning of the text. In this paper, we propose a novel approach to generate a semantic pattern-based topic representation based on the meaning of the text to represent the topics in a document collection. The proposed approach considers both the semantics and co-occurrence of words to generate a set of frequent semantic patterns to represent each topic. The semantics are captured by matching the words in each topic with concepts in the Probase ontology. A set of frequent semantic patterns in each topic is generated based on the co-occurrence of the matched words to represent the topic. Hence, our approach differs from traditional topic models because of the meaningful frequent semantic patterns generated based on the ontology. The proposed topic representation was evaluated in terms of topic quality and information filtering performance against a set of state-of-the-art systems. Perplexity, coherence, and topic word distribution were examined in the topic quality evaluation. The generated frequent semantic patterns were used as features for the information filtering evaluation. Our topic representation outperformed in all the evaluations.</description><subject>Concepts</subject><subject>Filtration</subject><subject>Knowledge representation</subject><subject>Ontology</subject><subject>Patterns</subject><subject>Quality assessment</subject><subject>Semantics</subject><subject>Topic representation</subject><subject>Words (language)</subject><issn>0950-7051</issn><issn>1872-7409</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><recordid>eNp9kE9LxDAQxYMouK5-Aw8Fz10nadM2F0HWv7DgQT2HJJ1KqpvUJCv47c3SPXsaeLz3ZuZHyCWFFQXaXI-rT-fjb1wxYDRLTQfdEVnQrmVlW4M4JgsQHMoWOD0lZzGOAMAY7Rbk7hW3yiVrSq0i9kXykzVFwClgRJdUst4Vu2jdRzEE_N5lrYiHSDGplDC4eE5OBvUV8eIwl-T94f5t_VRuXh6f17eb0lRVncpai4GyrgI-CMVZh9xQoQ1y0C1HqEyrUSOFTjBOtTZtj8boGhuqeT_QplqSq7l3Cj6fEpMc_S64vFIyDkwA44JnVz27TPAxBhzkFOxWhV9JQe55yVHOvOSel5x55djNHMP8wY_FIKOx6Az2NqBJsvf2_4I_YT13Bw</recordid><startdate>20210315</startdate><enddate>20210315</enddate><creator>Kapugama Geeganage, Dakshi T.</creator><creator>Xu, Yue</creator><creator>Li, Yuefeng</creator><general>Elsevier B.V</general><general>Elsevier Science Ltd</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>E3H</scope><scope>F2A</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>20210315</creationdate><title>Semantic-based topic representation using frequent semantic patterns</title><author>Kapugama Geeganage, Dakshi T. ; Xu, Yue ; Li, Yuefeng</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c334t-4b9f128305f9a528e5c19bce50b75e03c7bebe1089251bbc7deccb4e61b5df163</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Concepts</topic><topic>Filtration</topic><topic>Knowledge representation</topic><topic>Ontology</topic><topic>Patterns</topic><topic>Quality assessment</topic><topic>Semantics</topic><topic>Topic representation</topic><topic>Words (language)</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Kapugama Geeganage, Dakshi T.</creatorcontrib><creatorcontrib>Xu, Yue</creatorcontrib><creatorcontrib>Li, Yuefeng</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>Library & Information Sciences Abstracts (LISA)</collection><collection>Library & Information Science Abstracts (LISA)</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Knowledge-based systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Kapugama Geeganage, Dakshi T.</au><au>Xu, Yue</au><au>Li, Yuefeng</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Semantic-based topic representation using frequent semantic patterns</atitle><jtitle>Knowledge-based systems</jtitle><date>2021-03-15</date><risdate>2021</risdate><volume>216</volume><spage>106808</spage><pages>106808-</pages><artnum>106808</artnum><issn>0950-7051</issn><eissn>1872-7409</eissn><abstract>Topic modeling discovers the hidden topics in a document collection. Most of the existing topic models focus only on word usage and generate the topics based on the word frequency and co-occurrence without considering the meaning of the text. In this paper, we propose a novel approach to generate a semantic pattern-based topic representation based on the meaning of the text to represent the topics in a document collection. The proposed approach considers both the semantics and co-occurrence of words to generate a set of frequent semantic patterns to represent each topic. The semantics are captured by matching the words in each topic with concepts in the Probase ontology. A set of frequent semantic patterns in each topic is generated based on the co-occurrence of the matched words to represent the topic. Hence, our approach differs from traditional topic models because of the meaningful frequent semantic patterns generated based on the ontology. The proposed topic representation was evaluated in terms of topic quality and information filtering performance against a set of state-of-the-art systems. Perplexity, coherence, and topic word distribution were examined in the topic quality evaluation. The generated frequent semantic patterns were used as features for the information filtering evaluation. Our topic representation outperformed in all the evaluations.</abstract><cop>Amsterdam</cop><pub>Elsevier B.V</pub><doi>10.1016/j.knosys.2021.106808</doi></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0950-7051 |
ispartof | Knowledge-based systems, 2021-03, Vol.216, p.106808, Article 106808 |
issn | 0950-7051 1872-7409 |
language | eng |
recordid | cdi_proquest_journals_2502902595 |
source | ScienceDirect Journals (5 years ago - present) |
subjects | Concepts Filtration Knowledge representation Ontology Patterns Quality assessment Semantics Topic representation Words (language) |
title | Semantic-based topic representation using frequent semantic patterns |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-19T18%3A08%3A53IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Semantic-based%20topic%20representation%20using%20frequent%20semantic%20patterns&rft.jtitle=Knowledge-based%20systems&rft.au=Kapugama%20Geeganage,%20Dakshi%20T.&rft.date=2021-03-15&rft.volume=216&rft.spage=106808&rft.pages=106808-&rft.artnum=106808&rft.issn=0950-7051&rft.eissn=1872-7409&rft_id=info:doi/10.1016/j.knosys.2021.106808&rft_dat=%3Cproquest_cross%3E2502902595%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2502902595&rft_id=info:pmid/&rft_els_id=S095070512100071X&rfr_iscdi=true |