Towards building a high-quality microblog-specific Chinese sentiment lexicon

Due to the huge popularity of microblogging services, microblogs have become important sources of customer opinions. Sentiment analysis systems can provide useful knowledge to decision support systems and decision makers by aggregating and summarizing the opinions in massive microblogs automatically...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Decision Support Systems 2016-07, Vol.87, p.39-49
Hauptverfasser: Wu, Fangzhao, Huang, Yongfeng, Song, Yangqiu, Liu, Shixia
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 49
container_issue
container_start_page 39
container_title Decision Support Systems
container_volume 87
creator Wu, Fangzhao
Huang, Yongfeng
Song, Yangqiu
Liu, Shixia
description Due to the huge popularity of microblogging services, microblogs have become important sources of customer opinions. Sentiment analysis systems can provide useful knowledge to decision support systems and decision makers by aggregating and summarizing the opinions in massive microblogs automatically. The most important component of sentiment analysis systems is sentiment lexicon. However, the performance of traditional sentiment lexicons on microblog sentiment analysis is far from satisfactory, especially for Chinese. In this paper, we propose a data-driven approach to build a high-quality microblog-specific sentiment lexicon for Chinese microblog sentiment analysis system. The core of our method is a unified framework that incorporates three kinds of sentiment knowledge for sentiment lexicon construction, i.e., the word-sentiment knowledge extracted from microblogs with emoticons, the sentiment similarity knowledge extracted from words' associations among all the messages, and the prior sentiment knowledge extracted from existing sentiment lexicons. In addition, in order to improve the coverage of our sentiment lexicon, we propose an effective method to detect popular new words in microblogs, which considers not only words' distributions over texts, but also their distributions over users.The detected new words with strong sentiment are incorporated in our sentiment lexicon.We built a microblog-specific Chinese sentiment lexicon on a large microblog dataset with more than 17 million messages. Experimental results on two microblog sentiment datasets show that our microblog-specific sentiment lexicon can significantly improve the performance of microblog sentiment analysis. •An effective and efficient method to detect the popular use-invented new words in Chinese microblogs.•Three kinds of heterogenous sentiment knowledge are extracted for building sentiment lexicon.•A unified framework incorporating various kinds of sentiment knowledge for microblog-specific sentiment lexicon construction.•Our microblog-specific sentiment lexicon outperforms existing sentiment lexicons.
doi_str_mv 10.1016/j.dss.2016.04.007
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_1825543122</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0167923616300604</els_id><sourcerecordid>4102321821</sourcerecordid><originalsourceid>FETCH-LOGICAL-c358t-37e900d853ea66d5cfe0957aad4b1bad3b659ff488e6284c8d04de04cc02f68e3</originalsourceid><addsrcrecordid>eNp9kD1PwzAQhi0EEqXwA9gisbAk2PFnxIQqvqRKLGW2HPvSOkqTYidA_z2uysTAcnfSve_p3geha4ILgom4awsXY1GmscCswFieoBlRkuZcVvIUzdJC5lVJxTm6iLHFWFCpxAwtV8OXCS5m9eQ75_t1ZrKNX2_yj8l0ftxnW2_DUHfDOo87sL7xNltsfA8Rsgj96LepZB18ezv0l-isMV2Eq98-R-9Pj6vFS758e35dPCxzS7kacyqhwtgpTsEI4bhtAFdcGuNYTWrjaC141TRMKRClYlY5zBxgZi0uG6GAztHt8e4uDB8TxFFvfbTQdaaHYYqaqJJzRklZJunNH2k7TKFP32kiq0pJjhOlOSJHVcoaY4BG74LfmrDXBOsDX93qxFcf-GrMdOKbPPdHD6Sknx6CjtZDb8H5AHbUbvD_uH8AX5qDgw</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1799875087</pqid></control><display><type>article</type><title>Towards building a high-quality microblog-specific Chinese sentiment lexicon</title><source>Access via ScienceDirect (Elsevier)</source><creator>Wu, Fangzhao ; Huang, Yongfeng ; Song, Yangqiu ; Liu, Shixia</creator><creatorcontrib>Wu, Fangzhao ; Huang, Yongfeng ; Song, Yangqiu ; Liu, Shixia</creatorcontrib><description>Due to the huge popularity of microblogging services, microblogs have become important sources of customer opinions. Sentiment analysis systems can provide useful knowledge to decision support systems and decision makers by aggregating and summarizing the opinions in massive microblogs automatically. The most important component of sentiment analysis systems is sentiment lexicon. However, the performance of traditional sentiment lexicons on microblog sentiment analysis is far from satisfactory, especially for Chinese. In this paper, we propose a data-driven approach to build a high-quality microblog-specific sentiment lexicon for Chinese microblog sentiment analysis system. The core of our method is a unified framework that incorporates three kinds of sentiment knowledge for sentiment lexicon construction, i.e., the word-sentiment knowledge extracted from microblogs with emoticons, the sentiment similarity knowledge extracted from words' associations among all the messages, and the prior sentiment knowledge extracted from existing sentiment lexicons. In addition, in order to improve the coverage of our sentiment lexicon, we propose an effective method to detect popular new words in microblogs, which considers not only words' distributions over texts, but also their distributions over users.The detected new words with strong sentiment are incorporated in our sentiment lexicon.We built a microblog-specific Chinese sentiment lexicon on a large microblog dataset with more than 17 million messages. Experimental results on two microblog sentiment datasets show that our microblog-specific sentiment lexicon can significantly improve the performance of microblog sentiment analysis. •An effective and efficient method to detect the popular use-invented new words in Chinese microblogs.•Three kinds of heterogenous sentiment knowledge are extracted for building sentiment lexicon.•A unified framework incorporating various kinds of sentiment knowledge for microblog-specific sentiment lexicon construction.•Our microblog-specific sentiment lexicon outperforms existing sentiment lexicons.</description><identifier>ISSN: 0167-9236</identifier><identifier>EISSN: 1873-5797</identifier><identifier>DOI: 10.1016/j.dss.2016.04.007</identifier><identifier>CODEN: DSSYDK</identifier><language>eng</language><publisher>Amsterdam: Elsevier B.V</publisher><subject>Construction ; Customer services ; Data mining ; Decision making ; Decision support systems ; Messages ; Microblog ; Sentiment analysis ; Sentiment lexicon ; Similarity ; Social networks ; Studies ; Texts</subject><ispartof>Decision Support Systems, 2016-07, Vol.87, p.39-49</ispartof><rights>2016</rights><rights>Copyright Elsevier Sequoia S.A. Jul 2016</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c358t-37e900d853ea66d5cfe0957aad4b1bad3b659ff488e6284c8d04de04cc02f68e3</citedby><cites>FETCH-LOGICAL-c358t-37e900d853ea66d5cfe0957aad4b1bad3b659ff488e6284c8d04de04cc02f68e3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://dx.doi.org/10.1016/j.dss.2016.04.007$$EHTML$$P50$$Gelsevier$$H</linktohtml><link.rule.ids>315,781,785,3551,27926,27927,45997</link.rule.ids></links><search><creatorcontrib>Wu, Fangzhao</creatorcontrib><creatorcontrib>Huang, Yongfeng</creatorcontrib><creatorcontrib>Song, Yangqiu</creatorcontrib><creatorcontrib>Liu, Shixia</creatorcontrib><title>Towards building a high-quality microblog-specific Chinese sentiment lexicon</title><title>Decision Support Systems</title><description>Due to the huge popularity of microblogging services, microblogs have become important sources of customer opinions. Sentiment analysis systems can provide useful knowledge to decision support systems and decision makers by aggregating and summarizing the opinions in massive microblogs automatically. The most important component of sentiment analysis systems is sentiment lexicon. However, the performance of traditional sentiment lexicons on microblog sentiment analysis is far from satisfactory, especially for Chinese. In this paper, we propose a data-driven approach to build a high-quality microblog-specific sentiment lexicon for Chinese microblog sentiment analysis system. The core of our method is a unified framework that incorporates three kinds of sentiment knowledge for sentiment lexicon construction, i.e., the word-sentiment knowledge extracted from microblogs with emoticons, the sentiment similarity knowledge extracted from words' associations among all the messages, and the prior sentiment knowledge extracted from existing sentiment lexicons. In addition, in order to improve the coverage of our sentiment lexicon, we propose an effective method to detect popular new words in microblogs, which considers not only words' distributions over texts, but also their distributions over users.The detected new words with strong sentiment are incorporated in our sentiment lexicon.We built a microblog-specific Chinese sentiment lexicon on a large microblog dataset with more than 17 million messages. Experimental results on two microblog sentiment datasets show that our microblog-specific sentiment lexicon can significantly improve the performance of microblog sentiment analysis. •An effective and efficient method to detect the popular use-invented new words in Chinese microblogs.•Three kinds of heterogenous sentiment knowledge are extracted for building sentiment lexicon.•A unified framework incorporating various kinds of sentiment knowledge for microblog-specific sentiment lexicon construction.•Our microblog-specific sentiment lexicon outperforms existing sentiment lexicons.</description><subject>Construction</subject><subject>Customer services</subject><subject>Data mining</subject><subject>Decision making</subject><subject>Decision support systems</subject><subject>Messages</subject><subject>Microblog</subject><subject>Sentiment analysis</subject><subject>Sentiment lexicon</subject><subject>Similarity</subject><subject>Social networks</subject><subject>Studies</subject><subject>Texts</subject><issn>0167-9236</issn><issn>1873-5797</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2016</creationdate><recordtype>article</recordtype><recordid>eNp9kD1PwzAQhi0EEqXwA9gisbAk2PFnxIQqvqRKLGW2HPvSOkqTYidA_z2uysTAcnfSve_p3geha4ILgom4awsXY1GmscCswFieoBlRkuZcVvIUzdJC5lVJxTm6iLHFWFCpxAwtV8OXCS5m9eQ75_t1ZrKNX2_yj8l0ftxnW2_DUHfDOo87sL7xNltsfA8Rsgj96LepZB18ezv0l-isMV2Eq98-R-9Pj6vFS758e35dPCxzS7kacyqhwtgpTsEI4bhtAFdcGuNYTWrjaC141TRMKRClYlY5zBxgZi0uG6GAztHt8e4uDB8TxFFvfbTQdaaHYYqaqJJzRklZJunNH2k7TKFP32kiq0pJjhOlOSJHVcoaY4BG74LfmrDXBOsDX93qxFcf-GrMdOKbPPdHD6Sknx6CjtZDb8H5AHbUbvD_uH8AX5qDgw</recordid><startdate>201607</startdate><enddate>201607</enddate><creator>Wu, Fangzhao</creator><creator>Huang, Yongfeng</creator><creator>Song, Yangqiu</creator><creator>Liu, Shixia</creator><general>Elsevier B.V</general><general>Elsevier Sequoia S.A</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>201607</creationdate><title>Towards building a high-quality microblog-specific Chinese sentiment lexicon</title><author>Wu, Fangzhao ; Huang, Yongfeng ; Song, Yangqiu ; Liu, Shixia</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c358t-37e900d853ea66d5cfe0957aad4b1bad3b659ff488e6284c8d04de04cc02f68e3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2016</creationdate><topic>Construction</topic><topic>Customer services</topic><topic>Data mining</topic><topic>Decision making</topic><topic>Decision support systems</topic><topic>Messages</topic><topic>Microblog</topic><topic>Sentiment analysis</topic><topic>Sentiment lexicon</topic><topic>Similarity</topic><topic>Social networks</topic><topic>Studies</topic><topic>Texts</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Wu, Fangzhao</creatorcontrib><creatorcontrib>Huang, Yongfeng</creatorcontrib><creatorcontrib>Song, Yangqiu</creatorcontrib><creatorcontrib>Liu, Shixia</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Decision Support Systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Wu, Fangzhao</au><au>Huang, Yongfeng</au><au>Song, Yangqiu</au><au>Liu, Shixia</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Towards building a high-quality microblog-specific Chinese sentiment lexicon</atitle><jtitle>Decision Support Systems</jtitle><date>2016-07</date><risdate>2016</risdate><volume>87</volume><spage>39</spage><epage>49</epage><pages>39-49</pages><issn>0167-9236</issn><eissn>1873-5797</eissn><coden>DSSYDK</coden><abstract>Due to the huge popularity of microblogging services, microblogs have become important sources of customer opinions. Sentiment analysis systems can provide useful knowledge to decision support systems and decision makers by aggregating and summarizing the opinions in massive microblogs automatically. The most important component of sentiment analysis systems is sentiment lexicon. However, the performance of traditional sentiment lexicons on microblog sentiment analysis is far from satisfactory, especially for Chinese. In this paper, we propose a data-driven approach to build a high-quality microblog-specific sentiment lexicon for Chinese microblog sentiment analysis system. The core of our method is a unified framework that incorporates three kinds of sentiment knowledge for sentiment lexicon construction, i.e., the word-sentiment knowledge extracted from microblogs with emoticons, the sentiment similarity knowledge extracted from words' associations among all the messages, and the prior sentiment knowledge extracted from existing sentiment lexicons. In addition, in order to improve the coverage of our sentiment lexicon, we propose an effective method to detect popular new words in microblogs, which considers not only words' distributions over texts, but also their distributions over users.The detected new words with strong sentiment are incorporated in our sentiment lexicon.We built a microblog-specific Chinese sentiment lexicon on a large microblog dataset with more than 17 million messages. Experimental results on two microblog sentiment datasets show that our microblog-specific sentiment lexicon can significantly improve the performance of microblog sentiment analysis. •An effective and efficient method to detect the popular use-invented new words in Chinese microblogs.•Three kinds of heterogenous sentiment knowledge are extracted for building sentiment lexicon.•A unified framework incorporating various kinds of sentiment knowledge for microblog-specific sentiment lexicon construction.•Our microblog-specific sentiment lexicon outperforms existing sentiment lexicons.</abstract><cop>Amsterdam</cop><pub>Elsevier B.V</pub><doi>10.1016/j.dss.2016.04.007</doi><tpages>11</tpages></addata></record>
fulltext fulltext
identifier ISSN: 0167-9236
ispartof Decision Support Systems, 2016-07, Vol.87, p.39-49
issn 0167-9236
1873-5797
language eng
recordid cdi_proquest_miscellaneous_1825543122
source Access via ScienceDirect (Elsevier)
subjects Construction
Customer services
Data mining
Decision making
Decision support systems
Messages
Microblog
Sentiment analysis
Sentiment lexicon
Similarity
Social networks
Studies
Texts
title Towards building a high-quality microblog-specific Chinese sentiment lexicon
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-18T08%3A32%3A17IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Towards%20building%20a%20high-quality%20microblog-specific%20Chinese%20sentiment%20lexicon&rft.jtitle=Decision%20Support%20Systems&rft.au=Wu,%20Fangzhao&rft.date=2016-07&rft.volume=87&rft.spage=39&rft.epage=49&rft.pages=39-49&rft.issn=0167-9236&rft.eissn=1873-5797&rft.coden=DSSYDK&rft_id=info:doi/10.1016/j.dss.2016.04.007&rft_dat=%3Cproquest_cross%3E4102321821%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1799875087&rft_id=info:pmid/&rft_els_id=S0167923616300604&rfr_iscdi=true