On predicting the popularity of newly emerging hashtags in Twitter
Because of Twitter's popularity and the viral nature of information dissemination on Twitter, predicting which Twitter topics will become popular in the near future becomes a task of considerable economic importance. Many Twitter topics are annotated by hashtags. In this article, we propose met...
Gespeichert in:
Veröffentlicht in: | Journal of the American Society for Information Science and Technology 2013-07, Vol.64 (7), p.1399-1410 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 1410 |
---|---|
container_issue | 7 |
container_start_page | 1399 |
container_title | Journal of the American Society for Information Science and Technology |
container_volume | 64 |
creator | Ma, Zongyang Sun, Aixin Cong, Gao |
description | Because of Twitter's popularity and the viral nature of information dissemination on Twitter, predicting which Twitter topics will become popular in the near future becomes a task of considerable economic importance. Many Twitter topics are annotated by hashtags. In this article, we propose methods to predict the popularity of new hashtags on Twitter by formulating the problem as a classification task. We use five standard classification models (i.e., Naïve bayes, k‐nearest neighbors, decision trees, support vector machines, and logistic regression) for prediction. The main challenge is the identification of effective features for describing new hashtags. We extract 7 content features from a hashtag string and the collection of tweets containing the hashtag and 11 contextual features from the social graph formed by users who have adopted the hashtag. We conducted experiments on a Twitter data set consisting of 31 million tweets from 2 million Singapore‐based users. The experimental results show that the standard classifiers using the extracted features significantly outperform the baseline methods that do not use these features. Among the five classifiers, the logistic regression model performs the best in terms of the Micro‐F1 measure. We also observe that contextual features are more effective than content features. |
doi_str_mv | 10.1002/asi.22844 |
format | Article |
fullrecord | <record><control><sourceid>proquest_pasca</sourceid><recordid>TN_cdi_proquest_miscellaneous_1429844453</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1429844453</sourcerecordid><originalsourceid>FETCH-LOGICAL-i3244-b75034b16453d633e5f19a456c8f34ba229cc840bae6fba231b0ff410f0dc57d3</originalsourceid><addsrcrecordid>eNpdkEtPAjEUhRujiYgu_AeTGBM3A33OYwlEESWwEOOy6ZQWisPM2M4E599bHmFh7qL39n7npD0A3CPYQxDivnCmh3FC6QXoIEZwiJMUXp77BF-DG-c2ECLEEOyA4bwIKquWRtamWAX1WgVVWTW5sKZug1IHhdrlbaC2yq72wFq4dS1WLjBFsNiZulb2FlxpkTt1dzq74PPleTF6Dafz8WQ0mIaGYErDLGaQ0AxFlJFlRIhiGqWCskgm2t8LjFMpEwozoSLtR4IyqDVFUMOlZPGSdMHT0bey5U-jXM23xkmV56JQZeM4ojj1H_f2Hn34h27Kxhb-dRyRiO0rTT31eKKEkyLXVhTSOF5ZsxW25Tj2VkmEPdc_cjuTq_a8R5DvI-c-cn6InA8-JofGK8Kjwrha_Z4Vwn7zKCYx41-zMYfDWTrFizf-Tv4A5SKDOg</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1365656599</pqid></control><display><type>article</type><title>On predicting the popularity of newly emerging hashtags in Twitter</title><source>EBSCOhost Business Source Complete</source><source>Access via Wiley Online Library</source><creator>Ma, Zongyang ; Sun, Aixin ; Cong, Gao</creator><creatorcontrib>Ma, Zongyang ; Sun, Aixin ; Cong, Gao</creatorcontrib><description>Because of Twitter's popularity and the viral nature of information dissemination on Twitter, predicting which Twitter topics will become popular in the near future becomes a task of considerable economic importance. Many Twitter topics are annotated by hashtags. In this article, we propose methods to predict the popularity of new hashtags on Twitter by formulating the problem as a classification task. We use five standard classification models (i.e., Naïve bayes, k‐nearest neighbors, decision trees, support vector machines, and logistic regression) for prediction. The main challenge is the identification of effective features for describing new hashtags. We extract 7 content features from a hashtag string and the collection of tweets containing the hashtag and 11 contextual features from the social graph formed by users who have adopted the hashtag. We conducted experiments on a Twitter data set consisting of 31 million tweets from 2 million Singapore‐based users. The experimental results show that the standard classifiers using the extracted features significantly outperform the baseline methods that do not use these features. Among the five classifiers, the logistic regression model performs the best in terms of the Micro‐F1 measure. We also observe that contextual features are more effective than content features.</description><identifier>ISSN: 1532-2882</identifier><identifier>ISSN: 2330-1635</identifier><identifier>EISSN: 1532-2890</identifier><identifier>EISSN: 2330-1643</identifier><identifier>DOI: 10.1002/asi.22844</identifier><language>eng</language><publisher>New York, NY: Blackwell Publishing Ltd</publisher><subject>Annotations ; Automatic classification ; Bayesian analysis ; Bibliometrics. Scientometrics ; Bibliometrics. Scientometrics. Evaluation ; Blogs ; Classification ; Classifiers ; content filtering ; Decision trees ; Exact sciences and technology ; Feature extraction ; Information and communication sciences ; Information dissemination ; Information science. Documentation ; Library and information science. General aspects ; Peer Acceptance ; Popularity ; Predictions ; Regression analysis ; Regression models ; Sciences and techniques of general use ; Social networks ; Studies ; Support vector machines ; Tagging ; Text messaging ; text mining ; Topics</subject><ispartof>Journal of the American Society for Information Science and Technology, 2013-07, Vol.64 (7), p.1399-1410</ispartof><rights>2013 ASIS&T</rights><rights>2015 INIST-CNRS</rights><rights>Copyright Wiley Periodicals Inc. Jul 2013</rights><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://onlinelibrary.wiley.com/doi/pdf/10.1002%2Fasi.22844$$EPDF$$P50$$Gwiley$$H</linktopdf><linktohtml>$$Uhttps://onlinelibrary.wiley.com/doi/full/10.1002%2Fasi.22844$$EHTML$$P50$$Gwiley$$H</linktohtml><link.rule.ids>314,780,784,1417,27924,27925,45574,45575</link.rule.ids><backlink>$$Uhttp://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&idt=27453862$$DView record in Pascal Francis$$Hfree_for_read</backlink></links><search><creatorcontrib>Ma, Zongyang</creatorcontrib><creatorcontrib>Sun, Aixin</creatorcontrib><creatorcontrib>Cong, Gao</creatorcontrib><title>On predicting the popularity of newly emerging hashtags in Twitter</title><title>Journal of the American Society for Information Science and Technology</title><addtitle>J Am Soc Inf Sci Tec</addtitle><description>Because of Twitter's popularity and the viral nature of information dissemination on Twitter, predicting which Twitter topics will become popular in the near future becomes a task of considerable economic importance. Many Twitter topics are annotated by hashtags. In this article, we propose methods to predict the popularity of new hashtags on Twitter by formulating the problem as a classification task. We use five standard classification models (i.e., Naïve bayes, k‐nearest neighbors, decision trees, support vector machines, and logistic regression) for prediction. The main challenge is the identification of effective features for describing new hashtags. We extract 7 content features from a hashtag string and the collection of tweets containing the hashtag and 11 contextual features from the social graph formed by users who have adopted the hashtag. We conducted experiments on a Twitter data set consisting of 31 million tweets from 2 million Singapore‐based users. The experimental results show that the standard classifiers using the extracted features significantly outperform the baseline methods that do not use these features. Among the five classifiers, the logistic regression model performs the best in terms of the Micro‐F1 measure. We also observe that contextual features are more effective than content features.</description><subject>Annotations</subject><subject>Automatic classification</subject><subject>Bayesian analysis</subject><subject>Bibliometrics. Scientometrics</subject><subject>Bibliometrics. Scientometrics. Evaluation</subject><subject>Blogs</subject><subject>Classification</subject><subject>Classifiers</subject><subject>content filtering</subject><subject>Decision trees</subject><subject>Exact sciences and technology</subject><subject>Feature extraction</subject><subject>Information and communication sciences</subject><subject>Information dissemination</subject><subject>Information science. Documentation</subject><subject>Library and information science. General aspects</subject><subject>Peer Acceptance</subject><subject>Popularity</subject><subject>Predictions</subject><subject>Regression analysis</subject><subject>Regression models</subject><subject>Sciences and techniques of general use</subject><subject>Social networks</subject><subject>Studies</subject><subject>Support vector machines</subject><subject>Tagging</subject><subject>Text messaging</subject><subject>text mining</subject><subject>Topics</subject><issn>1532-2882</issn><issn>2330-1635</issn><issn>1532-2890</issn><issn>2330-1643</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2013</creationdate><recordtype>article</recordtype><recordid>eNpdkEtPAjEUhRujiYgu_AeTGBM3A33OYwlEESWwEOOy6ZQWisPM2M4E599bHmFh7qL39n7npD0A3CPYQxDivnCmh3FC6QXoIEZwiJMUXp77BF-DG-c2ECLEEOyA4bwIKquWRtamWAX1WgVVWTW5sKZug1IHhdrlbaC2yq72wFq4dS1WLjBFsNiZulb2FlxpkTt1dzq74PPleTF6Dafz8WQ0mIaGYErDLGaQ0AxFlJFlRIhiGqWCskgm2t8LjFMpEwozoSLtR4IyqDVFUMOlZPGSdMHT0bey5U-jXM23xkmV56JQZeM4ojj1H_f2Hn34h27Kxhb-dRyRiO0rTT31eKKEkyLXVhTSOF5ZsxW25Tj2VkmEPdc_cjuTq_a8R5DvI-c-cn6InA8-JofGK8Kjwrha_Z4Vwn7zKCYx41-zMYfDWTrFizf-Tv4A5SKDOg</recordid><startdate>201307</startdate><enddate>201307</enddate><creator>Ma, Zongyang</creator><creator>Sun, Aixin</creator><creator>Cong, Gao</creator><general>Blackwell Publishing Ltd</general><general>Wiley</general><general>Wiley Periodicals Inc</general><scope>BSCLL</scope><scope>IQODW</scope><scope>7SC</scope><scope>8FD</scope><scope>E3H</scope><scope>F2A</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>8BP</scope></search><sort><creationdate>201307</creationdate><title>On predicting the popularity of newly emerging hashtags in Twitter</title><author>Ma, Zongyang ; Sun, Aixin ; Cong, Gao</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i3244-b75034b16453d633e5f19a456c8f34ba229cc840bae6fba231b0ff410f0dc57d3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2013</creationdate><topic>Annotations</topic><topic>Automatic classification</topic><topic>Bayesian analysis</topic><topic>Bibliometrics. Scientometrics</topic><topic>Bibliometrics. Scientometrics. Evaluation</topic><topic>Blogs</topic><topic>Classification</topic><topic>Classifiers</topic><topic>content filtering</topic><topic>Decision trees</topic><topic>Exact sciences and technology</topic><topic>Feature extraction</topic><topic>Information and communication sciences</topic><topic>Information dissemination</topic><topic>Information science. Documentation</topic><topic>Library and information science. General aspects</topic><topic>Peer Acceptance</topic><topic>Popularity</topic><topic>Predictions</topic><topic>Regression analysis</topic><topic>Regression models</topic><topic>Sciences and techniques of general use</topic><topic>Social networks</topic><topic>Studies</topic><topic>Support vector machines</topic><topic>Tagging</topic><topic>Text messaging</topic><topic>text mining</topic><topic>Topics</topic><toplevel>online_resources</toplevel><creatorcontrib>Ma, Zongyang</creatorcontrib><creatorcontrib>Sun, Aixin</creatorcontrib><creatorcontrib>Cong, Gao</creatorcontrib><collection>Istex</collection><collection>Pascal-Francis</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>Library & Information Sciences Abstracts (LISA)</collection><collection>Library & Information Science Abstracts (LISA)</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Library & Information Sciences Abstracts (LISA) - CILIP Edition</collection><jtitle>Journal of the American Society for Information Science and Technology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Ma, Zongyang</au><au>Sun, Aixin</au><au>Cong, Gao</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>On predicting the popularity of newly emerging hashtags in Twitter</atitle><jtitle>Journal of the American Society for Information Science and Technology</jtitle><addtitle>J Am Soc Inf Sci Tec</addtitle><date>2013-07</date><risdate>2013</risdate><volume>64</volume><issue>7</issue><spage>1399</spage><epage>1410</epage><pages>1399-1410</pages><issn>1532-2882</issn><issn>2330-1635</issn><eissn>1532-2890</eissn><eissn>2330-1643</eissn><abstract>Because of Twitter's popularity and the viral nature of information dissemination on Twitter, predicting which Twitter topics will become popular in the near future becomes a task of considerable economic importance. Many Twitter topics are annotated by hashtags. In this article, we propose methods to predict the popularity of new hashtags on Twitter by formulating the problem as a classification task. We use five standard classification models (i.e., Naïve bayes, k‐nearest neighbors, decision trees, support vector machines, and logistic regression) for prediction. The main challenge is the identification of effective features for describing new hashtags. We extract 7 content features from a hashtag string and the collection of tweets containing the hashtag and 11 contextual features from the social graph formed by users who have adopted the hashtag. We conducted experiments on a Twitter data set consisting of 31 million tweets from 2 million Singapore‐based users. The experimental results show that the standard classifiers using the extracted features significantly outperform the baseline methods that do not use these features. Among the five classifiers, the logistic regression model performs the best in terms of the Micro‐F1 measure. We also observe that contextual features are more effective than content features.</abstract><cop>New York, NY</cop><pub>Blackwell Publishing Ltd</pub><doi>10.1002/asi.22844</doi><tpages>12</tpages></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1532-2882 |
ispartof | Journal of the American Society for Information Science and Technology, 2013-07, Vol.64 (7), p.1399-1410 |
issn | 1532-2882 2330-1635 1532-2890 2330-1643 |
language | eng |
recordid | cdi_proquest_miscellaneous_1429844453 |
source | EBSCOhost Business Source Complete; Access via Wiley Online Library |
subjects | Annotations Automatic classification Bayesian analysis Bibliometrics. Scientometrics Bibliometrics. Scientometrics. Evaluation Blogs Classification Classifiers content filtering Decision trees Exact sciences and technology Feature extraction Information and communication sciences Information dissemination Information science. Documentation Library and information science. General aspects Peer Acceptance Popularity Predictions Regression analysis Regression models Sciences and techniques of general use Social networks Studies Support vector machines Tagging Text messaging text mining Topics |
title | On predicting the popularity of newly emerging hashtags in Twitter |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-30T17%3A03%3A45IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_pasca&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=On%20predicting%20the%20popularity%20of%20newly%20emerging%20hashtags%20in%20Twitter&rft.jtitle=Journal%20of%20the%20American%20Society%20for%20Information%20Science%20and%20Technology&rft.au=Ma,%20Zongyang&rft.date=2013-07&rft.volume=64&rft.issue=7&rft.spage=1399&rft.epage=1410&rft.pages=1399-1410&rft.issn=1532-2882&rft.eissn=1532-2890&rft_id=info:doi/10.1002/asi.22844&rft_dat=%3Cproquest_pasca%3E1429844453%3C/proquest_pasca%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1365656599&rft_id=info:pmid/&rfr_iscdi=true |