Adding Twitter-specific features to stylistic features for classifying tweets by user type and number of retweets

Recently, Twitter has received much attention, both from the general public and researchers, as a new method of transmitting information. Among others, the number of retweets (RTs) and user types are the two important items of analysis for understanding the transmission of information on Twitter. To...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Journal of the Association for Information Science and Technology 2014-07, Vol.65 (7), p.1416-1423
Hauptverfasser:	Arakawa, Yui, Kameda, Akihiro, Aizawa, Akiko, Suzuki, Takafumi
Format:	Artikel
Sprache:	eng
Schlagworte:	Automatic classification Bibliometrics. Scientometrics Bibliometrics. Scientometrics. Evaluation Blogs Classification Data mining Exact sciences and technology Feature extraction Forests Grammars Information and communication sciences Information communication Information science. Documentation Japan knowledge discovery Library and information science. General aspects Machine learning Sciences and techniques of general use Social networks Speech Texts Transmission web mining
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	1423
container_issue	7
container_start_page	1416
container_title	Journal of the Association for Information Science and Technology
container_volume	65
creator	Arakawa, Yui Kameda, Akihiro Aizawa, Akiko Suzuki, Takafumi
description	Recently, Twitter has received much attention, both from the general public and researchers, as a new method of transmitting information. Among others, the number of retweets (RTs) and user types are the two important items of analysis for understanding the transmission of information on Twitter. To analyze this point, we applied text classification and feature extraction experiments using random forests machine learning with conventional stylistic and Twitter‐specific features. We first collected tweets from 40 accounts with a high number of followers and created tweet texts from 28,756 tweets. We then conducted 15 types of classification experiments using a variety of combinations of features such as function words, speech terms, Twitter's descriptive grammar, and information roles. We deliberately observed the effects of features for classification performance. The results indicated that class classification per user indicated the best performance. Furthermore, we observed that certain features had a greater impact on classification. In the case of the experiments that assessed the level of RT quantity, information roles had an impact. In the case of user experiments, important features, such as the honorific postpositional particle and auxiliary verbs, such as “desu” and “masu,” had an impact. This research clarifies the features that are useful for categorizing tweets according to the number of RTs and user types.
doi_str_mv	10.1002/asi.23126
format	Article
fullrecord	<record><control><sourceid>proquest_pasca</sourceid><recordid>TN_cdi_proquest_miscellaneous_1559704361</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1558999000</sourcerecordid><originalsourceid>FETCH-LOGICAL-i3956-83bc2c805dea65a6d292e66b4489e4680c412527eb454a4126028df75d1822f03</originalsourceid><addsrcrecordid>eNqNkT9P5DAQxaMTJx0CivsGbpCuCfh_nHKF7gBpBcUuorQcZ4wM2WTxOFry7QksWl1JNW9mfm-KeUXxm9ELRim_dBgvuGBc_yiOuRC0ZFqKo4MW6ldxhvhMKWW0Noqz4-J10baxfyLrXcwZUolb8DFETwK4PCZAkgeCeeoi5v-nYUjEdw4xhunDn3cAGUkzkREhkTxtgbi-Jf24aeZ-CCTBnjktfgbXIZx91ZPi4d_f9dVNuby_vr1aLMsoaqVLIxrPvaGqBaeV0y2vOWjdSGlqkNpQLxlXvIJGKulmrSk3bahUywzngYqT4s_-7jYNryNgtpuIHrrO9TCMaJlSdUWl0Ow7qKnrev7bjJ5_oQ6960JyvY9otyluXJosN6rikpuZu9xzu9jBdNgzaj-SsnNS9jMpu1jdforZUe4d86fh7eBw6cXqSlTKPt5dW7lartjjmlkq3gGIZJbX</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1558999000</pqid></control><display><type>article</type><title>Adding Twitter-specific features to stylistic features for classifying tweets by user type and number of retweets</title><source>EBSCOhost Business Source Complete</source><source>Access via Wiley Online Library</source><creator>Arakawa, Yui ; Kameda, Akihiro ; Aizawa, Akiko ; Suzuki, Takafumi</creator><creatorcontrib>Arakawa, Yui ; Kameda, Akihiro ; Aizawa, Akiko ; Suzuki, Takafumi</creatorcontrib><description>Recently, Twitter has received much attention, both from the general public and researchers, as a new method of transmitting information. Among others, the number of retweets (RTs) and user types are the two important items of analysis for understanding the transmission of information on Twitter. To analyze this point, we applied text classification and feature extraction experiments using random forests machine learning with conventional stylistic and Twitter‐specific features. We first collected tweets from 40 accounts with a high number of followers and created tweet texts from 28,756 tweets. We then conducted 15 types of classification experiments using a variety of combinations of features such as function words, speech terms, Twitter's descriptive grammar, and information roles. We deliberately observed the effects of features for classification performance. The results indicated that class classification per user indicated the best performance. Furthermore, we observed that certain features had a greater impact on classification. In the case of the experiments that assessed the level of RT quantity, information roles had an impact. In the case of user experiments, important features, such as the honorific postpositional particle and auxiliary verbs, such as “desu” and “masu,” had an impact. This research clarifies the features that are useful for categorizing tweets according to the number of RTs and user types.</description><identifier>ISSN: 2330-1635</identifier><identifier>EISSN: 2330-1643</identifier><identifier>DOI: 10.1002/asi.23126</identifier><language>eng</language><publisher>Malden, MA: Blackwell Publishing Ltd</publisher><subject>Automatic classification ; Bibliometrics. Scientometrics ; Bibliometrics. Scientometrics. Evaluation ; Blogs ; Classification ; Data mining ; Exact sciences and technology ; Feature extraction ; Forests ; Grammars ; Information and communication sciences ; Information communication ; Information science. Documentation ; Japan ; knowledge discovery ; Library and information science. General aspects ; Machine learning ; Sciences and techniques of general use ; Social networks ; Speech ; Texts ; Transmission ; web mining</subject><ispartof>Journal of the Association for Information Science and Technology, 2014-07, Vol.65 (7), p.1416-1423</ispartof><rights>2014 ASIS&T</rights><rights>2015 INIST-CNRS</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://onlinelibrary.wiley.com/doi/pdf/10.1002%2Fasi.23126$$EPDF$$P50$$Gwiley$$H</linktopdf><linktohtml>$$Uhttps://onlinelibrary.wiley.com/doi/full/10.1002%2Fasi.23126$$EHTML$$P50$$Gwiley$$H</linktohtml><link.rule.ids>314,780,784,1417,27924,27925,45574,45575</link.rule.ids><backlink>$$Uhttp://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&idt=28572428$$DView record in Pascal Francis$$Hfree_for_read</backlink></links><search><creatorcontrib>Arakawa, Yui</creatorcontrib><creatorcontrib>Kameda, Akihiro</creatorcontrib><creatorcontrib>Aizawa, Akiko</creatorcontrib><creatorcontrib>Suzuki, Takafumi</creatorcontrib><title>Adding Twitter-specific features to stylistic features for classifying tweets by user type and number of retweets</title><title>Journal of the Association for Information Science and Technology</title><addtitle>J Assn Inf Sci Tec</addtitle><description>Recently, Twitter has received much attention, both from the general public and researchers, as a new method of transmitting information. Among others, the number of retweets (RTs) and user types are the two important items of analysis for understanding the transmission of information on Twitter. To analyze this point, we applied text classification and feature extraction experiments using random forests machine learning with conventional stylistic and Twitter‐specific features. We first collected tweets from 40 accounts with a high number of followers and created tweet texts from 28,756 tweets. We then conducted 15 types of classification experiments using a variety of combinations of features such as function words, speech terms, Twitter's descriptive grammar, and information roles. We deliberately observed the effects of features for classification performance. The results indicated that class classification per user indicated the best performance. Furthermore, we observed that certain features had a greater impact on classification. In the case of the experiments that assessed the level of RT quantity, information roles had an impact. In the case of user experiments, important features, such as the honorific postpositional particle and auxiliary verbs, such as “desu” and “masu,” had an impact. This research clarifies the features that are useful for categorizing tweets according to the number of RTs and user types.</description><subject>Automatic classification</subject><subject>Bibliometrics. Scientometrics</subject><subject>Bibliometrics. Scientometrics. Evaluation</subject><subject>Blogs</subject><subject>Classification</subject><subject>Data mining</subject><subject>Exact sciences and technology</subject><subject>Feature extraction</subject><subject>Forests</subject><subject>Grammars</subject><subject>Information and communication sciences</subject><subject>Information communication</subject><subject>Information science. Documentation</subject><subject>Japan</subject><subject>knowledge discovery</subject><subject>Library and information science. General aspects</subject><subject>Machine learning</subject><subject>Sciences and techniques of general use</subject><subject>Social networks</subject><subject>Speech</subject><subject>Texts</subject><subject>Transmission</subject><subject>web mining</subject><issn>2330-1635</issn><issn>2330-1643</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2014</creationdate><recordtype>article</recordtype><recordid>eNqNkT9P5DAQxaMTJx0CivsGbpCuCfh_nHKF7gBpBcUuorQcZ4wM2WTxOFry7QksWl1JNW9mfm-KeUXxm9ELRim_dBgvuGBc_yiOuRC0ZFqKo4MW6ldxhvhMKWW0Noqz4-J10baxfyLrXcwZUolb8DFETwK4PCZAkgeCeeoi5v-nYUjEdw4xhunDn3cAGUkzkREhkTxtgbi-Jf24aeZ-CCTBnjktfgbXIZx91ZPi4d_f9dVNuby_vr1aLMsoaqVLIxrPvaGqBaeV0y2vOWjdSGlqkNpQLxlXvIJGKulmrSk3bahUywzngYqT4s_-7jYNryNgtpuIHrrO9TCMaJlSdUWl0Ow7qKnrev7bjJ5_oQ6960JyvY9otyluXJosN6rikpuZu9xzu9jBdNgzaj-SsnNS9jMpu1jdforZUe4d86fh7eBw6cXqSlTKPt5dW7lartjjmlkq3gGIZJbX</recordid><startdate>201407</startdate><enddate>201407</enddate><creator>Arakawa, Yui</creator><creator>Kameda, Akihiro</creator><creator>Aizawa, Akiko</creator><creator>Suzuki, Takafumi</creator><general>Blackwell Publishing Ltd</general><general>Wiley</general><scope>BSCLL</scope><scope>IQODW</scope><scope>8BP</scope><scope>E3H</scope><scope>F2A</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>201407</creationdate><title>Adding Twitter-specific features to stylistic features for classifying tweets by user type and number of retweets</title><author>Arakawa, Yui ; Kameda, Akihiro ; Aizawa, Akiko ; Suzuki, Takafumi</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i3956-83bc2c805dea65a6d292e66b4489e4680c412527eb454a4126028df75d1822f03</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2014</creationdate><topic>Automatic classification</topic><topic>Bibliometrics. Scientometrics</topic><topic>Bibliometrics. Scientometrics. Evaluation</topic><topic>Blogs</topic><topic>Classification</topic><topic>Data mining</topic><topic>Exact sciences and technology</topic><topic>Feature extraction</topic><topic>Forests</topic><topic>Grammars</topic><topic>Information and communication sciences</topic><topic>Information communication</topic><topic>Information science. Documentation</topic><topic>Japan</topic><topic>knowledge discovery</topic><topic>Library and information science. General aspects</topic><topic>Machine learning</topic><topic>Sciences and techniques of general use</topic><topic>Social networks</topic><topic>Speech</topic><topic>Texts</topic><topic>Transmission</topic><topic>web mining</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Arakawa, Yui</creatorcontrib><creatorcontrib>Kameda, Akihiro</creatorcontrib><creatorcontrib>Aizawa, Akiko</creatorcontrib><creatorcontrib>Suzuki, Takafumi</creatorcontrib><collection>Istex</collection><collection>Pascal-Francis</collection><collection>Library & Information Sciences Abstracts (LISA) - CILIP Edition</collection><collection>Library & Information Sciences Abstracts (LISA)</collection><collection>Library & Information Science Abstracts (LISA)</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Journal of the Association for Information Science and Technology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Arakawa, Yui</au><au>Kameda, Akihiro</au><au>Aizawa, Akiko</au><au>Suzuki, Takafumi</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Adding Twitter-specific features to stylistic features for classifying tweets by user type and number of retweets</atitle><jtitle>Journal of the Association for Information Science and Technology</jtitle><addtitle>J Assn Inf Sci Tec</addtitle><date>2014-07</date><risdate>2014</risdate><volume>65</volume><issue>7</issue><spage>1416</spage><epage>1423</epage><pages>1416-1423</pages><issn>2330-1635</issn><eissn>2330-1643</eissn><abstract>Recently, Twitter has received much attention, both from the general public and researchers, as a new method of transmitting information. Among others, the number of retweets (RTs) and user types are the two important items of analysis for understanding the transmission of information on Twitter. To analyze this point, we applied text classification and feature extraction experiments using random forests machine learning with conventional stylistic and Twitter‐specific features. We first collected tweets from 40 accounts with a high number of followers and created tweet texts from 28,756 tweets. We then conducted 15 types of classification experiments using a variety of combinations of features such as function words, speech terms, Twitter's descriptive grammar, and information roles. We deliberately observed the effects of features for classification performance. The results indicated that class classification per user indicated the best performance. Furthermore, we observed that certain features had a greater impact on classification. In the case of the experiments that assessed the level of RT quantity, information roles had an impact. In the case of user experiments, important features, such as the honorific postpositional particle and auxiliary verbs, such as “desu” and “masu,” had an impact. This research clarifies the features that are useful for categorizing tweets according to the number of RTs and user types.</abstract><cop>Malden, MA</cop><pub>Blackwell Publishing Ltd</pub><doi>10.1002/asi.23126</doi><tpages>8</tpages></addata></record>
fulltext	fulltext
identifier	ISSN: 2330-1635
ispartof	Journal of the Association for Information Science and Technology, 2014-07, Vol.65 (7), p.1416-1423
issn	2330-1635 2330-1643
language	eng
recordid	cdi_proquest_miscellaneous_1559704361
source	EBSCOhost Business Source Complete; Access via Wiley Online Library
subjects	Automatic classification Bibliometrics. Scientometrics Bibliometrics. Scientometrics. Evaluation Blogs Classification Data mining Exact sciences and technology Feature extraction Forests Grammars Information and communication sciences Information communication Information science. Documentation Japan knowledge discovery Library and information science. General aspects Machine learning Sciences and techniques of general use Social networks Speech Texts Transmission web mining
title	Adding Twitter-specific features to stylistic features for classifying tweets by user type and number of retweets
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-20T10%3A17%3A36IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_pasca&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Adding%20Twitter-specific%20features%20to%20stylistic%20features%20for%20classifying%20tweets%20by%20user%20type%20and%20number%20of%20retweets&rft.jtitle=Journal%20of%20the%20Association%20for%20Information%20Science%20and%20Technology&rft.au=Arakawa,%20Yui&rft.date=2014-07&rft.volume=65&rft.issue=7&rft.spage=1416&rft.epage=1423&rft.pages=1416-1423&rft.issn=2330-1635&rft.eissn=2330-1643&rft_id=info:doi/10.1002/asi.23126&rft_dat=%3Cproquest_pasca%3E1558999000%3C/proquest_pasca%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1558999000&rft_id=info:pmid/&rfr_iscdi=true