A Comparison of Stylometric and Lexical Features for Web Genre Classification and Emotion Classification in Blogs

In the blogosphere, the amount of digital content is expanding and for search engines, new challenges have been imposed. Due to the changing information need, automatic methods are needed to support blog search users to filter information by different facets. In our work, we aim to support blog sear...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Lex, Elisabeth, Juffinger, Andreas, Granitzer, Michael
Format: Tagungsbericht
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 14
container_issue
container_start_page 10
container_title
container_volume
creator Lex, Elisabeth
Juffinger, Andreas
Granitzer, Michael
description In the blogosphere, the amount of digital content is expanding and for search engines, new challenges have been imposed. Due to the changing information need, automatic methods are needed to support blog search users to filter information by different facets. In our work, we aim to support blog search with genre and facet information. Since we focus on the news genre, our approach is to classify blogs into news versus rest. Also, we assess the emotionality facet in news related blogs to enable users to identify people's feelings towards specific events. Our approach is to evaluate the performance of text classifiers with lexical and stylometric features to determine the best performing combination for our tasks. Our experiments on a subset of the TREC Blogs08 dataset reveal that classifiers trained on lexical features perform consistently better than classifiers trained on the best stylometric features.
doi_str_mv 10.1109/DEXA.2010.24
format Conference Proceeding
fullrecord <record><control><sourceid>ieee_6IE</sourceid><recordid>TN_cdi_ieee_primary_5591976</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>5591976</ieee_id><sourcerecordid>5591976</sourcerecordid><originalsourceid>FETCH-LOGICAL-i90t-425648085c1394c39366f7af05b5a152a5cc5df2d48e15e07383ebb84f49ab5f3</originalsourceid><addsrcrecordid>eNpdT8tOwzAQtHhIlNIbNy7-gRS_NrGPJbQFKRIHKsGtctI1MkriYgeJ_n3D48RedkY7M5ol5JqzOefM3N4vXxdzwUYq1AmZCFnoTBoOp-SSK6GUZsrIMzLhIEymuNYXZJbSOxtHAde5nJCPBS1Dt7fRp9DT4OjzcGhDh0P0DbX9jlb45Rvb0hXa4TNioi5E-oI1XWMfkZatTcm7UTL4MeDbsezCD_538j29a8NbuiLnzrYJZ397Sjar5aZ8yKqn9WO5qDJv2JApAfnYX0PDpVGNNDLPXWEdgxrs-I-FpoGdEzulkQOyQmqJda2VU8bW4OSU3PzGekTc7qPvbDxsAQw3RS6POGZbTQ</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>A Comparison of Stylometric and Lexical Features for Web Genre Classification and Emotion Classification in Blogs</title><source>IEEE Electronic Library (IEL) Conference Proceedings</source><creator>Lex, Elisabeth ; Juffinger, Andreas ; Granitzer, Michael</creator><creatorcontrib>Lex, Elisabeth ; Juffinger, Andreas ; Granitzer, Michael</creatorcontrib><description>In the blogosphere, the amount of digital content is expanding and for search engines, new challenges have been imposed. Due to the changing information need, automatic methods are needed to support blog search users to filter information by different facets. In our work, we aim to support blog search with genre and facet information. Since we focus on the news genre, our approach is to classify blogs into news versus rest. Also, we assess the emotionality facet in news related blogs to enable users to identify people's feelings towards specific events. Our approach is to evaluate the performance of text classifiers with lexical and stylometric features to determine the best performing combination for our tasks. Our experiments on a subset of the TREC Blogs08 dataset reveal that classifiers trained on lexical features perform consistently better than classifiers trained on the best stylometric features.</description><identifier>ISSN: 1529-4188</identifier><identifier>ISBN: 1424480493</identifier><identifier>ISBN: 9781424480494</identifier><identifier>EISSN: 2378-3915</identifier><identifier>DOI: 10.1109/DEXA.2010.24</identifier><language>eng</language><publisher>IEEE</publisher><subject>Accuracy ; Blogs ; Classification algorithms ; Data Mining ; Document Classification ; Feature extraction ; Features ; Mutual information ; Support vector machines ; Training</subject><ispartof>2010 Workshops on Database and Expert Systems Applications, 2010, p.10-14</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/5591976$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,776,780,785,786,2052,27902,54895</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/5591976$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Lex, Elisabeth</creatorcontrib><creatorcontrib>Juffinger, Andreas</creatorcontrib><creatorcontrib>Granitzer, Michael</creatorcontrib><title>A Comparison of Stylometric and Lexical Features for Web Genre Classification and Emotion Classification in Blogs</title><title>2010 Workshops on Database and Expert Systems Applications</title><addtitle>DEXA</addtitle><description>In the blogosphere, the amount of digital content is expanding and for search engines, new challenges have been imposed. Due to the changing information need, automatic methods are needed to support blog search users to filter information by different facets. In our work, we aim to support blog search with genre and facet information. Since we focus on the news genre, our approach is to classify blogs into news versus rest. Also, we assess the emotionality facet in news related blogs to enable users to identify people's feelings towards specific events. Our approach is to evaluate the performance of text classifiers with lexical and stylometric features to determine the best performing combination for our tasks. Our experiments on a subset of the TREC Blogs08 dataset reveal that classifiers trained on lexical features perform consistently better than classifiers trained on the best stylometric features.</description><subject>Accuracy</subject><subject>Blogs</subject><subject>Classification algorithms</subject><subject>Data Mining</subject><subject>Document Classification</subject><subject>Feature extraction</subject><subject>Features</subject><subject>Mutual information</subject><subject>Support vector machines</subject><subject>Training</subject><issn>1529-4188</issn><issn>2378-3915</issn><isbn>1424480493</isbn><isbn>9781424480494</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2010</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><sourceid>RIE</sourceid><recordid>eNpdT8tOwzAQtHhIlNIbNy7-gRS_NrGPJbQFKRIHKsGtctI1MkriYgeJ_n3D48RedkY7M5ol5JqzOefM3N4vXxdzwUYq1AmZCFnoTBoOp-SSK6GUZsrIMzLhIEymuNYXZJbSOxtHAde5nJCPBS1Dt7fRp9DT4OjzcGhDh0P0DbX9jlb45Rvb0hXa4TNioi5E-oI1XWMfkZatTcm7UTL4MeDbsezCD_538j29a8NbuiLnzrYJZ397Sjar5aZ8yKqn9WO5qDJv2JApAfnYX0PDpVGNNDLPXWEdgxrs-I-FpoGdEzulkQOyQmqJda2VU8bW4OSU3PzGekTc7qPvbDxsAQw3RS6POGZbTQ</recordid><startdate>201008</startdate><enddate>201008</enddate><creator>Lex, Elisabeth</creator><creator>Juffinger, Andreas</creator><creator>Granitzer, Michael</creator><general>IEEE</general><scope>6IE</scope><scope>6IL</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIL</scope></search><sort><creationdate>201008</creationdate><title>A Comparison of Stylometric and Lexical Features for Web Genre Classification and Emotion Classification in Blogs</title><author>Lex, Elisabeth ; Juffinger, Andreas ; Granitzer, Michael</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i90t-425648085c1394c39366f7af05b5a152a5cc5df2d48e15e07383ebb84f49ab5f3</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2010</creationdate><topic>Accuracy</topic><topic>Blogs</topic><topic>Classification algorithms</topic><topic>Data Mining</topic><topic>Document Classification</topic><topic>Feature extraction</topic><topic>Features</topic><topic>Mutual information</topic><topic>Support vector machines</topic><topic>Training</topic><toplevel>online_resources</toplevel><creatorcontrib>Lex, Elisabeth</creatorcontrib><creatorcontrib>Juffinger, Andreas</creatorcontrib><creatorcontrib>Granitzer, Michael</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Electronic Library (IEL)</collection><collection>IEEE Proceedings Order Plans (POP All) 1998-Present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Lex, Elisabeth</au><au>Juffinger, Andreas</au><au>Granitzer, Michael</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>A Comparison of Stylometric and Lexical Features for Web Genre Classification and Emotion Classification in Blogs</atitle><btitle>2010 Workshops on Database and Expert Systems Applications</btitle><stitle>DEXA</stitle><date>2010-08</date><risdate>2010</risdate><spage>10</spage><epage>14</epage><pages>10-14</pages><issn>1529-4188</issn><eissn>2378-3915</eissn><isbn>1424480493</isbn><isbn>9781424480494</isbn><abstract>In the blogosphere, the amount of digital content is expanding and for search engines, new challenges have been imposed. Due to the changing information need, automatic methods are needed to support blog search users to filter information by different facets. In our work, we aim to support blog search with genre and facet information. Since we focus on the news genre, our approach is to classify blogs into news versus rest. Also, we assess the emotionality facet in news related blogs to enable users to identify people's feelings towards specific events. Our approach is to evaluate the performance of text classifiers with lexical and stylometric features to determine the best performing combination for our tasks. Our experiments on a subset of the TREC Blogs08 dataset reveal that classifiers trained on lexical features perform consistently better than classifiers trained on the best stylometric features.</abstract><pub>IEEE</pub><doi>10.1109/DEXA.2010.24</doi><tpages>5</tpages></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 1529-4188
ispartof 2010 Workshops on Database and Expert Systems Applications, 2010, p.10-14
issn 1529-4188
2378-3915
language eng
recordid cdi_ieee_primary_5591976
source IEEE Electronic Library (IEL) Conference Proceedings
subjects Accuracy
Blogs
Classification algorithms
Data Mining
Document Classification
Feature extraction
Features
Mutual information
Support vector machines
Training
title A Comparison of Stylometric and Lexical Features for Web Genre Classification and Emotion Classification in Blogs
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-05T03%3A14%3A07IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_6IE&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=A%20Comparison%20of%20Stylometric%20and%20Lexical%20Features%20for%20Web%20Genre%20Classification%20and%20Emotion%20Classification%20in%20Blogs&rft.btitle=2010%20Workshops%20on%20Database%20and%20Expert%20Systems%20Applications&rft.au=Lex,%20Elisabeth&rft.date=2010-08&rft.spage=10&rft.epage=14&rft.pages=10-14&rft.issn=1529-4188&rft.eissn=2378-3915&rft.isbn=1424480493&rft.isbn_list=9781424480494&rft_id=info:doi/10.1109/DEXA.2010.24&rft_dat=%3Cieee_6IE%3E5591976%3C/ieee_6IE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=5591976&rfr_iscdi=true