Effects on Time and Quality of Short Text Clustering during Real-Time Presentations

Technologies for live presentations should consider users' capabilities to manage large amounts of data in real-time, particularly, exchanges of short texts (e.g., phrases). This study examines the effects on time and quality of text clustering algorithms applied to short, medium, and long size...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Revista IEEE América Latina 2021-08, Vol.19 (8), p.1391-1399
Hauptverfasser: Fuentealba, Diego, Lopez, Mario, Ponce, Hector
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 1399
container_issue 8
container_start_page 1391
container_title Revista IEEE América Latina
container_volume 19
creator Fuentealba, Diego
Lopez, Mario
Ponce, Hector
description Technologies for live presentations should consider users' capabilities to manage large amounts of data in real-time, particularly, exchanges of short texts (e.g., phrases). This study examines the effects on time and quality of text clustering algorithms applied to short, medium, and long size texts, and examines whether short text clustering shows a reasonable performance for live presentations. We run several simulations in which we varied the number of phrases (from 5 to 200) contained in each text type (long, medium, and short) and the number of generated clusters (from 2 to 10). The algorithms used were snowball steamers, TF-IDF, and K-means for clustering; and the text types were Reuters, 20 NewsGroup and an experimental data set, for the long, medium, and short size texts, respectively. The first result showed that text size had a large effect on the algorithms execution time, with the shortest average time for the short texts and longer average time for the longest texts. The second result showed that the number of phrases in each text type significantly predicts execution time but not the number of clusters generated by K-means. Inertia and purity measures were used to test the quality of the clusters generated. Text size, number of phrases and number of clusters predict inertia; showing the lowest inertia for the short texts. Purity measures were like previously reported results for all text types. Thus, clustering algorithms for short texts can confidently be used in real-time presentations.
doi_str_mv 10.1109/TLA.2021.9475870
format Article
fullrecord <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_proquest_journals_2548990785</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9475870</ieee_id><sourcerecordid>2548990785</sourcerecordid><originalsourceid>FETCH-LOGICAL-c221t-ed539a3c8e89306280abc53743e39ea4e0bfaf083270ebb4b912e1238a5c0ade3</originalsourceid><addsrcrecordid>eNpNkN1LwzAUxYMoOKfvgi8Bn1tvknZNHseYHzDwY_U5pO2NdnTtTFJw_73dh-LTuXDPuZfzI-SaQcwYqLt8MY05cBarJEtlBidkxNJERqAUP_03n5ML71cAQk6kGJHl3Fosg6ddS_N6jdS0FX3tTVOHLe0sXX52LtAcvwOdNb0P6Or2g1b9Xt7QNNE-9eLQYxtMqLvWX5IzaxqPV0cdk_f7eT57jBbPD0-z6SIqOWchwioVyohSolQCJlyCKcpUZIlAodAkCIU1FqTgGWBRJIViHBkX0qQlmArFmNwe7m5c99WjD3rV9a4dXmo-1FUKMpkOLji4Std579DqjavXxm01A71Dpwd0eodOH9ENkZtDpEbEP_vv9gd7-mn6</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2548990785</pqid></control><display><type>article</type><title>Effects on Time and Quality of Short Text Clustering during Real-Time Presentations</title><source>IEEE Xplore</source><creator>Fuentealba, Diego ; Lopez, Mario ; Ponce, Hector</creator><creatorcontrib>Fuentealba, Diego ; Lopez, Mario ; Ponce, Hector</creatorcontrib><description>Technologies for live presentations should consider users' capabilities to manage large amounts of data in real-time, particularly, exchanges of short texts (e.g., phrases). This study examines the effects on time and quality of text clustering algorithms applied to short, medium, and long size texts, and examines whether short text clustering shows a reasonable performance for live presentations. We run several simulations in which we varied the number of phrases (from 5 to 200) contained in each text type (long, medium, and short) and the number of generated clusters (from 2 to 10). The algorithms used were snowball steamers, TF-IDF, and K-means for clustering; and the text types were Reuters, 20 NewsGroup and an experimental data set, for the long, medium, and short size texts, respectively. The first result showed that text size had a large effect on the algorithms execution time, with the shortest average time for the short texts and longer average time for the longest texts. The second result showed that the number of phrases in each text type significantly predicts execution time but not the number of clusters generated by K-means. Inertia and purity measures were used to test the quality of the clusters generated. Text size, number of phrases and number of clusters predict inertia; showing the lowest inertia for the short texts. Purity measures were like previously reported results for all text types. Thus, clustering algorithms for short texts can confidently be used in real-time presentations.</description><identifier>ISSN: 1548-0992</identifier><identifier>EISSN: 1548-0992</identifier><identifier>DOI: 10.1109/TLA.2021.9475870</identifier><language>eng</language><publisher>Los Alamitos: IEEE</publisher><subject>Algorithms ; Blogs ; Clustering ; Clustering algorithms ; IEEE transactions ; Inertia ; Interactivity ; K-Means ; Purity ; Real time ; Real-time systems ; Sentences ; Short Phrases ; Short Text ; Silicon compounds ; Social networking (online) ; Text Mining ; Texts ; TF-IDF ; Visualization</subject><ispartof>Revista IEEE América Latina, 2021-08, Vol.19 (8), p.1391-1399</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2021</rights><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c221t-ed539a3c8e89306280abc53743e39ea4e0bfaf083270ebb4b912e1238a5c0ade3</citedby><orcidid>0000-0001-5284-0448 ; 0000-0003-0909-7702 ; 0000-0002-7984-3945</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9475870$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,792,27901,27902,54733</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/9475870$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Fuentealba, Diego</creatorcontrib><creatorcontrib>Lopez, Mario</creatorcontrib><creatorcontrib>Ponce, Hector</creatorcontrib><title>Effects on Time and Quality of Short Text Clustering during Real-Time Presentations</title><title>Revista IEEE América Latina</title><addtitle>T-LA</addtitle><description>Technologies for live presentations should consider users' capabilities to manage large amounts of data in real-time, particularly, exchanges of short texts (e.g., phrases). This study examines the effects on time and quality of text clustering algorithms applied to short, medium, and long size texts, and examines whether short text clustering shows a reasonable performance for live presentations. We run several simulations in which we varied the number of phrases (from 5 to 200) contained in each text type (long, medium, and short) and the number of generated clusters (from 2 to 10). The algorithms used were snowball steamers, TF-IDF, and K-means for clustering; and the text types were Reuters, 20 NewsGroup and an experimental data set, for the long, medium, and short size texts, respectively. The first result showed that text size had a large effect on the algorithms execution time, with the shortest average time for the short texts and longer average time for the longest texts. The second result showed that the number of phrases in each text type significantly predicts execution time but not the number of clusters generated by K-means. Inertia and purity measures were used to test the quality of the clusters generated. Text size, number of phrases and number of clusters predict inertia; showing the lowest inertia for the short texts. Purity measures were like previously reported results for all text types. Thus, clustering algorithms for short texts can confidently be used in real-time presentations.</description><subject>Algorithms</subject><subject>Blogs</subject><subject>Clustering</subject><subject>Clustering algorithms</subject><subject>IEEE transactions</subject><subject>Inertia</subject><subject>Interactivity</subject><subject>K-Means</subject><subject>Purity</subject><subject>Real time</subject><subject>Real-time systems</subject><subject>Sentences</subject><subject>Short Phrases</subject><subject>Short Text</subject><subject>Silicon compounds</subject><subject>Social networking (online)</subject><subject>Text Mining</subject><subject>Texts</subject><subject>TF-IDF</subject><subject>Visualization</subject><issn>1548-0992</issn><issn>1548-0992</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpNkN1LwzAUxYMoOKfvgi8Bn1tvknZNHseYHzDwY_U5pO2NdnTtTFJw_73dh-LTuXDPuZfzI-SaQcwYqLt8MY05cBarJEtlBidkxNJERqAUP_03n5ML71cAQk6kGJHl3Fosg6ddS_N6jdS0FX3tTVOHLe0sXX52LtAcvwOdNb0P6Or2g1b9Xt7QNNE-9eLQYxtMqLvWX5IzaxqPV0cdk_f7eT57jBbPD0-z6SIqOWchwioVyohSolQCJlyCKcpUZIlAodAkCIU1FqTgGWBRJIViHBkX0qQlmArFmNwe7m5c99WjD3rV9a4dXmo-1FUKMpkOLji4Std579DqjavXxm01A71Dpwd0eodOH9ENkZtDpEbEP_vv9gd7-mn6</recordid><startdate>20210801</startdate><enddate>20210801</enddate><creator>Fuentealba, Diego</creator><creator>Lopez, Mario</creator><creator>Ponce, Hector</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0001-5284-0448</orcidid><orcidid>https://orcid.org/0000-0003-0909-7702</orcidid><orcidid>https://orcid.org/0000-0002-7984-3945</orcidid></search><sort><creationdate>20210801</creationdate><title>Effects on Time and Quality of Short Text Clustering during Real-Time Presentations</title><author>Fuentealba, Diego ; Lopez, Mario ; Ponce, Hector</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c221t-ed539a3c8e89306280abc53743e39ea4e0bfaf083270ebb4b912e1238a5c0ade3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Algorithms</topic><topic>Blogs</topic><topic>Clustering</topic><topic>Clustering algorithms</topic><topic>IEEE transactions</topic><topic>Inertia</topic><topic>Interactivity</topic><topic>K-Means</topic><topic>Purity</topic><topic>Real time</topic><topic>Real-time systems</topic><topic>Sentences</topic><topic>Short Phrases</topic><topic>Short Text</topic><topic>Silicon compounds</topic><topic>Social networking (online)</topic><topic>Text Mining</topic><topic>Texts</topic><topic>TF-IDF</topic><topic>Visualization</topic><toplevel>online_resources</toplevel><creatorcontrib>Fuentealba, Diego</creatorcontrib><creatorcontrib>Lopez, Mario</creatorcontrib><creatorcontrib>Ponce, Hector</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998–Present</collection><collection>IEEE Xplore</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Revista IEEE América Latina</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Fuentealba, Diego</au><au>Lopez, Mario</au><au>Ponce, Hector</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Effects on Time and Quality of Short Text Clustering during Real-Time Presentations</atitle><jtitle>Revista IEEE América Latina</jtitle><stitle>T-LA</stitle><date>2021-08-01</date><risdate>2021</risdate><volume>19</volume><issue>8</issue><spage>1391</spage><epage>1399</epage><pages>1391-1399</pages><issn>1548-0992</issn><eissn>1548-0992</eissn><abstract>Technologies for live presentations should consider users' capabilities to manage large amounts of data in real-time, particularly, exchanges of short texts (e.g., phrases). This study examines the effects on time and quality of text clustering algorithms applied to short, medium, and long size texts, and examines whether short text clustering shows a reasonable performance for live presentations. We run several simulations in which we varied the number of phrases (from 5 to 200) contained in each text type (long, medium, and short) and the number of generated clusters (from 2 to 10). The algorithms used were snowball steamers, TF-IDF, and K-means for clustering; and the text types were Reuters, 20 NewsGroup and an experimental data set, for the long, medium, and short size texts, respectively. The first result showed that text size had a large effect on the algorithms execution time, with the shortest average time for the short texts and longer average time for the longest texts. The second result showed that the number of phrases in each text type significantly predicts execution time but not the number of clusters generated by K-means. Inertia and purity measures were used to test the quality of the clusters generated. Text size, number of phrases and number of clusters predict inertia; showing the lowest inertia for the short texts. Purity measures were like previously reported results for all text types. Thus, clustering algorithms for short texts can confidently be used in real-time presentations.</abstract><cop>Los Alamitos</cop><pub>IEEE</pub><doi>10.1109/TLA.2021.9475870</doi><tpages>9</tpages><orcidid>https://orcid.org/0000-0001-5284-0448</orcidid><orcidid>https://orcid.org/0000-0003-0909-7702</orcidid><orcidid>https://orcid.org/0000-0002-7984-3945</orcidid></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 1548-0992
ispartof Revista IEEE América Latina, 2021-08, Vol.19 (8), p.1391-1399
issn 1548-0992
1548-0992
language eng
recordid cdi_proquest_journals_2548990785
source IEEE Xplore
subjects Algorithms
Blogs
Clustering
Clustering algorithms
IEEE transactions
Inertia
Interactivity
K-Means
Purity
Real time
Real-time systems
Sentences
Short Phrases
Short Text
Silicon compounds
Social networking (online)
Text Mining
Texts
TF-IDF
Visualization
title Effects on Time and Quality of Short Text Clustering during Real-Time Presentations
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-29T10%3A24%3A16IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Effects%20on%20Time%20and%20Quality%20of%20Short%20Text%20Clustering%20during%20Real-Time%20Presentations&rft.jtitle=Revista%20IEEE%20Am%C3%A9rica%20Latina&rft.au=Fuentealba,%20Diego&rft.date=2021-08-01&rft.volume=19&rft.issue=8&rft.spage=1391&rft.epage=1399&rft.pages=1391-1399&rft.issn=1548-0992&rft.eissn=1548-0992&rft_id=info:doi/10.1109/TLA.2021.9475870&rft_dat=%3Cproquest_RIE%3E2548990785%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2548990785&rft_id=info:pmid/&rft_ieee_id=9475870&rfr_iscdi=true