Text Classification Using Novel Term Weighting Scheme-Based Improved TF-IDF for Internet Media Reports
With the rapid development of the internet technology, a large amount of internet text data can be obtained. The text classification (TC) technology plays a very important role in processing massive text data, but the accuracy of classification is directly affected by the performance of term weighti...
Gespeichert in:
Veröffentlicht in: | Mathematical problems in engineering 2021-03, Vol.2021, p.1-30 |
---|---|
Hauptverfasser: | , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 30 |
---|---|
container_issue | |
container_start_page | 1 |
container_title | Mathematical problems in engineering |
container_volume | 2021 |
creator | Jiang, Zhiying Gao, Bo He, Yanlin Han, Yongming Doyle, Paul Zhu, Qunxiong |
description | With the rapid development of the internet technology, a large amount of internet text data can be obtained. The text classification (TC) technology plays a very important role in processing massive text data, but the accuracy of classification is directly affected by the performance of term weighting in TC. Due to the original design of information retrieval (IR), term frequency-inverse document frequency (TF-IDF) is not effective enough for TC, especially for processing text data with unbalanced distributions in internet media reports. Therefore, the variance between the DF value of a particular term and the average of all DFs DF¯, namely, the document frequency variance (ADF), is proposed to enhance the ability in processing text data with unbalanced distribution. Then, the normal TF-IDF is modified by the proposed ADF for processing unbalanced text collection in four different ways, namely, TF-IADF, TF-IADF+, TF-IADFnorm, and TF-IADF+norm. As a result, an effective model can be established for the TC task of internet media reports. A series of simulations have been carried out to evaluate the performance of the proposed methods. Compared with TF-IDF on state-of-the-art classification algorithms, the effectiveness and feasibility of the proposed methods are confirmed by simulation results. |
doi_str_mv | 10.1155/2021/6619088 |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2501177244</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2501177244</sourcerecordid><originalsourceid>FETCH-LOGICAL-c337t-7255f4ad2e36e81b9451e85edaf1f59d5f95578779430ea5abd8df82b2d17c7b3</originalsourceid><addsrcrecordid>eNp90E1PwzAMBuAIgcQY3PgBkThCWZI2TXqEwaDSAAk6wa1KG2fLtLYjyfj493TazpxsWY9s60XonJJrSjkfMcLoKE1pRqQ8QAPK0zjiNBGHfU9YElEWfxyjE--XpJecygEyBfwEPF4p762xtQq2a_HM23aOn7svWOECXIPfwc4XYTt8qxfQQHSrPGicN2vXI42LSZTfTbDpHM7bAK6FgJ9AW4VfYd254E_RkVErD2f7OkSzyX0xfoymLw_5-GYa1XEsQiQY5yZRmkGcgqRVlnAKkoNWhhqeaW4yzoUUIktiAoqrSkttJKuYpqIWVTxEF7u9_WOfG_ChXHYb1_YnS8YJpUKwJOnV1U7VrvPegSnXzjbK_ZaUlNsky22S5T7Jnl_u-MK2Wn3b__UfFCNyPg</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2501177244</pqid></control><display><type>article</type><title>Text Classification Using Novel Term Weighting Scheme-Based Improved TF-IDF for Internet Media Reports</title><source>Wiley Online Library Open Access</source><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><source>Alma/SFX Local Collection</source><creator>Jiang, Zhiying ; Gao, Bo ; He, Yanlin ; Han, Yongming ; Doyle, Paul ; Zhu, Qunxiong</creator><contributor>Zeng, Nianyin</contributor><creatorcontrib>Jiang, Zhiying ; Gao, Bo ; He, Yanlin ; Han, Yongming ; Doyle, Paul ; Zhu, Qunxiong ; Zeng, Nianyin</creatorcontrib><description>With the rapid development of the internet technology, a large amount of internet text data can be obtained. The text classification (TC) technology plays a very important role in processing massive text data, but the accuracy of classification is directly affected by the performance of term weighting in TC. Due to the original design of information retrieval (IR), term frequency-inverse document frequency (TF-IDF) is not effective enough for TC, especially for processing text data with unbalanced distributions in internet media reports. Therefore, the variance between the DF value of a particular term and the average of all DFs DF¯, namely, the document frequency variance (ADF), is proposed to enhance the ability in processing text data with unbalanced distribution. Then, the normal TF-IDF is modified by the proposed ADF for processing unbalanced text collection in four different ways, namely, TF-IADF, TF-IADF+, TF-IADFnorm, and TF-IADF+norm. As a result, an effective model can be established for the TC task of internet media reports. A series of simulations have been carried out to evaluate the performance of the proposed methods. Compared with TF-IDF on state-of-the-art classification algorithms, the effectiveness and feasibility of the proposed methods are confirmed by simulation results.</description><identifier>ISSN: 1024-123X</identifier><identifier>EISSN: 1563-5147</identifier><identifier>DOI: 10.1155/2021/6619088</identifier><language>eng</language><publisher>New York: Hindawi</publisher><subject>Algorithms ; Classification ; Deep learning ; Information retrieval ; Internet ; Methods ; Neural networks ; Performance evaluation ; Semantic analysis ; Semantics ; Text categorization ; Weighting</subject><ispartof>Mathematical problems in engineering, 2021-03, Vol.2021, p.1-30</ispartof><rights>Copyright © 2021 Zhiying Jiang et al.</rights><rights>Copyright © 2021 Zhiying Jiang et al. This is an open access article distributed under the Creative Commons Attribution License (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. https://creativecommons.org/licenses/by/4.0</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c337t-7255f4ad2e36e81b9451e85edaf1f59d5f95578779430ea5abd8df82b2d17c7b3</citedby><cites>FETCH-LOGICAL-c337t-7255f4ad2e36e81b9451e85edaf1f59d5f95578779430ea5abd8df82b2d17c7b3</cites><orcidid>0000-0002-2719-0516 ; 0000-0003-3877-7432 ; 0000-0002-4643-7043 ; 0000-0002-0037-6871 ; 0000-0003-3209-725X ; 0000-0001-8840-7056</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,776,780,27901,27902</link.rule.ids></links><search><contributor>Zeng, Nianyin</contributor><creatorcontrib>Jiang, Zhiying</creatorcontrib><creatorcontrib>Gao, Bo</creatorcontrib><creatorcontrib>He, Yanlin</creatorcontrib><creatorcontrib>Han, Yongming</creatorcontrib><creatorcontrib>Doyle, Paul</creatorcontrib><creatorcontrib>Zhu, Qunxiong</creatorcontrib><title>Text Classification Using Novel Term Weighting Scheme-Based Improved TF-IDF for Internet Media Reports</title><title>Mathematical problems in engineering</title><description>With the rapid development of the internet technology, a large amount of internet text data can be obtained. The text classification (TC) technology plays a very important role in processing massive text data, but the accuracy of classification is directly affected by the performance of term weighting in TC. Due to the original design of information retrieval (IR), term frequency-inverse document frequency (TF-IDF) is not effective enough for TC, especially for processing text data with unbalanced distributions in internet media reports. Therefore, the variance between the DF value of a particular term and the average of all DFs DF¯, namely, the document frequency variance (ADF), is proposed to enhance the ability in processing text data with unbalanced distribution. Then, the normal TF-IDF is modified by the proposed ADF for processing unbalanced text collection in four different ways, namely, TF-IADF, TF-IADF+, TF-IADFnorm, and TF-IADF+norm. As a result, an effective model can be established for the TC task of internet media reports. A series of simulations have been carried out to evaluate the performance of the proposed methods. Compared with TF-IDF on state-of-the-art classification algorithms, the effectiveness and feasibility of the proposed methods are confirmed by simulation results.</description><subject>Algorithms</subject><subject>Classification</subject><subject>Deep learning</subject><subject>Information retrieval</subject><subject>Internet</subject><subject>Methods</subject><subject>Neural networks</subject><subject>Performance evaluation</subject><subject>Semantic analysis</subject><subject>Semantics</subject><subject>Text categorization</subject><subject>Weighting</subject><issn>1024-123X</issn><issn>1563-5147</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>RHX</sourceid><sourceid>BENPR</sourceid><recordid>eNp90E1PwzAMBuAIgcQY3PgBkThCWZI2TXqEwaDSAAk6wa1KG2fLtLYjyfj493TazpxsWY9s60XonJJrSjkfMcLoKE1pRqQ8QAPK0zjiNBGHfU9YElEWfxyjE--XpJecygEyBfwEPF4p762xtQq2a_HM23aOn7svWOECXIPfwc4XYTt8qxfQQHSrPGicN2vXI42LSZTfTbDpHM7bAK6FgJ9AW4VfYd254E_RkVErD2f7OkSzyX0xfoymLw_5-GYa1XEsQiQY5yZRmkGcgqRVlnAKkoNWhhqeaW4yzoUUIktiAoqrSkttJKuYpqIWVTxEF7u9_WOfG_ChXHYb1_YnS8YJpUKwJOnV1U7VrvPegSnXzjbK_ZaUlNsky22S5T7Jnl_u-MK2Wn3b__UfFCNyPg</recordid><startdate>20210305</startdate><enddate>20210305</enddate><creator>Jiang, Zhiying</creator><creator>Gao, Bo</creator><creator>He, Yanlin</creator><creator>Han, Yongming</creator><creator>Doyle, Paul</creator><creator>Zhu, Qunxiong</creator><general>Hindawi</general><general>Hindawi Limited</general><scope>RHU</scope><scope>RHW</scope><scope>RHX</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7TB</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>CWDGH</scope><scope>DWQXO</scope><scope>FR3</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K7-</scope><scope>KR7</scope><scope>L6V</scope><scope>M7S</scope><scope>P5Z</scope><scope>P62</scope><scope>PHGZM</scope><scope>PHGZT</scope><scope>PIMPY</scope><scope>PKEHL</scope><scope>PQEST</scope><scope>PQGLB</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope><orcidid>https://orcid.org/0000-0002-2719-0516</orcidid><orcidid>https://orcid.org/0000-0003-3877-7432</orcidid><orcidid>https://orcid.org/0000-0002-4643-7043</orcidid><orcidid>https://orcid.org/0000-0002-0037-6871</orcidid><orcidid>https://orcid.org/0000-0003-3209-725X</orcidid><orcidid>https://orcid.org/0000-0001-8840-7056</orcidid></search><sort><creationdate>20210305</creationdate><title>Text Classification Using Novel Term Weighting Scheme-Based Improved TF-IDF for Internet Media Reports</title><author>Jiang, Zhiying ; Gao, Bo ; He, Yanlin ; Han, Yongming ; Doyle, Paul ; Zhu, Qunxiong</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c337t-7255f4ad2e36e81b9451e85edaf1f59d5f95578779430ea5abd8df82b2d17c7b3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Algorithms</topic><topic>Classification</topic><topic>Deep learning</topic><topic>Information retrieval</topic><topic>Internet</topic><topic>Methods</topic><topic>Neural networks</topic><topic>Performance evaluation</topic><topic>Semantic analysis</topic><topic>Semantics</topic><topic>Text categorization</topic><topic>Weighting</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Jiang, Zhiying</creatorcontrib><creatorcontrib>Gao, Bo</creatorcontrib><creatorcontrib>He, Yanlin</creatorcontrib><creatorcontrib>Han, Yongming</creatorcontrib><creatorcontrib>Doyle, Paul</creatorcontrib><creatorcontrib>Zhu, Qunxiong</creatorcontrib><collection>Hindawi Publishing Complete</collection><collection>Hindawi Publishing Subscription Journals</collection><collection>Hindawi Publishing Open Access</collection><collection>CrossRef</collection><collection>Mechanical & Transportation Engineering Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies & Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection (ProQuest)</collection><collection>ProQuest One Community College</collection><collection>Middle East & Africa Database</collection><collection>ProQuest Central Korea</collection><collection>Engineering Research Database</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>Computer Science Database</collection><collection>Civil Engineering Abstracts</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Advanced Technologies & Aerospace Database</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>ProQuest Central (New)</collection><collection>ProQuest One Academic (New)</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Middle East (New)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Applied & Life Sciences</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection><jtitle>Mathematical problems in engineering</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Jiang, Zhiying</au><au>Gao, Bo</au><au>He, Yanlin</au><au>Han, Yongming</au><au>Doyle, Paul</au><au>Zhu, Qunxiong</au><au>Zeng, Nianyin</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Text Classification Using Novel Term Weighting Scheme-Based Improved TF-IDF for Internet Media Reports</atitle><jtitle>Mathematical problems in engineering</jtitle><date>2021-03-05</date><risdate>2021</risdate><volume>2021</volume><spage>1</spage><epage>30</epage><pages>1-30</pages><issn>1024-123X</issn><eissn>1563-5147</eissn><abstract>With the rapid development of the internet technology, a large amount of internet text data can be obtained. The text classification (TC) technology plays a very important role in processing massive text data, but the accuracy of classification is directly affected by the performance of term weighting in TC. Due to the original design of information retrieval (IR), term frequency-inverse document frequency (TF-IDF) is not effective enough for TC, especially for processing text data with unbalanced distributions in internet media reports. Therefore, the variance between the DF value of a particular term and the average of all DFs DF¯, namely, the document frequency variance (ADF), is proposed to enhance the ability in processing text data with unbalanced distribution. Then, the normal TF-IDF is modified by the proposed ADF for processing unbalanced text collection in four different ways, namely, TF-IADF, TF-IADF+, TF-IADFnorm, and TF-IADF+norm. As a result, an effective model can be established for the TC task of internet media reports. A series of simulations have been carried out to evaluate the performance of the proposed methods. Compared with TF-IDF on state-of-the-art classification algorithms, the effectiveness and feasibility of the proposed methods are confirmed by simulation results.</abstract><cop>New York</cop><pub>Hindawi</pub><doi>10.1155/2021/6619088</doi><tpages>30</tpages><orcidid>https://orcid.org/0000-0002-2719-0516</orcidid><orcidid>https://orcid.org/0000-0003-3877-7432</orcidid><orcidid>https://orcid.org/0000-0002-4643-7043</orcidid><orcidid>https://orcid.org/0000-0002-0037-6871</orcidid><orcidid>https://orcid.org/0000-0003-3209-725X</orcidid><orcidid>https://orcid.org/0000-0001-8840-7056</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1024-123X |
ispartof | Mathematical problems in engineering, 2021-03, Vol.2021, p.1-30 |
issn | 1024-123X 1563-5147 |
language | eng |
recordid | cdi_proquest_journals_2501177244 |
source | Wiley Online Library Open Access; Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals; Alma/SFX Local Collection |
subjects | Algorithms Classification Deep learning Information retrieval Internet Methods Neural networks Performance evaluation Semantic analysis Semantics Text categorization Weighting |
title | Text Classification Using Novel Term Weighting Scheme-Based Improved TF-IDF for Internet Media Reports |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-21T20%3A31%3A28IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Text%20Classification%20Using%20Novel%20Term%20Weighting%20Scheme-Based%20Improved%20TF-IDF%20for%20Internet%20Media%20Reports&rft.jtitle=Mathematical%20problems%20in%20engineering&rft.au=Jiang,%20Zhiying&rft.date=2021-03-05&rft.volume=2021&rft.spage=1&rft.epage=30&rft.pages=1-30&rft.issn=1024-123X&rft.eissn=1563-5147&rft_id=info:doi/10.1155/2021/6619088&rft_dat=%3Cproquest_cross%3E2501177244%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2501177244&rft_id=info:pmid/&rfr_iscdi=true |