A new online field feature selection algorithm based on streaming data

The rapid development of Internet technology derived out a massive network text data. Therefore, how to classify the massive text data efficiently has important theoretical significance and application value. In order to acquire accurate classification results, the process has been divided into two...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of ambient intelligence and humanized computing 2024-02, Vol.15 (2), p.1365-1377
Hauptverfasser: Zhang, Zhenjiang, Song, Fuxing, Zhang, Peng, Chao, Han-Chieh, Zhao, Yingsi
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 1377
container_issue 2
container_start_page 1365
container_title Journal of ambient intelligence and humanized computing
container_volume 15
creator Zhang, Zhenjiang
Song, Fuxing
Zhang, Peng
Chao, Han-Chieh
Zhao, Yingsi
description The rapid development of Internet technology derived out a massive network text data. Therefore, how to classify the massive text data efficiently has important theoretical significance and application value. In order to acquire accurate classification results, the process has been divided into two parts. In terms of text representation, this paper proposes an online field feature selection algorithm (OFFS algorithm) based on streaming data, which solves the problems of low efficiency and memory consumption of traditional feature selection algorithms. With improvements in the vector space model, the new algorithm can select the real-time feature of the data and quickly generate text vector. In the aspect of classifier design, an OFFS-BP neural network text classifier based on BP neutral network and OFFS algorithm is designed. It adapts to the distributed parallel computing, reduces the training time and balances the computation efficiency and classification accuracy. Finally based on the Spark platform, the OFFS-BP neural network classifier is implemented. The experimental results show that the OFFS-BP neural network classifier is more suitable for big data environment with less computation time and higher classification efficiency.
doi_str_mv 10.1007/s12652-018-0959-0
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2933288193</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2933288193</sourcerecordid><originalsourceid>FETCH-LOGICAL-c2310-2a93adb4f93f7edeffc12a435edde4fb8da82bd5fe02768416f76a5670b27a643</originalsourceid><addsrcrecordid>eNp1kE1LAzEQhoMoWGp_gLeA52gm2Y_ssRSrhYIXPYfZzaRu2WZrskX8925Z0ZNzeefwfsDD2C3Ie5CyfEigilwJCUbIKq-EvGAzMIUROWT55e-vy2u2SGkvx9OVBoAZWy95oE_eh64NxH1LneOecDhF4ok6aoa2Dxy7XR_b4f3Aa0zkRjtPQyQ8tGHHHQ54w648dokWPzpnb-vH19Wz2L48bVbLrWiUBikUVhpdnflK-5Iced-Awkzn5BxlvjYOjapd7kmqsjAZFL4sMC9KWasSi0zP2d3Ue4z9x4nSYPf9KYZx0qpKa2UMjDJnMLma2KcUydtjbA8YvyxIeyZmJ2J2JGbPxKwcM2rKpNEbdhT_mv8PfQPrwG29</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2933288193</pqid></control><display><type>article</type><title>A new online field feature selection algorithm based on streaming data</title><source>Springer Nature - Complete Springer Journals</source><creator>Zhang, Zhenjiang ; Song, Fuxing ; Zhang, Peng ; Chao, Han-Chieh ; Zhao, Yingsi</creator><creatorcontrib>Zhang, Zhenjiang ; Song, Fuxing ; Zhang, Peng ; Chao, Han-Chieh ; Zhao, Yingsi</creatorcontrib><description>The rapid development of Internet technology derived out a massive network text data. Therefore, how to classify the massive text data efficiently has important theoretical significance and application value. In order to acquire accurate classification results, the process has been divided into two parts. In terms of text representation, this paper proposes an online field feature selection algorithm (OFFS algorithm) based on streaming data, which solves the problems of low efficiency and memory consumption of traditional feature selection algorithms. With improvements in the vector space model, the new algorithm can select the real-time feature of the data and quickly generate text vector. In the aspect of classifier design, an OFFS-BP neural network text classifier based on BP neutral network and OFFS algorithm is designed. It adapts to the distributed parallel computing, reduces the training time and balances the computation efficiency and classification accuracy. Finally based on the Spark platform, the OFFS-BP neural network classifier is implemented. The experimental results show that the OFFS-BP neural network classifier is more suitable for big data environment with less computation time and higher classification efficiency.</description><identifier>ISSN: 1868-5137</identifier><identifier>EISSN: 1868-5145</identifier><identifier>DOI: 10.1007/s12652-018-0959-0</identifier><language>eng</language><publisher>Berlin/Heidelberg: Springer Berlin Heidelberg</publisher><subject>Accuracy ; Algorithms ; Artificial Intelligence ; Back propagation networks ; Batch processing ; Big Data ; Classification ; Classifiers ; Computational efficiency ; Computational Intelligence ; Computing time ; Deep learning ; Design ; Efficiency ; Engineering ; Feature selection ; Natural language ; Neural networks ; Original Research ; Robotics and Automation ; Semantics ; Text categorization ; User Interfaces and Human Computer Interaction ; Vector spaces</subject><ispartof>Journal of ambient intelligence and humanized computing, 2024-02, Vol.15 (2), p.1365-1377</ispartof><rights>Springer-Verlag GmbH Germany, part of Springer Nature 2018</rights><rights>Springer-Verlag GmbH Germany, part of Springer Nature 2018.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c2310-2a93adb4f93f7edeffc12a435edde4fb8da82bd5fe02768416f76a5670b27a643</citedby><cites>FETCH-LOGICAL-c2310-2a93adb4f93f7edeffc12a435edde4fb8da82bd5fe02768416f76a5670b27a643</cites><orcidid>0000-0003-0217-3012</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s12652-018-0959-0$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s12652-018-0959-0$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>314,776,780,27903,27904,41467,42536,51297</link.rule.ids></links><search><creatorcontrib>Zhang, Zhenjiang</creatorcontrib><creatorcontrib>Song, Fuxing</creatorcontrib><creatorcontrib>Zhang, Peng</creatorcontrib><creatorcontrib>Chao, Han-Chieh</creatorcontrib><creatorcontrib>Zhao, Yingsi</creatorcontrib><title>A new online field feature selection algorithm based on streaming data</title><title>Journal of ambient intelligence and humanized computing</title><addtitle>J Ambient Intell Human Comput</addtitle><description>The rapid development of Internet technology derived out a massive network text data. Therefore, how to classify the massive text data efficiently has important theoretical significance and application value. In order to acquire accurate classification results, the process has been divided into two parts. In terms of text representation, this paper proposes an online field feature selection algorithm (OFFS algorithm) based on streaming data, which solves the problems of low efficiency and memory consumption of traditional feature selection algorithms. With improvements in the vector space model, the new algorithm can select the real-time feature of the data and quickly generate text vector. In the aspect of classifier design, an OFFS-BP neural network text classifier based on BP neutral network and OFFS algorithm is designed. It adapts to the distributed parallel computing, reduces the training time and balances the computation efficiency and classification accuracy. Finally based on the Spark platform, the OFFS-BP neural network classifier is implemented. The experimental results show that the OFFS-BP neural network classifier is more suitable for big data environment with less computation time and higher classification efficiency.</description><subject>Accuracy</subject><subject>Algorithms</subject><subject>Artificial Intelligence</subject><subject>Back propagation networks</subject><subject>Batch processing</subject><subject>Big Data</subject><subject>Classification</subject><subject>Classifiers</subject><subject>Computational efficiency</subject><subject>Computational Intelligence</subject><subject>Computing time</subject><subject>Deep learning</subject><subject>Design</subject><subject>Efficiency</subject><subject>Engineering</subject><subject>Feature selection</subject><subject>Natural language</subject><subject>Neural networks</subject><subject>Original Research</subject><subject>Robotics and Automation</subject><subject>Semantics</subject><subject>Text categorization</subject><subject>User Interfaces and Human Computer Interaction</subject><subject>Vector spaces</subject><issn>1868-5137</issn><issn>1868-5145</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><recordid>eNp1kE1LAzEQhoMoWGp_gLeA52gm2Y_ssRSrhYIXPYfZzaRu2WZrskX8925Z0ZNzeefwfsDD2C3Ie5CyfEigilwJCUbIKq-EvGAzMIUROWT55e-vy2u2SGkvx9OVBoAZWy95oE_eh64NxH1LneOecDhF4ok6aoa2Dxy7XR_b4f3Aa0zkRjtPQyQ8tGHHHQ54w648dokWPzpnb-vH19Wz2L48bVbLrWiUBikUVhpdnflK-5Iced-Awkzn5BxlvjYOjapd7kmqsjAZFL4sMC9KWasSi0zP2d3Ue4z9x4nSYPf9KYZx0qpKa2UMjDJnMLma2KcUydtjbA8YvyxIeyZmJ2J2JGbPxKwcM2rKpNEbdhT_mv8PfQPrwG29</recordid><startdate>20240201</startdate><enddate>20240201</enddate><creator>Zhang, Zhenjiang</creator><creator>Song, Fuxing</creator><creator>Zhang, Peng</creator><creator>Chao, Han-Chieh</creator><creator>Zhao, Yingsi</creator><general>Springer Berlin Heidelberg</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>JQ2</scope><orcidid>https://orcid.org/0000-0003-0217-3012</orcidid></search><sort><creationdate>20240201</creationdate><title>A new online field feature selection algorithm based on streaming data</title><author>Zhang, Zhenjiang ; Song, Fuxing ; Zhang, Peng ; Chao, Han-Chieh ; Zhao, Yingsi</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c2310-2a93adb4f93f7edeffc12a435edde4fb8da82bd5fe02768416f76a5670b27a643</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Accuracy</topic><topic>Algorithms</topic><topic>Artificial Intelligence</topic><topic>Back propagation networks</topic><topic>Batch processing</topic><topic>Big Data</topic><topic>Classification</topic><topic>Classifiers</topic><topic>Computational efficiency</topic><topic>Computational Intelligence</topic><topic>Computing time</topic><topic>Deep learning</topic><topic>Design</topic><topic>Efficiency</topic><topic>Engineering</topic><topic>Feature selection</topic><topic>Natural language</topic><topic>Neural networks</topic><topic>Original Research</topic><topic>Robotics and Automation</topic><topic>Semantics</topic><topic>Text categorization</topic><topic>User Interfaces and Human Computer Interaction</topic><topic>Vector spaces</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Zhang, Zhenjiang</creatorcontrib><creatorcontrib>Song, Fuxing</creatorcontrib><creatorcontrib>Zhang, Peng</creatorcontrib><creatorcontrib>Chao, Han-Chieh</creatorcontrib><creatorcontrib>Zhao, Yingsi</creatorcontrib><collection>CrossRef</collection><collection>ProQuest Computer Science Collection</collection><jtitle>Journal of ambient intelligence and humanized computing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Zhang, Zhenjiang</au><au>Song, Fuxing</au><au>Zhang, Peng</au><au>Chao, Han-Chieh</au><au>Zhao, Yingsi</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A new online field feature selection algorithm based on streaming data</atitle><jtitle>Journal of ambient intelligence and humanized computing</jtitle><stitle>J Ambient Intell Human Comput</stitle><date>2024-02-01</date><risdate>2024</risdate><volume>15</volume><issue>2</issue><spage>1365</spage><epage>1377</epage><pages>1365-1377</pages><issn>1868-5137</issn><eissn>1868-5145</eissn><abstract>The rapid development of Internet technology derived out a massive network text data. Therefore, how to classify the massive text data efficiently has important theoretical significance and application value. In order to acquire accurate classification results, the process has been divided into two parts. In terms of text representation, this paper proposes an online field feature selection algorithm (OFFS algorithm) based on streaming data, which solves the problems of low efficiency and memory consumption of traditional feature selection algorithms. With improvements in the vector space model, the new algorithm can select the real-time feature of the data and quickly generate text vector. In the aspect of classifier design, an OFFS-BP neural network text classifier based on BP neutral network and OFFS algorithm is designed. It adapts to the distributed parallel computing, reduces the training time and balances the computation efficiency and classification accuracy. Finally based on the Spark platform, the OFFS-BP neural network classifier is implemented. The experimental results show that the OFFS-BP neural network classifier is more suitable for big data environment with less computation time and higher classification efficiency.</abstract><cop>Berlin/Heidelberg</cop><pub>Springer Berlin Heidelberg</pub><doi>10.1007/s12652-018-0959-0</doi><tpages>13</tpages><orcidid>https://orcid.org/0000-0003-0217-3012</orcidid></addata></record>
fulltext fulltext
identifier ISSN: 1868-5137
ispartof Journal of ambient intelligence and humanized computing, 2024-02, Vol.15 (2), p.1365-1377
issn 1868-5137
1868-5145
language eng
recordid cdi_proquest_journals_2933288193
source Springer Nature - Complete Springer Journals
subjects Accuracy
Algorithms
Artificial Intelligence
Back propagation networks
Batch processing
Big Data
Classification
Classifiers
Computational efficiency
Computational Intelligence
Computing time
Deep learning
Design
Efficiency
Engineering
Feature selection
Natural language
Neural networks
Original Research
Robotics and Automation
Semantics
Text categorization
User Interfaces and Human Computer Interaction
Vector spaces
title A new online field feature selection algorithm based on streaming data
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-26T18%3A22%3A22IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20new%20online%20field%20feature%20selection%20algorithm%20based%20on%20streaming%20data&rft.jtitle=Journal%20of%20ambient%20intelligence%20and%20humanized%20computing&rft.au=Zhang,%20Zhenjiang&rft.date=2024-02-01&rft.volume=15&rft.issue=2&rft.spage=1365&rft.epage=1377&rft.pages=1365-1377&rft.issn=1868-5137&rft.eissn=1868-5145&rft_id=info:doi/10.1007/s12652-018-0959-0&rft_dat=%3Cproquest_cross%3E2933288193%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2933288193&rft_id=info:pmid/&rfr_iscdi=true