A new online field feature selection algorithm based on streaming data
The rapid development of Internet technology derived out a massive network text data. Therefore, how to classify the massive text data efficiently has important theoretical significance and application value. In order to acquire accurate classification results, the process has been divided into two...
Gespeichert in:
Veröffentlicht in: | Journal of ambient intelligence and humanized computing 2024-02, Vol.15 (2), p.1365-1377 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 1377 |
---|---|
container_issue | 2 |
container_start_page | 1365 |
container_title | Journal of ambient intelligence and humanized computing |
container_volume | 15 |
creator | Zhang, Zhenjiang Song, Fuxing Zhang, Peng Chao, Han-Chieh Zhao, Yingsi |
description | The rapid development of Internet technology derived out a massive network text data. Therefore, how to classify the massive text data efficiently has important theoretical significance and application value. In order to acquire accurate classification results, the process has been divided into two parts. In terms of text representation, this paper proposes an online field feature selection algorithm (OFFS algorithm) based on streaming data, which solves the problems of low efficiency and memory consumption of traditional feature selection algorithms. With improvements in the vector space model, the new algorithm can select the real-time feature of the data and quickly generate text vector. In the aspect of classifier design, an OFFS-BP neural network text classifier based on BP neutral network and OFFS algorithm is designed. It adapts to the distributed parallel computing, reduces the training time and balances the computation efficiency and classification accuracy. Finally based on the Spark platform, the OFFS-BP neural network classifier is implemented. The experimental results show that the OFFS-BP neural network classifier is more suitable for big data environment with less computation time and higher classification efficiency. |
doi_str_mv | 10.1007/s12652-018-0959-0 |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2933288193</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2933288193</sourcerecordid><originalsourceid>FETCH-LOGICAL-c2310-2a93adb4f93f7edeffc12a435edde4fb8da82bd5fe02768416f76a5670b27a643</originalsourceid><addsrcrecordid>eNp1kE1LAzEQhoMoWGp_gLeA52gm2Y_ssRSrhYIXPYfZzaRu2WZrskX8925Z0ZNzeefwfsDD2C3Ie5CyfEigilwJCUbIKq-EvGAzMIUROWT55e-vy2u2SGkvx9OVBoAZWy95oE_eh64NxH1LneOecDhF4ok6aoa2Dxy7XR_b4f3Aa0zkRjtPQyQ8tGHHHQ54w648dokWPzpnb-vH19Wz2L48bVbLrWiUBikUVhpdnflK-5Iced-Awkzn5BxlvjYOjapd7kmqsjAZFL4sMC9KWasSi0zP2d3Ue4z9x4nSYPf9KYZx0qpKa2UMjDJnMLma2KcUydtjbA8YvyxIeyZmJ2J2JGbPxKwcM2rKpNEbdhT_mv8PfQPrwG29</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2933288193</pqid></control><display><type>article</type><title>A new online field feature selection algorithm based on streaming data</title><source>Springer Nature - Complete Springer Journals</source><creator>Zhang, Zhenjiang ; Song, Fuxing ; Zhang, Peng ; Chao, Han-Chieh ; Zhao, Yingsi</creator><creatorcontrib>Zhang, Zhenjiang ; Song, Fuxing ; Zhang, Peng ; Chao, Han-Chieh ; Zhao, Yingsi</creatorcontrib><description>The rapid development of Internet technology derived out a massive network text data. Therefore, how to classify the massive text data efficiently has important theoretical significance and application value. In order to acquire accurate classification results, the process has been divided into two parts. In terms of text representation, this paper proposes an online field feature selection algorithm (OFFS algorithm) based on streaming data, which solves the problems of low efficiency and memory consumption of traditional feature selection algorithms. With improvements in the vector space model, the new algorithm can select the real-time feature of the data and quickly generate text vector. In the aspect of classifier design, an OFFS-BP neural network text classifier based on BP neutral network and OFFS algorithm is designed. It adapts to the distributed parallel computing, reduces the training time and balances the computation efficiency and classification accuracy. Finally based on the Spark platform, the OFFS-BP neural network classifier is implemented. The experimental results show that the OFFS-BP neural network classifier is more suitable for big data environment with less computation time and higher classification efficiency.</description><identifier>ISSN: 1868-5137</identifier><identifier>EISSN: 1868-5145</identifier><identifier>DOI: 10.1007/s12652-018-0959-0</identifier><language>eng</language><publisher>Berlin/Heidelberg: Springer Berlin Heidelberg</publisher><subject>Accuracy ; Algorithms ; Artificial Intelligence ; Back propagation networks ; Batch processing ; Big Data ; Classification ; Classifiers ; Computational efficiency ; Computational Intelligence ; Computing time ; Deep learning ; Design ; Efficiency ; Engineering ; Feature selection ; Natural language ; Neural networks ; Original Research ; Robotics and Automation ; Semantics ; Text categorization ; User Interfaces and Human Computer Interaction ; Vector spaces</subject><ispartof>Journal of ambient intelligence and humanized computing, 2024-02, Vol.15 (2), p.1365-1377</ispartof><rights>Springer-Verlag GmbH Germany, part of Springer Nature 2018</rights><rights>Springer-Verlag GmbH Germany, part of Springer Nature 2018.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c2310-2a93adb4f93f7edeffc12a435edde4fb8da82bd5fe02768416f76a5670b27a643</citedby><cites>FETCH-LOGICAL-c2310-2a93adb4f93f7edeffc12a435edde4fb8da82bd5fe02768416f76a5670b27a643</cites><orcidid>0000-0003-0217-3012</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s12652-018-0959-0$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s12652-018-0959-0$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>314,776,780,27903,27904,41467,42536,51297</link.rule.ids></links><search><creatorcontrib>Zhang, Zhenjiang</creatorcontrib><creatorcontrib>Song, Fuxing</creatorcontrib><creatorcontrib>Zhang, Peng</creatorcontrib><creatorcontrib>Chao, Han-Chieh</creatorcontrib><creatorcontrib>Zhao, Yingsi</creatorcontrib><title>A new online field feature selection algorithm based on streaming data</title><title>Journal of ambient intelligence and humanized computing</title><addtitle>J Ambient Intell Human Comput</addtitle><description>The rapid development of Internet technology derived out a massive network text data. Therefore, how to classify the massive text data efficiently has important theoretical significance and application value. In order to acquire accurate classification results, the process has been divided into two parts. In terms of text representation, this paper proposes an online field feature selection algorithm (OFFS algorithm) based on streaming data, which solves the problems of low efficiency and memory consumption of traditional feature selection algorithms. With improvements in the vector space model, the new algorithm can select the real-time feature of the data and quickly generate text vector. In the aspect of classifier design, an OFFS-BP neural network text classifier based on BP neutral network and OFFS algorithm is designed. It adapts to the distributed parallel computing, reduces the training time and balances the computation efficiency and classification accuracy. Finally based on the Spark platform, the OFFS-BP neural network classifier is implemented. The experimental results show that the OFFS-BP neural network classifier is more suitable for big data environment with less computation time and higher classification efficiency.</description><subject>Accuracy</subject><subject>Algorithms</subject><subject>Artificial Intelligence</subject><subject>Back propagation networks</subject><subject>Batch processing</subject><subject>Big Data</subject><subject>Classification</subject><subject>Classifiers</subject><subject>Computational efficiency</subject><subject>Computational Intelligence</subject><subject>Computing time</subject><subject>Deep learning</subject><subject>Design</subject><subject>Efficiency</subject><subject>Engineering</subject><subject>Feature selection</subject><subject>Natural language</subject><subject>Neural networks</subject><subject>Original Research</subject><subject>Robotics and Automation</subject><subject>Semantics</subject><subject>Text categorization</subject><subject>User Interfaces and Human Computer Interaction</subject><subject>Vector spaces</subject><issn>1868-5137</issn><issn>1868-5145</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><recordid>eNp1kE1LAzEQhoMoWGp_gLeA52gm2Y_ssRSrhYIXPYfZzaRu2WZrskX8925Z0ZNzeefwfsDD2C3Ie5CyfEigilwJCUbIKq-EvGAzMIUROWT55e-vy2u2SGkvx9OVBoAZWy95oE_eh64NxH1LneOecDhF4ok6aoa2Dxy7XR_b4f3Aa0zkRjtPQyQ8tGHHHQ54w648dokWPzpnb-vH19Wz2L48bVbLrWiUBikUVhpdnflK-5Iced-Awkzn5BxlvjYOjapd7kmqsjAZFL4sMC9KWasSi0zP2d3Ue4z9x4nSYPf9KYZx0qpKa2UMjDJnMLma2KcUydtjbA8YvyxIeyZmJ2J2JGbPxKwcM2rKpNEbdhT_mv8PfQPrwG29</recordid><startdate>20240201</startdate><enddate>20240201</enddate><creator>Zhang, Zhenjiang</creator><creator>Song, Fuxing</creator><creator>Zhang, Peng</creator><creator>Chao, Han-Chieh</creator><creator>Zhao, Yingsi</creator><general>Springer Berlin Heidelberg</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>JQ2</scope><orcidid>https://orcid.org/0000-0003-0217-3012</orcidid></search><sort><creationdate>20240201</creationdate><title>A new online field feature selection algorithm based on streaming data</title><author>Zhang, Zhenjiang ; Song, Fuxing ; Zhang, Peng ; Chao, Han-Chieh ; Zhao, Yingsi</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c2310-2a93adb4f93f7edeffc12a435edde4fb8da82bd5fe02768416f76a5670b27a643</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Accuracy</topic><topic>Algorithms</topic><topic>Artificial Intelligence</topic><topic>Back propagation networks</topic><topic>Batch processing</topic><topic>Big Data</topic><topic>Classification</topic><topic>Classifiers</topic><topic>Computational efficiency</topic><topic>Computational Intelligence</topic><topic>Computing time</topic><topic>Deep learning</topic><topic>Design</topic><topic>Efficiency</topic><topic>Engineering</topic><topic>Feature selection</topic><topic>Natural language</topic><topic>Neural networks</topic><topic>Original Research</topic><topic>Robotics and Automation</topic><topic>Semantics</topic><topic>Text categorization</topic><topic>User Interfaces and Human Computer Interaction</topic><topic>Vector spaces</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Zhang, Zhenjiang</creatorcontrib><creatorcontrib>Song, Fuxing</creatorcontrib><creatorcontrib>Zhang, Peng</creatorcontrib><creatorcontrib>Chao, Han-Chieh</creatorcontrib><creatorcontrib>Zhao, Yingsi</creatorcontrib><collection>CrossRef</collection><collection>ProQuest Computer Science Collection</collection><jtitle>Journal of ambient intelligence and humanized computing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Zhang, Zhenjiang</au><au>Song, Fuxing</au><au>Zhang, Peng</au><au>Chao, Han-Chieh</au><au>Zhao, Yingsi</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A new online field feature selection algorithm based on streaming data</atitle><jtitle>Journal of ambient intelligence and humanized computing</jtitle><stitle>J Ambient Intell Human Comput</stitle><date>2024-02-01</date><risdate>2024</risdate><volume>15</volume><issue>2</issue><spage>1365</spage><epage>1377</epage><pages>1365-1377</pages><issn>1868-5137</issn><eissn>1868-5145</eissn><abstract>The rapid development of Internet technology derived out a massive network text data. Therefore, how to classify the massive text data efficiently has important theoretical significance and application value. In order to acquire accurate classification results, the process has been divided into two parts. In terms of text representation, this paper proposes an online field feature selection algorithm (OFFS algorithm) based on streaming data, which solves the problems of low efficiency and memory consumption of traditional feature selection algorithms. With improvements in the vector space model, the new algorithm can select the real-time feature of the data and quickly generate text vector. In the aspect of classifier design, an OFFS-BP neural network text classifier based on BP neutral network and OFFS algorithm is designed. It adapts to the distributed parallel computing, reduces the training time and balances the computation efficiency and classification accuracy. Finally based on the Spark platform, the OFFS-BP neural network classifier is implemented. The experimental results show that the OFFS-BP neural network classifier is more suitable for big data environment with less computation time and higher classification efficiency.</abstract><cop>Berlin/Heidelberg</cop><pub>Springer Berlin Heidelberg</pub><doi>10.1007/s12652-018-0959-0</doi><tpages>13</tpages><orcidid>https://orcid.org/0000-0003-0217-3012</orcidid></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1868-5137 |
ispartof | Journal of ambient intelligence and humanized computing, 2024-02, Vol.15 (2), p.1365-1377 |
issn | 1868-5137 1868-5145 |
language | eng |
recordid | cdi_proquest_journals_2933288193 |
source | Springer Nature - Complete Springer Journals |
subjects | Accuracy Algorithms Artificial Intelligence Back propagation networks Batch processing Big Data Classification Classifiers Computational efficiency Computational Intelligence Computing time Deep learning Design Efficiency Engineering Feature selection Natural language Neural networks Original Research Robotics and Automation Semantics Text categorization User Interfaces and Human Computer Interaction Vector spaces |
title | A new online field feature selection algorithm based on streaming data |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-26T18%3A22%3A22IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20new%20online%20field%20feature%20selection%20algorithm%20based%20on%20streaming%20data&rft.jtitle=Journal%20of%20ambient%20intelligence%20and%20humanized%20computing&rft.au=Zhang,%20Zhenjiang&rft.date=2024-02-01&rft.volume=15&rft.issue=2&rft.spage=1365&rft.epage=1377&rft.pages=1365-1377&rft.issn=1868-5137&rft.eissn=1868-5145&rft_id=info:doi/10.1007/s12652-018-0959-0&rft_dat=%3Cproquest_cross%3E2933288193%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2933288193&rft_id=info:pmid/&rfr_iscdi=true |