Hadoop neural network for parallel and distributed feature selection
In this paper, we introduce a theoretical basis for a Hadoop-based neural network for parallel and distributed feature selection in Big Data sets. It is underpinned by an associative memory (binary) neural network which is highly amenable to parallel and distributed processing and fits with the Hado...
Gespeichert in:
Veröffentlicht in: | Neural networks 2016-06, Vol.78, p.24-35 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 35 |
---|---|
container_issue | |
container_start_page | 24 |
container_title | Neural networks |
container_volume | 78 |
creator | Hodge, Victoria J. O’Keefe, Simon Austin, Jim |
description | In this paper, we introduce a theoretical basis for a Hadoop-based neural network for parallel and distributed feature selection in Big Data sets. It is underpinned by an associative memory (binary) neural network which is highly amenable to parallel and distributed processing and fits with the Hadoop paradigm. There are many feature selectors described in the literature which all have various strengths and weaknesses. We present the implementation details of five feature selection algorithms constructed using our artificial neural network framework embedded in Hadoop YARN. Hadoop allows parallel and distributed processing. Each feature selector can be divided into subtasks and the subtasks can then be processed in parallel. Multiple feature selectors can also be processed simultaneously (in parallel) allowing multiple feature selectors to be compared. We identify commonalities among the five features selectors. All can be processed in the framework using a single representation and the overall processing can also be greatly reduced by only processing the common aspects of the feature selectors once and propagating these aspects across all five feature selectors as necessary. This allows the best feature selector and the actual features to select to be identified for large and high dimensional data sets through exploiting the efficiency and flexibility of embedding the binary associative-memory neural network in Hadoop. |
doi_str_mv | 10.1016/j.neunet.2015.08.011 |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_1825481626</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0893608015001744</els_id><sourcerecordid>1808674979</sourcerecordid><originalsourceid>FETCH-LOGICAL-c474t-b7f10d19eb21f5b40cbef06c794a1e88d7f449e4179840981032bd974e42d2e03</originalsourceid><addsrcrecordid>eNqNkcGKFDEQhoMo7uzqG4j00Uu3lXRNUrkIsuqusOBFzyHdqYaMPZ0x6VZ8e7PM6lE9_VB8VT_UJ8QLCZ0EqV8fuoW3hddOgdx3QB1I-UjsJBnbKkPqsdgB2b7VQHAhLks5AIAm7J-KC6URelK4E-9ufUjp1NRb2c811h8pf22mlJuTr5OZ58YvoQmxrDkO28qhmdivW-am8MzjGtPyTDyZ_Fz4-UNeiS8f3n--vm3vPt18vH57145ocG0HM0kI0vKg5LQfEMaBJ9CjseglEwUzIVpGaSwhWJLQqyFYg4wqKIb-Srw63z3l9G3jsrpjLCPPs184bcVJUnskqZX-DxRIG7TG_hs1ZAH3pGVF8YyOOZWSeXKnHI8-_3QS3L0Vd3BnK-7eigNy1Upde_nQsA1HDn-WfmuowJszwPV73yNnV8bIy8gh5vpiF1L8e8MvfOeeog</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1789045861</pqid></control><display><type>article</type><title>Hadoop neural network for parallel and distributed feature selection</title><source>MEDLINE</source><source>Elsevier ScienceDirect Journals Complete</source><creator>Hodge, Victoria J. ; O’Keefe, Simon ; Austin, Jim</creator><creatorcontrib>Hodge, Victoria J. ; O’Keefe, Simon ; Austin, Jim</creatorcontrib><description>In this paper, we introduce a theoretical basis for a Hadoop-based neural network for parallel and distributed feature selection in Big Data sets. It is underpinned by an associative memory (binary) neural network which is highly amenable to parallel and distributed processing and fits with the Hadoop paradigm. There are many feature selectors described in the literature which all have various strengths and weaknesses. We present the implementation details of five feature selection algorithms constructed using our artificial neural network framework embedded in Hadoop YARN. Hadoop allows parallel and distributed processing. Each feature selector can be divided into subtasks and the subtasks can then be processed in parallel. Multiple feature selectors can also be processed simultaneously (in parallel) allowing multiple feature selectors to be compared. We identify commonalities among the five features selectors. All can be processed in the framework using a single representation and the overall processing can also be greatly reduced by only processing the common aspects of the feature selectors once and propagating these aspects across all five feature selectors as necessary. This allows the best feature selector and the actual features to select to be identified for large and high dimensional data sets through exploiting the efficiency and flexibility of embedding the binary associative-memory neural network in Hadoop.</description><identifier>ISSN: 0893-6080</identifier><identifier>EISSN: 1879-2782</identifier><identifier>DOI: 10.1016/j.neunet.2015.08.011</identifier><identifier>PMID: 26403824</identifier><language>eng</language><publisher>United States: Elsevier Ltd</publisher><subject>Algorithms ; Associative memory ; Binary neural network ; Commonality ; Databases, Factual - statistics & numerical data ; Distributed ; Distributed processing ; Feature selection ; Flexibility ; Hadoop ; MapReduce ; Neural networks ; Neural Networks (Computer) ; Parallel ; Representations ; Selectors ; Statistics as Topic - methods</subject><ispartof>Neural networks, 2016-06, Vol.78, p.24-35</ispartof><rights>2015 The Authors</rights><rights>Copyright © 2015 The Authors. Published by Elsevier Ltd.. All rights reserved.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c474t-b7f10d19eb21f5b40cbef06c794a1e88d7f449e4179840981032bd974e42d2e03</citedby><cites>FETCH-LOGICAL-c474t-b7f10d19eb21f5b40cbef06c794a1e88d7f449e4179840981032bd974e42d2e03</cites><orcidid>0000-0002-2469-0224</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.sciencedirect.com/science/article/pii/S0893608015001744$$EHTML$$P50$$Gelsevier$$Hfree_for_read</linktohtml><link.rule.ids>314,776,780,3537,27901,27902,65306</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/26403824$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Hodge, Victoria J.</creatorcontrib><creatorcontrib>O’Keefe, Simon</creatorcontrib><creatorcontrib>Austin, Jim</creatorcontrib><title>Hadoop neural network for parallel and distributed feature selection</title><title>Neural networks</title><addtitle>Neural Netw</addtitle><description>In this paper, we introduce a theoretical basis for a Hadoop-based neural network for parallel and distributed feature selection in Big Data sets. It is underpinned by an associative memory (binary) neural network which is highly amenable to parallel and distributed processing and fits with the Hadoop paradigm. There are many feature selectors described in the literature which all have various strengths and weaknesses. We present the implementation details of five feature selection algorithms constructed using our artificial neural network framework embedded in Hadoop YARN. Hadoop allows parallel and distributed processing. Each feature selector can be divided into subtasks and the subtasks can then be processed in parallel. Multiple feature selectors can also be processed simultaneously (in parallel) allowing multiple feature selectors to be compared. We identify commonalities among the five features selectors. All can be processed in the framework using a single representation and the overall processing can also be greatly reduced by only processing the common aspects of the feature selectors once and propagating these aspects across all five feature selectors as necessary. This allows the best feature selector and the actual features to select to be identified for large and high dimensional data sets through exploiting the efficiency and flexibility of embedding the binary associative-memory neural network in Hadoop.</description><subject>Algorithms</subject><subject>Associative memory</subject><subject>Binary neural network</subject><subject>Commonality</subject><subject>Databases, Factual - statistics & numerical data</subject><subject>Distributed</subject><subject>Distributed processing</subject><subject>Feature selection</subject><subject>Flexibility</subject><subject>Hadoop</subject><subject>MapReduce</subject><subject>Neural networks</subject><subject>Neural Networks (Computer)</subject><subject>Parallel</subject><subject>Representations</subject><subject>Selectors</subject><subject>Statistics as Topic - methods</subject><issn>0893-6080</issn><issn>1879-2782</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2016</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><recordid>eNqNkcGKFDEQhoMo7uzqG4j00Uu3lXRNUrkIsuqusOBFzyHdqYaMPZ0x6VZ8e7PM6lE9_VB8VT_UJ8QLCZ0EqV8fuoW3hddOgdx3QB1I-UjsJBnbKkPqsdgB2b7VQHAhLks5AIAm7J-KC6URelK4E-9ufUjp1NRb2c811h8pf22mlJuTr5OZ58YvoQmxrDkO28qhmdivW-am8MzjGtPyTDyZ_Fz4-UNeiS8f3n--vm3vPt18vH57145ocG0HM0kI0vKg5LQfEMaBJ9CjseglEwUzIVpGaSwhWJLQqyFYg4wqKIb-Srw63z3l9G3jsrpjLCPPs184bcVJUnskqZX-DxRIG7TG_hs1ZAH3pGVF8YyOOZWSeXKnHI8-_3QS3L0Vd3BnK-7eigNy1Upde_nQsA1HDn-WfmuowJszwPV73yNnV8bIy8gh5vpiF1L8e8MvfOeeog</recordid><startdate>201606</startdate><enddate>201606</enddate><creator>Hodge, Victoria J.</creator><creator>O’Keefe, Simon</creator><creator>Austin, Jim</creator><general>Elsevier Ltd</general><scope>6I.</scope><scope>AAFTH</scope><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope><scope>7QO</scope><scope>7TK</scope><scope>8FD</scope><scope>FR3</scope><scope>P64</scope><scope>7SC</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0002-2469-0224</orcidid></search><sort><creationdate>201606</creationdate><title>Hadoop neural network for parallel and distributed feature selection</title><author>Hodge, Victoria J. ; O’Keefe, Simon ; Austin, Jim</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c474t-b7f10d19eb21f5b40cbef06c794a1e88d7f449e4179840981032bd974e42d2e03</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2016</creationdate><topic>Algorithms</topic><topic>Associative memory</topic><topic>Binary neural network</topic><topic>Commonality</topic><topic>Databases, Factual - statistics & numerical data</topic><topic>Distributed</topic><topic>Distributed processing</topic><topic>Feature selection</topic><topic>Flexibility</topic><topic>Hadoop</topic><topic>MapReduce</topic><topic>Neural networks</topic><topic>Neural Networks (Computer)</topic><topic>Parallel</topic><topic>Representations</topic><topic>Selectors</topic><topic>Statistics as Topic - methods</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Hodge, Victoria J.</creatorcontrib><creatorcontrib>O’Keefe, Simon</creatorcontrib><creatorcontrib>Austin, Jim</creatorcontrib><collection>ScienceDirect Open Access Titles</collection><collection>Elsevier:ScienceDirect:Open Access</collection><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><collection>Biotechnology Research Abstracts</collection><collection>Neurosciences Abstracts</collection><collection>Technology Research Database</collection><collection>Engineering Research Database</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Computer and Information Systems Abstracts</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Neural networks</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Hodge, Victoria J.</au><au>O’Keefe, Simon</au><au>Austin, Jim</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Hadoop neural network for parallel and distributed feature selection</atitle><jtitle>Neural networks</jtitle><addtitle>Neural Netw</addtitle><date>2016-06</date><risdate>2016</risdate><volume>78</volume><spage>24</spage><epage>35</epage><pages>24-35</pages><issn>0893-6080</issn><eissn>1879-2782</eissn><abstract>In this paper, we introduce a theoretical basis for a Hadoop-based neural network for parallel and distributed feature selection in Big Data sets. It is underpinned by an associative memory (binary) neural network which is highly amenable to parallel and distributed processing and fits with the Hadoop paradigm. There are many feature selectors described in the literature which all have various strengths and weaknesses. We present the implementation details of five feature selection algorithms constructed using our artificial neural network framework embedded in Hadoop YARN. Hadoop allows parallel and distributed processing. Each feature selector can be divided into subtasks and the subtasks can then be processed in parallel. Multiple feature selectors can also be processed simultaneously (in parallel) allowing multiple feature selectors to be compared. We identify commonalities among the five features selectors. All can be processed in the framework using a single representation and the overall processing can also be greatly reduced by only processing the common aspects of the feature selectors once and propagating these aspects across all five feature selectors as necessary. This allows the best feature selector and the actual features to select to be identified for large and high dimensional data sets through exploiting the efficiency and flexibility of embedding the binary associative-memory neural network in Hadoop.</abstract><cop>United States</cop><pub>Elsevier Ltd</pub><pmid>26403824</pmid><doi>10.1016/j.neunet.2015.08.011</doi><tpages>12</tpages><orcidid>https://orcid.org/0000-0002-2469-0224</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0893-6080 |
ispartof | Neural networks, 2016-06, Vol.78, p.24-35 |
issn | 0893-6080 1879-2782 |
language | eng |
recordid | cdi_proquest_miscellaneous_1825481626 |
source | MEDLINE; Elsevier ScienceDirect Journals Complete |
subjects | Algorithms Associative memory Binary neural network Commonality Databases, Factual - statistics & numerical data Distributed Distributed processing Feature selection Flexibility Hadoop MapReduce Neural networks Neural Networks (Computer) Parallel Representations Selectors Statistics as Topic - methods |
title | Hadoop neural network for parallel and distributed feature selection |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-11T23%3A53%3A50IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Hadoop%20neural%20network%20for%20parallel%20and%20distributed%20feature%20selection&rft.jtitle=Neural%20networks&rft.au=Hodge,%20Victoria%20J.&rft.date=2016-06&rft.volume=78&rft.spage=24&rft.epage=35&rft.pages=24-35&rft.issn=0893-6080&rft.eissn=1879-2782&rft_id=info:doi/10.1016/j.neunet.2015.08.011&rft_dat=%3Cproquest_cross%3E1808674979%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1789045861&rft_id=info:pmid/26403824&rft_els_id=S0893608015001744&rfr_iscdi=true |