Hadoop neural network for parallel and distributed feature selection

In this paper, we introduce a theoretical basis for a Hadoop-based neural network for parallel and distributed feature selection in Big Data sets. It is underpinned by an associative memory (binary) neural network which is highly amenable to parallel and distributed processing and fits with the Hado...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Neural networks 2016-06, Vol.78, p.24-35
Hauptverfasser: Hodge, Victoria J., O’Keefe, Simon, Austin, Jim
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 35
container_issue
container_start_page 24
container_title Neural networks
container_volume 78
creator Hodge, Victoria J.
O’Keefe, Simon
Austin, Jim
description In this paper, we introduce a theoretical basis for a Hadoop-based neural network for parallel and distributed feature selection in Big Data sets. It is underpinned by an associative memory (binary) neural network which is highly amenable to parallel and distributed processing and fits with the Hadoop paradigm. There are many feature selectors described in the literature which all have various strengths and weaknesses. We present the implementation details of five feature selection algorithms constructed using our artificial neural network framework embedded in Hadoop YARN. Hadoop allows parallel and distributed processing. Each feature selector can be divided into subtasks and the subtasks can then be processed in parallel. Multiple feature selectors can also be processed simultaneously (in parallel) allowing multiple feature selectors to be compared. We identify commonalities among the five features selectors. All can be processed in the framework using a single representation and the overall processing can also be greatly reduced by only processing the common aspects of the feature selectors once and propagating these aspects across all five feature selectors as necessary. This allows the best feature selector and the actual features to select to be identified for large and high dimensional data sets through exploiting the efficiency and flexibility of embedding the binary associative-memory neural network in Hadoop.
doi_str_mv 10.1016/j.neunet.2015.08.011
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_1825481626</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0893608015001744</els_id><sourcerecordid>1808674979</sourcerecordid><originalsourceid>FETCH-LOGICAL-c474t-b7f10d19eb21f5b40cbef06c794a1e88d7f449e4179840981032bd974e42d2e03</originalsourceid><addsrcrecordid>eNqNkcGKFDEQhoMo7uzqG4j00Uu3lXRNUrkIsuqusOBFzyHdqYaMPZ0x6VZ8e7PM6lE9_VB8VT_UJ8QLCZ0EqV8fuoW3hddOgdx3QB1I-UjsJBnbKkPqsdgB2b7VQHAhLks5AIAm7J-KC6URelK4E-9ufUjp1NRb2c811h8pf22mlJuTr5OZ58YvoQmxrDkO28qhmdivW-am8MzjGtPyTDyZ_Fz4-UNeiS8f3n--vm3vPt18vH57145ocG0HM0kI0vKg5LQfEMaBJ9CjseglEwUzIVpGaSwhWJLQqyFYg4wqKIb-Srw63z3l9G3jsrpjLCPPs184bcVJUnskqZX-DxRIG7TG_hs1ZAH3pGVF8YyOOZWSeXKnHI8-_3QS3L0Vd3BnK-7eigNy1Upde_nQsA1HDn-WfmuowJszwPV73yNnV8bIy8gh5vpiF1L8e8MvfOeeog</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1789045861</pqid></control><display><type>article</type><title>Hadoop neural network for parallel and distributed feature selection</title><source>MEDLINE</source><source>Elsevier ScienceDirect Journals Complete</source><creator>Hodge, Victoria J. ; O’Keefe, Simon ; Austin, Jim</creator><creatorcontrib>Hodge, Victoria J. ; O’Keefe, Simon ; Austin, Jim</creatorcontrib><description>In this paper, we introduce a theoretical basis for a Hadoop-based neural network for parallel and distributed feature selection in Big Data sets. It is underpinned by an associative memory (binary) neural network which is highly amenable to parallel and distributed processing and fits with the Hadoop paradigm. There are many feature selectors described in the literature which all have various strengths and weaknesses. We present the implementation details of five feature selection algorithms constructed using our artificial neural network framework embedded in Hadoop YARN. Hadoop allows parallel and distributed processing. Each feature selector can be divided into subtasks and the subtasks can then be processed in parallel. Multiple feature selectors can also be processed simultaneously (in parallel) allowing multiple feature selectors to be compared. We identify commonalities among the five features selectors. All can be processed in the framework using a single representation and the overall processing can also be greatly reduced by only processing the common aspects of the feature selectors once and propagating these aspects across all five feature selectors as necessary. This allows the best feature selector and the actual features to select to be identified for large and high dimensional data sets through exploiting the efficiency and flexibility of embedding the binary associative-memory neural network in Hadoop.</description><identifier>ISSN: 0893-6080</identifier><identifier>EISSN: 1879-2782</identifier><identifier>DOI: 10.1016/j.neunet.2015.08.011</identifier><identifier>PMID: 26403824</identifier><language>eng</language><publisher>United States: Elsevier Ltd</publisher><subject>Algorithms ; Associative memory ; Binary neural network ; Commonality ; Databases, Factual - statistics &amp; numerical data ; Distributed ; Distributed processing ; Feature selection ; Flexibility ; Hadoop ; MapReduce ; Neural networks ; Neural Networks (Computer) ; Parallel ; Representations ; Selectors ; Statistics as Topic - methods</subject><ispartof>Neural networks, 2016-06, Vol.78, p.24-35</ispartof><rights>2015 The Authors</rights><rights>Copyright © 2015 The Authors. Published by Elsevier Ltd.. All rights reserved.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c474t-b7f10d19eb21f5b40cbef06c794a1e88d7f449e4179840981032bd974e42d2e03</citedby><cites>FETCH-LOGICAL-c474t-b7f10d19eb21f5b40cbef06c794a1e88d7f449e4179840981032bd974e42d2e03</cites><orcidid>0000-0002-2469-0224</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.sciencedirect.com/science/article/pii/S0893608015001744$$EHTML$$P50$$Gelsevier$$Hfree_for_read</linktohtml><link.rule.ids>314,776,780,3537,27901,27902,65306</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/26403824$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Hodge, Victoria J.</creatorcontrib><creatorcontrib>O’Keefe, Simon</creatorcontrib><creatorcontrib>Austin, Jim</creatorcontrib><title>Hadoop neural network for parallel and distributed feature selection</title><title>Neural networks</title><addtitle>Neural Netw</addtitle><description>In this paper, we introduce a theoretical basis for a Hadoop-based neural network for parallel and distributed feature selection in Big Data sets. It is underpinned by an associative memory (binary) neural network which is highly amenable to parallel and distributed processing and fits with the Hadoop paradigm. There are many feature selectors described in the literature which all have various strengths and weaknesses. We present the implementation details of five feature selection algorithms constructed using our artificial neural network framework embedded in Hadoop YARN. Hadoop allows parallel and distributed processing. Each feature selector can be divided into subtasks and the subtasks can then be processed in parallel. Multiple feature selectors can also be processed simultaneously (in parallel) allowing multiple feature selectors to be compared. We identify commonalities among the five features selectors. All can be processed in the framework using a single representation and the overall processing can also be greatly reduced by only processing the common aspects of the feature selectors once and propagating these aspects across all five feature selectors as necessary. This allows the best feature selector and the actual features to select to be identified for large and high dimensional data sets through exploiting the efficiency and flexibility of embedding the binary associative-memory neural network in Hadoop.</description><subject>Algorithms</subject><subject>Associative memory</subject><subject>Binary neural network</subject><subject>Commonality</subject><subject>Databases, Factual - statistics &amp; numerical data</subject><subject>Distributed</subject><subject>Distributed processing</subject><subject>Feature selection</subject><subject>Flexibility</subject><subject>Hadoop</subject><subject>MapReduce</subject><subject>Neural networks</subject><subject>Neural Networks (Computer)</subject><subject>Parallel</subject><subject>Representations</subject><subject>Selectors</subject><subject>Statistics as Topic - methods</subject><issn>0893-6080</issn><issn>1879-2782</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2016</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><recordid>eNqNkcGKFDEQhoMo7uzqG4j00Uu3lXRNUrkIsuqusOBFzyHdqYaMPZ0x6VZ8e7PM6lE9_VB8VT_UJ8QLCZ0EqV8fuoW3hddOgdx3QB1I-UjsJBnbKkPqsdgB2b7VQHAhLks5AIAm7J-KC6URelK4E-9ufUjp1NRb2c811h8pf22mlJuTr5OZ58YvoQmxrDkO28qhmdivW-am8MzjGtPyTDyZ_Fz4-UNeiS8f3n--vm3vPt18vH57145ocG0HM0kI0vKg5LQfEMaBJ9CjseglEwUzIVpGaSwhWJLQqyFYg4wqKIb-Srw63z3l9G3jsrpjLCPPs184bcVJUnskqZX-DxRIG7TG_hs1ZAH3pGVF8YyOOZWSeXKnHI8-_3QS3L0Vd3BnK-7eigNy1Upde_nQsA1HDn-WfmuowJszwPV73yNnV8bIy8gh5vpiF1L8e8MvfOeeog</recordid><startdate>201606</startdate><enddate>201606</enddate><creator>Hodge, Victoria J.</creator><creator>O’Keefe, Simon</creator><creator>Austin, Jim</creator><general>Elsevier Ltd</general><scope>6I.</scope><scope>AAFTH</scope><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope><scope>7QO</scope><scope>7TK</scope><scope>8FD</scope><scope>FR3</scope><scope>P64</scope><scope>7SC</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0002-2469-0224</orcidid></search><sort><creationdate>201606</creationdate><title>Hadoop neural network for parallel and distributed feature selection</title><author>Hodge, Victoria J. ; O’Keefe, Simon ; Austin, Jim</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c474t-b7f10d19eb21f5b40cbef06c794a1e88d7f449e4179840981032bd974e42d2e03</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2016</creationdate><topic>Algorithms</topic><topic>Associative memory</topic><topic>Binary neural network</topic><topic>Commonality</topic><topic>Databases, Factual - statistics &amp; numerical data</topic><topic>Distributed</topic><topic>Distributed processing</topic><topic>Feature selection</topic><topic>Flexibility</topic><topic>Hadoop</topic><topic>MapReduce</topic><topic>Neural networks</topic><topic>Neural Networks (Computer)</topic><topic>Parallel</topic><topic>Representations</topic><topic>Selectors</topic><topic>Statistics as Topic - methods</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Hodge, Victoria J.</creatorcontrib><creatorcontrib>O’Keefe, Simon</creatorcontrib><creatorcontrib>Austin, Jim</creatorcontrib><collection>ScienceDirect Open Access Titles</collection><collection>Elsevier:ScienceDirect:Open Access</collection><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><collection>Biotechnology Research Abstracts</collection><collection>Neurosciences Abstracts</collection><collection>Technology Research Database</collection><collection>Engineering Research Database</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Computer and Information Systems Abstracts</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Neural networks</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Hodge, Victoria J.</au><au>O’Keefe, Simon</au><au>Austin, Jim</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Hadoop neural network for parallel and distributed feature selection</atitle><jtitle>Neural networks</jtitle><addtitle>Neural Netw</addtitle><date>2016-06</date><risdate>2016</risdate><volume>78</volume><spage>24</spage><epage>35</epage><pages>24-35</pages><issn>0893-6080</issn><eissn>1879-2782</eissn><abstract>In this paper, we introduce a theoretical basis for a Hadoop-based neural network for parallel and distributed feature selection in Big Data sets. It is underpinned by an associative memory (binary) neural network which is highly amenable to parallel and distributed processing and fits with the Hadoop paradigm. There are many feature selectors described in the literature which all have various strengths and weaknesses. We present the implementation details of five feature selection algorithms constructed using our artificial neural network framework embedded in Hadoop YARN. Hadoop allows parallel and distributed processing. Each feature selector can be divided into subtasks and the subtasks can then be processed in parallel. Multiple feature selectors can also be processed simultaneously (in parallel) allowing multiple feature selectors to be compared. We identify commonalities among the five features selectors. All can be processed in the framework using a single representation and the overall processing can also be greatly reduced by only processing the common aspects of the feature selectors once and propagating these aspects across all five feature selectors as necessary. This allows the best feature selector and the actual features to select to be identified for large and high dimensional data sets through exploiting the efficiency and flexibility of embedding the binary associative-memory neural network in Hadoop.</abstract><cop>United States</cop><pub>Elsevier Ltd</pub><pmid>26403824</pmid><doi>10.1016/j.neunet.2015.08.011</doi><tpages>12</tpages><orcidid>https://orcid.org/0000-0002-2469-0224</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 0893-6080
ispartof Neural networks, 2016-06, Vol.78, p.24-35
issn 0893-6080
1879-2782
language eng
recordid cdi_proquest_miscellaneous_1825481626
source MEDLINE; Elsevier ScienceDirect Journals Complete
subjects Algorithms
Associative memory
Binary neural network
Commonality
Databases, Factual - statistics & numerical data
Distributed
Distributed processing
Feature selection
Flexibility
Hadoop
MapReduce
Neural networks
Neural Networks (Computer)
Parallel
Representations
Selectors
Statistics as Topic - methods
title Hadoop neural network for parallel and distributed feature selection
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-11T23%3A53%3A50IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Hadoop%20neural%20network%20for%20parallel%20and%20distributed%20feature%20selection&rft.jtitle=Neural%20networks&rft.au=Hodge,%20Victoria%20J.&rft.date=2016-06&rft.volume=78&rft.spage=24&rft.epage=35&rft.pages=24-35&rft.issn=0893-6080&rft.eissn=1879-2782&rft_id=info:doi/10.1016/j.neunet.2015.08.011&rft_dat=%3Cproquest_cross%3E1808674979%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1789045861&rft_id=info:pmid/26403824&rft_els_id=S0893608015001744&rfr_iscdi=true