Deep learning parallel computing and evaluation for embedded system clustering architecture processor

In the era of intelligence, the processing of a large amount of information and various intelligent applications need to rely on embedded devices. This trend has made machine learning algorithms play an increasingly important role. High-performance embedded computing is an effective means to solve t...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Design automation for embedded systems 2020-09, Vol.24 (3), p.145-159
1. Verfasser: Zu, Yue
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 159
container_issue 3
container_start_page 145
container_title Design automation for embedded systems
container_volume 24
creator Zu, Yue
description In the era of intelligence, the processing of a large amount of information and various intelligent applications need to rely on embedded devices. This trend has made machine learning algorithms play an increasingly important role. High-performance embedded computing is an effective means to solve the lack of computing power of embedded devices. Aiming at the problem that the calculation amount of new intelligent embedded applications based on machine learning technology is higher, the computing power of traditional embedded systems is difficult to meet their needs, this paper studies the parallel optimization and implementation techniques of convolutional neural networks in Parallella platform. The parallel optimization strategy of convolutional neural network on the clustering architecture processor of heterogeneous multi-core system is given. Then the high-performance implementation of convolutional neural network on Parallella platform is studied, and the function of convolutional neural network system is implemented. A set of performance evaluation methods for embedded parallel processors is proposed. From the application point of S698P, the eCos operating system is selected as the platform. The single-core mode and multi-core mode are compared on the simulator GRSIM, and the parallel performance evaluation is given. Experiments have shown that the efficiency of deep learning tasks is significantly improved compared to traditional parallel methods.
doi_str_mv 10.1007/s10617-020-09235-5
format Article
fullrecord <record><control><sourceid>proquest_webof</sourceid><recordid>TN_cdi_proquest_journals_2450310777</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2450310777</sourcerecordid><originalsourceid>FETCH-LOGICAL-c319t-df0a1cfb3744f7e2139e3facc458bd18dc7537ffd7b91041c62e8dea649737783</originalsourceid><addsrcrecordid>eNqNkE1r3DAQhkVoINtN_kBOghyLk5FlWfaxbJsPWMilPQtZGiUOtuVKckv-fbTr0txKTjMMzzsjPYRcMrhmAPImMqiZLKCEAtqSi0KckA0TsiwaaOAT2eRpWwjRiDPyOcYXAGglqzYEvyHOdEAdpn56orMOehhwoMaP85IOIz1Zir_1sOjU-4k6HyiOHVqLlsbXmHCkZlhyDUc6mOc-oUlLQDoHbzBGH87JqdNDxIu_dUt-3n7_sbsv9o93D7uv-8Jw1qbCOtDMuI7LqnISS8Zb5E4bU4mms6yxRgounbOyaxlUzNQlNhZ1XbWSS9nwLbla9-bLvxaMSb34JUz5pCorAZyBlDJT5UqZ4GMM6NQc-lGHV8VAHXSqVafKOtVRpxI59GUN_cHOu2h6nAz-C2afoi6rlte5A5bp5uP0rk9Htzu_TClH-RqN88Eohvc__Od5b21Mmv8</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2450310777</pqid></control><display><type>article</type><title>Deep learning parallel computing and evaluation for embedded system clustering architecture processor</title><source>SpringerLink Journals</source><creator>Zu, Yue</creator><creatorcontrib>Zu, Yue</creatorcontrib><description>In the era of intelligence, the processing of a large amount of information and various intelligent applications need to rely on embedded devices. This trend has made machine learning algorithms play an increasingly important role. High-performance embedded computing is an effective means to solve the lack of computing power of embedded devices. Aiming at the problem that the calculation amount of new intelligent embedded applications based on machine learning technology is higher, the computing power of traditional embedded systems is difficult to meet their needs, this paper studies the parallel optimization and implementation techniques of convolutional neural networks in Parallella platform. The parallel optimization strategy of convolutional neural network on the clustering architecture processor of heterogeneous multi-core system is given. Then the high-performance implementation of convolutional neural network on Parallella platform is studied, and the function of convolutional neural network system is implemented. A set of performance evaluation methods for embedded parallel processors is proposed. From the application point of S698P, the eCos operating system is selected as the platform. The single-core mode and multi-core mode are compared on the simulator GRSIM, and the parallel performance evaluation is given. Experiments have shown that the efficiency of deep learning tasks is significantly improved compared to traditional parallel methods.</description><identifier>ISSN: 0929-5585</identifier><identifier>EISSN: 1572-8080</identifier><identifier>DOI: 10.1007/s10617-020-09235-5</identifier><language>eng</language><publisher>New York: Springer US</publisher><subject>Algorithms ; Artificial neural networks ; CAE) and Design ; Circuits and Systems ; Clustering ; Cognitive tasks ; Computer architecture ; Computer Science ; Computer Science, Hardware &amp; Architecture ; Computer Science, Software Engineering ; Computer-Aided Engineering (CAD ; Deep learning ; Electronic devices ; Embedded systems ; Engineering ; Machine learning ; Microprocessors ; Neural networks ; Optimization ; Performance evaluation ; Science &amp; Technology ; Special Purpose and Application-Based Systems ; Technology</subject><ispartof>Design automation for embedded systems, 2020-09, Vol.24 (3), p.145-159</ispartof><rights>Springer Science+Business Media, LLC, part of Springer Nature 2020</rights><rights>Springer Science+Business Media, LLC, part of Springer Nature 2020.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>true</woscitedreferencessubscribed><woscitedreferencescount>2</woscitedreferencescount><woscitedreferencesoriginalsourcerecordid>wos000562493600001</woscitedreferencesoriginalsourcerecordid><citedby>FETCH-LOGICAL-c319t-df0a1cfb3744f7e2139e3facc458bd18dc7537ffd7b91041c62e8dea649737783</citedby><cites>FETCH-LOGICAL-c319t-df0a1cfb3744f7e2139e3facc458bd18dc7537ffd7b91041c62e8dea649737783</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s10617-020-09235-5$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s10617-020-09235-5$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>314,780,784,27924,27925,41488,42557,51319</link.rule.ids></links><search><creatorcontrib>Zu, Yue</creatorcontrib><title>Deep learning parallel computing and evaluation for embedded system clustering architecture processor</title><title>Design automation for embedded systems</title><addtitle>Des Autom Embed Syst</addtitle><addtitle>DES AUTOM EMBED SYST</addtitle><description>In the era of intelligence, the processing of a large amount of information and various intelligent applications need to rely on embedded devices. This trend has made machine learning algorithms play an increasingly important role. High-performance embedded computing is an effective means to solve the lack of computing power of embedded devices. Aiming at the problem that the calculation amount of new intelligent embedded applications based on machine learning technology is higher, the computing power of traditional embedded systems is difficult to meet their needs, this paper studies the parallel optimization and implementation techniques of convolutional neural networks in Parallella platform. The parallel optimization strategy of convolutional neural network on the clustering architecture processor of heterogeneous multi-core system is given. Then the high-performance implementation of convolutional neural network on Parallella platform is studied, and the function of convolutional neural network system is implemented. A set of performance evaluation methods for embedded parallel processors is proposed. From the application point of S698P, the eCos operating system is selected as the platform. The single-core mode and multi-core mode are compared on the simulator GRSIM, and the parallel performance evaluation is given. Experiments have shown that the efficiency of deep learning tasks is significantly improved compared to traditional parallel methods.</description><subject>Algorithms</subject><subject>Artificial neural networks</subject><subject>CAE) and Design</subject><subject>Circuits and Systems</subject><subject>Clustering</subject><subject>Cognitive tasks</subject><subject>Computer architecture</subject><subject>Computer Science</subject><subject>Computer Science, Hardware &amp; Architecture</subject><subject>Computer Science, Software Engineering</subject><subject>Computer-Aided Engineering (CAD</subject><subject>Deep learning</subject><subject>Electronic devices</subject><subject>Embedded systems</subject><subject>Engineering</subject><subject>Machine learning</subject><subject>Microprocessors</subject><subject>Neural networks</subject><subject>Optimization</subject><subject>Performance evaluation</subject><subject>Science &amp; Technology</subject><subject>Special Purpose and Application-Based Systems</subject><subject>Technology</subject><issn>0929-5585</issn><issn>1572-8080</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><sourceid>AOWDO</sourceid><recordid>eNqNkE1r3DAQhkVoINtN_kBOghyLk5FlWfaxbJsPWMilPQtZGiUOtuVKckv-fbTr0txKTjMMzzsjPYRcMrhmAPImMqiZLKCEAtqSi0KckA0TsiwaaOAT2eRpWwjRiDPyOcYXAGglqzYEvyHOdEAdpn56orMOehhwoMaP85IOIz1Zir_1sOjU-4k6HyiOHVqLlsbXmHCkZlhyDUc6mOc-oUlLQDoHbzBGH87JqdNDxIu_dUt-3n7_sbsv9o93D7uv-8Jw1qbCOtDMuI7LqnISS8Zb5E4bU4mms6yxRgounbOyaxlUzNQlNhZ1XbWSS9nwLbla9-bLvxaMSb34JUz5pCorAZyBlDJT5UqZ4GMM6NQc-lGHV8VAHXSqVafKOtVRpxI59GUN_cHOu2h6nAz-C2afoi6rlte5A5bp5uP0rk9Htzu_TClH-RqN88Eohvc__Od5b21Mmv8</recordid><startdate>20200901</startdate><enddate>20200901</enddate><creator>Zu, Yue</creator><general>Springer US</general><general>Springer Nature</general><general>Springer Nature B.V</general><scope>AOWDO</scope><scope>BLEPL</scope><scope>DTL</scope><scope>AAYXX</scope><scope>CITATION</scope></search><sort><creationdate>20200901</creationdate><title>Deep learning parallel computing and evaluation for embedded system clustering architecture processor</title><author>Zu, Yue</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c319t-df0a1cfb3744f7e2139e3facc458bd18dc7537ffd7b91041c62e8dea649737783</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Algorithms</topic><topic>Artificial neural networks</topic><topic>CAE) and Design</topic><topic>Circuits and Systems</topic><topic>Clustering</topic><topic>Cognitive tasks</topic><topic>Computer architecture</topic><topic>Computer Science</topic><topic>Computer Science, Hardware &amp; Architecture</topic><topic>Computer Science, Software Engineering</topic><topic>Computer-Aided Engineering (CAD</topic><topic>Deep learning</topic><topic>Electronic devices</topic><topic>Embedded systems</topic><topic>Engineering</topic><topic>Machine learning</topic><topic>Microprocessors</topic><topic>Neural networks</topic><topic>Optimization</topic><topic>Performance evaluation</topic><topic>Science &amp; Technology</topic><topic>Special Purpose and Application-Based Systems</topic><topic>Technology</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Zu, Yue</creatorcontrib><collection>Web of Science - Science Citation Index Expanded - 2020</collection><collection>Web of Science Core Collection</collection><collection>Science Citation Index Expanded</collection><collection>CrossRef</collection><jtitle>Design automation for embedded systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Zu, Yue</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Deep learning parallel computing and evaluation for embedded system clustering architecture processor</atitle><jtitle>Design automation for embedded systems</jtitle><stitle>Des Autom Embed Syst</stitle><stitle>DES AUTOM EMBED SYST</stitle><date>2020-09-01</date><risdate>2020</risdate><volume>24</volume><issue>3</issue><spage>145</spage><epage>159</epage><pages>145-159</pages><issn>0929-5585</issn><eissn>1572-8080</eissn><abstract>In the era of intelligence, the processing of a large amount of information and various intelligent applications need to rely on embedded devices. This trend has made machine learning algorithms play an increasingly important role. High-performance embedded computing is an effective means to solve the lack of computing power of embedded devices. Aiming at the problem that the calculation amount of new intelligent embedded applications based on machine learning technology is higher, the computing power of traditional embedded systems is difficult to meet their needs, this paper studies the parallel optimization and implementation techniques of convolutional neural networks in Parallella platform. The parallel optimization strategy of convolutional neural network on the clustering architecture processor of heterogeneous multi-core system is given. Then the high-performance implementation of convolutional neural network on Parallella platform is studied, and the function of convolutional neural network system is implemented. A set of performance evaluation methods for embedded parallel processors is proposed. From the application point of S698P, the eCos operating system is selected as the platform. The single-core mode and multi-core mode are compared on the simulator GRSIM, and the parallel performance evaluation is given. Experiments have shown that the efficiency of deep learning tasks is significantly improved compared to traditional parallel methods.</abstract><cop>New York</cop><pub>Springer US</pub><doi>10.1007/s10617-020-09235-5</doi><tpages>15</tpages></addata></record>
fulltext fulltext
identifier ISSN: 0929-5585
ispartof Design automation for embedded systems, 2020-09, Vol.24 (3), p.145-159
issn 0929-5585
1572-8080
language eng
recordid cdi_proquest_journals_2450310777
source SpringerLink Journals
subjects Algorithms
Artificial neural networks
CAE) and Design
Circuits and Systems
Clustering
Cognitive tasks
Computer architecture
Computer Science
Computer Science, Hardware & Architecture
Computer Science, Software Engineering
Computer-Aided Engineering (CAD
Deep learning
Electronic devices
Embedded systems
Engineering
Machine learning
Microprocessors
Neural networks
Optimization
Performance evaluation
Science & Technology
Special Purpose and Application-Based Systems
Technology
title Deep learning parallel computing and evaluation for embedded system clustering architecture processor
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-26T03%3A39%3A50IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_webof&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Deep%20learning%20parallel%20computing%20and%20evaluation%20for%20embedded%20system%20clustering%20architecture%20processor&rft.jtitle=Design%20automation%20for%20embedded%20systems&rft.au=Zu,%20Yue&rft.date=2020-09-01&rft.volume=24&rft.issue=3&rft.spage=145&rft.epage=159&rft.pages=145-159&rft.issn=0929-5585&rft.eissn=1572-8080&rft_id=info:doi/10.1007/s10617-020-09235-5&rft_dat=%3Cproquest_webof%3E2450310777%3C/proquest_webof%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2450310777&rft_id=info:pmid/&rfr_iscdi=true