Deep learning parallel computing and evaluation for embedded system clustering architecture processor
In the era of intelligence, the processing of a large amount of information and various intelligent applications need to rely on embedded devices. This trend has made machine learning algorithms play an increasingly important role. High-performance embedded computing is an effective means to solve t...
Gespeichert in:
Veröffentlicht in: | Design automation for embedded systems 2020-09, Vol.24 (3), p.145-159 |
---|---|
1. Verfasser: | |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 159 |
---|---|
container_issue | 3 |
container_start_page | 145 |
container_title | Design automation for embedded systems |
container_volume | 24 |
creator | Zu, Yue |
description | In the era of intelligence, the processing of a large amount of information and various intelligent applications need to rely on embedded devices. This trend has made machine learning algorithms play an increasingly important role. High-performance embedded computing is an effective means to solve the lack of computing power of embedded devices. Aiming at the problem that the calculation amount of new intelligent embedded applications based on machine learning technology is higher, the computing power of traditional embedded systems is difficult to meet their needs, this paper studies the parallel optimization and implementation techniques of convolutional neural networks in Parallella platform. The parallel optimization strategy of convolutional neural network on the clustering architecture processor of heterogeneous multi-core system is given. Then the high-performance implementation of convolutional neural network on Parallella platform is studied, and the function of convolutional neural network system is implemented. A set of performance evaluation methods for embedded parallel processors is proposed. From the application point of S698P, the eCos operating system is selected as the platform. The single-core mode and multi-core mode are compared on the simulator GRSIM, and the parallel performance evaluation is given. Experiments have shown that the efficiency of deep learning tasks is significantly improved compared to traditional parallel methods. |
doi_str_mv | 10.1007/s10617-020-09235-5 |
format | Article |
fullrecord | <record><control><sourceid>proquest_webof</sourceid><recordid>TN_cdi_proquest_journals_2450310777</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2450310777</sourcerecordid><originalsourceid>FETCH-LOGICAL-c319t-df0a1cfb3744f7e2139e3facc458bd18dc7537ffd7b91041c62e8dea649737783</originalsourceid><addsrcrecordid>eNqNkE1r3DAQhkVoINtN_kBOghyLk5FlWfaxbJsPWMilPQtZGiUOtuVKckv-fbTr0txKTjMMzzsjPYRcMrhmAPImMqiZLKCEAtqSi0KckA0TsiwaaOAT2eRpWwjRiDPyOcYXAGglqzYEvyHOdEAdpn56orMOehhwoMaP85IOIz1Zir_1sOjU-4k6HyiOHVqLlsbXmHCkZlhyDUc6mOc-oUlLQDoHbzBGH87JqdNDxIu_dUt-3n7_sbsv9o93D7uv-8Jw1qbCOtDMuI7LqnISS8Zb5E4bU4mms6yxRgounbOyaxlUzNQlNhZ1XbWSS9nwLbla9-bLvxaMSb34JUz5pCorAZyBlDJT5UqZ4GMM6NQc-lGHV8VAHXSqVafKOtVRpxI59GUN_cHOu2h6nAz-C2afoi6rlte5A5bp5uP0rk9Htzu_TClH-RqN88Eohvc__Od5b21Mmv8</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2450310777</pqid></control><display><type>article</type><title>Deep learning parallel computing and evaluation for embedded system clustering architecture processor</title><source>SpringerLink Journals</source><creator>Zu, Yue</creator><creatorcontrib>Zu, Yue</creatorcontrib><description>In the era of intelligence, the processing of a large amount of information and various intelligent applications need to rely on embedded devices. This trend has made machine learning algorithms play an increasingly important role. High-performance embedded computing is an effective means to solve the lack of computing power of embedded devices. Aiming at the problem that the calculation amount of new intelligent embedded applications based on machine learning technology is higher, the computing power of traditional embedded systems is difficult to meet their needs, this paper studies the parallel optimization and implementation techniques of convolutional neural networks in Parallella platform. The parallel optimization strategy of convolutional neural network on the clustering architecture processor of heterogeneous multi-core system is given. Then the high-performance implementation of convolutional neural network on Parallella platform is studied, and the function of convolutional neural network system is implemented. A set of performance evaluation methods for embedded parallel processors is proposed. From the application point of S698P, the eCos operating system is selected as the platform. The single-core mode and multi-core mode are compared on the simulator GRSIM, and the parallel performance evaluation is given. Experiments have shown that the efficiency of deep learning tasks is significantly improved compared to traditional parallel methods.</description><identifier>ISSN: 0929-5585</identifier><identifier>EISSN: 1572-8080</identifier><identifier>DOI: 10.1007/s10617-020-09235-5</identifier><language>eng</language><publisher>New York: Springer US</publisher><subject>Algorithms ; Artificial neural networks ; CAE) and Design ; Circuits and Systems ; Clustering ; Cognitive tasks ; Computer architecture ; Computer Science ; Computer Science, Hardware & Architecture ; Computer Science, Software Engineering ; Computer-Aided Engineering (CAD ; Deep learning ; Electronic devices ; Embedded systems ; Engineering ; Machine learning ; Microprocessors ; Neural networks ; Optimization ; Performance evaluation ; Science & Technology ; Special Purpose and Application-Based Systems ; Technology</subject><ispartof>Design automation for embedded systems, 2020-09, Vol.24 (3), p.145-159</ispartof><rights>Springer Science+Business Media, LLC, part of Springer Nature 2020</rights><rights>Springer Science+Business Media, LLC, part of Springer Nature 2020.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>true</woscitedreferencessubscribed><woscitedreferencescount>2</woscitedreferencescount><woscitedreferencesoriginalsourcerecordid>wos000562493600001</woscitedreferencesoriginalsourcerecordid><citedby>FETCH-LOGICAL-c319t-df0a1cfb3744f7e2139e3facc458bd18dc7537ffd7b91041c62e8dea649737783</citedby><cites>FETCH-LOGICAL-c319t-df0a1cfb3744f7e2139e3facc458bd18dc7537ffd7b91041c62e8dea649737783</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s10617-020-09235-5$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s10617-020-09235-5$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>314,780,784,27924,27925,41488,42557,51319</link.rule.ids></links><search><creatorcontrib>Zu, Yue</creatorcontrib><title>Deep learning parallel computing and evaluation for embedded system clustering architecture processor</title><title>Design automation for embedded systems</title><addtitle>Des Autom Embed Syst</addtitle><addtitle>DES AUTOM EMBED SYST</addtitle><description>In the era of intelligence, the processing of a large amount of information and various intelligent applications need to rely on embedded devices. This trend has made machine learning algorithms play an increasingly important role. High-performance embedded computing is an effective means to solve the lack of computing power of embedded devices. Aiming at the problem that the calculation amount of new intelligent embedded applications based on machine learning technology is higher, the computing power of traditional embedded systems is difficult to meet their needs, this paper studies the parallel optimization and implementation techniques of convolutional neural networks in Parallella platform. The parallel optimization strategy of convolutional neural network on the clustering architecture processor of heterogeneous multi-core system is given. Then the high-performance implementation of convolutional neural network on Parallella platform is studied, and the function of convolutional neural network system is implemented. A set of performance evaluation methods for embedded parallel processors is proposed. From the application point of S698P, the eCos operating system is selected as the platform. The single-core mode and multi-core mode are compared on the simulator GRSIM, and the parallel performance evaluation is given. Experiments have shown that the efficiency of deep learning tasks is significantly improved compared to traditional parallel methods.</description><subject>Algorithms</subject><subject>Artificial neural networks</subject><subject>CAE) and Design</subject><subject>Circuits and Systems</subject><subject>Clustering</subject><subject>Cognitive tasks</subject><subject>Computer architecture</subject><subject>Computer Science</subject><subject>Computer Science, Hardware & Architecture</subject><subject>Computer Science, Software Engineering</subject><subject>Computer-Aided Engineering (CAD</subject><subject>Deep learning</subject><subject>Electronic devices</subject><subject>Embedded systems</subject><subject>Engineering</subject><subject>Machine learning</subject><subject>Microprocessors</subject><subject>Neural networks</subject><subject>Optimization</subject><subject>Performance evaluation</subject><subject>Science & Technology</subject><subject>Special Purpose and Application-Based Systems</subject><subject>Technology</subject><issn>0929-5585</issn><issn>1572-8080</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><sourceid>AOWDO</sourceid><recordid>eNqNkE1r3DAQhkVoINtN_kBOghyLk5FlWfaxbJsPWMilPQtZGiUOtuVKckv-fbTr0txKTjMMzzsjPYRcMrhmAPImMqiZLKCEAtqSi0KckA0TsiwaaOAT2eRpWwjRiDPyOcYXAGglqzYEvyHOdEAdpn56orMOehhwoMaP85IOIz1Zir_1sOjU-4k6HyiOHVqLlsbXmHCkZlhyDUc6mOc-oUlLQDoHbzBGH87JqdNDxIu_dUt-3n7_sbsv9o93D7uv-8Jw1qbCOtDMuI7LqnISS8Zb5E4bU4mms6yxRgounbOyaxlUzNQlNhZ1XbWSS9nwLbla9-bLvxaMSb34JUz5pCorAZyBlDJT5UqZ4GMM6NQc-lGHV8VAHXSqVafKOtVRpxI59GUN_cHOu2h6nAz-C2afoi6rlte5A5bp5uP0rk9Htzu_TClH-RqN88Eohvc__Od5b21Mmv8</recordid><startdate>20200901</startdate><enddate>20200901</enddate><creator>Zu, Yue</creator><general>Springer US</general><general>Springer Nature</general><general>Springer Nature B.V</general><scope>AOWDO</scope><scope>BLEPL</scope><scope>DTL</scope><scope>AAYXX</scope><scope>CITATION</scope></search><sort><creationdate>20200901</creationdate><title>Deep learning parallel computing and evaluation for embedded system clustering architecture processor</title><author>Zu, Yue</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c319t-df0a1cfb3744f7e2139e3facc458bd18dc7537ffd7b91041c62e8dea649737783</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Algorithms</topic><topic>Artificial neural networks</topic><topic>CAE) and Design</topic><topic>Circuits and Systems</topic><topic>Clustering</topic><topic>Cognitive tasks</topic><topic>Computer architecture</topic><topic>Computer Science</topic><topic>Computer Science, Hardware & Architecture</topic><topic>Computer Science, Software Engineering</topic><topic>Computer-Aided Engineering (CAD</topic><topic>Deep learning</topic><topic>Electronic devices</topic><topic>Embedded systems</topic><topic>Engineering</topic><topic>Machine learning</topic><topic>Microprocessors</topic><topic>Neural networks</topic><topic>Optimization</topic><topic>Performance evaluation</topic><topic>Science & Technology</topic><topic>Special Purpose and Application-Based Systems</topic><topic>Technology</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Zu, Yue</creatorcontrib><collection>Web of Science - Science Citation Index Expanded - 2020</collection><collection>Web of Science Core Collection</collection><collection>Science Citation Index Expanded</collection><collection>CrossRef</collection><jtitle>Design automation for embedded systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Zu, Yue</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Deep learning parallel computing and evaluation for embedded system clustering architecture processor</atitle><jtitle>Design automation for embedded systems</jtitle><stitle>Des Autom Embed Syst</stitle><stitle>DES AUTOM EMBED SYST</stitle><date>2020-09-01</date><risdate>2020</risdate><volume>24</volume><issue>3</issue><spage>145</spage><epage>159</epage><pages>145-159</pages><issn>0929-5585</issn><eissn>1572-8080</eissn><abstract>In the era of intelligence, the processing of a large amount of information and various intelligent applications need to rely on embedded devices. This trend has made machine learning algorithms play an increasingly important role. High-performance embedded computing is an effective means to solve the lack of computing power of embedded devices. Aiming at the problem that the calculation amount of new intelligent embedded applications based on machine learning technology is higher, the computing power of traditional embedded systems is difficult to meet their needs, this paper studies the parallel optimization and implementation techniques of convolutional neural networks in Parallella platform. The parallel optimization strategy of convolutional neural network on the clustering architecture processor of heterogeneous multi-core system is given. Then the high-performance implementation of convolutional neural network on Parallella platform is studied, and the function of convolutional neural network system is implemented. A set of performance evaluation methods for embedded parallel processors is proposed. From the application point of S698P, the eCos operating system is selected as the platform. The single-core mode and multi-core mode are compared on the simulator GRSIM, and the parallel performance evaluation is given. Experiments have shown that the efficiency of deep learning tasks is significantly improved compared to traditional parallel methods.</abstract><cop>New York</cop><pub>Springer US</pub><doi>10.1007/s10617-020-09235-5</doi><tpages>15</tpages></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0929-5585 |
ispartof | Design automation for embedded systems, 2020-09, Vol.24 (3), p.145-159 |
issn | 0929-5585 1572-8080 |
language | eng |
recordid | cdi_proquest_journals_2450310777 |
source | SpringerLink Journals |
subjects | Algorithms Artificial neural networks CAE) and Design Circuits and Systems Clustering Cognitive tasks Computer architecture Computer Science Computer Science, Hardware & Architecture Computer Science, Software Engineering Computer-Aided Engineering (CAD Deep learning Electronic devices Embedded systems Engineering Machine learning Microprocessors Neural networks Optimization Performance evaluation Science & Technology Special Purpose and Application-Based Systems Technology |
title | Deep learning parallel computing and evaluation for embedded system clustering architecture processor |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-26T03%3A39%3A50IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_webof&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Deep%20learning%20parallel%20computing%20and%20evaluation%20for%20embedded%20system%20clustering%20architecture%20processor&rft.jtitle=Design%20automation%20for%20embedded%20systems&rft.au=Zu,%20Yue&rft.date=2020-09-01&rft.volume=24&rft.issue=3&rft.spage=145&rft.epage=159&rft.pages=145-159&rft.issn=0929-5585&rft.eissn=1572-8080&rft_id=info:doi/10.1007/s10617-020-09235-5&rft_dat=%3Cproquest_webof%3E2450310777%3C/proquest_webof%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2450310777&rft_id=info:pmid/&rfr_iscdi=true |