Performance portability in a real world application: PHAST applied to Caffe
This work covers the PHAST Library’s employment, a hardware-agnostic programming library, to a real-world application like the Caffe framework. The original implementation of Caffe consists of two different versions of the source code: one to run on CPU platforms and another one to run on the GPU si...
Gespeichert in:
Veröffentlicht in: | The international journal of high performance computing applications 2022-05, Vol.36 (3), p.419-439 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 439 |
---|---|
container_issue | 3 |
container_start_page | 419 |
container_title | The international journal of high performance computing applications |
container_volume | 36 |
creator | Martínez, Pablo Antonio Peccerillo, Biagio Bartolini, Sandro García, José M Bernabé, Gregorio |
description | This work covers the PHAST Library’s employment, a hardware-agnostic programming library, to a real-world application like the Caffe framework. The original implementation of Caffe consists of two different versions of the source code: one to run on CPU platforms and another one to run on the GPU side. With PHAST, we aim to develop a single-source code implementation capable of running efficiently on CPU and GPU. In this paper, we start by carrying out a basic Caffe implementation performance analysis using PHAST. Then, we detail possible performance upgrades. We find that the overall performance is dominated by few ‘heavy’ layers. In refining the inefficient parts of this version, we find two different approaches: improvements to the Caffe source code and improvements to the PHAST Library itself, which ultimately translates into improved performance in the PHAST version of Caffe. We demonstrate that our PHAST implementation achieves performance portability on CPUs and GPUs. With a single source, the PHAST version of Caffe provides the same or even better performance than the original version of Caffe built from two different codebases. For the MNIST database, the PHAST implementation takes an equivalent amount of time as native code in CPU and GPU. Furthermore, PHAST achieves a speedup of 51% and a 49% with the CIFAR-10 database against native code in CPU and GPU, respectively. These results provide a new horizon for software development in the upcoming heterogeneous computing era. |
doi_str_mv | 10.1177/10943420221077107 |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2665044519</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sage_id>10.1177_10943420221077107</sage_id><sourcerecordid>2665044519</sourcerecordid><originalsourceid>FETCH-LOGICAL-c242t-464cb4a82d5d0facadd96472fb25a86c2cd207c111bd0547105df6c1c0ad4a213</originalsourceid><addsrcrecordid>eNp1UEtLAzEQDqJgrf4AbwHPWzNpHq23UtSKBQvW8zKbh2zZbtZki_jvTVnBg3gYZpj5HsxHyDWwCYDWt8DmYio44xyY1rlOyAi0gILPhDrNc74XR8A5uUhpxxhTYipH5Hnjog9xj61xtAuxx6pu6v6L1i1FGh029DPExlLsuqY22NehvaOb1eJ1O6ycpX2gS_TeXZIzj01yVz99TN4e7rfLVbF-eXxaLtaF4YL3hVDCVAJn3ErLPBq0dq6E5r7iEmfKcGM50wYAKsukyM9I65UBw9AK5DAdk5tBt4vh4-BSX-7CIbbZsuRKSSaEhHlGwYAyMaQUnS-7WO8xfpXAymNm5Z_MMmcycBK-u1_V_wnfdYBqeQ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2665044519</pqid></control><display><type>article</type><title>Performance portability in a real world application: PHAST applied to Caffe</title><source>Access via SAGE</source><source>Alma/SFX Local Collection</source><creator>Martínez, Pablo Antonio ; Peccerillo, Biagio ; Bartolini, Sandro ; García, José M ; Bernabé, Gregorio</creator><creatorcontrib>Martínez, Pablo Antonio ; Peccerillo, Biagio ; Bartolini, Sandro ; García, José M ; Bernabé, Gregorio</creatorcontrib><description>This work covers the PHAST Library’s employment, a hardware-agnostic programming library, to a real-world application like the Caffe framework. The original implementation of Caffe consists of two different versions of the source code: one to run on CPU platforms and another one to run on the GPU side. With PHAST, we aim to develop a single-source code implementation capable of running efficiently on CPU and GPU. In this paper, we start by carrying out a basic Caffe implementation performance analysis using PHAST. Then, we detail possible performance upgrades. We find that the overall performance is dominated by few ‘heavy’ layers. In refining the inefficient parts of this version, we find two different approaches: improvements to the Caffe source code and improvements to the PHAST Library itself, which ultimately translates into improved performance in the PHAST version of Caffe. We demonstrate that our PHAST implementation achieves performance portability on CPUs and GPUs. With a single source, the PHAST version of Caffe provides the same or even better performance than the original version of Caffe built from two different codebases. For the MNIST database, the PHAST implementation takes an equivalent amount of time as native code in CPU and GPU. Furthermore, PHAST achieves a speedup of 51% and a 49% with the CIFAR-10 database against native code in CPU and GPU, respectively. These results provide a new horizon for software development in the upcoming heterogeneous computing era.</description><identifier>ISSN: 1094-3420</identifier><identifier>EISSN: 1741-2846</identifier><identifier>DOI: 10.1177/10943420221077107</identifier><language>eng</language><publisher>London, England: SAGE Publications</publisher><subject>Central processing units ; CPUs ; Graphics processing units ; Libraries ; Portability ; Software development ; Source code</subject><ispartof>The international journal of high performance computing applications, 2022-05, Vol.36 (3), p.419-439</ispartof><rights>The Author(s) 2022</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c242t-464cb4a82d5d0facadd96472fb25a86c2cd207c111bd0547105df6c1c0ad4a213</citedby><cites>FETCH-LOGICAL-c242t-464cb4a82d5d0facadd96472fb25a86c2cd207c111bd0547105df6c1c0ad4a213</cites><orcidid>0000-0002-4998-0092 ; 0000-0002-6388-2835 ; 0000-0002-7265-3508 ; 0000-0002-4391-2451</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://journals.sagepub.com/doi/pdf/10.1177/10943420221077107$$EPDF$$P50$$Gsage$$H</linktopdf><linktohtml>$$Uhttps://journals.sagepub.com/doi/10.1177/10943420221077107$$EHTML$$P50$$Gsage$$H</linktohtml><link.rule.ids>314,780,784,21819,27924,27925,43621,43622</link.rule.ids></links><search><creatorcontrib>Martínez, Pablo Antonio</creatorcontrib><creatorcontrib>Peccerillo, Biagio</creatorcontrib><creatorcontrib>Bartolini, Sandro</creatorcontrib><creatorcontrib>García, José M</creatorcontrib><creatorcontrib>Bernabé, Gregorio</creatorcontrib><title>Performance portability in a real world application: PHAST applied to Caffe</title><title>The international journal of high performance computing applications</title><description>This work covers the PHAST Library’s employment, a hardware-agnostic programming library, to a real-world application like the Caffe framework. The original implementation of Caffe consists of two different versions of the source code: one to run on CPU platforms and another one to run on the GPU side. With PHAST, we aim to develop a single-source code implementation capable of running efficiently on CPU and GPU. In this paper, we start by carrying out a basic Caffe implementation performance analysis using PHAST. Then, we detail possible performance upgrades. We find that the overall performance is dominated by few ‘heavy’ layers. In refining the inefficient parts of this version, we find two different approaches: improvements to the Caffe source code and improvements to the PHAST Library itself, which ultimately translates into improved performance in the PHAST version of Caffe. We demonstrate that our PHAST implementation achieves performance portability on CPUs and GPUs. With a single source, the PHAST version of Caffe provides the same or even better performance than the original version of Caffe built from two different codebases. For the MNIST database, the PHAST implementation takes an equivalent amount of time as native code in CPU and GPU. Furthermore, PHAST achieves a speedup of 51% and a 49% with the CIFAR-10 database against native code in CPU and GPU, respectively. These results provide a new horizon for software development in the upcoming heterogeneous computing era.</description><subject>Central processing units</subject><subject>CPUs</subject><subject>Graphics processing units</subject><subject>Libraries</subject><subject>Portability</subject><subject>Software development</subject><subject>Source code</subject><issn>1094-3420</issn><issn>1741-2846</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><recordid>eNp1UEtLAzEQDqJgrf4AbwHPWzNpHq23UtSKBQvW8zKbh2zZbtZki_jvTVnBg3gYZpj5HsxHyDWwCYDWt8DmYio44xyY1rlOyAi0gILPhDrNc74XR8A5uUhpxxhTYipH5Hnjog9xj61xtAuxx6pu6v6L1i1FGh029DPExlLsuqY22NehvaOb1eJ1O6ycpX2gS_TeXZIzj01yVz99TN4e7rfLVbF-eXxaLtaF4YL3hVDCVAJn3ErLPBq0dq6E5r7iEmfKcGM50wYAKsukyM9I65UBw9AK5DAdk5tBt4vh4-BSX-7CIbbZsuRKSSaEhHlGwYAyMaQUnS-7WO8xfpXAymNm5Z_MMmcycBK-u1_V_wnfdYBqeQ</recordid><startdate>202205</startdate><enddate>202205</enddate><creator>Martínez, Pablo Antonio</creator><creator>Peccerillo, Biagio</creator><creator>Bartolini, Sandro</creator><creator>García, José M</creator><creator>Bernabé, Gregorio</creator><general>SAGE Publications</general><general>SAGE PUBLICATIONS, INC</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0002-4998-0092</orcidid><orcidid>https://orcid.org/0000-0002-6388-2835</orcidid><orcidid>https://orcid.org/0000-0002-7265-3508</orcidid><orcidid>https://orcid.org/0000-0002-4391-2451</orcidid></search><sort><creationdate>202205</creationdate><title>Performance portability in a real world application: PHAST applied to Caffe</title><author>Martínez, Pablo Antonio ; Peccerillo, Biagio ; Bartolini, Sandro ; García, José M ; Bernabé, Gregorio</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c242t-464cb4a82d5d0facadd96472fb25a86c2cd207c111bd0547105df6c1c0ad4a213</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Central processing units</topic><topic>CPUs</topic><topic>Graphics processing units</topic><topic>Libraries</topic><topic>Portability</topic><topic>Software development</topic><topic>Source code</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Martínez, Pablo Antonio</creatorcontrib><creatorcontrib>Peccerillo, Biagio</creatorcontrib><creatorcontrib>Bartolini, Sandro</creatorcontrib><creatorcontrib>García, José M</creatorcontrib><creatorcontrib>Bernabé, Gregorio</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>The international journal of high performance computing applications</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Martínez, Pablo Antonio</au><au>Peccerillo, Biagio</au><au>Bartolini, Sandro</au><au>García, José M</au><au>Bernabé, Gregorio</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Performance portability in a real world application: PHAST applied to Caffe</atitle><jtitle>The international journal of high performance computing applications</jtitle><date>2022-05</date><risdate>2022</risdate><volume>36</volume><issue>3</issue><spage>419</spage><epage>439</epage><pages>419-439</pages><issn>1094-3420</issn><eissn>1741-2846</eissn><abstract>This work covers the PHAST Library’s employment, a hardware-agnostic programming library, to a real-world application like the Caffe framework. The original implementation of Caffe consists of two different versions of the source code: one to run on CPU platforms and another one to run on the GPU side. With PHAST, we aim to develop a single-source code implementation capable of running efficiently on CPU and GPU. In this paper, we start by carrying out a basic Caffe implementation performance analysis using PHAST. Then, we detail possible performance upgrades. We find that the overall performance is dominated by few ‘heavy’ layers. In refining the inefficient parts of this version, we find two different approaches: improvements to the Caffe source code and improvements to the PHAST Library itself, which ultimately translates into improved performance in the PHAST version of Caffe. We demonstrate that our PHAST implementation achieves performance portability on CPUs and GPUs. With a single source, the PHAST version of Caffe provides the same or even better performance than the original version of Caffe built from two different codebases. For the MNIST database, the PHAST implementation takes an equivalent amount of time as native code in CPU and GPU. Furthermore, PHAST achieves a speedup of 51% and a 49% with the CIFAR-10 database against native code in CPU and GPU, respectively. These results provide a new horizon for software development in the upcoming heterogeneous computing era.</abstract><cop>London, England</cop><pub>SAGE Publications</pub><doi>10.1177/10943420221077107</doi><tpages>21</tpages><orcidid>https://orcid.org/0000-0002-4998-0092</orcidid><orcidid>https://orcid.org/0000-0002-6388-2835</orcidid><orcidid>https://orcid.org/0000-0002-7265-3508</orcidid><orcidid>https://orcid.org/0000-0002-4391-2451</orcidid></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1094-3420 |
ispartof | The international journal of high performance computing applications, 2022-05, Vol.36 (3), p.419-439 |
issn | 1094-3420 1741-2846 |
language | eng |
recordid | cdi_proquest_journals_2665044519 |
source | Access via SAGE; Alma/SFX Local Collection |
subjects | Central processing units CPUs Graphics processing units Libraries Portability Software development Source code |
title | Performance portability in a real world application: PHAST applied to Caffe |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-30T21%3A13%3A19IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Performance%20portability%20in%20a%20real%20world%20application:%20PHAST%20applied%20to%20Caffe&rft.jtitle=The%20international%20journal%20of%20high%20performance%20computing%20applications&rft.au=Mart%C3%ADnez,%20Pablo%20Antonio&rft.date=2022-05&rft.volume=36&rft.issue=3&rft.spage=419&rft.epage=439&rft.pages=419-439&rft.issn=1094-3420&rft.eissn=1741-2846&rft_id=info:doi/10.1177/10943420221077107&rft_dat=%3Cproquest_cross%3E2665044519%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2665044519&rft_id=info:pmid/&rft_sage_id=10.1177_10943420221077107&rfr_iscdi=true |