Evaluating vector data type usage in OpenCL kernels
Summary Open Computing Language (OpenCL) is an open, functionally portable programming model for a large range of highly parallel processors. To provide users with access to the underlying platforms, OpenCL has explicit support for features such as local memory and vector data types (VDTs). However,...
Gespeichert in:
Veröffentlicht in: | Concurrency and computation 2015-12, Vol.27 (17), p.4586-4602 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 4602 |
---|---|
container_issue | 17 |
container_start_page | 4586 |
container_title | Concurrency and computation |
container_volume | 27 |
creator | Fang, Jianbin Varbanescu, Ana Lucia Liao, Xiangke Sips, Henk |
description | Summary
Open Computing Language (OpenCL) is an open, functionally portable programming model for a large range of highly parallel processors. To provide users with access to the underlying platforms, OpenCL has explicit support for features such as local memory and vector data types (VDTs). However, these are often low‐level, hardware‐specific features, which can be detrimental to performance on different platforms. In this paper, we focus on VDTs and investigate their usage in a systematic way. First, we propose two different approaches (inter‐vdt and intra‐vdt) to use VDTs in OpenCL kernels, and show how to translate scalar OpenCL kernels to vectorized ones. After obtaining vectorized code, we evaluate the performance effects of using VDTs with two types of benchmarks: micro‐benchmarks and macro‐benchmarks. With micro‐benchmarks, we study the execution model of VDTs and the role of the compiler‐aided vectorizer on five devices. With macro‐benchmarks, we explore the changes of memory access patterns before and after using VDTs, and the resulting performance impact. Not only our evaluation provides insights into how OpenCL's VDTs are mapped on different processors, but it also indicates that using such data types introduces changes in both computation and memory accesses. Based on the lessons learned, we discuss how to deal with performance portability in the presence of VDTs. Copyright © 2014 John Wiley & Sons, Ltd. |
doi_str_mv | 10.1002/cpe.3424 |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_1770345815</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1770345815</sourcerecordid><originalsourceid>FETCH-LOGICAL-c4724-538777c17cbdbda050483eb7bdb9c0258d1cd96740b1b6290bdf19c9b0e9c7b03</originalsourceid><addsrcrecordid>eNp10E1Lw0AQBuBFFKxV8Cfk6CV19iPZ5CihtkppRRSPy-5mWmLTJO4mrf33plQKHjzNDDy8MC8htxRGFIDd2wZHXDBxRgY04iyEmIvz087iS3Ll_ScApcDpgPDxVpedbotqFWzRtrULct3qoN03GHRerzAoqmDRYJXNgjW6Ckt_TS6WuvR48zuH5P1x_JZNw9li8pQ9zEIrJBNhxBMppaXSmtzkGiIQCUcj-yu1wKIkpzZPYynAUBOzFEy-pKlNDWBqpQE-JHfH3MbVXx36Vm0Kb7EsdYV15xWVEriIkv63E7Wu9t7hUjWu2Gi3VxTUoRfV96IOvfQ0PNJdUeL-X6eyl_FfX_gWv09eu7WKJZeR-phP1FQ88yyev6qE_wCETHFT</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1770345815</pqid></control><display><type>article</type><title>Evaluating vector data type usage in OpenCL kernels</title><source>Wiley Online Library Journals Frontfile Complete</source><creator>Fang, Jianbin ; Varbanescu, Ana Lucia ; Liao, Xiangke ; Sips, Henk</creator><creatorcontrib>Fang, Jianbin ; Varbanescu, Ana Lucia ; Liao, Xiangke ; Sips, Henk</creatorcontrib><description>Summary
Open Computing Language (OpenCL) is an open, functionally portable programming model for a large range of highly parallel processors. To provide users with access to the underlying platforms, OpenCL has explicit support for features such as local memory and vector data types (VDTs). However, these are often low‐level, hardware‐specific features, which can be detrimental to performance on different platforms. In this paper, we focus on VDTs and investigate their usage in a systematic way. First, we propose two different approaches (inter‐vdt and intra‐vdt) to use VDTs in OpenCL kernels, and show how to translate scalar OpenCL kernels to vectorized ones. After obtaining vectorized code, we evaluate the performance effects of using VDTs with two types of benchmarks: micro‐benchmarks and macro‐benchmarks. With micro‐benchmarks, we study the execution model of VDTs and the role of the compiler‐aided vectorizer on five devices. With macro‐benchmarks, we explore the changes of memory access patterns before and after using VDTs, and the resulting performance impact. Not only our evaluation provides insights into how OpenCL's VDTs are mapped on different processors, but it also indicates that using such data types introduces changes in both computation and memory accesses. Based on the lessons learned, we discuss how to deal with performance portability in the presence of VDTs. Copyright © 2014 John Wiley & Sons, Ltd.</description><identifier>ISSN: 1532-0626</identifier><identifier>EISSN: 1532-0634</identifier><identifier>DOI: 10.1002/cpe.3424</identifier><language>eng</language><publisher>Blackwell Publishing Ltd</publisher><subject>benchmarking ; Computation ; Kernels ; Mathematical analysis ; openCL ; Platforms ; Portability ; Processors ; Scalars ; vector data types ; vectorization ; Vectors (mathematics)</subject><ispartof>Concurrency and computation, 2015-12, Vol.27 (17), p.4586-4602</ispartof><rights>Copyright © 2014 John Wiley & Sons, Ltd.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c4724-538777c17cbdbda050483eb7bdb9c0258d1cd96740b1b6290bdf19c9b0e9c7b03</citedby><cites>FETCH-LOGICAL-c4724-538777c17cbdbda050483eb7bdb9c0258d1cd96740b1b6290bdf19c9b0e9c7b03</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://onlinelibrary.wiley.com/doi/pdf/10.1002%2Fcpe.3424$$EPDF$$P50$$Gwiley$$H</linktopdf><linktohtml>$$Uhttps://onlinelibrary.wiley.com/doi/full/10.1002%2Fcpe.3424$$EHTML$$P50$$Gwiley$$H</linktohtml><link.rule.ids>314,776,780,1411,27901,27902,45550,45551</link.rule.ids></links><search><creatorcontrib>Fang, Jianbin</creatorcontrib><creatorcontrib>Varbanescu, Ana Lucia</creatorcontrib><creatorcontrib>Liao, Xiangke</creatorcontrib><creatorcontrib>Sips, Henk</creatorcontrib><title>Evaluating vector data type usage in OpenCL kernels</title><title>Concurrency and computation</title><addtitle>Concurrency Computat.: Pract. Exper</addtitle><description>Summary
Open Computing Language (OpenCL) is an open, functionally portable programming model for a large range of highly parallel processors. To provide users with access to the underlying platforms, OpenCL has explicit support for features such as local memory and vector data types (VDTs). However, these are often low‐level, hardware‐specific features, which can be detrimental to performance on different platforms. In this paper, we focus on VDTs and investigate their usage in a systematic way. First, we propose two different approaches (inter‐vdt and intra‐vdt) to use VDTs in OpenCL kernels, and show how to translate scalar OpenCL kernels to vectorized ones. After obtaining vectorized code, we evaluate the performance effects of using VDTs with two types of benchmarks: micro‐benchmarks and macro‐benchmarks. With micro‐benchmarks, we study the execution model of VDTs and the role of the compiler‐aided vectorizer on five devices. With macro‐benchmarks, we explore the changes of memory access patterns before and after using VDTs, and the resulting performance impact. Not only our evaluation provides insights into how OpenCL's VDTs are mapped on different processors, but it also indicates that using such data types introduces changes in both computation and memory accesses. Based on the lessons learned, we discuss how to deal with performance portability in the presence of VDTs. Copyright © 2014 John Wiley & Sons, Ltd.</description><subject>benchmarking</subject><subject>Computation</subject><subject>Kernels</subject><subject>Mathematical analysis</subject><subject>openCL</subject><subject>Platforms</subject><subject>Portability</subject><subject>Processors</subject><subject>Scalars</subject><subject>vector data types</subject><subject>vectorization</subject><subject>Vectors (mathematics)</subject><issn>1532-0626</issn><issn>1532-0634</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2015</creationdate><recordtype>article</recordtype><recordid>eNp10E1Lw0AQBuBFFKxV8Cfk6CV19iPZ5CihtkppRRSPy-5mWmLTJO4mrf33plQKHjzNDDy8MC8htxRGFIDd2wZHXDBxRgY04iyEmIvz087iS3Ll_ScApcDpgPDxVpedbotqFWzRtrULct3qoN03GHRerzAoqmDRYJXNgjW6Ckt_TS6WuvR48zuH5P1x_JZNw9li8pQ9zEIrJBNhxBMppaXSmtzkGiIQCUcj-yu1wKIkpzZPYynAUBOzFEy-pKlNDWBqpQE-JHfH3MbVXx36Vm0Kb7EsdYV15xWVEriIkv63E7Wu9t7hUjWu2Gi3VxTUoRfV96IOvfQ0PNJdUeL-X6eyl_FfX_gWv09eu7WKJZeR-phP1FQ88yyev6qE_wCETHFT</recordid><startdate>20151210</startdate><enddate>20151210</enddate><creator>Fang, Jianbin</creator><creator>Varbanescu, Ana Lucia</creator><creator>Liao, Xiangke</creator><creator>Sips, Henk</creator><general>Blackwell Publishing Ltd</general><scope>BSCLL</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>20151210</creationdate><title>Evaluating vector data type usage in OpenCL kernels</title><author>Fang, Jianbin ; Varbanescu, Ana Lucia ; Liao, Xiangke ; Sips, Henk</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c4724-538777c17cbdbda050483eb7bdb9c0258d1cd96740b1b6290bdf19c9b0e9c7b03</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2015</creationdate><topic>benchmarking</topic><topic>Computation</topic><topic>Kernels</topic><topic>Mathematical analysis</topic><topic>openCL</topic><topic>Platforms</topic><topic>Portability</topic><topic>Processors</topic><topic>Scalars</topic><topic>vector data types</topic><topic>vectorization</topic><topic>Vectors (mathematics)</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Fang, Jianbin</creatorcontrib><creatorcontrib>Varbanescu, Ana Lucia</creatorcontrib><creatorcontrib>Liao, Xiangke</creatorcontrib><creatorcontrib>Sips, Henk</creatorcontrib><collection>Istex</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Concurrency and computation</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Fang, Jianbin</au><au>Varbanescu, Ana Lucia</au><au>Liao, Xiangke</au><au>Sips, Henk</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Evaluating vector data type usage in OpenCL kernels</atitle><jtitle>Concurrency and computation</jtitle><addtitle>Concurrency Computat.: Pract. Exper</addtitle><date>2015-12-10</date><risdate>2015</risdate><volume>27</volume><issue>17</issue><spage>4586</spage><epage>4602</epage><pages>4586-4602</pages><issn>1532-0626</issn><eissn>1532-0634</eissn><abstract>Summary
Open Computing Language (OpenCL) is an open, functionally portable programming model for a large range of highly parallel processors. To provide users with access to the underlying platforms, OpenCL has explicit support for features such as local memory and vector data types (VDTs). However, these are often low‐level, hardware‐specific features, which can be detrimental to performance on different platforms. In this paper, we focus on VDTs and investigate their usage in a systematic way. First, we propose two different approaches (inter‐vdt and intra‐vdt) to use VDTs in OpenCL kernels, and show how to translate scalar OpenCL kernels to vectorized ones. After obtaining vectorized code, we evaluate the performance effects of using VDTs with two types of benchmarks: micro‐benchmarks and macro‐benchmarks. With micro‐benchmarks, we study the execution model of VDTs and the role of the compiler‐aided vectorizer on five devices. With macro‐benchmarks, we explore the changes of memory access patterns before and after using VDTs, and the resulting performance impact. Not only our evaluation provides insights into how OpenCL's VDTs are mapped on different processors, but it also indicates that using such data types introduces changes in both computation and memory accesses. Based on the lessons learned, we discuss how to deal with performance portability in the presence of VDTs. Copyright © 2014 John Wiley & Sons, Ltd.</abstract><pub>Blackwell Publishing Ltd</pub><doi>10.1002/cpe.3424</doi><tpages>17</tpages></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1532-0626 |
ispartof | Concurrency and computation, 2015-12, Vol.27 (17), p.4586-4602 |
issn | 1532-0626 1532-0634 |
language | eng |
recordid | cdi_proquest_miscellaneous_1770345815 |
source | Wiley Online Library Journals Frontfile Complete |
subjects | benchmarking Computation Kernels Mathematical analysis openCL Platforms Portability Processors Scalars vector data types vectorization Vectors (mathematics) |
title | Evaluating vector data type usage in OpenCL kernels |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-09T04%3A33%3A22IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Evaluating%20vector%20data%20type%20usage%20in%20OpenCL%20kernels&rft.jtitle=Concurrency%20and%20computation&rft.au=Fang,%20Jianbin&rft.date=2015-12-10&rft.volume=27&rft.issue=17&rft.spage=4586&rft.epage=4602&rft.pages=4586-4602&rft.issn=1532-0626&rft.eissn=1532-0634&rft_id=info:doi/10.1002/cpe.3424&rft_dat=%3Cproquest_cross%3E1770345815%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1770345815&rft_id=info:pmid/&rfr_iscdi=true |