Evaluating vector data type usage in OpenCL kernels

Summary Open Computing Language (OpenCL) is an open, functionally portable programming model for a large range of highly parallel processors. To provide users with access to the underlying platforms, OpenCL has explicit support for features such as local memory and vector data types (VDTs). However,...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Concurrency and computation 2015-12, Vol.27 (17), p.4586-4602
Hauptverfasser: Fang, Jianbin, Varbanescu, Ana Lucia, Liao, Xiangke, Sips, Henk
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 4602
container_issue 17
container_start_page 4586
container_title Concurrency and computation
container_volume 27
creator Fang, Jianbin
Varbanescu, Ana Lucia
Liao, Xiangke
Sips, Henk
description Summary Open Computing Language (OpenCL) is an open, functionally portable programming model for a large range of highly parallel processors. To provide users with access to the underlying platforms, OpenCL has explicit support for features such as local memory and vector data types (VDTs). However, these are often low‐level, hardware‐specific features, which can be detrimental to performance on different platforms. In this paper, we focus on VDTs and investigate their usage in a systematic way. First, we propose two different approaches (inter‐vdt and intra‐vdt) to use VDTs in OpenCL kernels, and show how to translate scalar OpenCL kernels to vectorized ones. After obtaining vectorized code, we evaluate the performance effects of using VDTs with two types of benchmarks: micro‐benchmarks and macro‐benchmarks. With micro‐benchmarks, we study the execution model of VDTs and the role of the compiler‐aided vectorizer on five devices. With macro‐benchmarks, we explore the changes of memory access patterns before and after using VDTs, and the resulting performance impact. Not only our evaluation provides insights into how OpenCL's VDTs are mapped on different processors, but it also indicates that using such data types introduces changes in both computation and memory accesses. Based on the lessons learned, we discuss how to deal with performance portability in the presence of VDTs. Copyright © 2014 John Wiley & Sons, Ltd.
doi_str_mv 10.1002/cpe.3424
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_1770345815</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1770345815</sourcerecordid><originalsourceid>FETCH-LOGICAL-c4724-538777c17cbdbda050483eb7bdb9c0258d1cd96740b1b6290bdf19c9b0e9c7b03</originalsourceid><addsrcrecordid>eNp10E1Lw0AQBuBFFKxV8Cfk6CV19iPZ5CihtkppRRSPy-5mWmLTJO4mrf33plQKHjzNDDy8MC8htxRGFIDd2wZHXDBxRgY04iyEmIvz087iS3Ll_ScApcDpgPDxVpedbotqFWzRtrULct3qoN03GHRerzAoqmDRYJXNgjW6Ckt_TS6WuvR48zuH5P1x_JZNw9li8pQ9zEIrJBNhxBMppaXSmtzkGiIQCUcj-yu1wKIkpzZPYynAUBOzFEy-pKlNDWBqpQE-JHfH3MbVXx36Vm0Kb7EsdYV15xWVEriIkv63E7Wu9t7hUjWu2Gi3VxTUoRfV96IOvfQ0PNJdUeL-X6eyl_FfX_gWv09eu7WKJZeR-phP1FQ88yyev6qE_wCETHFT</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1770345815</pqid></control><display><type>article</type><title>Evaluating vector data type usage in OpenCL kernels</title><source>Wiley Online Library Journals Frontfile Complete</source><creator>Fang, Jianbin ; Varbanescu, Ana Lucia ; Liao, Xiangke ; Sips, Henk</creator><creatorcontrib>Fang, Jianbin ; Varbanescu, Ana Lucia ; Liao, Xiangke ; Sips, Henk</creatorcontrib><description>Summary Open Computing Language (OpenCL) is an open, functionally portable programming model for a large range of highly parallel processors. To provide users with access to the underlying platforms, OpenCL has explicit support for features such as local memory and vector data types (VDTs). However, these are often low‐level, hardware‐specific features, which can be detrimental to performance on different platforms. In this paper, we focus on VDTs and investigate their usage in a systematic way. First, we propose two different approaches (inter‐vdt and intra‐vdt) to use VDTs in OpenCL kernels, and show how to translate scalar OpenCL kernels to vectorized ones. After obtaining vectorized code, we evaluate the performance effects of using VDTs with two types of benchmarks: micro‐benchmarks and macro‐benchmarks. With micro‐benchmarks, we study the execution model of VDTs and the role of the compiler‐aided vectorizer on five devices. With macro‐benchmarks, we explore the changes of memory access patterns before and after using VDTs, and the resulting performance impact. Not only our evaluation provides insights into how OpenCL's VDTs are mapped on different processors, but it also indicates that using such data types introduces changes in both computation and memory accesses. Based on the lessons learned, we discuss how to deal with performance portability in the presence of VDTs. Copyright © 2014 John Wiley &amp; Sons, Ltd.</description><identifier>ISSN: 1532-0626</identifier><identifier>EISSN: 1532-0634</identifier><identifier>DOI: 10.1002/cpe.3424</identifier><language>eng</language><publisher>Blackwell Publishing Ltd</publisher><subject>benchmarking ; Computation ; Kernels ; Mathematical analysis ; openCL ; Platforms ; Portability ; Processors ; Scalars ; vector data types ; vectorization ; Vectors (mathematics)</subject><ispartof>Concurrency and computation, 2015-12, Vol.27 (17), p.4586-4602</ispartof><rights>Copyright © 2014 John Wiley &amp; Sons, Ltd.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c4724-538777c17cbdbda050483eb7bdb9c0258d1cd96740b1b6290bdf19c9b0e9c7b03</citedby><cites>FETCH-LOGICAL-c4724-538777c17cbdbda050483eb7bdb9c0258d1cd96740b1b6290bdf19c9b0e9c7b03</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://onlinelibrary.wiley.com/doi/pdf/10.1002%2Fcpe.3424$$EPDF$$P50$$Gwiley$$H</linktopdf><linktohtml>$$Uhttps://onlinelibrary.wiley.com/doi/full/10.1002%2Fcpe.3424$$EHTML$$P50$$Gwiley$$H</linktohtml><link.rule.ids>314,776,780,1411,27901,27902,45550,45551</link.rule.ids></links><search><creatorcontrib>Fang, Jianbin</creatorcontrib><creatorcontrib>Varbanescu, Ana Lucia</creatorcontrib><creatorcontrib>Liao, Xiangke</creatorcontrib><creatorcontrib>Sips, Henk</creatorcontrib><title>Evaluating vector data type usage in OpenCL kernels</title><title>Concurrency and computation</title><addtitle>Concurrency Computat.: Pract. Exper</addtitle><description>Summary Open Computing Language (OpenCL) is an open, functionally portable programming model for a large range of highly parallel processors. To provide users with access to the underlying platforms, OpenCL has explicit support for features such as local memory and vector data types (VDTs). However, these are often low‐level, hardware‐specific features, which can be detrimental to performance on different platforms. In this paper, we focus on VDTs and investigate their usage in a systematic way. First, we propose two different approaches (inter‐vdt and intra‐vdt) to use VDTs in OpenCL kernels, and show how to translate scalar OpenCL kernels to vectorized ones. After obtaining vectorized code, we evaluate the performance effects of using VDTs with two types of benchmarks: micro‐benchmarks and macro‐benchmarks. With micro‐benchmarks, we study the execution model of VDTs and the role of the compiler‐aided vectorizer on five devices. With macro‐benchmarks, we explore the changes of memory access patterns before and after using VDTs, and the resulting performance impact. Not only our evaluation provides insights into how OpenCL's VDTs are mapped on different processors, but it also indicates that using such data types introduces changes in both computation and memory accesses. Based on the lessons learned, we discuss how to deal with performance portability in the presence of VDTs. Copyright © 2014 John Wiley &amp; Sons, Ltd.</description><subject>benchmarking</subject><subject>Computation</subject><subject>Kernels</subject><subject>Mathematical analysis</subject><subject>openCL</subject><subject>Platforms</subject><subject>Portability</subject><subject>Processors</subject><subject>Scalars</subject><subject>vector data types</subject><subject>vectorization</subject><subject>Vectors (mathematics)</subject><issn>1532-0626</issn><issn>1532-0634</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2015</creationdate><recordtype>article</recordtype><recordid>eNp10E1Lw0AQBuBFFKxV8Cfk6CV19iPZ5CihtkppRRSPy-5mWmLTJO4mrf33plQKHjzNDDy8MC8htxRGFIDd2wZHXDBxRgY04iyEmIvz087iS3Ll_ScApcDpgPDxVpedbotqFWzRtrULct3qoN03GHRerzAoqmDRYJXNgjW6Ckt_TS6WuvR48zuH5P1x_JZNw9li8pQ9zEIrJBNhxBMppaXSmtzkGiIQCUcj-yu1wKIkpzZPYynAUBOzFEy-pKlNDWBqpQE-JHfH3MbVXx36Vm0Kb7EsdYV15xWVEriIkv63E7Wu9t7hUjWu2Gi3VxTUoRfV96IOvfQ0PNJdUeL-X6eyl_FfX_gWv09eu7WKJZeR-phP1FQ88yyev6qE_wCETHFT</recordid><startdate>20151210</startdate><enddate>20151210</enddate><creator>Fang, Jianbin</creator><creator>Varbanescu, Ana Lucia</creator><creator>Liao, Xiangke</creator><creator>Sips, Henk</creator><general>Blackwell Publishing Ltd</general><scope>BSCLL</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>20151210</creationdate><title>Evaluating vector data type usage in OpenCL kernels</title><author>Fang, Jianbin ; Varbanescu, Ana Lucia ; Liao, Xiangke ; Sips, Henk</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c4724-538777c17cbdbda050483eb7bdb9c0258d1cd96740b1b6290bdf19c9b0e9c7b03</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2015</creationdate><topic>benchmarking</topic><topic>Computation</topic><topic>Kernels</topic><topic>Mathematical analysis</topic><topic>openCL</topic><topic>Platforms</topic><topic>Portability</topic><topic>Processors</topic><topic>Scalars</topic><topic>vector data types</topic><topic>vectorization</topic><topic>Vectors (mathematics)</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Fang, Jianbin</creatorcontrib><creatorcontrib>Varbanescu, Ana Lucia</creatorcontrib><creatorcontrib>Liao, Xiangke</creatorcontrib><creatorcontrib>Sips, Henk</creatorcontrib><collection>Istex</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Concurrency and computation</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Fang, Jianbin</au><au>Varbanescu, Ana Lucia</au><au>Liao, Xiangke</au><au>Sips, Henk</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Evaluating vector data type usage in OpenCL kernels</atitle><jtitle>Concurrency and computation</jtitle><addtitle>Concurrency Computat.: Pract. Exper</addtitle><date>2015-12-10</date><risdate>2015</risdate><volume>27</volume><issue>17</issue><spage>4586</spage><epage>4602</epage><pages>4586-4602</pages><issn>1532-0626</issn><eissn>1532-0634</eissn><abstract>Summary Open Computing Language (OpenCL) is an open, functionally portable programming model for a large range of highly parallel processors. To provide users with access to the underlying platforms, OpenCL has explicit support for features such as local memory and vector data types (VDTs). However, these are often low‐level, hardware‐specific features, which can be detrimental to performance on different platforms. In this paper, we focus on VDTs and investigate their usage in a systematic way. First, we propose two different approaches (inter‐vdt and intra‐vdt) to use VDTs in OpenCL kernels, and show how to translate scalar OpenCL kernels to vectorized ones. After obtaining vectorized code, we evaluate the performance effects of using VDTs with two types of benchmarks: micro‐benchmarks and macro‐benchmarks. With micro‐benchmarks, we study the execution model of VDTs and the role of the compiler‐aided vectorizer on five devices. With macro‐benchmarks, we explore the changes of memory access patterns before and after using VDTs, and the resulting performance impact. Not only our evaluation provides insights into how OpenCL's VDTs are mapped on different processors, but it also indicates that using such data types introduces changes in both computation and memory accesses. Based on the lessons learned, we discuss how to deal with performance portability in the presence of VDTs. Copyright © 2014 John Wiley &amp; Sons, Ltd.</abstract><pub>Blackwell Publishing Ltd</pub><doi>10.1002/cpe.3424</doi><tpages>17</tpages></addata></record>
fulltext fulltext
identifier ISSN: 1532-0626
ispartof Concurrency and computation, 2015-12, Vol.27 (17), p.4586-4602
issn 1532-0626
1532-0634
language eng
recordid cdi_proquest_miscellaneous_1770345815
source Wiley Online Library Journals Frontfile Complete
subjects benchmarking
Computation
Kernels
Mathematical analysis
openCL
Platforms
Portability
Processors
Scalars
vector data types
vectorization
Vectors (mathematics)
title Evaluating vector data type usage in OpenCL kernels
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-09T04%3A33%3A22IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Evaluating%20vector%20data%20type%20usage%20in%20OpenCL%20kernels&rft.jtitle=Concurrency%20and%20computation&rft.au=Fang,%20Jianbin&rft.date=2015-12-10&rft.volume=27&rft.issue=17&rft.spage=4586&rft.epage=4602&rft.pages=4586-4602&rft.issn=1532-0626&rft.eissn=1532-0634&rft_id=info:doi/10.1002/cpe.3424&rft_dat=%3Cproquest_cross%3E1770345815%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1770345815&rft_id=info:pmid/&rfr_iscdi=true