Memory Coalescing Implementation of Metropolis Resampling on Graphics Processing Unit

Owing to many cores in its architecture, graphics processing unit (GPU) offers promise for parallel execution of the particle filter. A stage of the particle filter that is particularly challenging to parallelize is resampling. There are parallel resampling algorithms in the literature such as Metro...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of signal processing systems 2018-03, Vol.90 (3), p.433-447
Hauptverfasser: Dülger, Özcan, Oğuztüzün, Halit, Demirekler, Mübeccel
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 447
container_issue 3
container_start_page 433
container_title Journal of signal processing systems
container_volume 90
creator Dülger, Özcan
Oğuztüzün, Halit
Demirekler, Mübeccel
description Owing to many cores in its architecture, graphics processing unit (GPU) offers promise for parallel execution of the particle filter. A stage of the particle filter that is particularly challenging to parallelize is resampling. There are parallel resampling algorithms in the literature such as Metropolis resampling, which does not require a collective operation such as cumulative sum over weights and does not suffer from numerical instability. However, with large number of particles, Metropolis resampling becomes slow. This is because of the non-coalesced access problem on the global memory of the GPU. In this article, we offer solutions for this problem of Metropolis resampling. We introduce two implementation techniques, named Metropolis-C1 and Metropolis-C2, and compare them with the original Metropolis resampling on NVIDIA Tesla K40 board. In the first scenario where these two techniques achieve their fastest execution times, Metropolis-C1 is faster than the others, but yields the worst results in quality. However, Metropolis-C2 is closer to Metropolis resampling in quality. In the second scenario where all three algorithms yield similar quality, although Metropolis-C1 and Metropolis-C2 get slower, they are still faster than the original Metropolis resampling.
doi_str_mv 10.1007/s11265-017-1254-6
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2009642733</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2009642733</sourcerecordid><originalsourceid>FETCH-LOGICAL-c355t-6620cf134ca8948562ddf333432da2ebd32862235a9cd2db4b27b5223cf818243</originalsourceid><addsrcrecordid>eNp1kE1LxDAQhoMouFZ_gLeC52pm0qTtURZdF3ZRxD2HNE3XLm1Tk-5h_70pVTx5mmHej4GHkFug90Bp9uABUPCEQpYA8jQRZ2QBBSuSHICf_-4U8kty5f2BUkEzDguy25rOulO8tKo1Xjf9Pl53Q2s6049qbGwf2zremtHZwbaNj9-NV0GffEFbOTV8NtrHb85q4_103vXNeE0uatV6c_MzI7J7fvpYviSb19V6-bhJNON8TIRAqmtgqVZ5keZcYFXVjLGUYaXQlBXDXCAyrgpdYVWmJWYlDwdd55BjyiJyN_cOzn4djR_lwR5dH15KpLQQKWahLiIwu7Sz3jtTy8E1nXInCVRO9ORMTwZ6cqInRcjgnPHB2--N-2v-P_QNl5hx8g</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2009642733</pqid></control><display><type>article</type><title>Memory Coalescing Implementation of Metropolis Resampling on Graphics Processing Unit</title><source>SpringerLink Journals</source><creator>Dülger, Özcan ; Oğuztüzün, Halit ; Demirekler, Mübeccel</creator><creatorcontrib>Dülger, Özcan ; Oğuztüzün, Halit ; Demirekler, Mübeccel</creatorcontrib><description>Owing to many cores in its architecture, graphics processing unit (GPU) offers promise for parallel execution of the particle filter. A stage of the particle filter that is particularly challenging to parallelize is resampling. There are parallel resampling algorithms in the literature such as Metropolis resampling, which does not require a collective operation such as cumulative sum over weights and does not suffer from numerical instability. However, with large number of particles, Metropolis resampling becomes slow. This is because of the non-coalesced access problem on the global memory of the GPU. In this article, we offer solutions for this problem of Metropolis resampling. We introduce two implementation techniques, named Metropolis-C1 and Metropolis-C2, and compare them with the original Metropolis resampling on NVIDIA Tesla K40 board. In the first scenario where these two techniques achieve their fastest execution times, Metropolis-C1 is faster than the others, but yields the worst results in quality. However, Metropolis-C2 is closer to Metropolis resampling in quality. In the second scenario where all three algorithms yield similar quality, although Metropolis-C1 and Metropolis-C2 get slower, they are still faster than the original Metropolis resampling.</description><identifier>ISSN: 1939-8018</identifier><identifier>EISSN: 1939-8115</identifier><identifier>DOI: 10.1007/s11265-017-1254-6</identifier><language>eng</language><publisher>New York: Springer US</publisher><subject>Algorithms ; Circuits and Systems ; Coalescing ; Computer Imaging ; Electrical Engineering ; Engineering ; Graphics boards ; Graphics processing units ; Image Processing and Computer Vision ; Parallel processing ; Pattern Recognition ; Pattern Recognition and Graphics ; Resampling ; Signal,Image and Speech Processing ; Stability ; Vision</subject><ispartof>Journal of signal processing systems, 2018-03, Vol.90 (3), p.433-447</ispartof><rights>Springer Science+Business Media New York 2017</rights><rights>Copyright Springer Science &amp; Business Media 2018</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c355t-6620cf134ca8948562ddf333432da2ebd32862235a9cd2db4b27b5223cf818243</citedby><cites>FETCH-LOGICAL-c355t-6620cf134ca8948562ddf333432da2ebd32862235a9cd2db4b27b5223cf818243</cites><orcidid>0000-0001-9928-0441 ; 0000-0001-7525-1064</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s11265-017-1254-6$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s11265-017-1254-6$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>314,780,784,27924,27925,41488,42557,51319</link.rule.ids></links><search><creatorcontrib>Dülger, Özcan</creatorcontrib><creatorcontrib>Oğuztüzün, Halit</creatorcontrib><creatorcontrib>Demirekler, Mübeccel</creatorcontrib><title>Memory Coalescing Implementation of Metropolis Resampling on Graphics Processing Unit</title><title>Journal of signal processing systems</title><addtitle>J Sign Process Syst</addtitle><description>Owing to many cores in its architecture, graphics processing unit (GPU) offers promise for parallel execution of the particle filter. A stage of the particle filter that is particularly challenging to parallelize is resampling. There are parallel resampling algorithms in the literature such as Metropolis resampling, which does not require a collective operation such as cumulative sum over weights and does not suffer from numerical instability. However, with large number of particles, Metropolis resampling becomes slow. This is because of the non-coalesced access problem on the global memory of the GPU. In this article, we offer solutions for this problem of Metropolis resampling. We introduce two implementation techniques, named Metropolis-C1 and Metropolis-C2, and compare them with the original Metropolis resampling on NVIDIA Tesla K40 board. In the first scenario where these two techniques achieve their fastest execution times, Metropolis-C1 is faster than the others, but yields the worst results in quality. However, Metropolis-C2 is closer to Metropolis resampling in quality. In the second scenario where all three algorithms yield similar quality, although Metropolis-C1 and Metropolis-C2 get slower, they are still faster than the original Metropolis resampling.</description><subject>Algorithms</subject><subject>Circuits and Systems</subject><subject>Coalescing</subject><subject>Computer Imaging</subject><subject>Electrical Engineering</subject><subject>Engineering</subject><subject>Graphics boards</subject><subject>Graphics processing units</subject><subject>Image Processing and Computer Vision</subject><subject>Parallel processing</subject><subject>Pattern Recognition</subject><subject>Pattern Recognition and Graphics</subject><subject>Resampling</subject><subject>Signal,Image and Speech Processing</subject><subject>Stability</subject><subject>Vision</subject><issn>1939-8018</issn><issn>1939-8115</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2018</creationdate><recordtype>article</recordtype><recordid>eNp1kE1LxDAQhoMouFZ_gLeC52pm0qTtURZdF3ZRxD2HNE3XLm1Tk-5h_70pVTx5mmHej4GHkFug90Bp9uABUPCEQpYA8jQRZ2QBBSuSHICf_-4U8kty5f2BUkEzDguy25rOulO8tKo1Xjf9Pl53Q2s6049qbGwf2zremtHZwbaNj9-NV0GffEFbOTV8NtrHb85q4_103vXNeE0uatV6c_MzI7J7fvpYviSb19V6-bhJNON8TIRAqmtgqVZ5keZcYFXVjLGUYaXQlBXDXCAyrgpdYVWmJWYlDwdd55BjyiJyN_cOzn4djR_lwR5dH15KpLQQKWahLiIwu7Sz3jtTy8E1nXInCVRO9ORMTwZ6cqInRcjgnPHB2--N-2v-P_QNl5hx8g</recordid><startdate>20180301</startdate><enddate>20180301</enddate><creator>Dülger, Özcan</creator><creator>Oğuztüzün, Halit</creator><creator>Demirekler, Mübeccel</creator><general>Springer US</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><orcidid>https://orcid.org/0000-0001-9928-0441</orcidid><orcidid>https://orcid.org/0000-0001-7525-1064</orcidid></search><sort><creationdate>20180301</creationdate><title>Memory Coalescing Implementation of Metropolis Resampling on Graphics Processing Unit</title><author>Dülger, Özcan ; Oğuztüzün, Halit ; Demirekler, Mübeccel</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c355t-6620cf134ca8948562ddf333432da2ebd32862235a9cd2db4b27b5223cf818243</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2018</creationdate><topic>Algorithms</topic><topic>Circuits and Systems</topic><topic>Coalescing</topic><topic>Computer Imaging</topic><topic>Electrical Engineering</topic><topic>Engineering</topic><topic>Graphics boards</topic><topic>Graphics processing units</topic><topic>Image Processing and Computer Vision</topic><topic>Parallel processing</topic><topic>Pattern Recognition</topic><topic>Pattern Recognition and Graphics</topic><topic>Resampling</topic><topic>Signal,Image and Speech Processing</topic><topic>Stability</topic><topic>Vision</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Dülger, Özcan</creatorcontrib><creatorcontrib>Oğuztüzün, Halit</creatorcontrib><creatorcontrib>Demirekler, Mübeccel</creatorcontrib><collection>CrossRef</collection><jtitle>Journal of signal processing systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Dülger, Özcan</au><au>Oğuztüzün, Halit</au><au>Demirekler, Mübeccel</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Memory Coalescing Implementation of Metropolis Resampling on Graphics Processing Unit</atitle><jtitle>Journal of signal processing systems</jtitle><stitle>J Sign Process Syst</stitle><date>2018-03-01</date><risdate>2018</risdate><volume>90</volume><issue>3</issue><spage>433</spage><epage>447</epage><pages>433-447</pages><issn>1939-8018</issn><eissn>1939-8115</eissn><abstract>Owing to many cores in its architecture, graphics processing unit (GPU) offers promise for parallel execution of the particle filter. A stage of the particle filter that is particularly challenging to parallelize is resampling. There are parallel resampling algorithms in the literature such as Metropolis resampling, which does not require a collective operation such as cumulative sum over weights and does not suffer from numerical instability. However, with large number of particles, Metropolis resampling becomes slow. This is because of the non-coalesced access problem on the global memory of the GPU. In this article, we offer solutions for this problem of Metropolis resampling. We introduce two implementation techniques, named Metropolis-C1 and Metropolis-C2, and compare them with the original Metropolis resampling on NVIDIA Tesla K40 board. In the first scenario where these two techniques achieve their fastest execution times, Metropolis-C1 is faster than the others, but yields the worst results in quality. However, Metropolis-C2 is closer to Metropolis resampling in quality. In the second scenario where all three algorithms yield similar quality, although Metropolis-C1 and Metropolis-C2 get slower, they are still faster than the original Metropolis resampling.</abstract><cop>New York</cop><pub>Springer US</pub><doi>10.1007/s11265-017-1254-6</doi><tpages>15</tpages><orcidid>https://orcid.org/0000-0001-9928-0441</orcidid><orcidid>https://orcid.org/0000-0001-7525-1064</orcidid></addata></record>
fulltext fulltext
identifier ISSN: 1939-8018
ispartof Journal of signal processing systems, 2018-03, Vol.90 (3), p.433-447
issn 1939-8018
1939-8115
language eng
recordid cdi_proquest_journals_2009642733
source SpringerLink Journals
subjects Algorithms
Circuits and Systems
Coalescing
Computer Imaging
Electrical Engineering
Engineering
Graphics boards
Graphics processing units
Image Processing and Computer Vision
Parallel processing
Pattern Recognition
Pattern Recognition and Graphics
Resampling
Signal,Image and Speech Processing
Stability
Vision
title Memory Coalescing Implementation of Metropolis Resampling on Graphics Processing Unit
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-26T05%3A15%3A54IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Memory%20Coalescing%20Implementation%20of%20Metropolis%20Resampling%20on%20Graphics%20Processing%20Unit&rft.jtitle=Journal%20of%20signal%20processing%20systems&rft.au=D%C3%BClger,%20%C3%96zcan&rft.date=2018-03-01&rft.volume=90&rft.issue=3&rft.spage=433&rft.epage=447&rft.pages=433-447&rft.issn=1939-8018&rft.eissn=1939-8115&rft_id=info:doi/10.1007/s11265-017-1254-6&rft_dat=%3Cproquest_cross%3E2009642733%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2009642733&rft_id=info:pmid/&rfr_iscdi=true