Memory Coalescing Implementation of Metropolis Resampling on Graphics Processing Unit
Owing to many cores in its architecture, graphics processing unit (GPU) offers promise for parallel execution of the particle filter. A stage of the particle filter that is particularly challenging to parallelize is resampling. There are parallel resampling algorithms in the literature such as Metro...
Gespeichert in:
Veröffentlicht in: | Journal of signal processing systems 2018-03, Vol.90 (3), p.433-447 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 447 |
---|---|
container_issue | 3 |
container_start_page | 433 |
container_title | Journal of signal processing systems |
container_volume | 90 |
creator | Dülger, Özcan Oğuztüzün, Halit Demirekler, Mübeccel |
description | Owing to many cores in its architecture, graphics processing unit (GPU) offers promise for parallel execution of the particle filter. A stage of the particle filter that is particularly challenging to parallelize is resampling. There are parallel resampling algorithms in the literature such as Metropolis resampling, which does not require a collective operation such as cumulative sum over weights and does not suffer from numerical instability. However, with large number of particles, Metropolis resampling becomes slow. This is because of the non-coalesced access problem on the global memory of the GPU. In this article, we offer solutions for this problem of Metropolis resampling. We introduce two implementation techniques, named Metropolis-C1 and Metropolis-C2, and compare them with the original Metropolis resampling on NVIDIA Tesla K40 board. In the first scenario where these two techniques achieve their fastest execution times, Metropolis-C1 is faster than the others, but yields the worst results in quality. However, Metropolis-C2 is closer to Metropolis resampling in quality. In the second scenario where all three algorithms yield similar quality, although Metropolis-C1 and Metropolis-C2 get slower, they are still faster than the original Metropolis resampling. |
doi_str_mv | 10.1007/s11265-017-1254-6 |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2009642733</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2009642733</sourcerecordid><originalsourceid>FETCH-LOGICAL-c355t-6620cf134ca8948562ddf333432da2ebd32862235a9cd2db4b27b5223cf818243</originalsourceid><addsrcrecordid>eNp1kE1LxDAQhoMouFZ_gLeC52pm0qTtURZdF3ZRxD2HNE3XLm1Tk-5h_70pVTx5mmHej4GHkFug90Bp9uABUPCEQpYA8jQRZ2QBBSuSHICf_-4U8kty5f2BUkEzDguy25rOulO8tKo1Xjf9Pl53Q2s6049qbGwf2zremtHZwbaNj9-NV0GffEFbOTV8NtrHb85q4_103vXNeE0uatV6c_MzI7J7fvpYviSb19V6-bhJNON8TIRAqmtgqVZ5keZcYFXVjLGUYaXQlBXDXCAyrgpdYVWmJWYlDwdd55BjyiJyN_cOzn4djR_lwR5dH15KpLQQKWahLiIwu7Sz3jtTy8E1nXInCVRO9ORMTwZ6cqInRcjgnPHB2--N-2v-P_QNl5hx8g</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2009642733</pqid></control><display><type>article</type><title>Memory Coalescing Implementation of Metropolis Resampling on Graphics Processing Unit</title><source>SpringerLink Journals</source><creator>Dülger, Özcan ; Oğuztüzün, Halit ; Demirekler, Mübeccel</creator><creatorcontrib>Dülger, Özcan ; Oğuztüzün, Halit ; Demirekler, Mübeccel</creatorcontrib><description>Owing to many cores in its architecture, graphics processing unit (GPU) offers promise for parallel execution of the particle filter. A stage of the particle filter that is particularly challenging to parallelize is resampling. There are parallel resampling algorithms in the literature such as Metropolis resampling, which does not require a collective operation such as cumulative sum over weights and does not suffer from numerical instability. However, with large number of particles, Metropolis resampling becomes slow. This is because of the non-coalesced access problem on the global memory of the GPU. In this article, we offer solutions for this problem of Metropolis resampling. We introduce two implementation techniques, named Metropolis-C1 and Metropolis-C2, and compare them with the original Metropolis resampling on NVIDIA Tesla K40 board. In the first scenario where these two techniques achieve their fastest execution times, Metropolis-C1 is faster than the others, but yields the worst results in quality. However, Metropolis-C2 is closer to Metropolis resampling in quality. In the second scenario where all three algorithms yield similar quality, although Metropolis-C1 and Metropolis-C2 get slower, they are still faster than the original Metropolis resampling.</description><identifier>ISSN: 1939-8018</identifier><identifier>EISSN: 1939-8115</identifier><identifier>DOI: 10.1007/s11265-017-1254-6</identifier><language>eng</language><publisher>New York: Springer US</publisher><subject>Algorithms ; Circuits and Systems ; Coalescing ; Computer Imaging ; Electrical Engineering ; Engineering ; Graphics boards ; Graphics processing units ; Image Processing and Computer Vision ; Parallel processing ; Pattern Recognition ; Pattern Recognition and Graphics ; Resampling ; Signal,Image and Speech Processing ; Stability ; Vision</subject><ispartof>Journal of signal processing systems, 2018-03, Vol.90 (3), p.433-447</ispartof><rights>Springer Science+Business Media New York 2017</rights><rights>Copyright Springer Science & Business Media 2018</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c355t-6620cf134ca8948562ddf333432da2ebd32862235a9cd2db4b27b5223cf818243</citedby><cites>FETCH-LOGICAL-c355t-6620cf134ca8948562ddf333432da2ebd32862235a9cd2db4b27b5223cf818243</cites><orcidid>0000-0001-9928-0441 ; 0000-0001-7525-1064</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s11265-017-1254-6$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s11265-017-1254-6$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>314,780,784,27924,27925,41488,42557,51319</link.rule.ids></links><search><creatorcontrib>Dülger, Özcan</creatorcontrib><creatorcontrib>Oğuztüzün, Halit</creatorcontrib><creatorcontrib>Demirekler, Mübeccel</creatorcontrib><title>Memory Coalescing Implementation of Metropolis Resampling on Graphics Processing Unit</title><title>Journal of signal processing systems</title><addtitle>J Sign Process Syst</addtitle><description>Owing to many cores in its architecture, graphics processing unit (GPU) offers promise for parallel execution of the particle filter. A stage of the particle filter that is particularly challenging to parallelize is resampling. There are parallel resampling algorithms in the literature such as Metropolis resampling, which does not require a collective operation such as cumulative sum over weights and does not suffer from numerical instability. However, with large number of particles, Metropolis resampling becomes slow. This is because of the non-coalesced access problem on the global memory of the GPU. In this article, we offer solutions for this problem of Metropolis resampling. We introduce two implementation techniques, named Metropolis-C1 and Metropolis-C2, and compare them with the original Metropolis resampling on NVIDIA Tesla K40 board. In the first scenario where these two techniques achieve their fastest execution times, Metropolis-C1 is faster than the others, but yields the worst results in quality. However, Metropolis-C2 is closer to Metropolis resampling in quality. In the second scenario where all three algorithms yield similar quality, although Metropolis-C1 and Metropolis-C2 get slower, they are still faster than the original Metropolis resampling.</description><subject>Algorithms</subject><subject>Circuits and Systems</subject><subject>Coalescing</subject><subject>Computer Imaging</subject><subject>Electrical Engineering</subject><subject>Engineering</subject><subject>Graphics boards</subject><subject>Graphics processing units</subject><subject>Image Processing and Computer Vision</subject><subject>Parallel processing</subject><subject>Pattern Recognition</subject><subject>Pattern Recognition and Graphics</subject><subject>Resampling</subject><subject>Signal,Image and Speech Processing</subject><subject>Stability</subject><subject>Vision</subject><issn>1939-8018</issn><issn>1939-8115</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2018</creationdate><recordtype>article</recordtype><recordid>eNp1kE1LxDAQhoMouFZ_gLeC52pm0qTtURZdF3ZRxD2HNE3XLm1Tk-5h_70pVTx5mmHej4GHkFug90Bp9uABUPCEQpYA8jQRZ2QBBSuSHICf_-4U8kty5f2BUkEzDguy25rOulO8tKo1Xjf9Pl53Q2s6049qbGwf2zremtHZwbaNj9-NV0GffEFbOTV8NtrHb85q4_103vXNeE0uatV6c_MzI7J7fvpYviSb19V6-bhJNON8TIRAqmtgqVZ5keZcYFXVjLGUYaXQlBXDXCAyrgpdYVWmJWYlDwdd55BjyiJyN_cOzn4djR_lwR5dH15KpLQQKWahLiIwu7Sz3jtTy8E1nXInCVRO9ORMTwZ6cqInRcjgnPHB2--N-2v-P_QNl5hx8g</recordid><startdate>20180301</startdate><enddate>20180301</enddate><creator>Dülger, Özcan</creator><creator>Oğuztüzün, Halit</creator><creator>Demirekler, Mübeccel</creator><general>Springer US</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><orcidid>https://orcid.org/0000-0001-9928-0441</orcidid><orcidid>https://orcid.org/0000-0001-7525-1064</orcidid></search><sort><creationdate>20180301</creationdate><title>Memory Coalescing Implementation of Metropolis Resampling on Graphics Processing Unit</title><author>Dülger, Özcan ; Oğuztüzün, Halit ; Demirekler, Mübeccel</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c355t-6620cf134ca8948562ddf333432da2ebd32862235a9cd2db4b27b5223cf818243</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2018</creationdate><topic>Algorithms</topic><topic>Circuits and Systems</topic><topic>Coalescing</topic><topic>Computer Imaging</topic><topic>Electrical Engineering</topic><topic>Engineering</topic><topic>Graphics boards</topic><topic>Graphics processing units</topic><topic>Image Processing and Computer Vision</topic><topic>Parallel processing</topic><topic>Pattern Recognition</topic><topic>Pattern Recognition and Graphics</topic><topic>Resampling</topic><topic>Signal,Image and Speech Processing</topic><topic>Stability</topic><topic>Vision</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Dülger, Özcan</creatorcontrib><creatorcontrib>Oğuztüzün, Halit</creatorcontrib><creatorcontrib>Demirekler, Mübeccel</creatorcontrib><collection>CrossRef</collection><jtitle>Journal of signal processing systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Dülger, Özcan</au><au>Oğuztüzün, Halit</au><au>Demirekler, Mübeccel</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Memory Coalescing Implementation of Metropolis Resampling on Graphics Processing Unit</atitle><jtitle>Journal of signal processing systems</jtitle><stitle>J Sign Process Syst</stitle><date>2018-03-01</date><risdate>2018</risdate><volume>90</volume><issue>3</issue><spage>433</spage><epage>447</epage><pages>433-447</pages><issn>1939-8018</issn><eissn>1939-8115</eissn><abstract>Owing to many cores in its architecture, graphics processing unit (GPU) offers promise for parallel execution of the particle filter. A stage of the particle filter that is particularly challenging to parallelize is resampling. There are parallel resampling algorithms in the literature such as Metropolis resampling, which does not require a collective operation such as cumulative sum over weights and does not suffer from numerical instability. However, with large number of particles, Metropolis resampling becomes slow. This is because of the non-coalesced access problem on the global memory of the GPU. In this article, we offer solutions for this problem of Metropolis resampling. We introduce two implementation techniques, named Metropolis-C1 and Metropolis-C2, and compare them with the original Metropolis resampling on NVIDIA Tesla K40 board. In the first scenario where these two techniques achieve their fastest execution times, Metropolis-C1 is faster than the others, but yields the worst results in quality. However, Metropolis-C2 is closer to Metropolis resampling in quality. In the second scenario where all three algorithms yield similar quality, although Metropolis-C1 and Metropolis-C2 get slower, they are still faster than the original Metropolis resampling.</abstract><cop>New York</cop><pub>Springer US</pub><doi>10.1007/s11265-017-1254-6</doi><tpages>15</tpages><orcidid>https://orcid.org/0000-0001-9928-0441</orcidid><orcidid>https://orcid.org/0000-0001-7525-1064</orcidid></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1939-8018 |
ispartof | Journal of signal processing systems, 2018-03, Vol.90 (3), p.433-447 |
issn | 1939-8018 1939-8115 |
language | eng |
recordid | cdi_proquest_journals_2009642733 |
source | SpringerLink Journals |
subjects | Algorithms Circuits and Systems Coalescing Computer Imaging Electrical Engineering Engineering Graphics boards Graphics processing units Image Processing and Computer Vision Parallel processing Pattern Recognition Pattern Recognition and Graphics Resampling Signal,Image and Speech Processing Stability Vision |
title | Memory Coalescing Implementation of Metropolis Resampling on Graphics Processing Unit |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-26T05%3A15%3A54IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Memory%20Coalescing%20Implementation%20of%20Metropolis%20Resampling%20on%20Graphics%20Processing%20Unit&rft.jtitle=Journal%20of%20signal%20processing%20systems&rft.au=D%C3%BClger,%20%C3%96zcan&rft.date=2018-03-01&rft.volume=90&rft.issue=3&rft.spage=433&rft.epage=447&rft.pages=433-447&rft.issn=1939-8018&rft.eissn=1939-8115&rft_id=info:doi/10.1007/s11265-017-1254-6&rft_dat=%3Cproquest_cross%3E2009642733%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2009642733&rft_id=info:pmid/&rfr_iscdi=true |