Fast GPU 3D diffeomorphic image registration
3D image registration is one of the most fundamental and computationally expensive operations in medical image analysis. Here, we present a mixed-precision, Gauss–Newton–Krylov solver for diffeomorphic registration of two images. Our work extends the publicly available CLAIRE library to GPU architec...
Gespeichert in:
Veröffentlicht in: | Journal of parallel and distributed computing 2021-03, Vol.149 (C), p.149-162 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 162 |
---|---|
container_issue | C |
container_start_page | 149 |
container_title | Journal of parallel and distributed computing |
container_volume | 149 |
creator | Brunn, Malte Himthani, Naveen Biros, George Mehl, Miriam Mang, Andreas |
description | 3D image registration is one of the most fundamental and computationally expensive operations in medical image analysis. Here, we present a mixed-precision, Gauss–Newton–Krylov solver for diffeomorphic registration of two images. Our work extends the publicly available CLAIRE library to GPU architectures. Despite the importance of image registration, only a few implementations of large deformation diffeomorphic registration packages support GPUs. Our contributions are new algorithms to significantly reduce the run time of the two main computational kernels in CLAIRE: calculation of derivatives and scattered-data interpolation. We deploy (i) highly-optimized, mixed-precision GPU-kernels for the evaluation of scattered-data interpolation, (ii) replace Fast-Fourier-Transform (FFT)-based first-order derivatives with optimized 8th-order finite differences, and (iii) compare with state-of-the-art CPU and GPU implementations. As a highlight, we demonstrate that we can register 2563 clinical images in less than 6 s on a single NVIDIA Tesla V100. This amounts to over 20× speed-up over the current version of CLAIRE and over 30× speed-up over existing GPU implementations.
•The LDDMM software CLAIRE is ported to GPU.•Compute intensive kernels are optimized.•A mixed-precision approach with Fast-Fourier-Transforms and finite differences is used.•Hardware acceleration is used for linear and cubic interpolations.•Clinical images can be registered in less than 6 seconds. |
doi_str_mv | 10.1016/j.jpdc.2020.11.006 |
format | Article |
fullrecord | <record><control><sourceid>proquest_pubme</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_7769216</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S074373152030407X</els_id><sourcerecordid>2474467774</sourcerecordid><originalsourceid>FETCH-LOGICAL-c482t-d92b516cf1fb79bd87ce737048d8e7528e5730c40e19ed7bf21cb6036dad549f3</originalsourceid><addsrcrecordid>eNp9kUFv1DAQhS0EotvCH-CAIk49kO2M7diOhJBQaUulSnCgZyuxJ7te7caLna3Ev6-jLRVcOFkaf_PmzTzG3iEsEVBdbJabvXdLDrwUcAmgXrAFQqtqMNK8ZAvQUtRaYHPCTnPeACA22rxmJ0IIA1q1C_bxustTdfPjvhJfKx-GgeIupv06uCrsuhVViVYhT6mbQhzfsFdDt8309uk9Y_fXVz8vv9V3329uL7_c1U4aPtW-5X2Dyg049LrtvdGOtNAgjTekG26o0QKcBMKWvO4Hjq5XIJTvfCPbQZyxz0fd_aHfkXc0FgNbu0_FUvptYxfsvz9jWNtVfLC67MRRFYEPR4GYp2CzCxO5tYvjSG6yaBqQQhbo_GlKir8OlCe7C9nRdtuNFA_ZcqmlVFrrGeVH1KWYc6Lh2QuCnbOwGztnYecsLKItWZSm939v8dzy5_gF-HQEqNzyIVCandLoyIc0G_Ux_E__EcVrmc4</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2474467774</pqid></control><display><type>article</type><title>Fast GPU 3D diffeomorphic image registration</title><source>Elsevier ScienceDirect Journals Complete</source><creator>Brunn, Malte ; Himthani, Naveen ; Biros, George ; Mehl, Miriam ; Mang, Andreas</creator><creatorcontrib>Brunn, Malte ; Himthani, Naveen ; Biros, George ; Mehl, Miriam ; Mang, Andreas ; Univ. of Texas, Austin, TX (United States) ; Duke Univ., Durham, NC (United States)</creatorcontrib><description>3D image registration is one of the most fundamental and computationally expensive operations in medical image analysis. Here, we present a mixed-precision, Gauss–Newton–Krylov solver for diffeomorphic registration of two images. Our work extends the publicly available CLAIRE library to GPU architectures. Despite the importance of image registration, only a few implementations of large deformation diffeomorphic registration packages support GPUs. Our contributions are new algorithms to significantly reduce the run time of the two main computational kernels in CLAIRE: calculation of derivatives and scattered-data interpolation. We deploy (i) highly-optimized, mixed-precision GPU-kernels for the evaluation of scattered-data interpolation, (ii) replace Fast-Fourier-Transform (FFT)-based first-order derivatives with optimized 8th-order finite differences, and (iii) compare with state-of-the-art CPU and GPU implementations. As a highlight, we demonstrate that we can register 2563 clinical images in less than 6 s on a single NVIDIA Tesla V100. This amounts to over 20× speed-up over the current version of CLAIRE and over 30× speed-up over existing GPU implementations.
•The LDDMM software CLAIRE is ported to GPU.•Compute intensive kernels are optimized.•A mixed-precision approach with Fast-Fourier-Transforms and finite differences is used.•Hardware acceleration is used for linear and cubic interpolations.•Clinical images can be registered in less than 6 seconds.</description><identifier>ISSN: 0743-7315</identifier><identifier>EISSN: 1096-0848</identifier><identifier>DOI: 10.1016/j.jpdc.2020.11.006</identifier><identifier>PMID: 33380769</identifier><language>eng</language><publisher>United States: Elsevier Inc</publisher><subject>Computer Science ; Diffeomorphic image registration ; Gauss–Newton–Krylov method ; GPU computing ; MATHEMATICS AND COMPUTING ; Mixed-precision solver ; Parallel optimization</subject><ispartof>Journal of parallel and distributed computing, 2021-03, Vol.149 (C), p.149-162</ispartof><rights>2020 Elsevier Inc.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c482t-d92b516cf1fb79bd87ce737048d8e7528e5730c40e19ed7bf21cb6036dad549f3</citedby><cites>FETCH-LOGICAL-c482t-d92b516cf1fb79bd87ce737048d8e7528e5730c40e19ed7bf21cb6036dad549f3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://dx.doi.org/10.1016/j.jpdc.2020.11.006$$EHTML$$P50$$Gelsevier$$H</linktohtml><link.rule.ids>230,314,780,784,885,3550,27924,27925,45995</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/33380769$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink><backlink>$$Uhttps://www.osti.gov/servlets/purl/1850434$$D View this record in Osti.gov$$Hfree_for_read</backlink></links><search><creatorcontrib>Brunn, Malte</creatorcontrib><creatorcontrib>Himthani, Naveen</creatorcontrib><creatorcontrib>Biros, George</creatorcontrib><creatorcontrib>Mehl, Miriam</creatorcontrib><creatorcontrib>Mang, Andreas</creatorcontrib><creatorcontrib>Univ. of Texas, Austin, TX (United States)</creatorcontrib><creatorcontrib>Duke Univ., Durham, NC (United States)</creatorcontrib><title>Fast GPU 3D diffeomorphic image registration</title><title>Journal of parallel and distributed computing</title><addtitle>J Parallel Distrib Comput</addtitle><description>3D image registration is one of the most fundamental and computationally expensive operations in medical image analysis. Here, we present a mixed-precision, Gauss–Newton–Krylov solver for diffeomorphic registration of two images. Our work extends the publicly available CLAIRE library to GPU architectures. Despite the importance of image registration, only a few implementations of large deformation diffeomorphic registration packages support GPUs. Our contributions are new algorithms to significantly reduce the run time of the two main computational kernels in CLAIRE: calculation of derivatives and scattered-data interpolation. We deploy (i) highly-optimized, mixed-precision GPU-kernels for the evaluation of scattered-data interpolation, (ii) replace Fast-Fourier-Transform (FFT)-based first-order derivatives with optimized 8th-order finite differences, and (iii) compare with state-of-the-art CPU and GPU implementations. As a highlight, we demonstrate that we can register 2563 clinical images in less than 6 s on a single NVIDIA Tesla V100. This amounts to over 20× speed-up over the current version of CLAIRE and over 30× speed-up over existing GPU implementations.
•The LDDMM software CLAIRE is ported to GPU.•Compute intensive kernels are optimized.•A mixed-precision approach with Fast-Fourier-Transforms and finite differences is used.•Hardware acceleration is used for linear and cubic interpolations.•Clinical images can be registered in less than 6 seconds.</description><subject>Computer Science</subject><subject>Diffeomorphic image registration</subject><subject>Gauss–Newton–Krylov method</subject><subject>GPU computing</subject><subject>MATHEMATICS AND COMPUTING</subject><subject>Mixed-precision solver</subject><subject>Parallel optimization</subject><issn>0743-7315</issn><issn>1096-0848</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><recordid>eNp9kUFv1DAQhS0EotvCH-CAIk49kO2M7diOhJBQaUulSnCgZyuxJ7te7caLna3Ev6-jLRVcOFkaf_PmzTzG3iEsEVBdbJabvXdLDrwUcAmgXrAFQqtqMNK8ZAvQUtRaYHPCTnPeACA22rxmJ0IIA1q1C_bxustTdfPjvhJfKx-GgeIupv06uCrsuhVViVYhT6mbQhzfsFdDt8309uk9Y_fXVz8vv9V3329uL7_c1U4aPtW-5X2Dyg049LrtvdGOtNAgjTekG26o0QKcBMKWvO4Hjq5XIJTvfCPbQZyxz0fd_aHfkXc0FgNbu0_FUvptYxfsvz9jWNtVfLC67MRRFYEPR4GYp2CzCxO5tYvjSG6yaBqQQhbo_GlKir8OlCe7C9nRdtuNFA_ZcqmlVFrrGeVH1KWYc6Lh2QuCnbOwGztnYecsLKItWZSm939v8dzy5_gF-HQEqNzyIVCandLoyIc0G_Ux_E__EcVrmc4</recordid><startdate>20210301</startdate><enddate>20210301</enddate><creator>Brunn, Malte</creator><creator>Himthani, Naveen</creator><creator>Biros, George</creator><creator>Mehl, Miriam</creator><creator>Mang, Andreas</creator><general>Elsevier Inc</general><general>Elsevier</general><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope><scope>OIOZB</scope><scope>OTOTI</scope><scope>5PM</scope></search><sort><creationdate>20210301</creationdate><title>Fast GPU 3D diffeomorphic image registration</title><author>Brunn, Malte ; Himthani, Naveen ; Biros, George ; Mehl, Miriam ; Mang, Andreas</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c482t-d92b516cf1fb79bd87ce737048d8e7528e5730c40e19ed7bf21cb6036dad549f3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Computer Science</topic><topic>Diffeomorphic image registration</topic><topic>Gauss–Newton–Krylov method</topic><topic>GPU computing</topic><topic>MATHEMATICS AND COMPUTING</topic><topic>Mixed-precision solver</topic><topic>Parallel optimization</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Brunn, Malte</creatorcontrib><creatorcontrib>Himthani, Naveen</creatorcontrib><creatorcontrib>Biros, George</creatorcontrib><creatorcontrib>Mehl, Miriam</creatorcontrib><creatorcontrib>Mang, Andreas</creatorcontrib><creatorcontrib>Univ. of Texas, Austin, TX (United States)</creatorcontrib><creatorcontrib>Duke Univ., Durham, NC (United States)</creatorcontrib><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><collection>OSTI.GOV - Hybrid</collection><collection>OSTI.GOV</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>Journal of parallel and distributed computing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Brunn, Malte</au><au>Himthani, Naveen</au><au>Biros, George</au><au>Mehl, Miriam</au><au>Mang, Andreas</au><aucorp>Univ. of Texas, Austin, TX (United States)</aucorp><aucorp>Duke Univ., Durham, NC (United States)</aucorp><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Fast GPU 3D diffeomorphic image registration</atitle><jtitle>Journal of parallel and distributed computing</jtitle><addtitle>J Parallel Distrib Comput</addtitle><date>2021-03-01</date><risdate>2021</risdate><volume>149</volume><issue>C</issue><spage>149</spage><epage>162</epage><pages>149-162</pages><issn>0743-7315</issn><eissn>1096-0848</eissn><abstract>3D image registration is one of the most fundamental and computationally expensive operations in medical image analysis. Here, we present a mixed-precision, Gauss–Newton–Krylov solver for diffeomorphic registration of two images. Our work extends the publicly available CLAIRE library to GPU architectures. Despite the importance of image registration, only a few implementations of large deformation diffeomorphic registration packages support GPUs. Our contributions are new algorithms to significantly reduce the run time of the two main computational kernels in CLAIRE: calculation of derivatives and scattered-data interpolation. We deploy (i) highly-optimized, mixed-precision GPU-kernels for the evaluation of scattered-data interpolation, (ii) replace Fast-Fourier-Transform (FFT)-based first-order derivatives with optimized 8th-order finite differences, and (iii) compare with state-of-the-art CPU and GPU implementations. As a highlight, we demonstrate that we can register 2563 clinical images in less than 6 s on a single NVIDIA Tesla V100. This amounts to over 20× speed-up over the current version of CLAIRE and over 30× speed-up over existing GPU implementations.
•The LDDMM software CLAIRE is ported to GPU.•Compute intensive kernels are optimized.•A mixed-precision approach with Fast-Fourier-Transforms and finite differences is used.•Hardware acceleration is used for linear and cubic interpolations.•Clinical images can be registered in less than 6 seconds.</abstract><cop>United States</cop><pub>Elsevier Inc</pub><pmid>33380769</pmid><doi>10.1016/j.jpdc.2020.11.006</doi><tpages>14</tpages><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0743-7315 |
ispartof | Journal of parallel and distributed computing, 2021-03, Vol.149 (C), p.149-162 |
issn | 0743-7315 1096-0848 |
language | eng |
recordid | cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_7769216 |
source | Elsevier ScienceDirect Journals Complete |
subjects | Computer Science Diffeomorphic image registration Gauss–Newton–Krylov method GPU computing MATHEMATICS AND COMPUTING Mixed-precision solver Parallel optimization |
title | Fast GPU 3D diffeomorphic image registration |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-28T06%3A59%3A13IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Fast%20GPU%203D%20diffeomorphic%20image%20registration&rft.jtitle=Journal%20of%20parallel%20and%20distributed%20computing&rft.au=Brunn,%20Malte&rft.aucorp=Univ.%20of%20Texas,%20Austin,%20TX%20(United%20States)&rft.date=2021-03-01&rft.volume=149&rft.issue=C&rft.spage=149&rft.epage=162&rft.pages=149-162&rft.issn=0743-7315&rft.eissn=1096-0848&rft_id=info:doi/10.1016/j.jpdc.2020.11.006&rft_dat=%3Cproquest_pubme%3E2474467774%3C/proquest_pubme%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2474467774&rft_id=info:pmid/33380769&rft_els_id=S074373152030407X&rfr_iscdi=true |