DAE-Former: Dual Attention-guided Efficient Transformer for Medical Image Segmentation
Transformers have recently gained attention in the computer vision domain due to their ability to model long-range dependencies. However, the self-attention mechanism, which is the core part of the Transformer model, usually suffers from quadratic computational complexity with respect to the number...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | |
container_volume | |
creator | Azad, Reza Arimond, René Aghdam, Ehsan Khodapanah Kazerouni, Amirhossein Merhof, Dorit |
description | Transformers have recently gained attention in the computer vision domain due
to their ability to model long-range dependencies. However, the self-attention
mechanism, which is the core part of the Transformer model, usually suffers
from quadratic computational complexity with respect to the number of tokens.
Many architectures attempt to reduce model complexity by limiting the
self-attention mechanism to local regions or by redesigning the tokenization
process. In this paper, we propose DAE-Former, a novel method that seeks to
provide an alternative perspective by efficiently designing the self-attention
mechanism. More specifically, we reformulate the self-attention mechanism to
capture both spatial and channel relations across the whole feature dimension
while staying computationally efficient. Furthermore, we redesign the skip
connection path by including the cross-attention module to ensure the feature
reusability and enhance the localization power. Our method outperforms
state-of-the-art methods on multi-organ cardiac and skin lesion segmentation
datasets without requiring pre-training weights. The code is publicly available
at https://github.com/mindflow-institue/DAEFormer. |
doi_str_mv | 10.48550/arxiv.2212.13504 |
format | Article |
fullrecord | <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2212_13504</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2212_13504</sourcerecordid><originalsourceid>FETCH-LOGICAL-a674-c679a8777f8258cf739918621bcbe4bcde3200742a785466b1c7947a32bdf0aa3</originalsourceid><addsrcrecordid>eNotj8tKw0AYhWfjQqoP4Mp5gcS5z8RdaFMtVFwY3IZ_bmGgSWWair69aezqwOF8Bz6EHigphZGSPEH-Sd8lY5SVlEsibtHnpm6K7TEPIT_jzRkOuJ6mME7pOBb9OfngcRNjcmnucJthPMVljOfAb8EnNyO7AfqAP0I_zCu4sHfoJsLhFO6vuULttmnXr8X-_WW3rvcFKC0Kp3QFRmsdDZPGRc2rihrFqHU2COt84IwQLRhoI4VSljpdCQ2cWR8JAF-hx__bRaz7ymmA_NtdBLtFkP8Bq8pK1g</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>DAE-Former: Dual Attention-guided Efficient Transformer for Medical Image Segmentation</title><source>arXiv.org</source><creator>Azad, Reza ; Arimond, René ; Aghdam, Ehsan Khodapanah ; Kazerouni, Amirhossein ; Merhof, Dorit</creator><creatorcontrib>Azad, Reza ; Arimond, René ; Aghdam, Ehsan Khodapanah ; Kazerouni, Amirhossein ; Merhof, Dorit</creatorcontrib><description>Transformers have recently gained attention in the computer vision domain due
to their ability to model long-range dependencies. However, the self-attention
mechanism, which is the core part of the Transformer model, usually suffers
from quadratic computational complexity with respect to the number of tokens.
Many architectures attempt to reduce model complexity by limiting the
self-attention mechanism to local regions or by redesigning the tokenization
process. In this paper, we propose DAE-Former, a novel method that seeks to
provide an alternative perspective by efficiently designing the self-attention
mechanism. More specifically, we reformulate the self-attention mechanism to
capture both spatial and channel relations across the whole feature dimension
while staying computationally efficient. Furthermore, we redesign the skip
connection path by including the cross-attention module to ensure the feature
reusability and enhance the localization power. Our method outperforms
state-of-the-art methods on multi-organ cardiac and skin lesion segmentation
datasets without requiring pre-training weights. The code is publicly available
at https://github.com/mindflow-institue/DAEFormer.</description><identifier>DOI: 10.48550/arxiv.2212.13504</identifier><language>eng</language><subject>Computer Science - Computer Vision and Pattern Recognition</subject><creationdate>2022-12</creationdate><rights>http://creativecommons.org/licenses/by/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2212.13504$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2212.13504$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Azad, Reza</creatorcontrib><creatorcontrib>Arimond, René</creatorcontrib><creatorcontrib>Aghdam, Ehsan Khodapanah</creatorcontrib><creatorcontrib>Kazerouni, Amirhossein</creatorcontrib><creatorcontrib>Merhof, Dorit</creatorcontrib><title>DAE-Former: Dual Attention-guided Efficient Transformer for Medical Image Segmentation</title><description>Transformers have recently gained attention in the computer vision domain due
to their ability to model long-range dependencies. However, the self-attention
mechanism, which is the core part of the Transformer model, usually suffers
from quadratic computational complexity with respect to the number of tokens.
Many architectures attempt to reduce model complexity by limiting the
self-attention mechanism to local regions or by redesigning the tokenization
process. In this paper, we propose DAE-Former, a novel method that seeks to
provide an alternative perspective by efficiently designing the self-attention
mechanism. More specifically, we reformulate the self-attention mechanism to
capture both spatial and channel relations across the whole feature dimension
while staying computationally efficient. Furthermore, we redesign the skip
connection path by including the cross-attention module to ensure the feature
reusability and enhance the localization power. Our method outperforms
state-of-the-art methods on multi-organ cardiac and skin lesion segmentation
datasets without requiring pre-training weights. The code is publicly available
at https://github.com/mindflow-institue/DAEFormer.</description><subject>Computer Science - Computer Vision and Pattern Recognition</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotj8tKw0AYhWfjQqoP4Mp5gcS5z8RdaFMtVFwY3IZ_bmGgSWWair69aezqwOF8Bz6EHigphZGSPEH-Sd8lY5SVlEsibtHnpm6K7TEPIT_jzRkOuJ6mME7pOBb9OfngcRNjcmnucJthPMVljOfAb8EnNyO7AfqAP0I_zCu4sHfoJsLhFO6vuULttmnXr8X-_WW3rvcFKC0Kp3QFRmsdDZPGRc2rihrFqHU2COt84IwQLRhoI4VSljpdCQ2cWR8JAF-hx__bRaz7ymmA_NtdBLtFkP8Bq8pK1g</recordid><startdate>20221227</startdate><enddate>20221227</enddate><creator>Azad, Reza</creator><creator>Arimond, René</creator><creator>Aghdam, Ehsan Khodapanah</creator><creator>Kazerouni, Amirhossein</creator><creator>Merhof, Dorit</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20221227</creationdate><title>DAE-Former: Dual Attention-guided Efficient Transformer for Medical Image Segmentation</title><author>Azad, Reza ; Arimond, René ; Aghdam, Ehsan Khodapanah ; Kazerouni, Amirhossein ; Merhof, Dorit</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a674-c679a8777f8258cf739918621bcbe4bcde3200742a785466b1c7947a32bdf0aa3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Computer Science - Computer Vision and Pattern Recognition</topic><toplevel>online_resources</toplevel><creatorcontrib>Azad, Reza</creatorcontrib><creatorcontrib>Arimond, René</creatorcontrib><creatorcontrib>Aghdam, Ehsan Khodapanah</creatorcontrib><creatorcontrib>Kazerouni, Amirhossein</creatorcontrib><creatorcontrib>Merhof, Dorit</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Azad, Reza</au><au>Arimond, René</au><au>Aghdam, Ehsan Khodapanah</au><au>Kazerouni, Amirhossein</au><au>Merhof, Dorit</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>DAE-Former: Dual Attention-guided Efficient Transformer for Medical Image Segmentation</atitle><date>2022-12-27</date><risdate>2022</risdate><abstract>Transformers have recently gained attention in the computer vision domain due
to their ability to model long-range dependencies. However, the self-attention
mechanism, which is the core part of the Transformer model, usually suffers
from quadratic computational complexity with respect to the number of tokens.
Many architectures attempt to reduce model complexity by limiting the
self-attention mechanism to local regions or by redesigning the tokenization
process. In this paper, we propose DAE-Former, a novel method that seeks to
provide an alternative perspective by efficiently designing the self-attention
mechanism. More specifically, we reformulate the self-attention mechanism to
capture both spatial and channel relations across the whole feature dimension
while staying computationally efficient. Furthermore, we redesign the skip
connection path by including the cross-attention module to ensure the feature
reusability and enhance the localization power. Our method outperforms
state-of-the-art methods on multi-organ cardiac and skin lesion segmentation
datasets without requiring pre-training weights. The code is publicly available
at https://github.com/mindflow-institue/DAEFormer.</abstract><doi>10.48550/arxiv.2212.13504</doi><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | DOI: 10.48550/arxiv.2212.13504 |
ispartof | |
issn | |
language | eng |
recordid | cdi_arxiv_primary_2212_13504 |
source | arXiv.org |
subjects | Computer Science - Computer Vision and Pattern Recognition |
title | DAE-Former: Dual Attention-guided Efficient Transformer for Medical Image Segmentation |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-25T12%3A03%3A59IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=DAE-Former:%20Dual%20Attention-guided%20Efficient%20Transformer%20for%20Medical%20Image%20Segmentation&rft.au=Azad,%20Reza&rft.date=2022-12-27&rft_id=info:doi/10.48550/arxiv.2212.13504&rft_dat=%3Carxiv_GOX%3E2212_13504%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |