Skin Cancer Detection utilizing Deep Learning: Classification of Skin Lesion Images using a Vision Transformer

Skin cancer detection still represents a major challenge in healthcare. Common detection methods can be lengthy and require human assistance which falls short in many countries. Previous research demonstrates how convolutional neural networks (CNNs) can help effectively through both automation and a...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Flosdorf, Carolin, Engelker, Justin, Keller, Igor, Mohr, Nicolas
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Flosdorf, Carolin
Engelker, Justin
Keller, Igor
Mohr, Nicolas
description Skin cancer detection still represents a major challenge in healthcare. Common detection methods can be lengthy and require human assistance which falls short in many countries. Previous research demonstrates how convolutional neural networks (CNNs) can help effectively through both automation and an accuracy that is comparable to the human level. However, despite the progress in previous decades, the precision is still limited, leading to substantial misclassifications that have a serious impact on people's health. Hence, we employ a Vision Transformer (ViT) that has been developed in recent years based on the idea of a self-attention mechanism, specifically two configurations of a pre-trained ViT. We generally find superior metrics for classifying skin lesions after comparing them to base models such as decision tree classifier and k-nearest neighbor (KNN) classifier, as well as to CNNs and less complex ViTs. In particular, we attach greater importance to the performance of melanoma, which is the most lethal type of skin cancer. The ViT-L32 model achieves an accuracy of 91.57% and a melanoma recall of 58.54%, while ViT-L16 achieves an accuracy of 92.79% and a melanoma recall of 56.10%. This offers a potential tool for faster and more accurate diagnoses and an overall improvement for the healthcare sector.
doi_str_mv 10.48550/arxiv.2407.18554
format Article
fullrecord <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2407_18554</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2407_18554</sourcerecordid><originalsourceid>FETCH-arxiv_primary_2407_185543</originalsourceid><addsrcrecordid>eNqFjsEKglAQRd-mRVQf0Kr5gUxTKdpaUeAuaSuDzJMhfco8jerr00f7VsM93DscpZaB70X7OPY3KC9-etvI33nBAKKpMrcHG0jQFCRwpI6KjhsDfccVf9iUA6MWUkIxQzpAUqG1rLlA12s0uAcp2TFeayzJQm_HJcKdHc0EjdWN1CRzNdFYWVr87kytzqcsuaydWd4K1yjvfDTMnWH4v_EFY55HfQ</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Skin Cancer Detection utilizing Deep Learning: Classification of Skin Lesion Images using a Vision Transformer</title><source>arXiv.org</source><creator>Flosdorf, Carolin ; Engelker, Justin ; Keller, Igor ; Mohr, Nicolas</creator><creatorcontrib>Flosdorf, Carolin ; Engelker, Justin ; Keller, Igor ; Mohr, Nicolas</creatorcontrib><description>Skin cancer detection still represents a major challenge in healthcare. Common detection methods can be lengthy and require human assistance which falls short in many countries. Previous research demonstrates how convolutional neural networks (CNNs) can help effectively through both automation and an accuracy that is comparable to the human level. However, despite the progress in previous decades, the precision is still limited, leading to substantial misclassifications that have a serious impact on people's health. Hence, we employ a Vision Transformer (ViT) that has been developed in recent years based on the idea of a self-attention mechanism, specifically two configurations of a pre-trained ViT. We generally find superior metrics for classifying skin lesions after comparing them to base models such as decision tree classifier and k-nearest neighbor (KNN) classifier, as well as to CNNs and less complex ViTs. In particular, we attach greater importance to the performance of melanoma, which is the most lethal type of skin cancer. The ViT-L32 model achieves an accuracy of 91.57% and a melanoma recall of 58.54%, while ViT-L16 achieves an accuracy of 92.79% and a melanoma recall of 56.10%. This offers a potential tool for faster and more accurate diagnoses and an overall improvement for the healthcare sector.</description><identifier>DOI: 10.48550/arxiv.2407.18554</identifier><language>eng</language><subject>Computer Science - Computer Vision and Pattern Recognition</subject><creationdate>2024-07</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2407.18554$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2407.18554$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Flosdorf, Carolin</creatorcontrib><creatorcontrib>Engelker, Justin</creatorcontrib><creatorcontrib>Keller, Igor</creatorcontrib><creatorcontrib>Mohr, Nicolas</creatorcontrib><title>Skin Cancer Detection utilizing Deep Learning: Classification of Skin Lesion Images using a Vision Transformer</title><description>Skin cancer detection still represents a major challenge in healthcare. Common detection methods can be lengthy and require human assistance which falls short in many countries. Previous research demonstrates how convolutional neural networks (CNNs) can help effectively through both automation and an accuracy that is comparable to the human level. However, despite the progress in previous decades, the precision is still limited, leading to substantial misclassifications that have a serious impact on people's health. Hence, we employ a Vision Transformer (ViT) that has been developed in recent years based on the idea of a self-attention mechanism, specifically two configurations of a pre-trained ViT. We generally find superior metrics for classifying skin lesions after comparing them to base models such as decision tree classifier and k-nearest neighbor (KNN) classifier, as well as to CNNs and less complex ViTs. In particular, we attach greater importance to the performance of melanoma, which is the most lethal type of skin cancer. The ViT-L32 model achieves an accuracy of 91.57% and a melanoma recall of 58.54%, while ViT-L16 achieves an accuracy of 92.79% and a melanoma recall of 56.10%. This offers a potential tool for faster and more accurate diagnoses and an overall improvement for the healthcare sector.</description><subject>Computer Science - Computer Vision and Pattern Recognition</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNqFjsEKglAQRd-mRVQf0Kr5gUxTKdpaUeAuaSuDzJMhfco8jerr00f7VsM93DscpZaB70X7OPY3KC9-etvI33nBAKKpMrcHG0jQFCRwpI6KjhsDfccVf9iUA6MWUkIxQzpAUqG1rLlA12s0uAcp2TFeayzJQm_HJcKdHc0EjdWN1CRzNdFYWVr87kytzqcsuaydWd4K1yjvfDTMnWH4v_EFY55HfQ</recordid><startdate>20240726</startdate><enddate>20240726</enddate><creator>Flosdorf, Carolin</creator><creator>Engelker, Justin</creator><creator>Keller, Igor</creator><creator>Mohr, Nicolas</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20240726</creationdate><title>Skin Cancer Detection utilizing Deep Learning: Classification of Skin Lesion Images using a Vision Transformer</title><author>Flosdorf, Carolin ; Engelker, Justin ; Keller, Igor ; Mohr, Nicolas</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-arxiv_primary_2407_185543</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Computer Vision and Pattern Recognition</topic><toplevel>online_resources</toplevel><creatorcontrib>Flosdorf, Carolin</creatorcontrib><creatorcontrib>Engelker, Justin</creatorcontrib><creatorcontrib>Keller, Igor</creatorcontrib><creatorcontrib>Mohr, Nicolas</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Flosdorf, Carolin</au><au>Engelker, Justin</au><au>Keller, Igor</au><au>Mohr, Nicolas</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Skin Cancer Detection utilizing Deep Learning: Classification of Skin Lesion Images using a Vision Transformer</atitle><date>2024-07-26</date><risdate>2024</risdate><abstract>Skin cancer detection still represents a major challenge in healthcare. Common detection methods can be lengthy and require human assistance which falls short in many countries. Previous research demonstrates how convolutional neural networks (CNNs) can help effectively through both automation and an accuracy that is comparable to the human level. However, despite the progress in previous decades, the precision is still limited, leading to substantial misclassifications that have a serious impact on people's health. Hence, we employ a Vision Transformer (ViT) that has been developed in recent years based on the idea of a self-attention mechanism, specifically two configurations of a pre-trained ViT. We generally find superior metrics for classifying skin lesions after comparing them to base models such as decision tree classifier and k-nearest neighbor (KNN) classifier, as well as to CNNs and less complex ViTs. In particular, we attach greater importance to the performance of melanoma, which is the most lethal type of skin cancer. The ViT-L32 model achieves an accuracy of 91.57% and a melanoma recall of 58.54%, while ViT-L16 achieves an accuracy of 92.79% and a melanoma recall of 56.10%. This offers a potential tool for faster and more accurate diagnoses and an overall improvement for the healthcare sector.</abstract><doi>10.48550/arxiv.2407.18554</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier DOI: 10.48550/arxiv.2407.18554
ispartof
issn
language eng
recordid cdi_arxiv_primary_2407_18554
source arXiv.org
subjects Computer Science - Computer Vision and Pattern Recognition
title Skin Cancer Detection utilizing Deep Learning: Classification of Skin Lesion Images using a Vision Transformer
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-11T20%3A56%3A35IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Skin%20Cancer%20Detection%20utilizing%20Deep%20Learning:%20Classification%20of%20Skin%20Lesion%20Images%20using%20a%20Vision%20Transformer&rft.au=Flosdorf,%20Carolin&rft.date=2024-07-26&rft_id=info:doi/10.48550/arxiv.2407.18554&rft_dat=%3Carxiv_GOX%3E2407_18554%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true