Skin Cancer Detection utilizing Deep Learning: Classification of Skin Lesion Images using a Vision Transformer

Skin cancer detection still represents a major challenge in healthcare. Common detection methods can be lengthy and require human assistance which falls short in many countries. Previous research demonstrates how convolutional neural networks (CNNs) can help effectively through both automation and a...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Flosdorf, Carolin, Engelker, Justin, Keller, Igor, Mohr, Nicolas
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Computer Vision and Pattern Recognition
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Flosdorf, Carolin Engelker, Justin Keller, Igor Mohr, Nicolas
description	Skin cancer detection still represents a major challenge in healthcare. Common detection methods can be lengthy and require human assistance which falls short in many countries. Previous research demonstrates how convolutional neural networks (CNNs) can help effectively through both automation and an accuracy that is comparable to the human level. However, despite the progress in previous decades, the precision is still limited, leading to substantial misclassifications that have a serious impact on people's health. Hence, we employ a Vision Transformer (ViT) that has been developed in recent years based on the idea of a self-attention mechanism, specifically two configurations of a pre-trained ViT. We generally find superior metrics for classifying skin lesions after comparing them to base models such as decision tree classifier and k-nearest neighbor (KNN) classifier, as well as to CNNs and less complex ViTs. In particular, we attach greater importance to the performance of melanoma, which is the most lethal type of skin cancer. The ViT-L32 model achieves an accuracy of 91.57% and a melanoma recall of 58.54%, while ViT-L16 achieves an accuracy of 92.79% and a melanoma recall of 56.10%. This offers a potential tool for faster and more accurate diagnoses and an overall improvement for the healthcare sector.
doi_str_mv	10.48550/arxiv.2407.18554
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2407_18554</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2407_18554</sourcerecordid><originalsourceid>FETCH-arxiv_primary_2407_185543</originalsourceid><addsrcrecordid>eNqFjsEKglAQRd-mRVQf0Kr5gUxTKdpaUeAuaSuDzJMhfco8jerr00f7VsM93DscpZaB70X7OPY3KC9-etvI33nBAKKpMrcHG0jQFCRwpI6KjhsDfccVf9iUA6MWUkIxQzpAUqG1rLlA12s0uAcp2TFeayzJQm_HJcKdHc0EjdWN1CRzNdFYWVr87kytzqcsuaydWd4K1yjvfDTMnWH4v_EFY55HfQ</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Skin Cancer Detection utilizing Deep Learning: Classification of Skin Lesion Images using a Vision Transformer</title><source>arXiv.org</source><creator>Flosdorf, Carolin ; Engelker, Justin ; Keller, Igor ; Mohr, Nicolas</creator><creatorcontrib>Flosdorf, Carolin ; Engelker, Justin ; Keller, Igor ; Mohr, Nicolas</creatorcontrib><description>Skin cancer detection still represents a major challenge in healthcare. Common detection methods can be lengthy and require human assistance which falls short in many countries. Previous research demonstrates how convolutional neural networks (CNNs) can help effectively through both automation and an accuracy that is comparable to the human level. However, despite the progress in previous decades, the precision is still limited, leading to substantial misclassifications that have a serious impact on people's health. Hence, we employ a Vision Transformer (ViT) that has been developed in recent years based on the idea of a self-attention mechanism, specifically two configurations of a pre-trained ViT. We generally find superior metrics for classifying skin lesions after comparing them to base models such as decision tree classifier and k-nearest neighbor (KNN) classifier, as well as to CNNs and less complex ViTs. In particular, we attach greater importance to the performance of melanoma, which is the most lethal type of skin cancer. The ViT-L32 model achieves an accuracy of 91.57% and a melanoma recall of 58.54%, while ViT-L16 achieves an accuracy of 92.79% and a melanoma recall of 56.10%. This offers a potential tool for faster and more accurate diagnoses and an overall improvement for the healthcare sector.</description><identifier>DOI: 10.48550/arxiv.2407.18554</identifier><language>eng</language><subject>Computer Science - Computer Vision and Pattern Recognition</subject><creationdate>2024-07</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2407.18554$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2407.18554$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Flosdorf, Carolin</creatorcontrib><creatorcontrib>Engelker, Justin</creatorcontrib><creatorcontrib>Keller, Igor</creatorcontrib><creatorcontrib>Mohr, Nicolas</creatorcontrib><title>Skin Cancer Detection utilizing Deep Learning: Classification of Skin Lesion Images using a Vision Transformer</title><description>Skin cancer detection still represents a major challenge in healthcare. Common detection methods can be lengthy and require human assistance which falls short in many countries. Previous research demonstrates how convolutional neural networks (CNNs) can help effectively through both automation and an accuracy that is comparable to the human level. However, despite the progress in previous decades, the precision is still limited, leading to substantial misclassifications that have a serious impact on people's health. Hence, we employ a Vision Transformer (ViT) that has been developed in recent years based on the idea of a self-attention mechanism, specifically two configurations of a pre-trained ViT. We generally find superior metrics for classifying skin lesions after comparing them to base models such as decision tree classifier and k-nearest neighbor (KNN) classifier, as well as to CNNs and less complex ViTs. In particular, we attach greater importance to the performance of melanoma, which is the most lethal type of skin cancer. The ViT-L32 model achieves an accuracy of 91.57% and a melanoma recall of 58.54%, while ViT-L16 achieves an accuracy of 92.79% and a melanoma recall of 56.10%. This offers a potential tool for faster and more accurate diagnoses and an overall improvement for the healthcare sector.</description><subject>Computer Science - Computer Vision and Pattern Recognition</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNqFjsEKglAQRd-mRVQf0Kr5gUxTKdpaUeAuaSuDzJMhfco8jerr00f7VsM93DscpZaB70X7OPY3KC9-etvI33nBAKKpMrcHG0jQFCRwpI6KjhsDfccVf9iUA6MWUkIxQzpAUqG1rLlA12s0uAcp2TFeayzJQm_HJcKdHc0EjdWN1CRzNdFYWVr87kytzqcsuaydWd4K1yjvfDTMnWH4v_EFY55HfQ</recordid><startdate>20240726</startdate><enddate>20240726</enddate><creator>Flosdorf, Carolin</creator><creator>Engelker, Justin</creator><creator>Keller, Igor</creator><creator>Mohr, Nicolas</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20240726</creationdate><title>Skin Cancer Detection utilizing Deep Learning: Classification of Skin Lesion Images using a Vision Transformer</title><author>Flosdorf, Carolin ; Engelker, Justin ; Keller, Igor ; Mohr, Nicolas</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-arxiv_primary_2407_185543</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Computer Vision and Pattern Recognition</topic><toplevel>online_resources</toplevel><creatorcontrib>Flosdorf, Carolin</creatorcontrib><creatorcontrib>Engelker, Justin</creatorcontrib><creatorcontrib>Keller, Igor</creatorcontrib><creatorcontrib>Mohr, Nicolas</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Flosdorf, Carolin</au><au>Engelker, Justin</au><au>Keller, Igor</au><au>Mohr, Nicolas</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Skin Cancer Detection utilizing Deep Learning: Classification of Skin Lesion Images using a Vision Transformer</atitle><date>2024-07-26</date><risdate>2024</risdate><abstract>Skin cancer detection still represents a major challenge in healthcare. Common detection methods can be lengthy and require human assistance which falls short in many countries. Previous research demonstrates how convolutional neural networks (CNNs) can help effectively through both automation and an accuracy that is comparable to the human level. However, despite the progress in previous decades, the precision is still limited, leading to substantial misclassifications that have a serious impact on people's health. Hence, we employ a Vision Transformer (ViT) that has been developed in recent years based on the idea of a self-attention mechanism, specifically two configurations of a pre-trained ViT. We generally find superior metrics for classifying skin lesions after comparing them to base models such as decision tree classifier and k-nearest neighbor (KNN) classifier, as well as to CNNs and less complex ViTs. In particular, we attach greater importance to the performance of melanoma, which is the most lethal type of skin cancer. The ViT-L32 model achieves an accuracy of 91.57% and a melanoma recall of 58.54%, while ViT-L16 achieves an accuracy of 92.79% and a melanoma recall of 56.10%. This offers a potential tool for faster and more accurate diagnoses and an overall improvement for the healthcare sector.</abstract><doi>10.48550/arxiv.2407.18554</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2407.18554
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2407_18554
source	arXiv.org
subjects	Computer Science - Computer Vision and Pattern Recognition
title	Skin Cancer Detection utilizing Deep Learning: Classification of Skin Lesion Images using a Vision Transformer
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-11T20%3A56%3A35IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Skin%20Cancer%20Detection%20utilizing%20Deep%20Learning:%20Classification%20of%20Skin%20Lesion%20Images%20using%20a%20Vision%20Transformer&rft.au=Flosdorf,%20Carolin&rft.date=2024-07-26&rft_id=info:doi/10.48550/arxiv.2407.18554&rft_dat=%3Carxiv_GOX%3E2407_18554%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true