AnomalyGPT: Detecting Industrial Anomalies Using Large Vision-Language Models

Large Vision-Language Models (LVLMs) such as MiniGPT-4 and LLaVA have demonstrated the capability of understanding images and achieved remarkable performance in various visual tasks. Despite their strong abilities in recognizing common objects due to extensive training datasets, they lack specific d...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Gu, Zhaopeng, Zhu, Bingke, Zhu, Guibo, Chen, Yingying, Tang, Ming, Wang, Jinqiao
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Computer Vision and Pattern Recognition
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Gu, Zhaopeng Zhu, Bingke Zhu, Guibo Chen, Yingying Tang, Ming Wang, Jinqiao
description	Large Vision-Language Models (LVLMs) such as MiniGPT-4 and LLaVA have demonstrated the capability of understanding images and achieved remarkable performance in various visual tasks. Despite their strong abilities in recognizing common objects due to extensive training datasets, they lack specific domain knowledge and have a weaker understanding of localized details within objects, which hinders their effectiveness in the Industrial Anomaly Detection (IAD) task. On the other hand, most existing IAD methods only provide anomaly scores and necessitate the manual setting of thresholds to distinguish between normal and abnormal samples, which restricts their practical implementation. In this paper, we explore the utilization of LVLM to address the IAD problem and propose AnomalyGPT, a novel IAD approach based on LVLM. We generate training data by simulating anomalous images and producing corresponding textual descriptions for each image. We also employ an image decoder to provide fine-grained semantic and design a prompt learner to fine-tune the LVLM using prompt embeddings. Our AnomalyGPT eliminates the need for manual threshold adjustments, thus directly assesses the presence and locations of anomalies. Additionally, AnomalyGPT supports multi-turn dialogues and exhibits impressive few-shot in-context learning capabilities. With only one normal shot, AnomalyGPT achieves the state-of-the-art performance with an accuracy of 86.1%, an image-level AUC of 94.1%, and a pixel-level AUC of 95.3% on the MVTec-AD dataset. Code is available at https://github.com/CASIA-IVA-Lab/AnomalyGPT.
doi_str_mv	10.48550/arxiv.2308.15366
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2308_15366</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2308_15366</sourcerecordid><originalsourceid>FETCH-LOGICAL-a676-1dd44c97c1fb5554d23acce3847e6bfcdec49aa34629f1548e4448eb21afbcc93</originalsourceid><addsrcrecordid>eNotj81OwzAQhH3hgAoPwAm_QEIcr52EW1WgVEoFh8A12tjryFLqoDhF9O3p32VGoxmN9DH2ILIUSqWyJ5z-_G-ay6xMhZJa37LtMow7HA7rz-aZv9BMZvah55tg93GePA78MvAU-Vc8VTVOPfFvH_0YkhpDv8dj3o6WhnjHbhwOke6vvmDN22uzek_qj_VmtawT1IVOhLUApiqMcJ1SCmwu0RiSJRSkO2csGagQJei8ckJBSQBH6XKBrjOmkgv2eLk987Q_k9_hdGhPXO2ZS_4DQU5JFQ</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>AnomalyGPT: Detecting Industrial Anomalies Using Large Vision-Language Models</title><source>arXiv.org</source><creator>Gu, Zhaopeng ; Zhu, Bingke ; Zhu, Guibo ; Chen, Yingying ; Tang, Ming ; Wang, Jinqiao</creator><creatorcontrib>Gu, Zhaopeng ; Zhu, Bingke ; Zhu, Guibo ; Chen, Yingying ; Tang, Ming ; Wang, Jinqiao</creatorcontrib><description>Large Vision-Language Models (LVLMs) such as MiniGPT-4 and LLaVA have demonstrated the capability of understanding images and achieved remarkable performance in various visual tasks. Despite their strong abilities in recognizing common objects due to extensive training datasets, they lack specific domain knowledge and have a weaker understanding of localized details within objects, which hinders their effectiveness in the Industrial Anomaly Detection (IAD) task. On the other hand, most existing IAD methods only provide anomaly scores and necessitate the manual setting of thresholds to distinguish between normal and abnormal samples, which restricts their practical implementation. In this paper, we explore the utilization of LVLM to address the IAD problem and propose AnomalyGPT, a novel IAD approach based on LVLM. We generate training data by simulating anomalous images and producing corresponding textual descriptions for each image. We also employ an image decoder to provide fine-grained semantic and design a prompt learner to fine-tune the LVLM using prompt embeddings. Our AnomalyGPT eliminates the need for manual threshold adjustments, thus directly assesses the presence and locations of anomalies. Additionally, AnomalyGPT supports multi-turn dialogues and exhibits impressive few-shot in-context learning capabilities. With only one normal shot, AnomalyGPT achieves the state-of-the-art performance with an accuracy of 86.1%, an image-level AUC of 94.1%, and a pixel-level AUC of 95.3% on the MVTec-AD dataset. Code is available at https://github.com/CASIA-IVA-Lab/AnomalyGPT.</description><identifier>DOI: 10.48550/arxiv.2308.15366</identifier><language>eng</language><subject>Computer Science - Computer Vision and Pattern Recognition</subject><creationdate>2023-08</creationdate><rights>http://creativecommons.org/licenses/by-nc-sa/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2308.15366$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2308.15366$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Gu, Zhaopeng</creatorcontrib><creatorcontrib>Zhu, Bingke</creatorcontrib><creatorcontrib>Zhu, Guibo</creatorcontrib><creatorcontrib>Chen, Yingying</creatorcontrib><creatorcontrib>Tang, Ming</creatorcontrib><creatorcontrib>Wang, Jinqiao</creatorcontrib><title>AnomalyGPT: Detecting Industrial Anomalies Using Large Vision-Language Models</title><description>Large Vision-Language Models (LVLMs) such as MiniGPT-4 and LLaVA have demonstrated the capability of understanding images and achieved remarkable performance in various visual tasks. Despite their strong abilities in recognizing common objects due to extensive training datasets, they lack specific domain knowledge and have a weaker understanding of localized details within objects, which hinders their effectiveness in the Industrial Anomaly Detection (IAD) task. On the other hand, most existing IAD methods only provide anomaly scores and necessitate the manual setting of thresholds to distinguish between normal and abnormal samples, which restricts their practical implementation. In this paper, we explore the utilization of LVLM to address the IAD problem and propose AnomalyGPT, a novel IAD approach based on LVLM. We generate training data by simulating anomalous images and producing corresponding textual descriptions for each image. We also employ an image decoder to provide fine-grained semantic and design a prompt learner to fine-tune the LVLM using prompt embeddings. Our AnomalyGPT eliminates the need for manual threshold adjustments, thus directly assesses the presence and locations of anomalies. Additionally, AnomalyGPT supports multi-turn dialogues and exhibits impressive few-shot in-context learning capabilities. With only one normal shot, AnomalyGPT achieves the state-of-the-art performance with an accuracy of 86.1%, an image-level AUC of 94.1%, and a pixel-level AUC of 95.3% on the MVTec-AD dataset. Code is available at https://github.com/CASIA-IVA-Lab/AnomalyGPT.</description><subject>Computer Science - Computer Vision and Pattern Recognition</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotj81OwzAQhH3hgAoPwAm_QEIcr52EW1WgVEoFh8A12tjryFLqoDhF9O3p32VGoxmN9DH2ILIUSqWyJ5z-_G-ay6xMhZJa37LtMow7HA7rz-aZv9BMZvah55tg93GePA78MvAU-Vc8VTVOPfFvH_0YkhpDv8dj3o6WhnjHbhwOke6vvmDN22uzek_qj_VmtawT1IVOhLUApiqMcJ1SCmwu0RiSJRSkO2csGagQJei8ckJBSQBH6XKBrjOmkgv2eLk987Q_k9_hdGhPXO2ZS_4DQU5JFQ</recordid><startdate>20230829</startdate><enddate>20230829</enddate><creator>Gu, Zhaopeng</creator><creator>Zhu, Bingke</creator><creator>Zhu, Guibo</creator><creator>Chen, Yingying</creator><creator>Tang, Ming</creator><creator>Wang, Jinqiao</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20230829</creationdate><title>AnomalyGPT: Detecting Industrial Anomalies Using Large Vision-Language Models</title><author>Gu, Zhaopeng ; Zhu, Bingke ; Zhu, Guibo ; Chen, Yingying ; Tang, Ming ; Wang, Jinqiao</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a676-1dd44c97c1fb5554d23acce3847e6bfcdec49aa34629f1548e4448eb21afbcc93</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Computer Science - Computer Vision and Pattern Recognition</topic><toplevel>online_resources</toplevel><creatorcontrib>Gu, Zhaopeng</creatorcontrib><creatorcontrib>Zhu, Bingke</creatorcontrib><creatorcontrib>Zhu, Guibo</creatorcontrib><creatorcontrib>Chen, Yingying</creatorcontrib><creatorcontrib>Tang, Ming</creatorcontrib><creatorcontrib>Wang, Jinqiao</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Gu, Zhaopeng</au><au>Zhu, Bingke</au><au>Zhu, Guibo</au><au>Chen, Yingying</au><au>Tang, Ming</au><au>Wang, Jinqiao</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>AnomalyGPT: Detecting Industrial Anomalies Using Large Vision-Language Models</atitle><date>2023-08-29</date><risdate>2023</risdate><abstract>Large Vision-Language Models (LVLMs) such as MiniGPT-4 and LLaVA have demonstrated the capability of understanding images and achieved remarkable performance in various visual tasks. Despite their strong abilities in recognizing common objects due to extensive training datasets, they lack specific domain knowledge and have a weaker understanding of localized details within objects, which hinders their effectiveness in the Industrial Anomaly Detection (IAD) task. On the other hand, most existing IAD methods only provide anomaly scores and necessitate the manual setting of thresholds to distinguish between normal and abnormal samples, which restricts their practical implementation. In this paper, we explore the utilization of LVLM to address the IAD problem and propose AnomalyGPT, a novel IAD approach based on LVLM. We generate training data by simulating anomalous images and producing corresponding textual descriptions for each image. We also employ an image decoder to provide fine-grained semantic and design a prompt learner to fine-tune the LVLM using prompt embeddings. Our AnomalyGPT eliminates the need for manual threshold adjustments, thus directly assesses the presence and locations of anomalies. Additionally, AnomalyGPT supports multi-turn dialogues and exhibits impressive few-shot in-context learning capabilities. With only one normal shot, AnomalyGPT achieves the state-of-the-art performance with an accuracy of 86.1%, an image-level AUC of 94.1%, and a pixel-level AUC of 95.3% on the MVTec-AD dataset. Code is available at https://github.com/CASIA-IVA-Lab/AnomalyGPT.</abstract><doi>10.48550/arxiv.2308.15366</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2308.15366
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2308_15366
source	arXiv.org
subjects	Computer Science - Computer Vision and Pattern Recognition
title	AnomalyGPT: Detecting Industrial Anomalies Using Large Vision-Language Models
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-05T18%3A11%3A58IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=AnomalyGPT:%20Detecting%20Industrial%20Anomalies%20Using%20Large%20Vision-Language%20Models&rft.au=Gu,%20Zhaopeng&rft.date=2023-08-29&rft_id=info:doi/10.48550/arxiv.2308.15366&rft_dat=%3Carxiv_GOX%3E2308_15366%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true