ADAGENT: Anomaly Detection Agent With Multimodal Large Models in Adverse Environments

Multimodal Language Models (MMLMs), such as LLaVA and GPT-4V, have shown zero-shot generalization capabilities for understanding images and text across various domains. However, their effectiveness in open-world visual tasks, particularly anomaly detection under challenging conditions, such as low l...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE access 2024, Vol.12, p.172061-172074
Hauptverfasser:	Zhang, Miao, Shen, Yiqing, Yin, Jun, Lu, Shuai, Wang, Xueqian
Format:	Artikel
Sprache:	eng
Schlagworte:	Accuracy AI agent Anomalies Anomaly detection Artificial intelligence Benchmark testing Benchmarks Cognition Context modeling Error analysis Error detection Feature extraction Image quality Information retrieval Lighting Multimodal language model Multisensory integration Performance evaluation Prompt engineering Semantics Training Visual tasks Visualization
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	172074
container_issue
container_start_page	172061
container_title	IEEE access
container_volume	12
creator	Zhang, Miao Shen, Yiqing Yin, Jun Lu, Shuai Wang, Xueqian
description	Multimodal Language Models (MMLMs), such as LLaVA and GPT-4V, have shown zero-shot generalization capabilities for understanding images and text across various domains. However, their effectiveness in open-world visual tasks, particularly anomaly detection under challenging conditions, such as low light or poor image quality, has yet to be thoroughly investigated. Assessing the robustness and limitations of MMLMs in these scenarios is essential to ensuring their reliability and safety in real-world applications, where the input image quality can vary significantly. To address this gap, we propose a benchmark comprising 460 images captured under challenging conditions, including low light and blurring. This benchmark is specifically designed to evaluate the anomaly detection capabilities of MMLMs. We assess the performance of state-of-the-art MMLMs, such as Qwen-VL-Max-0809, GPT-4V, Gemini-1.5, Claude3-opus, ERNIE-Bot-4, and SparkDesk-v3.5, across six diverse scenes. Our evaluations indicate that these MMLMs struggle with error detection in adverse scenarios, thereby highlighting the need for further investigation into the underlying causes and potential improvement strategies. To tackle these limitations, we introduce a novel anomaly detection agent (ADAGENT) framework, which is an AI agent framework that combines the "Chain of Critical Self-Reflection (CCS)", specialized toolsets, and "Heuristic Retrieval-Augmented Generation (RAG)" to enhance anomaly detection performance with MMLMs. ADAGENT sequentially evaluates abilities, such as text generation, semantic understanding, contextual comprehension, key information extraction, reasoning, and logical thinking. By implementing this framework, we demonstrated a 15\%\sim {30\%} improvement in the top-3 accuracy for anomaly detection tasks under adverse conditions, compared with baseline approaches.
doi_str_mv	10.1109/ACCESS.2024.3480250
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_3131913658</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10716620</ieee_id><doaj_id>oai_doaj_org_article_eb1ab84728ba408eb0a27adedd4ad095</doaj_id><sourcerecordid>3131913658</sourcerecordid><originalsourceid>FETCH-LOGICAL-c244t-365e004e10e1092b43526ba2e791fd10a424b9bcb42ebf5f9c53ffba77829f8f3</originalsourceid><addsrcrecordid>eNpNUdtKAzEQXUTBUv0CfQj43JrbXuLb0tYLVH3Q4mNINpOast3UZCv490ZXpMPADMM5Zw6cLLsgeEoIFtf1bLZ4eZlSTPmU8QrTHB9lI0oKMWE5K44P9tPsPMYNTlWlU16OslU9r-8WT683qO78VrVfaA49NL3zHarX0PXozfXv6HHf9m7rjWrRUoU1oEdvoI3IJZT5hBABLbpPF3y3TZx4lp1Y1UY4_5vjbHW7eJ3dT5bPdw-zejlpKOf9hBU5YMyB4NSCas5yWmhFoRTEGoIVp1wL3WhOQdvciiZn1mpVlhUVtrJsnD0MusarjdwFt1XhS3rl5O_Bh7VUoXdNCxI0UbriJa204rgCjRUtlQFjuDJY5EnratDaBf-xh9jLjd-HLtmXjDAiSHJbJRQbUE3wMQaw_18Jlj9xyCEO-ROH_IsjsS4HlgOAA0ZJioJi9g10r4Vn</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3131913658</pqid></control><display><type>article</type><title>ADAGENT: Anomaly Detection Agent With Multimodal Large Models in Adverse Environments</title><source>Directory of Open Access Journals</source><source>IEEE Xplore Open Access Journals</source><source>EZB Electronic Journals Library</source><creator>Zhang, Miao ; Shen, Yiqing ; Yin, Jun ; Lu, Shuai ; Wang, Xueqian</creator><creatorcontrib>Zhang, Miao ; Shen, Yiqing ; Yin, Jun ; Lu, Shuai ; Wang, Xueqian</creatorcontrib><description>Multimodal Language Models (MMLMs), such as LLaVA and GPT-4V, have shown zero-shot generalization capabilities for understanding images and text across various domains. However, their effectiveness in open-world visual tasks, particularly anomaly detection under challenging conditions, such as low light or poor image quality, has yet to be thoroughly investigated. Assessing the robustness and limitations of MMLMs in these scenarios is essential to ensuring their reliability and safety in real-world applications, where the input image quality can vary significantly. To address this gap, we propose a benchmark comprising 460 images captured under challenging conditions, including low light and blurring. This benchmark is specifically designed to evaluate the anomaly detection capabilities of MMLMs. We assess the performance of state-of-the-art MMLMs, such as Qwen-VL-Max-0809, GPT-4V, Gemini-1.5, Claude3-opus, ERNIE-Bot-4, and SparkDesk-v3.5, across six diverse scenes. Our evaluations indicate that these MMLMs struggle with error detection in adverse scenarios, thereby highlighting the need for further investigation into the underlying causes and potential improvement strategies. To tackle these limitations, we introduce a novel anomaly detection agent (ADAGENT) framework, which is an AI agent framework that combines the "Chain of Critical Self-Reflection (CCS)", specialized toolsets, and "Heuristic Retrieval-Augmented Generation (RAG)" to enhance anomaly detection performance with MMLMs. ADAGENT sequentially evaluates abilities, such as text generation, semantic understanding, contextual comprehension, key information extraction, reasoning, and logical thinking. By implementing this framework, we demonstrated a <inline-formula> <tex-math notation="LaTeX">15\%\sim {30\%} </tex-math></inline-formula> improvement in the top-3 accuracy for anomaly detection tasks under adverse conditions, compared with baseline approaches.</description><identifier>ISSN: 2169-3536</identifier><identifier>EISSN: 2169-3536</identifier><identifier>DOI: 10.1109/ACCESS.2024.3480250</identifier><identifier>CODEN: IAECCG</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Accuracy ; AI agent ; Anomalies ; Anomaly detection ; Artificial intelligence ; Benchmark testing ; Benchmarks ; Cognition ; Context modeling ; Error analysis ; Error detection ; Feature extraction ; Image quality ; Information retrieval ; Lighting ; Multimodal language model ; Multisensory integration ; Performance evaluation ; Prompt engineering ; Semantics ; Training ; Visual tasks ; Visualization</subject><ispartof>IEEE access, 2024, Vol.12, p.172061-172074</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c244t-365e004e10e1092b43526ba2e791fd10a424b9bcb42ebf5f9c53ffba77829f8f3</cites><orcidid>0000-0001-7866-3339 ; 0000-0003-3542-0593 ; 0009-0000-0551-9678</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10716620$$EHTML$$P50$$Gieee$$Hfree_for_read</linktohtml><link.rule.ids>314,776,780,860,2096,4010,27610,27900,27901,27902,54908</link.rule.ids></links><search><creatorcontrib>Zhang, Miao</creatorcontrib><creatorcontrib>Shen, Yiqing</creatorcontrib><creatorcontrib>Yin, Jun</creatorcontrib><creatorcontrib>Lu, Shuai</creatorcontrib><creatorcontrib>Wang, Xueqian</creatorcontrib><title>ADAGENT: Anomaly Detection Agent With Multimodal Large Models in Adverse Environments</title><title>IEEE access</title><addtitle>Access</addtitle><description>Multimodal Language Models (MMLMs), such as LLaVA and GPT-4V, have shown zero-shot generalization capabilities for understanding images and text across various domains. However, their effectiveness in open-world visual tasks, particularly anomaly detection under challenging conditions, such as low light or poor image quality, has yet to be thoroughly investigated. Assessing the robustness and limitations of MMLMs in these scenarios is essential to ensuring their reliability and safety in real-world applications, where the input image quality can vary significantly. To address this gap, we propose a benchmark comprising 460 images captured under challenging conditions, including low light and blurring. This benchmark is specifically designed to evaluate the anomaly detection capabilities of MMLMs. We assess the performance of state-of-the-art MMLMs, such as Qwen-VL-Max-0809, GPT-4V, Gemini-1.5, Claude3-opus, ERNIE-Bot-4, and SparkDesk-v3.5, across six diverse scenes. Our evaluations indicate that these MMLMs struggle with error detection in adverse scenarios, thereby highlighting the need for further investigation into the underlying causes and potential improvement strategies. To tackle these limitations, we introduce a novel anomaly detection agent (ADAGENT) framework, which is an AI agent framework that combines the "Chain of Critical Self-Reflection (CCS)", specialized toolsets, and "Heuristic Retrieval-Augmented Generation (RAG)" to enhance anomaly detection performance with MMLMs. ADAGENT sequentially evaluates abilities, such as text generation, semantic understanding, contextual comprehension, key information extraction, reasoning, and logical thinking. By implementing this framework, we demonstrated a <inline-formula> <tex-math notation="LaTeX">15\%\sim {30\%} </tex-math></inline-formula> improvement in the top-3 accuracy for anomaly detection tasks under adverse conditions, compared with baseline approaches.</description><subject>Accuracy</subject><subject>AI agent</subject><subject>Anomalies</subject><subject>Anomaly detection</subject><subject>Artificial intelligence</subject><subject>Benchmark testing</subject><subject>Benchmarks</subject><subject>Cognition</subject><subject>Context modeling</subject><subject>Error analysis</subject><subject>Error detection</subject><subject>Feature extraction</subject><subject>Image quality</subject><subject>Information retrieval</subject><subject>Lighting</subject><subject>Multimodal language model</subject><subject>Multisensory integration</subject><subject>Performance evaluation</subject><subject>Prompt engineering</subject><subject>Semantics</subject><subject>Training</subject><subject>Visual tasks</subject><subject>Visualization</subject><issn>2169-3536</issn><issn>2169-3536</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>ESBDL</sourceid><sourceid>RIE</sourceid><sourceid>DOA</sourceid><recordid>eNpNUdtKAzEQXUTBUv0CfQj43JrbXuLb0tYLVH3Q4mNINpOast3UZCv490ZXpMPADMM5Zw6cLLsgeEoIFtf1bLZ4eZlSTPmU8QrTHB9lI0oKMWE5K44P9tPsPMYNTlWlU16OslU9r-8WT683qO78VrVfaA49NL3zHarX0PXozfXv6HHf9m7rjWrRUoU1oEdvoI3IJZT5hBABLbpPF3y3TZx4lp1Y1UY4_5vjbHW7eJ3dT5bPdw-zejlpKOf9hBU5YMyB4NSCas5yWmhFoRTEGoIVp1wL3WhOQdvciiZn1mpVlhUVtrJsnD0MusarjdwFt1XhS3rl5O_Bh7VUoXdNCxI0UbriJa204rgCjRUtlQFjuDJY5EnratDaBf-xh9jLjd-HLtmXjDAiSHJbJRQbUE3wMQaw_18Jlj9xyCEO-ROH_IsjsS4HlgOAA0ZJioJi9g10r4Vn</recordid><startdate>2024</startdate><enddate>2024</enddate><creator>Zhang, Miao</creator><creator>Shen, Yiqing</creator><creator>Yin, Jun</creator><creator>Lu, Shuai</creator><creator>Wang, Xueqian</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>ESBDL</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>7SR</scope><scope>8BQ</scope><scope>8FD</scope><scope>JG9</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0001-7866-3339</orcidid><orcidid>https://orcid.org/0000-0003-3542-0593</orcidid><orcidid>https://orcid.org/0009-0000-0551-9678</orcidid></search><sort><creationdate>2024</creationdate><title>ADAGENT: Anomaly Detection Agent With Multimodal Large Models in Adverse Environments</title><author>Zhang, Miao ; Shen, Yiqing ; Yin, Jun ; Lu, Shuai ; Wang, Xueqian</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c244t-365e004e10e1092b43526ba2e791fd10a424b9bcb42ebf5f9c53ffba77829f8f3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Accuracy</topic><topic>AI agent</topic><topic>Anomalies</topic><topic>Anomaly detection</topic><topic>Artificial intelligence</topic><topic>Benchmark testing</topic><topic>Benchmarks</topic><topic>Cognition</topic><topic>Context modeling</topic><topic>Error analysis</topic><topic>Error detection</topic><topic>Feature extraction</topic><topic>Image quality</topic><topic>Information retrieval</topic><topic>Lighting</topic><topic>Multimodal language model</topic><topic>Multisensory integration</topic><topic>Performance evaluation</topic><topic>Prompt engineering</topic><topic>Semantics</topic><topic>Training</topic><topic>Visual tasks</topic><topic>Visualization</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Zhang, Miao</creatorcontrib><creatorcontrib>Shen, Yiqing</creatorcontrib><creatorcontrib>Yin, Jun</creatorcontrib><creatorcontrib>Lu, Shuai</creatorcontrib><creatorcontrib>Wang, Xueqian</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005–Present</collection><collection>IEEE Xplore Open Access Journals</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE/IET Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Directory of Open Access Journals</collection><jtitle>IEEE access</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Zhang, Miao</au><au>Shen, Yiqing</au><au>Yin, Jun</au><au>Lu, Shuai</au><au>Wang, Xueqian</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>ADAGENT: Anomaly Detection Agent With Multimodal Large Models in Adverse Environments</atitle><jtitle>IEEE access</jtitle><stitle>Access</stitle><date>2024</date><risdate>2024</risdate><volume>12</volume><spage>172061</spage><epage>172074</epage><pages>172061-172074</pages><issn>2169-3536</issn><eissn>2169-3536</eissn><coden>IAECCG</coden><abstract>Multimodal Language Models (MMLMs), such as LLaVA and GPT-4V, have shown zero-shot generalization capabilities for understanding images and text across various domains. However, their effectiveness in open-world visual tasks, particularly anomaly detection under challenging conditions, such as low light or poor image quality, has yet to be thoroughly investigated. Assessing the robustness and limitations of MMLMs in these scenarios is essential to ensuring their reliability and safety in real-world applications, where the input image quality can vary significantly. To address this gap, we propose a benchmark comprising 460 images captured under challenging conditions, including low light and blurring. This benchmark is specifically designed to evaluate the anomaly detection capabilities of MMLMs. We assess the performance of state-of-the-art MMLMs, such as Qwen-VL-Max-0809, GPT-4V, Gemini-1.5, Claude3-opus, ERNIE-Bot-4, and SparkDesk-v3.5, across six diverse scenes. Our evaluations indicate that these MMLMs struggle with error detection in adverse scenarios, thereby highlighting the need for further investigation into the underlying causes and potential improvement strategies. To tackle these limitations, we introduce a novel anomaly detection agent (ADAGENT) framework, which is an AI agent framework that combines the "Chain of Critical Self-Reflection (CCS)", specialized toolsets, and "Heuristic Retrieval-Augmented Generation (RAG)" to enhance anomaly detection performance with MMLMs. ADAGENT sequentially evaluates abilities, such as text generation, semantic understanding, contextual comprehension, key information extraction, reasoning, and logical thinking. By implementing this framework, we demonstrated a <inline-formula> <tex-math notation="LaTeX">15\%\sim {30\%} </tex-math></inline-formula> improvement in the top-3 accuracy for anomaly detection tasks under adverse conditions, compared with baseline approaches.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/ACCESS.2024.3480250</doi><tpages>14</tpages><orcidid>https://orcid.org/0000-0001-7866-3339</orcidid><orcidid>https://orcid.org/0000-0003-3542-0593</orcidid><orcidid>https://orcid.org/0009-0000-0551-9678</orcidid><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 2169-3536
ispartof	IEEE access, 2024, Vol.12, p.172061-172074
issn	2169-3536 2169-3536
language	eng
recordid	cdi_proquest_journals_3131913658
source	Directory of Open Access Journals; IEEE Xplore Open Access Journals; EZB Electronic Journals Library
subjects	Accuracy AI agent Anomalies Anomaly detection Artificial intelligence Benchmark testing Benchmarks Cognition Context modeling Error analysis Error detection Feature extraction Image quality Information retrieval Lighting Multimodal language model Multisensory integration Performance evaluation Prompt engineering Semantics Training Visual tasks Visualization
title	ADAGENT: Anomaly Detection Agent With Multimodal Large Models in Adverse Environments
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-07T16%3A37%3A09IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=ADAGENT:%20Anomaly%20Detection%20Agent%20With%20Multimodal%20Large%20Models%20in%20Adverse%20Environments&rft.jtitle=IEEE%20access&rft.au=Zhang,%20Miao&rft.date=2024&rft.volume=12&rft.spage=172061&rft.epage=172074&rft.pages=172061-172074&rft.issn=2169-3536&rft.eissn=2169-3536&rft.coden=IAECCG&rft_id=info:doi/10.1109/ACCESS.2024.3480250&rft_dat=%3Cproquest_cross%3E3131913658%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3131913658&rft_id=info:pmid/&rft_ieee_id=10716620&rft_doaj_id=oai_doaj_org_article_eb1ab84728ba408eb0a27adedd4ad095&rfr_iscdi=true