ADAGENT: Anomaly Detection Agent With Multimodal Large Models in Adverse Environments
Multimodal Language Models (MMLMs), such as LLaVA and GPT-4V, have shown zero-shot generalization capabilities for understanding images and text across various domains. However, their effectiveness in open-world visual tasks, particularly anomaly detection under challenging conditions, such as low l...
Gespeichert in:
Veröffentlicht in: | IEEE access 2024, Vol.12, p.172061-172074 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 172074 |
---|---|
container_issue | |
container_start_page | 172061 |
container_title | IEEE access |
container_volume | 12 |
creator | Zhang, Miao Shen, Yiqing Yin, Jun Lu, Shuai Wang, Xueqian |
description | Multimodal Language Models (MMLMs), such as LLaVA and GPT-4V, have shown zero-shot generalization capabilities for understanding images and text across various domains. However, their effectiveness in open-world visual tasks, particularly anomaly detection under challenging conditions, such as low light or poor image quality, has yet to be thoroughly investigated. Assessing the robustness and limitations of MMLMs in these scenarios is essential to ensuring their reliability and safety in real-world applications, where the input image quality can vary significantly. To address this gap, we propose a benchmark comprising 460 images captured under challenging conditions, including low light and blurring. This benchmark is specifically designed to evaluate the anomaly detection capabilities of MMLMs. We assess the performance of state-of-the-art MMLMs, such as Qwen-VL-Max-0809, GPT-4V, Gemini-1.5, Claude3-opus, ERNIE-Bot-4, and SparkDesk-v3.5, across six diverse scenes. Our evaluations indicate that these MMLMs struggle with error detection in adverse scenarios, thereby highlighting the need for further investigation into the underlying causes and potential improvement strategies. To tackle these limitations, we introduce a novel anomaly detection agent (ADAGENT) framework, which is an AI agent framework that combines the "Chain of Critical Self-Reflection (CCS)", specialized toolsets, and "Heuristic Retrieval-Augmented Generation (RAG)" to enhance anomaly detection performance with MMLMs. ADAGENT sequentially evaluates abilities, such as text generation, semantic understanding, contextual comprehension, key information extraction, reasoning, and logical thinking. By implementing this framework, we demonstrated a 15\%\sim {30\%} improvement in the top-3 accuracy for anomaly detection tasks under adverse conditions, compared with baseline approaches. |
doi_str_mv | 10.1109/ACCESS.2024.3480250 |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_3131913658</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10716620</ieee_id><doaj_id>oai_doaj_org_article_eb1ab84728ba408eb0a27adedd4ad095</doaj_id><sourcerecordid>3131913658</sourcerecordid><originalsourceid>FETCH-LOGICAL-c244t-365e004e10e1092b43526ba2e791fd10a424b9bcb42ebf5f9c53ffba77829f8f3</originalsourceid><addsrcrecordid>eNpNUdtKAzEQXUTBUv0CfQj43JrbXuLb0tYLVH3Q4mNINpOast3UZCv490ZXpMPADMM5Zw6cLLsgeEoIFtf1bLZ4eZlSTPmU8QrTHB9lI0oKMWE5K44P9tPsPMYNTlWlU16OslU9r-8WT683qO78VrVfaA49NL3zHarX0PXozfXv6HHf9m7rjWrRUoU1oEdvoI3IJZT5hBABLbpPF3y3TZx4lp1Y1UY4_5vjbHW7eJ3dT5bPdw-zejlpKOf9hBU5YMyB4NSCas5yWmhFoRTEGoIVp1wL3WhOQdvciiZn1mpVlhUVtrJsnD0MusarjdwFt1XhS3rl5O_Bh7VUoXdNCxI0UbriJa204rgCjRUtlQFjuDJY5EnratDaBf-xh9jLjd-HLtmXjDAiSHJbJRQbUE3wMQaw_18Jlj9xyCEO-ROH_IsjsS4HlgOAA0ZJioJi9g10r4Vn</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3131913658</pqid></control><display><type>article</type><title>ADAGENT: Anomaly Detection Agent With Multimodal Large Models in Adverse Environments</title><source>Directory of Open Access Journals</source><source>IEEE Xplore Open Access Journals</source><source>EZB Electronic Journals Library</source><creator>Zhang, Miao ; Shen, Yiqing ; Yin, Jun ; Lu, Shuai ; Wang, Xueqian</creator><creatorcontrib>Zhang, Miao ; Shen, Yiqing ; Yin, Jun ; Lu, Shuai ; Wang, Xueqian</creatorcontrib><description>Multimodal Language Models (MMLMs), such as LLaVA and GPT-4V, have shown zero-shot generalization capabilities for understanding images and text across various domains. However, their effectiveness in open-world visual tasks, particularly anomaly detection under challenging conditions, such as low light or poor image quality, has yet to be thoroughly investigated. Assessing the robustness and limitations of MMLMs in these scenarios is essential to ensuring their reliability and safety in real-world applications, where the input image quality can vary significantly. To address this gap, we propose a benchmark comprising 460 images captured under challenging conditions, including low light and blurring. This benchmark is specifically designed to evaluate the anomaly detection capabilities of MMLMs. We assess the performance of state-of-the-art MMLMs, such as Qwen-VL-Max-0809, GPT-4V, Gemini-1.5, Claude3-opus, ERNIE-Bot-4, and SparkDesk-v3.5, across six diverse scenes. Our evaluations indicate that these MMLMs struggle with error detection in adverse scenarios, thereby highlighting the need for further investigation into the underlying causes and potential improvement strategies. To tackle these limitations, we introduce a novel anomaly detection agent (ADAGENT) framework, which is an AI agent framework that combines the "Chain of Critical Self-Reflection (CCS)", specialized toolsets, and "Heuristic Retrieval-Augmented Generation (RAG)" to enhance anomaly detection performance with MMLMs. ADAGENT sequentially evaluates abilities, such as text generation, semantic understanding, contextual comprehension, key information extraction, reasoning, and logical thinking. By implementing this framework, we demonstrated a <inline-formula> <tex-math notation="LaTeX">15\%\sim {30\%} </tex-math></inline-formula> improvement in the top-3 accuracy for anomaly detection tasks under adverse conditions, compared with baseline approaches.</description><identifier>ISSN: 2169-3536</identifier><identifier>EISSN: 2169-3536</identifier><identifier>DOI: 10.1109/ACCESS.2024.3480250</identifier><identifier>CODEN: IAECCG</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Accuracy ; AI agent ; Anomalies ; Anomaly detection ; Artificial intelligence ; Benchmark testing ; Benchmarks ; Cognition ; Context modeling ; Error analysis ; Error detection ; Feature extraction ; Image quality ; Information retrieval ; Lighting ; Multimodal language model ; Multisensory integration ; Performance evaluation ; Prompt engineering ; Semantics ; Training ; Visual tasks ; Visualization</subject><ispartof>IEEE access, 2024, Vol.12, p.172061-172074</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c244t-365e004e10e1092b43526ba2e791fd10a424b9bcb42ebf5f9c53ffba77829f8f3</cites><orcidid>0000-0001-7866-3339 ; 0000-0003-3542-0593 ; 0009-0000-0551-9678</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10716620$$EHTML$$P50$$Gieee$$Hfree_for_read</linktohtml><link.rule.ids>314,776,780,860,2096,4010,27610,27900,27901,27902,54908</link.rule.ids></links><search><creatorcontrib>Zhang, Miao</creatorcontrib><creatorcontrib>Shen, Yiqing</creatorcontrib><creatorcontrib>Yin, Jun</creatorcontrib><creatorcontrib>Lu, Shuai</creatorcontrib><creatorcontrib>Wang, Xueqian</creatorcontrib><title>ADAGENT: Anomaly Detection Agent With Multimodal Large Models in Adverse Environments</title><title>IEEE access</title><addtitle>Access</addtitle><description>Multimodal Language Models (MMLMs), such as LLaVA and GPT-4V, have shown zero-shot generalization capabilities for understanding images and text across various domains. However, their effectiveness in open-world visual tasks, particularly anomaly detection under challenging conditions, such as low light or poor image quality, has yet to be thoroughly investigated. Assessing the robustness and limitations of MMLMs in these scenarios is essential to ensuring their reliability and safety in real-world applications, where the input image quality can vary significantly. To address this gap, we propose a benchmark comprising 460 images captured under challenging conditions, including low light and blurring. This benchmark is specifically designed to evaluate the anomaly detection capabilities of MMLMs. We assess the performance of state-of-the-art MMLMs, such as Qwen-VL-Max-0809, GPT-4V, Gemini-1.5, Claude3-opus, ERNIE-Bot-4, and SparkDesk-v3.5, across six diverse scenes. Our evaluations indicate that these MMLMs struggle with error detection in adverse scenarios, thereby highlighting the need for further investigation into the underlying causes and potential improvement strategies. To tackle these limitations, we introduce a novel anomaly detection agent (ADAGENT) framework, which is an AI agent framework that combines the "Chain of Critical Self-Reflection (CCS)", specialized toolsets, and "Heuristic Retrieval-Augmented Generation (RAG)" to enhance anomaly detection performance with MMLMs. ADAGENT sequentially evaluates abilities, such as text generation, semantic understanding, contextual comprehension, key information extraction, reasoning, and logical thinking. By implementing this framework, we demonstrated a <inline-formula> <tex-math notation="LaTeX">15\%\sim {30\%} </tex-math></inline-formula> improvement in the top-3 accuracy for anomaly detection tasks under adverse conditions, compared with baseline approaches.</description><subject>Accuracy</subject><subject>AI agent</subject><subject>Anomalies</subject><subject>Anomaly detection</subject><subject>Artificial intelligence</subject><subject>Benchmark testing</subject><subject>Benchmarks</subject><subject>Cognition</subject><subject>Context modeling</subject><subject>Error analysis</subject><subject>Error detection</subject><subject>Feature extraction</subject><subject>Image quality</subject><subject>Information retrieval</subject><subject>Lighting</subject><subject>Multimodal language model</subject><subject>Multisensory integration</subject><subject>Performance evaluation</subject><subject>Prompt engineering</subject><subject>Semantics</subject><subject>Training</subject><subject>Visual tasks</subject><subject>Visualization</subject><issn>2169-3536</issn><issn>2169-3536</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>ESBDL</sourceid><sourceid>RIE</sourceid><sourceid>DOA</sourceid><recordid>eNpNUdtKAzEQXUTBUv0CfQj43JrbXuLb0tYLVH3Q4mNINpOast3UZCv490ZXpMPADMM5Zw6cLLsgeEoIFtf1bLZ4eZlSTPmU8QrTHB9lI0oKMWE5K44P9tPsPMYNTlWlU16OslU9r-8WT683qO78VrVfaA49NL3zHarX0PXozfXv6HHf9m7rjWrRUoU1oEdvoI3IJZT5hBABLbpPF3y3TZx4lp1Y1UY4_5vjbHW7eJ3dT5bPdw-zejlpKOf9hBU5YMyB4NSCas5yWmhFoRTEGoIVp1wL3WhOQdvciiZn1mpVlhUVtrJsnD0MusarjdwFt1XhS3rl5O_Bh7VUoXdNCxI0UbriJa204rgCjRUtlQFjuDJY5EnratDaBf-xh9jLjd-HLtmXjDAiSHJbJRQbUE3wMQaw_18Jlj9xyCEO-ROH_IsjsS4HlgOAA0ZJioJi9g10r4Vn</recordid><startdate>2024</startdate><enddate>2024</enddate><creator>Zhang, Miao</creator><creator>Shen, Yiqing</creator><creator>Yin, Jun</creator><creator>Lu, Shuai</creator><creator>Wang, Xueqian</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>ESBDL</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>7SR</scope><scope>8BQ</scope><scope>8FD</scope><scope>JG9</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0001-7866-3339</orcidid><orcidid>https://orcid.org/0000-0003-3542-0593</orcidid><orcidid>https://orcid.org/0009-0000-0551-9678</orcidid></search><sort><creationdate>2024</creationdate><title>ADAGENT: Anomaly Detection Agent With Multimodal Large Models in Adverse Environments</title><author>Zhang, Miao ; Shen, Yiqing ; Yin, Jun ; Lu, Shuai ; Wang, Xueqian</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c244t-365e004e10e1092b43526ba2e791fd10a424b9bcb42ebf5f9c53ffba77829f8f3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Accuracy</topic><topic>AI agent</topic><topic>Anomalies</topic><topic>Anomaly detection</topic><topic>Artificial intelligence</topic><topic>Benchmark testing</topic><topic>Benchmarks</topic><topic>Cognition</topic><topic>Context modeling</topic><topic>Error analysis</topic><topic>Error detection</topic><topic>Feature extraction</topic><topic>Image quality</topic><topic>Information retrieval</topic><topic>Lighting</topic><topic>Multimodal language model</topic><topic>Multisensory integration</topic><topic>Performance evaluation</topic><topic>Prompt engineering</topic><topic>Semantics</topic><topic>Training</topic><topic>Visual tasks</topic><topic>Visualization</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Zhang, Miao</creatorcontrib><creatorcontrib>Shen, Yiqing</creatorcontrib><creatorcontrib>Yin, Jun</creatorcontrib><creatorcontrib>Lu, Shuai</creatorcontrib><creatorcontrib>Wang, Xueqian</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005–Present</collection><collection>IEEE Xplore Open Access Journals</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE/IET Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Directory of Open Access Journals</collection><jtitle>IEEE access</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Zhang, Miao</au><au>Shen, Yiqing</au><au>Yin, Jun</au><au>Lu, Shuai</au><au>Wang, Xueqian</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>ADAGENT: Anomaly Detection Agent With Multimodal Large Models in Adverse Environments</atitle><jtitle>IEEE access</jtitle><stitle>Access</stitle><date>2024</date><risdate>2024</risdate><volume>12</volume><spage>172061</spage><epage>172074</epage><pages>172061-172074</pages><issn>2169-3536</issn><eissn>2169-3536</eissn><coden>IAECCG</coden><abstract>Multimodal Language Models (MMLMs), such as LLaVA and GPT-4V, have shown zero-shot generalization capabilities for understanding images and text across various domains. However, their effectiveness in open-world visual tasks, particularly anomaly detection under challenging conditions, such as low light or poor image quality, has yet to be thoroughly investigated. Assessing the robustness and limitations of MMLMs in these scenarios is essential to ensuring their reliability and safety in real-world applications, where the input image quality can vary significantly. To address this gap, we propose a benchmark comprising 460 images captured under challenging conditions, including low light and blurring. This benchmark is specifically designed to evaluate the anomaly detection capabilities of MMLMs. We assess the performance of state-of-the-art MMLMs, such as Qwen-VL-Max-0809, GPT-4V, Gemini-1.5, Claude3-opus, ERNIE-Bot-4, and SparkDesk-v3.5, across six diverse scenes. Our evaluations indicate that these MMLMs struggle with error detection in adverse scenarios, thereby highlighting the need for further investigation into the underlying causes and potential improvement strategies. To tackle these limitations, we introduce a novel anomaly detection agent (ADAGENT) framework, which is an AI agent framework that combines the "Chain of Critical Self-Reflection (CCS)", specialized toolsets, and "Heuristic Retrieval-Augmented Generation (RAG)" to enhance anomaly detection performance with MMLMs. ADAGENT sequentially evaluates abilities, such as text generation, semantic understanding, contextual comprehension, key information extraction, reasoning, and logical thinking. By implementing this framework, we demonstrated a <inline-formula> <tex-math notation="LaTeX">15\%\sim {30\%} </tex-math></inline-formula> improvement in the top-3 accuracy for anomaly detection tasks under adverse conditions, compared with baseline approaches.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/ACCESS.2024.3480250</doi><tpages>14</tpages><orcidid>https://orcid.org/0000-0001-7866-3339</orcidid><orcidid>https://orcid.org/0000-0003-3542-0593</orcidid><orcidid>https://orcid.org/0009-0000-0551-9678</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 2169-3536 |
ispartof | IEEE access, 2024, Vol.12, p.172061-172074 |
issn | 2169-3536 2169-3536 |
language | eng |
recordid | cdi_proquest_journals_3131913658 |
source | Directory of Open Access Journals; IEEE Xplore Open Access Journals; EZB Electronic Journals Library |
subjects | Accuracy AI agent Anomalies Anomaly detection Artificial intelligence Benchmark testing Benchmarks Cognition Context modeling Error analysis Error detection Feature extraction Image quality Information retrieval Lighting Multimodal language model Multisensory integration Performance evaluation Prompt engineering Semantics Training Visual tasks Visualization |
title | ADAGENT: Anomaly Detection Agent With Multimodal Large Models in Adverse Environments |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-07T16%3A37%3A09IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=ADAGENT:%20Anomaly%20Detection%20Agent%20With%20Multimodal%20Large%20Models%20in%20Adverse%20Environments&rft.jtitle=IEEE%20access&rft.au=Zhang,%20Miao&rft.date=2024&rft.volume=12&rft.spage=172061&rft.epage=172074&rft.pages=172061-172074&rft.issn=2169-3536&rft.eissn=2169-3536&rft.coden=IAECCG&rft_id=info:doi/10.1109/ACCESS.2024.3480250&rft_dat=%3Cproquest_cross%3E3131913658%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3131913658&rft_id=info:pmid/&rft_ieee_id=10716620&rft_doaj_id=oai_doaj_org_article_eb1ab84728ba408eb0a27adedd4ad095&rfr_iscdi=true |