A Topic-level Self-Correctional Approach to Mitigate Hallucinations in MLLMs

Aligning the behaviors of Multimodal Large Language Models (MLLMs) with human preferences is crucial for developing robust and trustworthy AI systems. While recent attempts have employed human experts or powerful auxiliary AI systems to provide more accurate preference feedback, such as determining...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: He, Lehan, Chen, Zeren, Shi, Zhelun, Yu, Tianyu, Shao, Jing, Sheng, Lu
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator He, Lehan
Chen, Zeren
Shi, Zhelun
Yu, Tianyu
Shao, Jing
Sheng, Lu
description Aligning the behaviors of Multimodal Large Language Models (MLLMs) with human preferences is crucial for developing robust and trustworthy AI systems. While recent attempts have employed human experts or powerful auxiliary AI systems to provide more accurate preference feedback, such as determining the preferable responses from MLLMs or directly rewriting hallucination-free responses, extensive resource overhead compromise the scalability of the feedback collection. In this work, we introduce Topic-level Preference Overwriting (TPO), a self-correctional approach that guide the model itself to mitigate its own hallucination at the topic level. Through a deconfounded strategy that replaces each topic within the response with the best or worst alternatives generated by the model itself, TPO creates more contrasting pairwise preference feedback, enhancing the feedback quality without human or proprietary model intervention. Notably, the experimental results demonstrate proposed TPO achieves state-of-the-art performance in trustworthiness, significantly reducing the object hallucinations by 92% and overall hallucinations by 38%. Code, model and dataset are available now.
doi_str_mv 10.48550/arxiv.2411.17265
format Article
fullrecord <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2411_17265</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2411_17265</sourcerecordid><originalsourceid>FETCH-arxiv_primary_2411_172653</originalsourceid><addsrcrecordid>eNqFzbEOgjAQgOEuDkZ9ACfvBUCKoK6EaBhgkp1cmqKXnLRpK9G3NxB3p3_5k0-IrUzi7JznyR7dm8Y4zaSM5Sk95ktRF9AaSypiPWqGm-Y-Ko1zWgUyAzIU1jqD6gHBQEOB7hg0VMj8UjTgNHmgAZq6bvxaLHpkrze_rsTuemnLKprdzjp6ovt0k9_N_uH_8QWeLTp3</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>A Topic-level Self-Correctional Approach to Mitigate Hallucinations in MLLMs</title><source>arXiv.org</source><creator>He, Lehan ; Chen, Zeren ; Shi, Zhelun ; Yu, Tianyu ; Shao, Jing ; Sheng, Lu</creator><creatorcontrib>He, Lehan ; Chen, Zeren ; Shi, Zhelun ; Yu, Tianyu ; Shao, Jing ; Sheng, Lu</creatorcontrib><description>Aligning the behaviors of Multimodal Large Language Models (MLLMs) with human preferences is crucial for developing robust and trustworthy AI systems. While recent attempts have employed human experts or powerful auxiliary AI systems to provide more accurate preference feedback, such as determining the preferable responses from MLLMs or directly rewriting hallucination-free responses, extensive resource overhead compromise the scalability of the feedback collection. In this work, we introduce Topic-level Preference Overwriting (TPO), a self-correctional approach that guide the model itself to mitigate its own hallucination at the topic level. Through a deconfounded strategy that replaces each topic within the response with the best or worst alternatives generated by the model itself, TPO creates more contrasting pairwise preference feedback, enhancing the feedback quality without human or proprietary model intervention. Notably, the experimental results demonstrate proposed TPO achieves state-of-the-art performance in trustworthiness, significantly reducing the object hallucinations by 92% and overall hallucinations by 38%. Code, model and dataset are available now.</description><identifier>DOI: 10.48550/arxiv.2411.17265</identifier><language>eng</language><subject>Computer Science - Computation and Language ; Computer Science - Computer Vision and Pattern Recognition</subject><creationdate>2024-11</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2411.17265$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2411.17265$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>He, Lehan</creatorcontrib><creatorcontrib>Chen, Zeren</creatorcontrib><creatorcontrib>Shi, Zhelun</creatorcontrib><creatorcontrib>Yu, Tianyu</creatorcontrib><creatorcontrib>Shao, Jing</creatorcontrib><creatorcontrib>Sheng, Lu</creatorcontrib><title>A Topic-level Self-Correctional Approach to Mitigate Hallucinations in MLLMs</title><description>Aligning the behaviors of Multimodal Large Language Models (MLLMs) with human preferences is crucial for developing robust and trustworthy AI systems. While recent attempts have employed human experts or powerful auxiliary AI systems to provide more accurate preference feedback, such as determining the preferable responses from MLLMs or directly rewriting hallucination-free responses, extensive resource overhead compromise the scalability of the feedback collection. In this work, we introduce Topic-level Preference Overwriting (TPO), a self-correctional approach that guide the model itself to mitigate its own hallucination at the topic level. Through a deconfounded strategy that replaces each topic within the response with the best or worst alternatives generated by the model itself, TPO creates more contrasting pairwise preference feedback, enhancing the feedback quality without human or proprietary model intervention. Notably, the experimental results demonstrate proposed TPO achieves state-of-the-art performance in trustworthiness, significantly reducing the object hallucinations by 92% and overall hallucinations by 38%. Code, model and dataset are available now.</description><subject>Computer Science - Computation and Language</subject><subject>Computer Science - Computer Vision and Pattern Recognition</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNqFzbEOgjAQgOEuDkZ9ACfvBUCKoK6EaBhgkp1cmqKXnLRpK9G3NxB3p3_5k0-IrUzi7JznyR7dm8Y4zaSM5Sk95ktRF9AaSypiPWqGm-Y-Ko1zWgUyAzIU1jqD6gHBQEOB7hg0VMj8UjTgNHmgAZq6bvxaLHpkrze_rsTuemnLKprdzjp6ovt0k9_N_uH_8QWeLTp3</recordid><startdate>20241126</startdate><enddate>20241126</enddate><creator>He, Lehan</creator><creator>Chen, Zeren</creator><creator>Shi, Zhelun</creator><creator>Yu, Tianyu</creator><creator>Shao, Jing</creator><creator>Sheng, Lu</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20241126</creationdate><title>A Topic-level Self-Correctional Approach to Mitigate Hallucinations in MLLMs</title><author>He, Lehan ; Chen, Zeren ; Shi, Zhelun ; Yu, Tianyu ; Shao, Jing ; Sheng, Lu</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-arxiv_primary_2411_172653</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Computation and Language</topic><topic>Computer Science - Computer Vision and Pattern Recognition</topic><toplevel>online_resources</toplevel><creatorcontrib>He, Lehan</creatorcontrib><creatorcontrib>Chen, Zeren</creatorcontrib><creatorcontrib>Shi, Zhelun</creatorcontrib><creatorcontrib>Yu, Tianyu</creatorcontrib><creatorcontrib>Shao, Jing</creatorcontrib><creatorcontrib>Sheng, Lu</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>He, Lehan</au><au>Chen, Zeren</au><au>Shi, Zhelun</au><au>Yu, Tianyu</au><au>Shao, Jing</au><au>Sheng, Lu</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A Topic-level Self-Correctional Approach to Mitigate Hallucinations in MLLMs</atitle><date>2024-11-26</date><risdate>2024</risdate><abstract>Aligning the behaviors of Multimodal Large Language Models (MLLMs) with human preferences is crucial for developing robust and trustworthy AI systems. While recent attempts have employed human experts or powerful auxiliary AI systems to provide more accurate preference feedback, such as determining the preferable responses from MLLMs or directly rewriting hallucination-free responses, extensive resource overhead compromise the scalability of the feedback collection. In this work, we introduce Topic-level Preference Overwriting (TPO), a self-correctional approach that guide the model itself to mitigate its own hallucination at the topic level. Through a deconfounded strategy that replaces each topic within the response with the best or worst alternatives generated by the model itself, TPO creates more contrasting pairwise preference feedback, enhancing the feedback quality without human or proprietary model intervention. Notably, the experimental results demonstrate proposed TPO achieves state-of-the-art performance in trustworthiness, significantly reducing the object hallucinations by 92% and overall hallucinations by 38%. Code, model and dataset are available now.</abstract><doi>10.48550/arxiv.2411.17265</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier DOI: 10.48550/arxiv.2411.17265
ispartof
issn
language eng
recordid cdi_arxiv_primary_2411_17265
source arXiv.org
subjects Computer Science - Computation and Language
Computer Science - Computer Vision and Pattern Recognition
title A Topic-level Self-Correctional Approach to Mitigate Hallucinations in MLLMs
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-29T12%3A11%3A26IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20Topic-level%20Self-Correctional%20Approach%20to%20Mitigate%20Hallucinations%20in%20MLLMs&rft.au=He,%20Lehan&rft.date=2024-11-26&rft_id=info:doi/10.48550/arxiv.2411.17265&rft_dat=%3Carxiv_GOX%3E2411_17265%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true