A Topic-level Self-Correctional Approach to Mitigate Hallucinations in MLLMs

Aligning the behaviors of Multimodal Large Language Models (MLLMs) with human preferences is crucial for developing robust and trustworthy AI systems. While recent attempts have employed human experts or powerful auxiliary AI systems to provide more accurate preference feedback, such as determining...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	He, Lehan, Chen, Zeren, Shi, Zhelun, Yu, Tianyu, Shao, Jing, Sheng, Lu
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Computation and Language Computer Science - Computer Vision and Pattern Recognition
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	He, Lehan Chen, Zeren Shi, Zhelun Yu, Tianyu Shao, Jing Sheng, Lu
description	Aligning the behaviors of Multimodal Large Language Models (MLLMs) with human preferences is crucial for developing robust and trustworthy AI systems. While recent attempts have employed human experts or powerful auxiliary AI systems to provide more accurate preference feedback, such as determining the preferable responses from MLLMs or directly rewriting hallucination-free responses, extensive resource overhead compromise the scalability of the feedback collection. In this work, we introduce Topic-level Preference Overwriting (TPO), a self-correctional approach that guide the model itself to mitigate its own hallucination at the topic level. Through a deconfounded strategy that replaces each topic within the response with the best or worst alternatives generated by the model itself, TPO creates more contrasting pairwise preference feedback, enhancing the feedback quality without human or proprietary model intervention. Notably, the experimental results demonstrate proposed TPO achieves state-of-the-art performance in trustworthiness, significantly reducing the object hallucinations by 92% and overall hallucinations by 38%. Code, model and dataset are available now.
doi_str_mv	10.48550/arxiv.2411.17265
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2411_17265</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2411_17265</sourcerecordid><originalsourceid>FETCH-arxiv_primary_2411_172653</originalsourceid><addsrcrecordid>eNqFzbEOgjAQgOEuDkZ9ACfvBUCKoK6EaBhgkp1cmqKXnLRpK9G3NxB3p3_5k0-IrUzi7JznyR7dm8Y4zaSM5Sk95ktRF9AaSypiPWqGm-Y-Ko1zWgUyAzIU1jqD6gHBQEOB7hg0VMj8UjTgNHmgAZq6bvxaLHpkrze_rsTuemnLKprdzjp6ovt0k9_N_uH_8QWeLTp3</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>A Topic-level Self-Correctional Approach to Mitigate Hallucinations in MLLMs</title><source>arXiv.org</source><creator>He, Lehan ; Chen, Zeren ; Shi, Zhelun ; Yu, Tianyu ; Shao, Jing ; Sheng, Lu</creator><creatorcontrib>He, Lehan ; Chen, Zeren ; Shi, Zhelun ; Yu, Tianyu ; Shao, Jing ; Sheng, Lu</creatorcontrib><description>Aligning the behaviors of Multimodal Large Language Models (MLLMs) with human preferences is crucial for developing robust and trustworthy AI systems. While recent attempts have employed human experts or powerful auxiliary AI systems to provide more accurate preference feedback, such as determining the preferable responses from MLLMs or directly rewriting hallucination-free responses, extensive resource overhead compromise the scalability of the feedback collection. In this work, we introduce Topic-level Preference Overwriting (TPO), a self-correctional approach that guide the model itself to mitigate its own hallucination at the topic level. Through a deconfounded strategy that replaces each topic within the response with the best or worst alternatives generated by the model itself, TPO creates more contrasting pairwise preference feedback, enhancing the feedback quality without human or proprietary model intervention. Notably, the experimental results demonstrate proposed TPO achieves state-of-the-art performance in trustworthiness, significantly reducing the object hallucinations by 92% and overall hallucinations by 38%. Code, model and dataset are available now.</description><identifier>DOI: 10.48550/arxiv.2411.17265</identifier><language>eng</language><subject>Computer Science - Computation and Language ; Computer Science - Computer Vision and Pattern Recognition</subject><creationdate>2024-11</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2411.17265$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2411.17265$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>He, Lehan</creatorcontrib><creatorcontrib>Chen, Zeren</creatorcontrib><creatorcontrib>Shi, Zhelun</creatorcontrib><creatorcontrib>Yu, Tianyu</creatorcontrib><creatorcontrib>Shao, Jing</creatorcontrib><creatorcontrib>Sheng, Lu</creatorcontrib><title>A Topic-level Self-Correctional Approach to Mitigate Hallucinations in MLLMs</title><description>Aligning the behaviors of Multimodal Large Language Models (MLLMs) with human preferences is crucial for developing robust and trustworthy AI systems. While recent attempts have employed human experts or powerful auxiliary AI systems to provide more accurate preference feedback, such as determining the preferable responses from MLLMs or directly rewriting hallucination-free responses, extensive resource overhead compromise the scalability of the feedback collection. In this work, we introduce Topic-level Preference Overwriting (TPO), a self-correctional approach that guide the model itself to mitigate its own hallucination at the topic level. Through a deconfounded strategy that replaces each topic within the response with the best or worst alternatives generated by the model itself, TPO creates more contrasting pairwise preference feedback, enhancing the feedback quality without human or proprietary model intervention. Notably, the experimental results demonstrate proposed TPO achieves state-of-the-art performance in trustworthiness, significantly reducing the object hallucinations by 92% and overall hallucinations by 38%. Code, model and dataset are available now.</description><subject>Computer Science - Computation and Language</subject><subject>Computer Science - Computer Vision and Pattern Recognition</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNqFzbEOgjAQgOEuDkZ9ACfvBUCKoK6EaBhgkp1cmqKXnLRpK9G3NxB3p3_5k0-IrUzi7JznyR7dm8Y4zaSM5Sk95ktRF9AaSypiPWqGm-Y-Ko1zWgUyAzIU1jqD6gHBQEOB7hg0VMj8UjTgNHmgAZq6bvxaLHpkrze_rsTuemnLKprdzjp6ovt0k9_N_uH_8QWeLTp3</recordid><startdate>20241126</startdate><enddate>20241126</enddate><creator>He, Lehan</creator><creator>Chen, Zeren</creator><creator>Shi, Zhelun</creator><creator>Yu, Tianyu</creator><creator>Shao, Jing</creator><creator>Sheng, Lu</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20241126</creationdate><title>A Topic-level Self-Correctional Approach to Mitigate Hallucinations in MLLMs</title><author>He, Lehan ; Chen, Zeren ; Shi, Zhelun ; Yu, Tianyu ; Shao, Jing ; Sheng, Lu</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-arxiv_primary_2411_172653</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Computation and Language</topic><topic>Computer Science - Computer Vision and Pattern Recognition</topic><toplevel>online_resources</toplevel><creatorcontrib>He, Lehan</creatorcontrib><creatorcontrib>Chen, Zeren</creatorcontrib><creatorcontrib>Shi, Zhelun</creatorcontrib><creatorcontrib>Yu, Tianyu</creatorcontrib><creatorcontrib>Shao, Jing</creatorcontrib><creatorcontrib>Sheng, Lu</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>He, Lehan</au><au>Chen, Zeren</au><au>Shi, Zhelun</au><au>Yu, Tianyu</au><au>Shao, Jing</au><au>Sheng, Lu</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A Topic-level Self-Correctional Approach to Mitigate Hallucinations in MLLMs</atitle><date>2024-11-26</date><risdate>2024</risdate><abstract>Aligning the behaviors of Multimodal Large Language Models (MLLMs) with human preferences is crucial for developing robust and trustworthy AI systems. While recent attempts have employed human experts or powerful auxiliary AI systems to provide more accurate preference feedback, such as determining the preferable responses from MLLMs or directly rewriting hallucination-free responses, extensive resource overhead compromise the scalability of the feedback collection. In this work, we introduce Topic-level Preference Overwriting (TPO), a self-correctional approach that guide the model itself to mitigate its own hallucination at the topic level. Through a deconfounded strategy that replaces each topic within the response with the best or worst alternatives generated by the model itself, TPO creates more contrasting pairwise preference feedback, enhancing the feedback quality without human or proprietary model intervention. Notably, the experimental results demonstrate proposed TPO achieves state-of-the-art performance in trustworthiness, significantly reducing the object hallucinations by 92% and overall hallucinations by 38%. Code, model and dataset are available now.</abstract><doi>10.48550/arxiv.2411.17265</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2411.17265
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2411_17265
source	arXiv.org
subjects	Computer Science - Computation and Language Computer Science - Computer Vision and Pattern Recognition
title	A Topic-level Self-Correctional Approach to Mitigate Hallucinations in MLLMs
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-29T12%3A11%3A26IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20Topic-level%20Self-Correctional%20Approach%20to%20Mitigate%20Hallucinations%20in%20MLLMs&rft.au=He,%20Lehan&rft.date=2024-11-26&rft_id=info:doi/10.48550/arxiv.2411.17265&rft_dat=%3Carxiv_GOX%3E2411_17265%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true