A Topic-level Self-Correctional Approach to Mitigate Hallucinations in MLLMs
Aligning the behaviors of Multimodal Large Language Models (MLLMs) with human preferences is crucial for developing robust and trustworthy AI systems. While recent attempts have employed human experts or powerful auxiliary AI systems to provide more accurate preference feedback, such as determining...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Aligning the behaviors of Multimodal Large Language Models (MLLMs) with human
preferences is crucial for developing robust and trustworthy AI systems. While
recent attempts have employed human experts or powerful auxiliary AI systems to
provide more accurate preference feedback, such as determining the preferable
responses from MLLMs or directly rewriting hallucination-free responses,
extensive resource overhead compromise the scalability of the feedback
collection. In this work, we introduce Topic-level Preference Overwriting
(TPO), a self-correctional approach that guide the model itself to mitigate its
own hallucination at the topic level. Through a deconfounded strategy that
replaces each topic within the response with the best or worst alternatives
generated by the model itself, TPO creates more contrasting pairwise preference
feedback, enhancing the feedback quality without human or proprietary model
intervention. Notably, the experimental results demonstrate proposed TPO
achieves state-of-the-art performance in trustworthiness, significantly
reducing the object hallucinations by 92% and overall hallucinations by 38%.
Code, model and dataset are available now. |
---|---|
DOI: | 10.48550/arxiv.2411.17265 |