A Novel Evaluation Framework for Image2Text Generation
Evaluating the quality of automatically generated image descriptions is challenging, requiring metrics that capture various aspects such as grammaticality, coverage, correctness, and truthfulness. While human evaluation offers valuable insights, its cost and time-consuming nature pose limitations. E...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Evaluating the quality of automatically generated image descriptions is
challenging, requiring metrics that capture various aspects such as
grammaticality, coverage, correctness, and truthfulness. While human evaluation
offers valuable insights, its cost and time-consuming nature pose limitations.
Existing automated metrics like BLEU, ROUGE, METEOR, and CIDEr aim to bridge
this gap but often show weak correlations with human judgment. We address this
challenge by introducing a novel evaluation framework rooted in a modern large
language model (LLM), such as GPT-4 or Gemini, capable of image generation. In
our proposed framework, we begin by feeding an input image into a designated
image captioning model, chosen for evaluation, to generate a textual
description. Using this description, an LLM then creates a new image. By
extracting features from both the original and LLM-created images, we measure
their similarity using a designated similarity metric. A high similarity score
suggests that the image captioning model has accurately generated textual
descriptions, while a low similarity score indicates discrepancies, revealing
potential shortcomings in the model's performance. Human-annotated reference
captions are not required in our proposed evaluation framework, which serves as
a valuable tool for evaluating the effectiveness of image captioning models.
Its efficacy is confirmed through human evaluation. |
---|---|
DOI: | 10.48550/arxiv.2408.01723 |