BodyMetric: Evaluating the Realism of Human Bodies in Text-to-Image Generation
Accurately generating images of human bodies from text remains a challenging problem for state of the art text-to-image models. Commonly observed body-related artifacts include extra or missing limbs, unrealistic poses, blurred body parts, etc. Currently, evaluation of such artifacts relies heavily...
Gespeichert in:
Hauptverfasser: | , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Accurately generating images of human bodies from text remains a challenging
problem for state of the art text-to-image models. Commonly observed
body-related artifacts include extra or missing limbs, unrealistic poses,
blurred body parts, etc. Currently, evaluation of such artifacts relies heavily
on time-consuming human judgments, limiting the ability to benchmark models at
scale. We address this by proposing BodyMetric, a learnable metric that
predicts body realism in images. BodyMetric is trained on realism labels and
multi-modal signals including 3D body representations inferred from the input
image, and textual descriptions. In order to facilitate this approach, we
design an annotation pipeline to collect expert ratings on human body realism
leading to a new dataset for this task, namely, BodyRealism. Ablation studies
support our architectural choices for BodyMetric and the importance of
leveraging a 3D human body prior in capturing body-related artifacts in 2D
images. In comparison to concurrent metrics which evaluate general user
preference in images, BodyMetric specifically reflects body-related artifacts.
We demonstrate the utility of BodyMetric through applications that were
previously infeasible at scale. In particular, we use BodyMetric to benchmark
the generation ability of text-to-image models to produce realistic human
bodies. We also demonstrate the effectiveness of BodyMetric in ranking
generated images based on the predicted realism scores. |
---|---|
DOI: | 10.48550/arxiv.2412.04086 |