A multidimensional measurement of photorealistic avatar quality of experience
Photorealistic avatars are human avatars that look, move, and talk like real people. The performance of photorealistic avatars has significantly improved recently based on objective metrics such as PSNR, SSIM, LPIPS, FID, and FVD. However, recent photorealistic avatar publications do not provide sub...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Photorealistic avatars are human avatars that look, move, and talk like real
people. The performance of photorealistic avatars has significantly improved
recently based on objective metrics such as PSNR, SSIM, LPIPS, FID, and FVD.
However, recent photorealistic avatar publications do not provide subjective
tests of the avatars to measure human usability factors. We provide an open
source test framework to subjectively measure photorealistic avatar performance
in ten dimensions: realism, trust, comfortableness using, comfortableness
interacting with, appropriateness for work, creepiness, formality, affinity,
resemblance to the person, and emotion accuracy. We show that the correlation
of nine of these subjective metrics with PSNR, SSIM, LPIPS, FID, and FVD is
weak, and moderate for emotion accuracy. The crowdsourced subjective test
framework is highly reproducible and accurate when compared to a panel of
experts. We analyze a wide range of avatars from photorealistic to cartoon-like
and show that some photorealistic avatars are approaching real video
performance based on these dimensions. We also find that for avatars above a
certain level of realism, eight of these measured dimensions are strongly
correlated. This means that avatars that are not as realistic as real video
will have lower trust, comfortableness using, comfortableness interacting with,
appropriateness for work, formality, and affinity, and higher creepiness
compared to real video. In addition, because there is a strong linear
relationship between avatar affinity and realism, there is no uncanny valley
effect for photorealistic avatars in the telecommunication scenario. We provide
several extensions of this test framework for future work and discuss design
implications for telecommunication systems. The test framework is available at
https://github.com/microsoft/P.910. |
---|---|
DOI: | 10.48550/arxiv.2411.09066 |