GrainedCLIP and DiffusionGrainedCLIP: Text-Guided Advanced Models for Fine-Grained Attribute Face Image Processing
Text-guided image processing has made tremendous progress in recent years. Most existing methods generally focus on using visual-language pre-training models for text-guided image processing. However, their applications to achieve text-guided fine-grained attribute face image processing (e.g., editi...
Gespeichert in:
Veröffentlicht in: | IEEE access 2023, Vol.11, p.99030-99045 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Text-guided image processing has made tremendous progress in recent years. Most existing methods generally focus on using visual-language pre-training models for text-guided image processing. However, their applications to achieve text-guided fine-grained attribute face image processing (e.g., editing a smiling face to change from showing teeth to a closed-mouth smile) lead to poor performance due to the limited fine-grained semantic knowledge learned by existing visual-language pre-training models. To alleviate this problem, we propose a novel visual-language pre-training model based on fine-grained facial attribute features, which we call GrainedCLIP. Based on GrainedCLIP, we further propose a new text-guided fine-grained attribute face image processing model, which we call DiffusionGrainedCLIP. Our experimental results showed that GrainedCLIP outperformed existing methods, achieving 12.61 R @1 and 12.17 R @1 in text-to-image and image-to-text retrieval evaluation metrics, respectively, on the FFHQ dataset. Furthermore, compared to state-of-the-art text-guided face image processing methods, DiffusionGrainedCLIP significantly improved 55.37% in semantic consistency and 49.38% in face identity preservation on the FFHQ dataset. |
---|---|
ISSN: | 2169-3536 2169-3536 |
DOI: | 10.1109/ACCESS.2023.3313248 |