Shapley visual transformers for image-to-text generation

In the contemporary landscape of the web, text-to-image generation stands out as a crucial information service. Recently, deep learning has emerged as the cutting-edge methodology for advancing text-to-image generation systems. However, these models are typically constructed using domain knowledge s...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Applied soft computing 2024-11, Vol.166, p.112205, Article 112205
Hauptverfasser:	Belhadi, Asma, Djenouri, Youcef, Belbachir, Ahmed Nabil, Michalak, Tomasz, Srivastava, Gautam
Format:	Artikel
Sprache:	eng
Schlagworte:	Diffusion models Ensemble pruning Image-to-text generation LSTM Shapley Transformers
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	In the contemporary landscape of the web, text-to-image generation stands out as a crucial information service. Recently, deep learning has emerged as the cutting-edge methodology for advancing text-to-image generation systems. However, these models are typically constructed using domain knowledge specific to the application at hand and a very particular data distribution. Consequently, data scientists must be well-versed in the relevant subject. In this research work, we target a new foundation for text-to-image generation systems by introducing a consensus method that facilitates self-adaptation and flexibility to handle different learning tasks and diverse data distributions. This paper presents I2T-SP (Image-to-Text Generation for Shapley Pruning) as a consensus method for general-purpose intelligence without the assistance of a domain expert. The trained model is developed using a general deep-learning approach that investigates the contribution of each model in the training process. Multiple deep learning models are trained for each set of historical data, and the Shapley Value is determined to compute the contribution of each subset of models in the training. Subsequently, the models are pruned according to their contribution to the learning process. We present the evaluation of the generality of I2T-SP using different datasets with varying shapes and complexities. The results reveal the effectiveness of I2T-SP compared to baseline image-to-text generation solutions. This research marks a significant step towards establishing a more adaptable and broadly applicable foundation for image-to-text generation systems. •A novel generative AI model (I2T-SP) based on ensemble pruning is developed.•Novel LSTM and transformers models for text-to-image generation is proposed.•I2T-SP shows better performance compared to current text-to-image generation solutions.
ISSN:	1568-4946
DOI:	10.1016/j.asoc.2024.112205