CNN-LSTM-BASED DEEP LEARNING FOR AUTOMATIC IMAGE CAPTIONING

The evolution of Computer Vision and Machine Learning allows natural language image description techniques to be more efficient and accurate, through deep neural networks. This study used an encoder-decoder structure for object identification and captioning, through an input image. The proposed mode...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	ARACÊ 2024-11, Vol.6 (3)
Hauptverfasser:	Ribeiro, Maria Vitória Sousa, Nogueira, Tiago do Carmo, da Cruz Junior, Gelson, Vinhal, Cássio Dener Noronha, Ullmann, Matheus Rudolfo Diedrich, Ferreira, Deller James, Carvalho, Caio Henrique Rodrigues, Santana, Danyele de Oliveira
Format:	Artikel
Sprache:	eng
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	The evolution of Computer Vision and Machine Learning allows natural language image description techniques to be more efficient and accurate, through deep neural networks. This study used an encoder-decoder structure for object identification and captioning, through an input image. The proposed model used the VGG16 and Inception-V3 architectures as encoders and LSTM as decoder. To carry out the experiments, the Flickr8k dataset was used, with 8,000 images. The model was evaluated by the Bleu, Meteor, CIDEr and Rouge metrics. Achieving 58.40% accuracy according to the Bleu metric, thus ensuring human-understandable descriptions.
ISSN:	2358-2472
DOI:	10.56238/arev6n3-145