CNN-LSTM-BASED DEEP LEARNING FOR AUTOMATIC IMAGE CAPTIONING
The evolution of Computer Vision and Machine Learning allows natural language image description techniques to be more efficient and accurate, through deep neural networks. This study used an encoder-decoder structure for object identification and captioning, through an input image. The proposed mode...
Gespeichert in:
Veröffentlicht in: | ARACÊ 2024-11, Vol.6 (3) |
---|---|
Hauptverfasser: | , , , , , , , |
Format: | Artikel |
Sprache: | eng |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The evolution of Computer Vision and Machine Learning allows natural language image description techniques to be more efficient and accurate, through deep neural networks. This study used an encoder-decoder structure for object identification and captioning, through an input image. The proposed model used the VGG16 and Inception-V3 architectures as encoders and LSTM as decoder. To carry out the experiments, the Flickr8k dataset was used, with 8,000 images. The model was evaluated by the Bleu, Meteor, CIDEr and Rouge metrics. Achieving 58.40% accuracy according to the Bleu metric, thus ensuring human-understandable descriptions. |
---|---|
ISSN: | 2358-2472 |
DOI: | 10.56238/arev6n3-145 |