GAGPT-2: A Geometric Attention-based GPT-2 Framework for Image Captioning in Hindi

Image captioning frameworks usually employ an encoder-decoder paradigm, with the encoder receiving abstract image feature vectors as input and the decoder for language modeling. Nowadays, most prominent architectures employ features from region proposals derived from object detection modules. In thi...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:ACM transactions on Asian and low-resource language information processing 2023-10, Vol.22 (10), p.1-16, Article 241
Hauptverfasser: Mishra, Santosh Kumar, Chakraborty, Soham, Saha, Sriparna, Bhattacharyya, Pushpak
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Image captioning frameworks usually employ an encoder-decoder paradigm, with the encoder receiving abstract image feature vectors as input and the decoder for language modeling. Nowadays, most prominent architectures employ features from region proposals derived from object detection modules. In this work, we propose a novel architecture for image captioning. We employ the object detection module integrated with transformer architecture as an encoder and GPT-2 (Generative Pre-trained Transformer) as a decoder. The encoder utilizes the information of the spatial relationships among detected objects. We introduce a unique methodology for image caption generation in Hindi, which is widely spoken in South Asia and India and is the world’s third most spoken language as well as India’s official language. In terms of BLEU scores, the proposed approach’s performance is comparable to those of other baselines, and the results illustrate that the proposed approach outperforms the other baselines. The efficacy of the proposed approach in generating correct captions is further determined by human assessment in terms of adequacy and fluency.
ISSN:2375-4699
2375-4702
DOI:10.1145/3622936