GAGPT-2: A Geometric Attention-based GPT-2 Framework for Image Captioning in Hindi

Image captioning frameworks usually employ an encoder-decoder paradigm, with the encoder receiving abstract image feature vectors as input and the decoder for language modeling. Nowadays, most prominent architectures employ features from region proposals derived from object detection modules. In thi...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	ACM transactions on Asian and low-resource language information processing 2023-10, Vol.22 (10), p.1-16, Article 241
Hauptverfasser:	Mishra, Santosh Kumar, Chakraborty, Soham, Saha, Sriparna, Bhattacharyya, Pushpak
Format:	Artikel
Sprache:	eng
Schlagworte:	Computing methodologies Natural language generation
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	16
container_issue	10
container_start_page	1
container_title	ACM transactions on Asian and low-resource language information processing
container_volume	22
creator	Mishra, Santosh Kumar Chakraborty, Soham Saha, Sriparna Bhattacharyya, Pushpak
description	Image captioning frameworks usually employ an encoder-decoder paradigm, with the encoder receiving abstract image feature vectors as input and the decoder for language modeling. Nowadays, most prominent architectures employ features from region proposals derived from object detection modules. In this work, we propose a novel architecture for image captioning. We employ the object detection module integrated with transformer architecture as an encoder and GPT-2 (Generative Pre-trained Transformer) as a decoder. The encoder utilizes the information of the spatial relationships among detected objects. We introduce a unique methodology for image caption generation in Hindi, which is widely spoken in South Asia and India and is the world’s third most spoken language as well as India’s official language. In terms of BLEU scores, the proposed approach’s performance is comparable to those of other baselines, and the results illustrate that the proposed approach outperforms the other baselines. The efficacy of the proposed approach in generating correct captions is further determined by human assessment in terms of adequacy and fluency.
doi_str_mv	10.1145/3622936
format	Article
fullrecord	<record><control><sourceid>acm_cross</sourceid><recordid>TN_cdi_crossref_primary_10_1145_3622936</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3622936</sourcerecordid><originalsourceid>FETCH-LOGICAL-a202t-85a8a6669408ebcc3ca28d5bc66cb71ac560046cdd7cf979cf9e3718cd5713c13</originalsourceid><addsrcrecordid>eNo90M9LwzAUB_AgCo45vHvKzVM1P5qXxlsZrhsMFJnnkr6kI2rbkRTE_97NbV7ee_D98A5fQm45e-A8V48ShDASLshESK2yXDNxeb7BmGsyS-mDMcZzDcD4hLxVZfW6ycQTLWnlh86PMSAtx9H3Yxj6rLHJO_pH6CLazn8P8ZO2Q6Srzm49ndvdwYV-S0NPl6F34YZctfYr-dlpT8n74nkzX2brl2o1L9eZFUyMWaFsYQHA5KzwDaJEKwqnGgTARnOLChjLAZ3T2Bpt9sNLzQt0SnOJXE7J_fEvxiGl6Nt6F0Nn40_NWX1ooz61sZd3R2mx-0fn8BehuFdi</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>GAGPT-2: A Geometric Attention-based GPT-2 Framework for Image Captioning in Hindi</title><source>Access via ACM Digital Library</source><creator>Mishra, Santosh Kumar ; Chakraborty, Soham ; Saha, Sriparna ; Bhattacharyya, Pushpak</creator><creatorcontrib>Mishra, Santosh Kumar ; Chakraborty, Soham ; Saha, Sriparna ; Bhattacharyya, Pushpak</creatorcontrib><description>Image captioning frameworks usually employ an encoder-decoder paradigm, with the encoder receiving abstract image feature vectors as input and the decoder for language modeling. Nowadays, most prominent architectures employ features from region proposals derived from object detection modules. In this work, we propose a novel architecture for image captioning. We employ the object detection module integrated with transformer architecture as an encoder and GPT-2 (Generative Pre-trained Transformer) as a decoder. The encoder utilizes the information of the spatial relationships among detected objects. We introduce a unique methodology for image caption generation in Hindi, which is widely spoken in South Asia and India and is the world’s third most spoken language as well as India’s official language. In terms of BLEU scores, the proposed approach’s performance is comparable to those of other baselines, and the results illustrate that the proposed approach outperforms the other baselines. The efficacy of the proposed approach in generating correct captions is further determined by human assessment in terms of adequacy and fluency.</description><identifier>ISSN: 2375-4699</identifier><identifier>EISSN: 2375-4702</identifier><identifier>DOI: 10.1145/3622936</identifier><language>eng</language><publisher>New York, NY: ACM</publisher><subject>Computing methodologies ; Natural language generation</subject><ispartof>ACM transactions on Asian and low-resource language information processing, 2023-10, Vol.22 (10), p.1-16, Article 241</ispartof><rights>Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-a202t-85a8a6669408ebcc3ca28d5bc66cb71ac560046cdd7cf979cf9e3718cd5713c13</cites><orcidid>0000-0001-5458-9381 ; 0000-0003-4639-5506 ; 0000-0001-5319-5508 ; 0009-0004-2675-9418</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://dl.acm.org/doi/pdf/10.1145/3622936$$EPDF$$P50$$Gacm$$H</linktopdf><link.rule.ids>315,781,785,2283,27926,27927,40198,76230</link.rule.ids></links><search><creatorcontrib>Mishra, Santosh Kumar</creatorcontrib><creatorcontrib>Chakraborty, Soham</creatorcontrib><creatorcontrib>Saha, Sriparna</creatorcontrib><creatorcontrib>Bhattacharyya, Pushpak</creatorcontrib><title>GAGPT-2: A Geometric Attention-based GPT-2 Framework for Image Captioning in Hindi</title><title>ACM transactions on Asian and low-resource language information processing</title><addtitle>ACM TALLIP</addtitle><description>Image captioning frameworks usually employ an encoder-decoder paradigm, with the encoder receiving abstract image feature vectors as input and the decoder for language modeling. Nowadays, most prominent architectures employ features from region proposals derived from object detection modules. In this work, we propose a novel architecture for image captioning. We employ the object detection module integrated with transformer architecture as an encoder and GPT-2 (Generative Pre-trained Transformer) as a decoder. The encoder utilizes the information of the spatial relationships among detected objects. We introduce a unique methodology for image caption generation in Hindi, which is widely spoken in South Asia and India and is the world’s third most spoken language as well as India’s official language. In terms of BLEU scores, the proposed approach’s performance is comparable to those of other baselines, and the results illustrate that the proposed approach outperforms the other baselines. The efficacy of the proposed approach in generating correct captions is further determined by human assessment in terms of adequacy and fluency.</description><subject>Computing methodologies</subject><subject>Natural language generation</subject><issn>2375-4699</issn><issn>2375-4702</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><recordid>eNo90M9LwzAUB_AgCo45vHvKzVM1P5qXxlsZrhsMFJnnkr6kI2rbkRTE_97NbV7ee_D98A5fQm45e-A8V48ShDASLshESK2yXDNxeb7BmGsyS-mDMcZzDcD4hLxVZfW6ycQTLWnlh86PMSAtx9H3Yxj6rLHJO_pH6CLazn8P8ZO2Q6Srzm49ndvdwYV-S0NPl6F34YZctfYr-dlpT8n74nkzX2brl2o1L9eZFUyMWaFsYQHA5KzwDaJEKwqnGgTARnOLChjLAZ3T2Bpt9sNLzQt0SnOJXE7J_fEvxiGl6Nt6F0Nn40_NWX1ooz61sZd3R2mx-0fn8BehuFdi</recordid><startdate>20231014</startdate><enddate>20231014</enddate><creator>Mishra, Santosh Kumar</creator><creator>Chakraborty, Soham</creator><creator>Saha, Sriparna</creator><creator>Bhattacharyya, Pushpak</creator><general>ACM</general><scope>AAYXX</scope><scope>CITATION</scope><orcidid>https://orcid.org/0000-0001-5458-9381</orcidid><orcidid>https://orcid.org/0000-0003-4639-5506</orcidid><orcidid>https://orcid.org/0000-0001-5319-5508</orcidid><orcidid>https://orcid.org/0009-0004-2675-9418</orcidid></search><sort><creationdate>20231014</creationdate><title>GAGPT-2: A Geometric Attention-based GPT-2 Framework for Image Captioning in Hindi</title><author>Mishra, Santosh Kumar ; Chakraborty, Soham ; Saha, Sriparna ; Bhattacharyya, Pushpak</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a202t-85a8a6669408ebcc3ca28d5bc66cb71ac560046cdd7cf979cf9e3718cd5713c13</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Computing methodologies</topic><topic>Natural language generation</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Mishra, Santosh Kumar</creatorcontrib><creatorcontrib>Chakraborty, Soham</creatorcontrib><creatorcontrib>Saha, Sriparna</creatorcontrib><creatorcontrib>Bhattacharyya, Pushpak</creatorcontrib><collection>CrossRef</collection><jtitle>ACM transactions on Asian and low-resource language information processing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Mishra, Santosh Kumar</au><au>Chakraborty, Soham</au><au>Saha, Sriparna</au><au>Bhattacharyya, Pushpak</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>GAGPT-2: A Geometric Attention-based GPT-2 Framework for Image Captioning in Hindi</atitle><jtitle>ACM transactions on Asian and low-resource language information processing</jtitle><stitle>ACM TALLIP</stitle><date>2023-10-14</date><risdate>2023</risdate><volume>22</volume><issue>10</issue><spage>1</spage><epage>16</epage><pages>1-16</pages><artnum>241</artnum><issn>2375-4699</issn><eissn>2375-4702</eissn><abstract>Image captioning frameworks usually employ an encoder-decoder paradigm, with the encoder receiving abstract image feature vectors as input and the decoder for language modeling. Nowadays, most prominent architectures employ features from region proposals derived from object detection modules. In this work, we propose a novel architecture for image captioning. We employ the object detection module integrated with transformer architecture as an encoder and GPT-2 (Generative Pre-trained Transformer) as a decoder. The encoder utilizes the information of the spatial relationships among detected objects. We introduce a unique methodology for image caption generation in Hindi, which is widely spoken in South Asia and India and is the world’s third most spoken language as well as India’s official language. In terms of BLEU scores, the proposed approach’s performance is comparable to those of other baselines, and the results illustrate that the proposed approach outperforms the other baselines. The efficacy of the proposed approach in generating correct captions is further determined by human assessment in terms of adequacy and fluency.</abstract><cop>New York, NY</cop><pub>ACM</pub><doi>10.1145/3622936</doi><tpages>16</tpages><orcidid>https://orcid.org/0000-0001-5458-9381</orcidid><orcidid>https://orcid.org/0000-0003-4639-5506</orcidid><orcidid>https://orcid.org/0000-0001-5319-5508</orcidid><orcidid>https://orcid.org/0009-0004-2675-9418</orcidid><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 2375-4699
ispartof	ACM transactions on Asian and low-resource language information processing, 2023-10, Vol.22 (10), p.1-16, Article 241
issn	2375-4699 2375-4702
language	eng
recordid	cdi_crossref_primary_10_1145_3622936
source	Access via ACM Digital Library
subjects	Computing methodologies Natural language generation
title	GAGPT-2: A Geometric Attention-based GPT-2 Framework for Image Captioning in Hindi
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-18T00%3A26%3A46IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-acm_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=GAGPT-2:%20A%20Geometric%20Attention-based%20GPT-2%20Framework%20for%20Image%20Captioning%20in%20Hindi&rft.jtitle=ACM%20transactions%20on%20Asian%20and%20low-resource%20language%20information%20processing&rft.au=Mishra,%20Santosh%20Kumar&rft.date=2023-10-14&rft.volume=22&rft.issue=10&rft.spage=1&rft.epage=16&rft.pages=1-16&rft.artnum=241&rft.issn=2375-4699&rft.eissn=2375-4702&rft_id=info:doi/10.1145/3622936&rft_dat=%3Cacm_cross%3E3622936%3C/acm_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true