CVQA: Culturally-diverse Multilingual Visual Question Answering Benchmark
Visual Question Answering (VQA) is an important task in multimodal AI, and it is often used to test the ability of vision-language models to understand and reason on knowledge present in both visual and textual data. However, most of the current VQA models use datasets that are primarily focused on...
Gespeichert in:
Hauptverfasser: | , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | |
container_volume | |
creator | Romero, David Lyu, Chenyang Wibowo, Haryo Akbarianto Lynn, Teresa Hamed, Injy Kishore, Aditya Nanda Mandal, Aishik Dragonetti, Alina Abzaliev, Artem Tonja, Atnafu Lambebo Balcha, Bontu Fufa Whitehouse, Chenxi Salamea, Christian Velasco, Dan John Adelani, David Ifeoluwa Meur, David Le Villa-Cueva, Emilio Koto, Fajri Farooqui, Fauzan Belcavello, Frederico Batnasan, Ganzorig Vallejo, Gisela Caulfield, Grainne Ivetta, Guido Song, Haiyue Ademtew, Henok Biadglign Maina, Hernán Lovenia, Holy Azime, Israel Abebe Cruz, Jan Christian Blaise Gala, Jay Geng, Jiahui Ortiz-Barajas, Jesus-German Baek, Jinheon Dunstan, Jocelyn Alemany, Laura Alonso Nagasinghe, Kumaranage Ravindu Yasas Benotti, Luciana D'Haro, Luis Fernando Viridiano, Marcelo Estecha-Garitagoitia, Marcos Cabrera, Maria Camila Buitrago Rodríguez-Cantelar, Mario Jouitteau, Mélanie Mihaylov, Mihail Imam, Mohamed Fazli Mohamed Adilazuarda, Muhammad Farid Gochoo, Munkhjargal Otgonbold, Munkh-Erdene Etori, Naome Niyomugisha, Olivier Silva, Paula Mónica Chitale, Pranjal Dabre, Raj Chevi, Rendi Zhang, Ruochen Diandaru, Ryandito Cahyawijaya, Samuel Góngora, Santiago Jeong, Soyeong Purkayastha, Sukannya Kuribayashi, Tatsuki Clifford, Teresa Jayakumar, Thanmay Torrent, Tiago Timponi Ehsan, Toqeer Araujo, Vladimir Kementchedjhieva, Yova Burzo, Zara Lim, Zheng Wei Yong, Zheng Xin Ignat, Oana Nwatu, Joan Mihalcea, Rada Solorio, Thamar Aji, Alham Fikri |
description | Visual Question Answering (VQA) is an important task in multimodal AI, and it
is often used to test the ability of vision-language models to understand and
reason on knowledge present in both visual and textual data. However, most of
the current VQA models use datasets that are primarily focused on English and a
few major world languages, with images that are typically Western-centric.
While recent efforts have tried to increase the number of languages covered on
VQA datasets, they still lack diversity in low-resource languages. More
importantly, although these datasets often extend their linguistic range via
translation or some other approaches, they usually keep images the same,
resulting in narrow cultural representation. To address these limitations, we
construct CVQA, a new Culturally-diverse multilingual Visual Question Answering
benchmark, designed to cover a rich set of languages and cultures, where we
engage native speakers and cultural experts in the data collection process. As
a result, CVQA includes culturally-driven images and questions from across 30
countries on four continents, covering 31 languages with 13 scripts, providing
a total of 10k questions. We then benchmark several Multimodal Large Language
Models (MLLMs) on CVQA, and show that the dataset is challenging for the
current state-of-the-art models. This benchmark can serve as a probing
evaluation suite for assessing the cultural capability and bias of multimodal
models and hopefully encourage more research efforts toward increasing cultural
awareness and linguistic diversity in this field. |
doi_str_mv | 10.48550/arxiv.2406.05967 |
format | Article |
fullrecord | <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2406_05967</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2406_05967</sourcerecordid><originalsourceid>FETCH-LOGICAL-a677-258885d1326ecaed58a26f522c28d6c9c5455c5abddd5a0b2f32f90127bdf9053</originalsourceid><addsrcrecordid>eNotj8FOwzAQRH3hgFo-gBP-gQRnk3Wc3kJUoFIRqlT1Gm1sByxMqOym0L8nLZyeNCPN6DF2m4m0UIjinsKPO6ZQCJkKrGR5zVbNblMveDP6wxjI-1Ni3NGGaPnLFDnvhreRPN-5eMZmtPHgvgZeD_HbhqnkD3bQ758UPubsqicf7c0_Z2z7uNw2z8n69WnV1OuEZFkmgEopNFkO0mqyBhWB7BFAgzJSVxoLRI3UGWOQRAd9Dn0lMig7MxHzGbv7m724tPvgpvNTe3ZqL075LxojR4M</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>CVQA: Culturally-diverse Multilingual Visual Question Answering Benchmark</title><source>arXiv.org</source><creator>Romero, David ; Lyu, Chenyang ; Wibowo, Haryo Akbarianto ; Lynn, Teresa ; Hamed, Injy ; Kishore, Aditya Nanda ; Mandal, Aishik ; Dragonetti, Alina ; Abzaliev, Artem ; Tonja, Atnafu Lambebo ; Balcha, Bontu Fufa ; Whitehouse, Chenxi ; Salamea, Christian ; Velasco, Dan John ; Adelani, David Ifeoluwa ; Meur, David Le ; Villa-Cueva, Emilio ; Koto, Fajri ; Farooqui, Fauzan ; Belcavello, Frederico ; Batnasan, Ganzorig ; Vallejo, Gisela ; Caulfield, Grainne ; Ivetta, Guido ; Song, Haiyue ; Ademtew, Henok Biadglign ; Maina, Hernán ; Lovenia, Holy ; Azime, Israel Abebe ; Cruz, Jan Christian Blaise ; Gala, Jay ; Geng, Jiahui ; Ortiz-Barajas, Jesus-German ; Baek, Jinheon ; Dunstan, Jocelyn ; Alemany, Laura Alonso ; Nagasinghe, Kumaranage Ravindu Yasas ; Benotti, Luciana ; D'Haro, Luis Fernando ; Viridiano, Marcelo ; Estecha-Garitagoitia, Marcos ; Cabrera, Maria Camila Buitrago ; Rodríguez-Cantelar, Mario ; Jouitteau, Mélanie ; Mihaylov, Mihail ; Imam, Mohamed Fazli Mohamed ; Adilazuarda, Muhammad Farid ; Gochoo, Munkhjargal ; Otgonbold, Munkh-Erdene ; Etori, Naome ; Niyomugisha, Olivier ; Silva, Paula Mónica ; Chitale, Pranjal ; Dabre, Raj ; Chevi, Rendi ; Zhang, Ruochen ; Diandaru, Ryandito ; Cahyawijaya, Samuel ; Góngora, Santiago ; Jeong, Soyeong ; Purkayastha, Sukannya ; Kuribayashi, Tatsuki ; Clifford, Teresa ; Jayakumar, Thanmay ; Torrent, Tiago Timponi ; Ehsan, Toqeer ; Araujo, Vladimir ; Kementchedjhieva, Yova ; Burzo, Zara ; Lim, Zheng Wei ; Yong, Zheng Xin ; Ignat, Oana ; Nwatu, Joan ; Mihalcea, Rada ; Solorio, Thamar ; Aji, Alham Fikri</creator><creatorcontrib>Romero, David ; Lyu, Chenyang ; Wibowo, Haryo Akbarianto ; Lynn, Teresa ; Hamed, Injy ; Kishore, Aditya Nanda ; Mandal, Aishik ; Dragonetti, Alina ; Abzaliev, Artem ; Tonja, Atnafu Lambebo ; Balcha, Bontu Fufa ; Whitehouse, Chenxi ; Salamea, Christian ; Velasco, Dan John ; Adelani, David Ifeoluwa ; Meur, David Le ; Villa-Cueva, Emilio ; Koto, Fajri ; Farooqui, Fauzan ; Belcavello, Frederico ; Batnasan, Ganzorig ; Vallejo, Gisela ; Caulfield, Grainne ; Ivetta, Guido ; Song, Haiyue ; Ademtew, Henok Biadglign ; Maina, Hernán ; Lovenia, Holy ; Azime, Israel Abebe ; Cruz, Jan Christian Blaise ; Gala, Jay ; Geng, Jiahui ; Ortiz-Barajas, Jesus-German ; Baek, Jinheon ; Dunstan, Jocelyn ; Alemany, Laura Alonso ; Nagasinghe, Kumaranage Ravindu Yasas ; Benotti, Luciana ; D'Haro, Luis Fernando ; Viridiano, Marcelo ; Estecha-Garitagoitia, Marcos ; Cabrera, Maria Camila Buitrago ; Rodríguez-Cantelar, Mario ; Jouitteau, Mélanie ; Mihaylov, Mihail ; Imam, Mohamed Fazli Mohamed ; Adilazuarda, Muhammad Farid ; Gochoo, Munkhjargal ; Otgonbold, Munkh-Erdene ; Etori, Naome ; Niyomugisha, Olivier ; Silva, Paula Mónica ; Chitale, Pranjal ; Dabre, Raj ; Chevi, Rendi ; Zhang, Ruochen ; Diandaru, Ryandito ; Cahyawijaya, Samuel ; Góngora, Santiago ; Jeong, Soyeong ; Purkayastha, Sukannya ; Kuribayashi, Tatsuki ; Clifford, Teresa ; Jayakumar, Thanmay ; Torrent, Tiago Timponi ; Ehsan, Toqeer ; Araujo, Vladimir ; Kementchedjhieva, Yova ; Burzo, Zara ; Lim, Zheng Wei ; Yong, Zheng Xin ; Ignat, Oana ; Nwatu, Joan ; Mihalcea, Rada ; Solorio, Thamar ; Aji, Alham Fikri</creatorcontrib><description>Visual Question Answering (VQA) is an important task in multimodal AI, and it
is often used to test the ability of vision-language models to understand and
reason on knowledge present in both visual and textual data. However, most of
the current VQA models use datasets that are primarily focused on English and a
few major world languages, with images that are typically Western-centric.
While recent efforts have tried to increase the number of languages covered on
VQA datasets, they still lack diversity in low-resource languages. More
importantly, although these datasets often extend their linguistic range via
translation or some other approaches, they usually keep images the same,
resulting in narrow cultural representation. To address these limitations, we
construct CVQA, a new Culturally-diverse multilingual Visual Question Answering
benchmark, designed to cover a rich set of languages and cultures, where we
engage native speakers and cultural experts in the data collection process. As
a result, CVQA includes culturally-driven images and questions from across 30
countries on four continents, covering 31 languages with 13 scripts, providing
a total of 10k questions. We then benchmark several Multimodal Large Language
Models (MLLMs) on CVQA, and show that the dataset is challenging for the
current state-of-the-art models. This benchmark can serve as a probing
evaluation suite for assessing the cultural capability and bias of multimodal
models and hopefully encourage more research efforts toward increasing cultural
awareness and linguistic diversity in this field.</description><identifier>DOI: 10.48550/arxiv.2406.05967</identifier><language>eng</language><subject>Computer Science - Artificial Intelligence ; Computer Science - Computation and Language ; Computer Science - Computer Vision and Pattern Recognition ; Computer Science - Learning</subject><creationdate>2024-06</creationdate><rights>http://creativecommons.org/licenses/by-sa/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2406.05967$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2406.05967$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Romero, David</creatorcontrib><creatorcontrib>Lyu, Chenyang</creatorcontrib><creatorcontrib>Wibowo, Haryo Akbarianto</creatorcontrib><creatorcontrib>Lynn, Teresa</creatorcontrib><creatorcontrib>Hamed, Injy</creatorcontrib><creatorcontrib>Kishore, Aditya Nanda</creatorcontrib><creatorcontrib>Mandal, Aishik</creatorcontrib><creatorcontrib>Dragonetti, Alina</creatorcontrib><creatorcontrib>Abzaliev, Artem</creatorcontrib><creatorcontrib>Tonja, Atnafu Lambebo</creatorcontrib><creatorcontrib>Balcha, Bontu Fufa</creatorcontrib><creatorcontrib>Whitehouse, Chenxi</creatorcontrib><creatorcontrib>Salamea, Christian</creatorcontrib><creatorcontrib>Velasco, Dan John</creatorcontrib><creatorcontrib>Adelani, David Ifeoluwa</creatorcontrib><creatorcontrib>Meur, David Le</creatorcontrib><creatorcontrib>Villa-Cueva, Emilio</creatorcontrib><creatorcontrib>Koto, Fajri</creatorcontrib><creatorcontrib>Farooqui, Fauzan</creatorcontrib><creatorcontrib>Belcavello, Frederico</creatorcontrib><creatorcontrib>Batnasan, Ganzorig</creatorcontrib><creatorcontrib>Vallejo, Gisela</creatorcontrib><creatorcontrib>Caulfield, Grainne</creatorcontrib><creatorcontrib>Ivetta, Guido</creatorcontrib><creatorcontrib>Song, Haiyue</creatorcontrib><creatorcontrib>Ademtew, Henok Biadglign</creatorcontrib><creatorcontrib>Maina, Hernán</creatorcontrib><creatorcontrib>Lovenia, Holy</creatorcontrib><creatorcontrib>Azime, Israel Abebe</creatorcontrib><creatorcontrib>Cruz, Jan Christian Blaise</creatorcontrib><creatorcontrib>Gala, Jay</creatorcontrib><creatorcontrib>Geng, Jiahui</creatorcontrib><creatorcontrib>Ortiz-Barajas, Jesus-German</creatorcontrib><creatorcontrib>Baek, Jinheon</creatorcontrib><creatorcontrib>Dunstan, Jocelyn</creatorcontrib><creatorcontrib>Alemany, Laura Alonso</creatorcontrib><creatorcontrib>Nagasinghe, Kumaranage Ravindu Yasas</creatorcontrib><creatorcontrib>Benotti, Luciana</creatorcontrib><creatorcontrib>D'Haro, Luis Fernando</creatorcontrib><creatorcontrib>Viridiano, Marcelo</creatorcontrib><creatorcontrib>Estecha-Garitagoitia, Marcos</creatorcontrib><creatorcontrib>Cabrera, Maria Camila Buitrago</creatorcontrib><creatorcontrib>Rodríguez-Cantelar, Mario</creatorcontrib><creatorcontrib>Jouitteau, Mélanie</creatorcontrib><creatorcontrib>Mihaylov, Mihail</creatorcontrib><creatorcontrib>Imam, Mohamed Fazli Mohamed</creatorcontrib><creatorcontrib>Adilazuarda, Muhammad Farid</creatorcontrib><creatorcontrib>Gochoo, Munkhjargal</creatorcontrib><creatorcontrib>Otgonbold, Munkh-Erdene</creatorcontrib><creatorcontrib>Etori, Naome</creatorcontrib><creatorcontrib>Niyomugisha, Olivier</creatorcontrib><creatorcontrib>Silva, Paula Mónica</creatorcontrib><creatorcontrib>Chitale, Pranjal</creatorcontrib><creatorcontrib>Dabre, Raj</creatorcontrib><creatorcontrib>Chevi, Rendi</creatorcontrib><creatorcontrib>Zhang, Ruochen</creatorcontrib><creatorcontrib>Diandaru, Ryandito</creatorcontrib><creatorcontrib>Cahyawijaya, Samuel</creatorcontrib><creatorcontrib>Góngora, Santiago</creatorcontrib><creatorcontrib>Jeong, Soyeong</creatorcontrib><creatorcontrib>Purkayastha, Sukannya</creatorcontrib><creatorcontrib>Kuribayashi, Tatsuki</creatorcontrib><creatorcontrib>Clifford, Teresa</creatorcontrib><creatorcontrib>Jayakumar, Thanmay</creatorcontrib><creatorcontrib>Torrent, Tiago Timponi</creatorcontrib><creatorcontrib>Ehsan, Toqeer</creatorcontrib><creatorcontrib>Araujo, Vladimir</creatorcontrib><creatorcontrib>Kementchedjhieva, Yova</creatorcontrib><creatorcontrib>Burzo, Zara</creatorcontrib><creatorcontrib>Lim, Zheng Wei</creatorcontrib><creatorcontrib>Yong, Zheng Xin</creatorcontrib><creatorcontrib>Ignat, Oana</creatorcontrib><creatorcontrib>Nwatu, Joan</creatorcontrib><creatorcontrib>Mihalcea, Rada</creatorcontrib><creatorcontrib>Solorio, Thamar</creatorcontrib><creatorcontrib>Aji, Alham Fikri</creatorcontrib><title>CVQA: Culturally-diverse Multilingual Visual Question Answering Benchmark</title><description>Visual Question Answering (VQA) is an important task in multimodal AI, and it
is often used to test the ability of vision-language models to understand and
reason on knowledge present in both visual and textual data. However, most of
the current VQA models use datasets that are primarily focused on English and a
few major world languages, with images that are typically Western-centric.
While recent efforts have tried to increase the number of languages covered on
VQA datasets, they still lack diversity in low-resource languages. More
importantly, although these datasets often extend their linguistic range via
translation or some other approaches, they usually keep images the same,
resulting in narrow cultural representation. To address these limitations, we
construct CVQA, a new Culturally-diverse multilingual Visual Question Answering
benchmark, designed to cover a rich set of languages and cultures, where we
engage native speakers and cultural experts in the data collection process. As
a result, CVQA includes culturally-driven images and questions from across 30
countries on four continents, covering 31 languages with 13 scripts, providing
a total of 10k questions. We then benchmark several Multimodal Large Language
Models (MLLMs) on CVQA, and show that the dataset is challenging for the
current state-of-the-art models. This benchmark can serve as a probing
evaluation suite for assessing the cultural capability and bias of multimodal
models and hopefully encourage more research efforts toward increasing cultural
awareness and linguistic diversity in this field.</description><subject>Computer Science - Artificial Intelligence</subject><subject>Computer Science - Computation and Language</subject><subject>Computer Science - Computer Vision and Pattern Recognition</subject><subject>Computer Science - Learning</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotj8FOwzAQRH3hgFo-gBP-gQRnk3Wc3kJUoFIRqlT1Gm1sByxMqOym0L8nLZyeNCPN6DF2m4m0UIjinsKPO6ZQCJkKrGR5zVbNblMveDP6wxjI-1Ni3NGGaPnLFDnvhreRPN-5eMZmtPHgvgZeD_HbhqnkD3bQ758UPubsqicf7c0_Z2z7uNw2z8n69WnV1OuEZFkmgEopNFkO0mqyBhWB7BFAgzJSVxoLRI3UGWOQRAd9Dn0lMig7MxHzGbv7m724tPvgpvNTe3ZqL075LxojR4M</recordid><startdate>20240609</startdate><enddate>20240609</enddate><creator>Romero, David</creator><creator>Lyu, Chenyang</creator><creator>Wibowo, Haryo Akbarianto</creator><creator>Lynn, Teresa</creator><creator>Hamed, Injy</creator><creator>Kishore, Aditya Nanda</creator><creator>Mandal, Aishik</creator><creator>Dragonetti, Alina</creator><creator>Abzaliev, Artem</creator><creator>Tonja, Atnafu Lambebo</creator><creator>Balcha, Bontu Fufa</creator><creator>Whitehouse, Chenxi</creator><creator>Salamea, Christian</creator><creator>Velasco, Dan John</creator><creator>Adelani, David Ifeoluwa</creator><creator>Meur, David Le</creator><creator>Villa-Cueva, Emilio</creator><creator>Koto, Fajri</creator><creator>Farooqui, Fauzan</creator><creator>Belcavello, Frederico</creator><creator>Batnasan, Ganzorig</creator><creator>Vallejo, Gisela</creator><creator>Caulfield, Grainne</creator><creator>Ivetta, Guido</creator><creator>Song, Haiyue</creator><creator>Ademtew, Henok Biadglign</creator><creator>Maina, Hernán</creator><creator>Lovenia, Holy</creator><creator>Azime, Israel Abebe</creator><creator>Cruz, Jan Christian Blaise</creator><creator>Gala, Jay</creator><creator>Geng, Jiahui</creator><creator>Ortiz-Barajas, Jesus-German</creator><creator>Baek, Jinheon</creator><creator>Dunstan, Jocelyn</creator><creator>Alemany, Laura Alonso</creator><creator>Nagasinghe, Kumaranage Ravindu Yasas</creator><creator>Benotti, Luciana</creator><creator>D'Haro, Luis Fernando</creator><creator>Viridiano, Marcelo</creator><creator>Estecha-Garitagoitia, Marcos</creator><creator>Cabrera, Maria Camila Buitrago</creator><creator>Rodríguez-Cantelar, Mario</creator><creator>Jouitteau, Mélanie</creator><creator>Mihaylov, Mihail</creator><creator>Imam, Mohamed Fazli Mohamed</creator><creator>Adilazuarda, Muhammad Farid</creator><creator>Gochoo, Munkhjargal</creator><creator>Otgonbold, Munkh-Erdene</creator><creator>Etori, Naome</creator><creator>Niyomugisha, Olivier</creator><creator>Silva, Paula Mónica</creator><creator>Chitale, Pranjal</creator><creator>Dabre, Raj</creator><creator>Chevi, Rendi</creator><creator>Zhang, Ruochen</creator><creator>Diandaru, Ryandito</creator><creator>Cahyawijaya, Samuel</creator><creator>Góngora, Santiago</creator><creator>Jeong, Soyeong</creator><creator>Purkayastha, Sukannya</creator><creator>Kuribayashi, Tatsuki</creator><creator>Clifford, Teresa</creator><creator>Jayakumar, Thanmay</creator><creator>Torrent, Tiago Timponi</creator><creator>Ehsan, Toqeer</creator><creator>Araujo, Vladimir</creator><creator>Kementchedjhieva, Yova</creator><creator>Burzo, Zara</creator><creator>Lim, Zheng Wei</creator><creator>Yong, Zheng Xin</creator><creator>Ignat, Oana</creator><creator>Nwatu, Joan</creator><creator>Mihalcea, Rada</creator><creator>Solorio, Thamar</creator><creator>Aji, Alham Fikri</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20240609</creationdate><title>CVQA: Culturally-diverse Multilingual Visual Question Answering Benchmark</title><author>Romero, David ; Lyu, Chenyang ; Wibowo, Haryo Akbarianto ; Lynn, Teresa ; Hamed, Injy ; Kishore, Aditya Nanda ; Mandal, Aishik ; Dragonetti, Alina ; Abzaliev, Artem ; Tonja, Atnafu Lambebo ; Balcha, Bontu Fufa ; Whitehouse, Chenxi ; Salamea, Christian ; Velasco, Dan John ; Adelani, David Ifeoluwa ; Meur, David Le ; Villa-Cueva, Emilio ; Koto, Fajri ; Farooqui, Fauzan ; Belcavello, Frederico ; Batnasan, Ganzorig ; Vallejo, Gisela ; Caulfield, Grainne ; Ivetta, Guido ; Song, Haiyue ; Ademtew, Henok Biadglign ; Maina, Hernán ; Lovenia, Holy ; Azime, Israel Abebe ; Cruz, Jan Christian Blaise ; Gala, Jay ; Geng, Jiahui ; Ortiz-Barajas, Jesus-German ; Baek, Jinheon ; Dunstan, Jocelyn ; Alemany, Laura Alonso ; Nagasinghe, Kumaranage Ravindu Yasas ; Benotti, Luciana ; D'Haro, Luis Fernando ; Viridiano, Marcelo ; Estecha-Garitagoitia, Marcos ; Cabrera, Maria Camila Buitrago ; Rodríguez-Cantelar, Mario ; Jouitteau, Mélanie ; Mihaylov, Mihail ; Imam, Mohamed Fazli Mohamed ; Adilazuarda, Muhammad Farid ; Gochoo, Munkhjargal ; Otgonbold, Munkh-Erdene ; Etori, Naome ; Niyomugisha, Olivier ; Silva, Paula Mónica ; Chitale, Pranjal ; Dabre, Raj ; Chevi, Rendi ; Zhang, Ruochen ; Diandaru, Ryandito ; Cahyawijaya, Samuel ; Góngora, Santiago ; Jeong, Soyeong ; Purkayastha, Sukannya ; Kuribayashi, Tatsuki ; Clifford, Teresa ; Jayakumar, Thanmay ; Torrent, Tiago Timponi ; Ehsan, Toqeer ; Araujo, Vladimir ; Kementchedjhieva, Yova ; Burzo, Zara ; Lim, Zheng Wei ; Yong, Zheng Xin ; Ignat, Oana ; Nwatu, Joan ; Mihalcea, Rada ; Solorio, Thamar ; Aji, Alham Fikri</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a677-258885d1326ecaed58a26f522c28d6c9c5455c5abddd5a0b2f32f90127bdf9053</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Artificial Intelligence</topic><topic>Computer Science - Computation and Language</topic><topic>Computer Science - Computer Vision and Pattern Recognition</topic><topic>Computer Science - Learning</topic><toplevel>online_resources</toplevel><creatorcontrib>Romero, David</creatorcontrib><creatorcontrib>Lyu, Chenyang</creatorcontrib><creatorcontrib>Wibowo, Haryo Akbarianto</creatorcontrib><creatorcontrib>Lynn, Teresa</creatorcontrib><creatorcontrib>Hamed, Injy</creatorcontrib><creatorcontrib>Kishore, Aditya Nanda</creatorcontrib><creatorcontrib>Mandal, Aishik</creatorcontrib><creatorcontrib>Dragonetti, Alina</creatorcontrib><creatorcontrib>Abzaliev, Artem</creatorcontrib><creatorcontrib>Tonja, Atnafu Lambebo</creatorcontrib><creatorcontrib>Balcha, Bontu Fufa</creatorcontrib><creatorcontrib>Whitehouse, Chenxi</creatorcontrib><creatorcontrib>Salamea, Christian</creatorcontrib><creatorcontrib>Velasco, Dan John</creatorcontrib><creatorcontrib>Adelani, David Ifeoluwa</creatorcontrib><creatorcontrib>Meur, David Le</creatorcontrib><creatorcontrib>Villa-Cueva, Emilio</creatorcontrib><creatorcontrib>Koto, Fajri</creatorcontrib><creatorcontrib>Farooqui, Fauzan</creatorcontrib><creatorcontrib>Belcavello, Frederico</creatorcontrib><creatorcontrib>Batnasan, Ganzorig</creatorcontrib><creatorcontrib>Vallejo, Gisela</creatorcontrib><creatorcontrib>Caulfield, Grainne</creatorcontrib><creatorcontrib>Ivetta, Guido</creatorcontrib><creatorcontrib>Song, Haiyue</creatorcontrib><creatorcontrib>Ademtew, Henok Biadglign</creatorcontrib><creatorcontrib>Maina, Hernán</creatorcontrib><creatorcontrib>Lovenia, Holy</creatorcontrib><creatorcontrib>Azime, Israel Abebe</creatorcontrib><creatorcontrib>Cruz, Jan Christian Blaise</creatorcontrib><creatorcontrib>Gala, Jay</creatorcontrib><creatorcontrib>Geng, Jiahui</creatorcontrib><creatorcontrib>Ortiz-Barajas, Jesus-German</creatorcontrib><creatorcontrib>Baek, Jinheon</creatorcontrib><creatorcontrib>Dunstan, Jocelyn</creatorcontrib><creatorcontrib>Alemany, Laura Alonso</creatorcontrib><creatorcontrib>Nagasinghe, Kumaranage Ravindu Yasas</creatorcontrib><creatorcontrib>Benotti, Luciana</creatorcontrib><creatorcontrib>D'Haro, Luis Fernando</creatorcontrib><creatorcontrib>Viridiano, Marcelo</creatorcontrib><creatorcontrib>Estecha-Garitagoitia, Marcos</creatorcontrib><creatorcontrib>Cabrera, Maria Camila Buitrago</creatorcontrib><creatorcontrib>Rodríguez-Cantelar, Mario</creatorcontrib><creatorcontrib>Jouitteau, Mélanie</creatorcontrib><creatorcontrib>Mihaylov, Mihail</creatorcontrib><creatorcontrib>Imam, Mohamed Fazli Mohamed</creatorcontrib><creatorcontrib>Adilazuarda, Muhammad Farid</creatorcontrib><creatorcontrib>Gochoo, Munkhjargal</creatorcontrib><creatorcontrib>Otgonbold, Munkh-Erdene</creatorcontrib><creatorcontrib>Etori, Naome</creatorcontrib><creatorcontrib>Niyomugisha, Olivier</creatorcontrib><creatorcontrib>Silva, Paula Mónica</creatorcontrib><creatorcontrib>Chitale, Pranjal</creatorcontrib><creatorcontrib>Dabre, Raj</creatorcontrib><creatorcontrib>Chevi, Rendi</creatorcontrib><creatorcontrib>Zhang, Ruochen</creatorcontrib><creatorcontrib>Diandaru, Ryandito</creatorcontrib><creatorcontrib>Cahyawijaya, Samuel</creatorcontrib><creatorcontrib>Góngora, Santiago</creatorcontrib><creatorcontrib>Jeong, Soyeong</creatorcontrib><creatorcontrib>Purkayastha, Sukannya</creatorcontrib><creatorcontrib>Kuribayashi, Tatsuki</creatorcontrib><creatorcontrib>Clifford, Teresa</creatorcontrib><creatorcontrib>Jayakumar, Thanmay</creatorcontrib><creatorcontrib>Torrent, Tiago Timponi</creatorcontrib><creatorcontrib>Ehsan, Toqeer</creatorcontrib><creatorcontrib>Araujo, Vladimir</creatorcontrib><creatorcontrib>Kementchedjhieva, Yova</creatorcontrib><creatorcontrib>Burzo, Zara</creatorcontrib><creatorcontrib>Lim, Zheng Wei</creatorcontrib><creatorcontrib>Yong, Zheng Xin</creatorcontrib><creatorcontrib>Ignat, Oana</creatorcontrib><creatorcontrib>Nwatu, Joan</creatorcontrib><creatorcontrib>Mihalcea, Rada</creatorcontrib><creatorcontrib>Solorio, Thamar</creatorcontrib><creatorcontrib>Aji, Alham Fikri</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Romero, David</au><au>Lyu, Chenyang</au><au>Wibowo, Haryo Akbarianto</au><au>Lynn, Teresa</au><au>Hamed, Injy</au><au>Kishore, Aditya Nanda</au><au>Mandal, Aishik</au><au>Dragonetti, Alina</au><au>Abzaliev, Artem</au><au>Tonja, Atnafu Lambebo</au><au>Balcha, Bontu Fufa</au><au>Whitehouse, Chenxi</au><au>Salamea, Christian</au><au>Velasco, Dan John</au><au>Adelani, David Ifeoluwa</au><au>Meur, David Le</au><au>Villa-Cueva, Emilio</au><au>Koto, Fajri</au><au>Farooqui, Fauzan</au><au>Belcavello, Frederico</au><au>Batnasan, Ganzorig</au><au>Vallejo, Gisela</au><au>Caulfield, Grainne</au><au>Ivetta, Guido</au><au>Song, Haiyue</au><au>Ademtew, Henok Biadglign</au><au>Maina, Hernán</au><au>Lovenia, Holy</au><au>Azime, Israel Abebe</au><au>Cruz, Jan Christian Blaise</au><au>Gala, Jay</au><au>Geng, Jiahui</au><au>Ortiz-Barajas, Jesus-German</au><au>Baek, Jinheon</au><au>Dunstan, Jocelyn</au><au>Alemany, Laura Alonso</au><au>Nagasinghe, Kumaranage Ravindu Yasas</au><au>Benotti, Luciana</au><au>D'Haro, Luis Fernando</au><au>Viridiano, Marcelo</au><au>Estecha-Garitagoitia, Marcos</au><au>Cabrera, Maria Camila Buitrago</au><au>Rodríguez-Cantelar, Mario</au><au>Jouitteau, Mélanie</au><au>Mihaylov, Mihail</au><au>Imam, Mohamed Fazli Mohamed</au><au>Adilazuarda, Muhammad Farid</au><au>Gochoo, Munkhjargal</au><au>Otgonbold, Munkh-Erdene</au><au>Etori, Naome</au><au>Niyomugisha, Olivier</au><au>Silva, Paula Mónica</au><au>Chitale, Pranjal</au><au>Dabre, Raj</au><au>Chevi, Rendi</au><au>Zhang, Ruochen</au><au>Diandaru, Ryandito</au><au>Cahyawijaya, Samuel</au><au>Góngora, Santiago</au><au>Jeong, Soyeong</au><au>Purkayastha, Sukannya</au><au>Kuribayashi, Tatsuki</au><au>Clifford, Teresa</au><au>Jayakumar, Thanmay</au><au>Torrent, Tiago Timponi</au><au>Ehsan, Toqeer</au><au>Araujo, Vladimir</au><au>Kementchedjhieva, Yova</au><au>Burzo, Zara</au><au>Lim, Zheng Wei</au><au>Yong, Zheng Xin</au><au>Ignat, Oana</au><au>Nwatu, Joan</au><au>Mihalcea, Rada</au><au>Solorio, Thamar</au><au>Aji, Alham Fikri</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>CVQA: Culturally-diverse Multilingual Visual Question Answering Benchmark</atitle><date>2024-06-09</date><risdate>2024</risdate><abstract>Visual Question Answering (VQA) is an important task in multimodal AI, and it
is often used to test the ability of vision-language models to understand and
reason on knowledge present in both visual and textual data. However, most of
the current VQA models use datasets that are primarily focused on English and a
few major world languages, with images that are typically Western-centric.
While recent efforts have tried to increase the number of languages covered on
VQA datasets, they still lack diversity in low-resource languages. More
importantly, although these datasets often extend their linguistic range via
translation or some other approaches, they usually keep images the same,
resulting in narrow cultural representation. To address these limitations, we
construct CVQA, a new Culturally-diverse multilingual Visual Question Answering
benchmark, designed to cover a rich set of languages and cultures, where we
engage native speakers and cultural experts in the data collection process. As
a result, CVQA includes culturally-driven images and questions from across 30
countries on four continents, covering 31 languages with 13 scripts, providing
a total of 10k questions. We then benchmark several Multimodal Large Language
Models (MLLMs) on CVQA, and show that the dataset is challenging for the
current state-of-the-art models. This benchmark can serve as a probing
evaluation suite for assessing the cultural capability and bias of multimodal
models and hopefully encourage more research efforts toward increasing cultural
awareness and linguistic diversity in this field.</abstract><doi>10.48550/arxiv.2406.05967</doi><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | DOI: 10.48550/arxiv.2406.05967 |
ispartof | |
issn | |
language | eng |
recordid | cdi_arxiv_primary_2406_05967 |
source | arXiv.org |
subjects | Computer Science - Artificial Intelligence Computer Science - Computation and Language Computer Science - Computer Vision and Pattern Recognition Computer Science - Learning |
title | CVQA: Culturally-diverse Multilingual Visual Question Answering Benchmark |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-02T23%3A06%3A13IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=CVQA:%20Culturally-diverse%20Multilingual%20Visual%20Question%20Answering%20Benchmark&rft.au=Romero,%20David&rft.date=2024-06-09&rft_id=info:doi/10.48550/arxiv.2406.05967&rft_dat=%3Carxiv_GOX%3E2406_05967%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |