SciCap+: A Knowledge Augmented Dataset to Study the Challenges of Scientific Figure Captioning
In scholarly documents, figures provide a straightforward way of communicating scientific findings to readers. Automating figure caption generation helps move model understandings of scientific documents beyond text and will help authors write informative captions that facilitate communicating scien...
Gespeichert in:
Veröffentlicht in: | arXiv.org 2023-06 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | arXiv.org |
container_volume | |
creator | Yang, Zhishen Dabre, Raj Tanaka, Hideki Okazaki, Naoaki |
description | In scholarly documents, figures provide a straightforward way of communicating scientific findings to readers. Automating figure caption generation helps move model understandings of scientific documents beyond text and will help authors write informative captions that facilitate communicating scientific findings. Unlike previous studies, we reframe scientific figure captioning as a knowledge-augmented image captioning task that models need to utilize knowledge embedded across modalities for caption generation. To this end, we extended the large-scale SciCap dataset~\cite{hsu-etal-2021-scicap-generating} to SciCap+ which includes mention-paragraphs (paragraphs mentioning figures) and OCR tokens. Then, we conduct experiments with the M4C-Captioner (a multimodal transformer-based model with a pointer network) as a baseline for our study. Our results indicate that mention-paragraphs serves as additional context knowledge, which significantly boosts the automatic standard image caption evaluation scores compared to the figure-only baselines. Human evaluations further reveal the challenges of generating figure captions that are informative to readers. The code and SciCap+ dataset will be publicly available at https://github.com/ZhishenYang/scientific_figure_captioning_dataset |
format | Article |
fullrecord | <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2823307342</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2823307342</sourcerecordid><originalsourceid>FETCH-proquest_journals_28233073423</originalsourceid><addsrcrecordid>eNqNjkELgjAYhkcQJOV_-KBjCLZpSjexJOhY52To55zYZm4j-vft0A_o9B6eh4d3QQLK2D7KE0pXJDRmiOOYHjKapiwgj1sjSz7tjlDAVen3iK1AKJx4orLYwolbbtCC1XCzrv2A7RHKno8jKoEGdAe-4F3ZyQYqKdzsOZ-s1EoqsSHLjo8Gw9-uybY638tLNM365dDYetBuVh7VNPc344wllP1nfQGQI0L3</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2823307342</pqid></control><display><type>article</type><title>SciCap+: A Knowledge Augmented Dataset to Study the Challenges of Scientific Figure Captioning</title><source>Free E- Journals</source><creator>Yang, Zhishen ; Dabre, Raj ; Tanaka, Hideki ; Okazaki, Naoaki</creator><creatorcontrib>Yang, Zhishen ; Dabre, Raj ; Tanaka, Hideki ; Okazaki, Naoaki</creatorcontrib><description>In scholarly documents, figures provide a straightforward way of communicating scientific findings to readers. Automating figure caption generation helps move model understandings of scientific documents beyond text and will help authors write informative captions that facilitate communicating scientific findings. Unlike previous studies, we reframe scientific figure captioning as a knowledge-augmented image captioning task that models need to utilize knowledge embedded across modalities for caption generation. To this end, we extended the large-scale SciCap dataset~\cite{hsu-etal-2021-scicap-generating} to SciCap+ which includes mention-paragraphs (paragraphs mentioning figures) and OCR tokens. Then, we conduct experiments with the M4C-Captioner (a multimodal transformer-based model with a pointer network) as a baseline for our study. Our results indicate that mention-paragraphs serves as additional context knowledge, which significantly boosts the automatic standard image caption evaluation scores compared to the figure-only baselines. Human evaluations further reveal the challenges of generating figure captions that are informative to readers. The code and SciCap+ dataset will be publicly available at https://github.com/ZhishenYang/scientific_figure_captioning_dataset</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Communication ; Data augmentation ; Datasets ; Documents</subject><ispartof>arXiv.org, 2023-06</ispartof><rights>2023. This work is published under http://creativecommons.org/licenses/by-sa/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>780,784</link.rule.ids></links><search><creatorcontrib>Yang, Zhishen</creatorcontrib><creatorcontrib>Dabre, Raj</creatorcontrib><creatorcontrib>Tanaka, Hideki</creatorcontrib><creatorcontrib>Okazaki, Naoaki</creatorcontrib><title>SciCap+: A Knowledge Augmented Dataset to Study the Challenges of Scientific Figure Captioning</title><title>arXiv.org</title><description>In scholarly documents, figures provide a straightforward way of communicating scientific findings to readers. Automating figure caption generation helps move model understandings of scientific documents beyond text and will help authors write informative captions that facilitate communicating scientific findings. Unlike previous studies, we reframe scientific figure captioning as a knowledge-augmented image captioning task that models need to utilize knowledge embedded across modalities for caption generation. To this end, we extended the large-scale SciCap dataset~\cite{hsu-etal-2021-scicap-generating} to SciCap+ which includes mention-paragraphs (paragraphs mentioning figures) and OCR tokens. Then, we conduct experiments with the M4C-Captioner (a multimodal transformer-based model with a pointer network) as a baseline for our study. Our results indicate that mention-paragraphs serves as additional context knowledge, which significantly boosts the automatic standard image caption evaluation scores compared to the figure-only baselines. Human evaluations further reveal the challenges of generating figure captions that are informative to readers. The code and SciCap+ dataset will be publicly available at https://github.com/ZhishenYang/scientific_figure_captioning_dataset</description><subject>Communication</subject><subject>Data augmentation</subject><subject>Datasets</subject><subject>Documents</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><recordid>eNqNjkELgjAYhkcQJOV_-KBjCLZpSjexJOhY52To55zYZm4j-vft0A_o9B6eh4d3QQLK2D7KE0pXJDRmiOOYHjKapiwgj1sjSz7tjlDAVen3iK1AKJx4orLYwolbbtCC1XCzrv2A7RHKno8jKoEGdAe-4F3ZyQYqKdzsOZ-s1EoqsSHLjo8Gw9-uybY638tLNM365dDYetBuVh7VNPc344wllP1nfQGQI0L3</recordid><startdate>20230606</startdate><enddate>20230606</enddate><creator>Yang, Zhishen</creator><creator>Dabre, Raj</creator><creator>Tanaka, Hideki</creator><creator>Okazaki, Naoaki</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20230606</creationdate><title>SciCap+: A Knowledge Augmented Dataset to Study the Challenges of Scientific Figure Captioning</title><author>Yang, Zhishen ; Dabre, Raj ; Tanaka, Hideki ; Okazaki, Naoaki</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_28233073423</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Communication</topic><topic>Data augmentation</topic><topic>Datasets</topic><topic>Documents</topic><toplevel>online_resources</toplevel><creatorcontrib>Yang, Zhishen</creatorcontrib><creatorcontrib>Dabre, Raj</creatorcontrib><creatorcontrib>Tanaka, Hideki</creatorcontrib><creatorcontrib>Okazaki, Naoaki</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Access via ProQuest (Open Access)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Yang, Zhishen</au><au>Dabre, Raj</au><au>Tanaka, Hideki</au><au>Okazaki, Naoaki</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>SciCap+: A Knowledge Augmented Dataset to Study the Challenges of Scientific Figure Captioning</atitle><jtitle>arXiv.org</jtitle><date>2023-06-06</date><risdate>2023</risdate><eissn>2331-8422</eissn><abstract>In scholarly documents, figures provide a straightforward way of communicating scientific findings to readers. Automating figure caption generation helps move model understandings of scientific documents beyond text and will help authors write informative captions that facilitate communicating scientific findings. Unlike previous studies, we reframe scientific figure captioning as a knowledge-augmented image captioning task that models need to utilize knowledge embedded across modalities for caption generation. To this end, we extended the large-scale SciCap dataset~\cite{hsu-etal-2021-scicap-generating} to SciCap+ which includes mention-paragraphs (paragraphs mentioning figures) and OCR tokens. Then, we conduct experiments with the M4C-Captioner (a multimodal transformer-based model with a pointer network) as a baseline for our study. Our results indicate that mention-paragraphs serves as additional context knowledge, which significantly boosts the automatic standard image caption evaluation scores compared to the figure-only baselines. Human evaluations further reveal the challenges of generating figure captions that are informative to readers. The code and SciCap+ dataset will be publicly available at https://github.com/ZhishenYang/scientific_figure_captioning_dataset</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | EISSN: 2331-8422 |
ispartof | arXiv.org, 2023-06 |
issn | 2331-8422 |
language | eng |
recordid | cdi_proquest_journals_2823307342 |
source | Free E- Journals |
subjects | Communication Data augmentation Datasets Documents |
title | SciCap+: A Knowledge Augmented Dataset to Study the Challenges of Scientific Figure Captioning |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-18T21%3A02%3A31IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=SciCap+:%20A%20Knowledge%20Augmented%20Dataset%20to%20Study%20the%20Challenges%20of%20Scientific%20Figure%20Captioning&rft.jtitle=arXiv.org&rft.au=Yang,%20Zhishen&rft.date=2023-06-06&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2823307342%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2823307342&rft_id=info:pmid/&rfr_iscdi=true |