Panel-Page-Aware Comic Genre Understanding

Using a sequence of discrete still images to tell a story or introduce a process has become a tradition in the field of digital visual media. With the surge in these media and the requirements in downstream tasks, acquiring their main topic or genre in a very short time is urgently needed. As a repr...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on image processing 2023-01, Vol.32, p.1-1
Hauptverfasser:	Xu, Chenshu, Xu, Xuemiao, Zhao, Nanxuan, Cai, Weiwei, Zhang, Huaidong, Li, Chengze, Liu, Xueting
Format:	Artikel
Sprache:	eng
Schlagworte:	attention mechanism Classification Comic Comics deep learning Digital imaging multi-image classification Visual fields
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	1
container_issue
container_start_page	1
container_title	IEEE transactions on image processing
container_volume	32
creator	Xu, Chenshu Xu, Xuemiao Zhao, Nanxuan Cai, Weiwei Zhang, Huaidong Li, Chengze Liu, Xueting
description	Using a sequence of discrete still images to tell a story or introduce a process has become a tradition in the field of digital visual media. With the surge in these media and the requirements in downstream tasks, acquiring their main topic or genre in a very short time is urgently needed. As a representative form of the media, comic enjoys a huge boom as it has gone digital. However, different from natural images, comic images are divided by panels, and the images are not visually consistent from page to page. Therefore, existing works tailored for natural images perform poorly in analyzing comics. Considering the identification of comic genres is tied to the overall story plotting, a long-term understanding that makes full use of the semantic interactions between multi-level comic fragments needs to be fully exploited. In this paper, we propose P²Comic, a Panel-Page-aware Comic genre classification model, which takes page sequences of comics as the input and produces class-wise probabilities. P²Comic utilizes detected panel boxes to extract panel representations and deploys self-attention to construct panel-page understanding, assisted with inter-dependent classifiers to model label correlation. We develop the first comic dataset for the task of comic genre classification with multi-genre labels. Our approach is proved by experiments to outperform state-of-the-art methods on related tasks. We also validate the extensibility of our network to perform on the multi-modal scenario. Finally, we show the practicability of our approach by giving effective genre prediction results for whole comic books.
doi_str_mv	10.1109/TIP.2023.3270105
format	Article
fullrecord	<record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_pubmed_primary_37115827</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10112648</ieee_id><sourcerecordid>2812842690</sourcerecordid><originalsourceid>FETCH-LOGICAL-c348t-f7002e22bc1529da60bd02f8c9c795b6a712cb1c791415b43289fada76506aa53</originalsourceid><addsrcrecordid>eNpdkE1LAzEQhoMotlbvHkQKXkTYOpOPTXIsRWuhYA_tOWSz2bJlu1s3XcR_b0qriKeZged9GR5CbhFGiKCfl7PFiAJlI0YlIIgz0kfNMQHg9DzuIGQikeseuQphA4BcYHpJekwiCkVlnzwtbO2rZGHXPhl_2tYPJ822dMOpr-O-qnPfhr2t87JeX5OLwlbB35zmgKxeX5aTt2T-Pp1NxvPEMa72SSEBqKc0cyiozm0KWQ60UE47qUWWWonUZRgP5CgyzqjShc2tTAWk1go2II_H3l3bfHQ-7M22DM5XVfy06YKhCmJWacYi-vAP3TRdW8fvIoVUcZpqiBQcKdc2IbS-MLu23Nr2yyCYg0cTPZqDR3PyGCP3p-Iu2_r8N_AjLgJ3R6D03v_pQ6QpV-wb-DlzMw</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2812842690</pqid></control><display><type>article</type><title>Panel-Page-Aware Comic Genre Understanding</title><source>IEEE Electronic Library (IEL)</source><creator>Xu, Chenshu ; Xu, Xuemiao ; Zhao, Nanxuan ; Cai, Weiwei ; Zhang, Huaidong ; Li, Chengze ; Liu, Xueting</creator><creatorcontrib>Xu, Chenshu ; Xu, Xuemiao ; Zhao, Nanxuan ; Cai, Weiwei ; Zhang, Huaidong ; Li, Chengze ; Liu, Xueting</creatorcontrib><description>Using a sequence of discrete still images to tell a story or introduce a process has become a tradition in the field of digital visual media. With the surge in these media and the requirements in downstream tasks, acquiring their main topic or genre in a very short time is urgently needed. As a representative form of the media, comic enjoys a huge boom as it has gone digital. However, different from natural images, comic images are divided by panels, and the images are not visually consistent from page to page. Therefore, existing works tailored for natural images perform poorly in analyzing comics. Considering the identification of comic genres is tied to the overall story plotting, a long-term understanding that makes full use of the semantic interactions between multi-level comic fragments needs to be fully exploited. In this paper, we propose P²Comic, a Panel-Page-aware Comic genre classification model, which takes page sequences of comics as the input and produces class-wise probabilities. P²Comic utilizes detected panel boxes to extract panel representations and deploys self-attention to construct panel-page understanding, assisted with inter-dependent classifiers to model label correlation. We develop the first comic dataset for the task of comic genre classification with multi-genre labels. Our approach is proved by experiments to outperform state-of-the-art methods on related tasks. We also validate the extensibility of our network to perform on the multi-modal scenario. Finally, we show the practicability of our approach by giving effective genre prediction results for whole comic books.</description><identifier>ISSN: 1057-7149</identifier><identifier>EISSN: 1941-0042</identifier><identifier>DOI: 10.1109/TIP.2023.3270105</identifier><identifier>PMID: 37115827</identifier><identifier>CODEN: IIPRE4</identifier><language>eng</language><publisher>United States: IEEE</publisher><subject>attention mechanism ; Classification ; Comic ; Comics ; deep learning ; Digital imaging ; multi-image classification ; Visual fields</subject><ispartof>IEEE transactions on image processing, 2023-01, Vol.32, p.1-1</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2023</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c348t-f7002e22bc1529da60bd02f8c9c795b6a712cb1c791415b43289fada76506aa53</citedby><cites>FETCH-LOGICAL-c348t-f7002e22bc1529da60bd02f8c9c795b6a712cb1c791415b43289fada76506aa53</cites><orcidid>0000-0003-0377-8646 ; 0000-0002-0665-1993 ; 0000-0002-0868-5353 ; 0000-0002-1519-750X ; 0000-0001-7662-9831 ; 0000-0002-8006-3663</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10112648$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>315,781,785,797,27929,27930,54763</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10112648$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/37115827$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Xu, Chenshu</creatorcontrib><creatorcontrib>Xu, Xuemiao</creatorcontrib><creatorcontrib>Zhao, Nanxuan</creatorcontrib><creatorcontrib>Cai, Weiwei</creatorcontrib><creatorcontrib>Zhang, Huaidong</creatorcontrib><creatorcontrib>Li, Chengze</creatorcontrib><creatorcontrib>Liu, Xueting</creatorcontrib><title>Panel-Page-Aware Comic Genre Understanding</title><title>IEEE transactions on image processing</title><addtitle>TIP</addtitle><addtitle>IEEE Trans Image Process</addtitle><description>Using a sequence of discrete still images to tell a story or introduce a process has become a tradition in the field of digital visual media. With the surge in these media and the requirements in downstream tasks, acquiring their main topic or genre in a very short time is urgently needed. As a representative form of the media, comic enjoys a huge boom as it has gone digital. However, different from natural images, comic images are divided by panels, and the images are not visually consistent from page to page. Therefore, existing works tailored for natural images perform poorly in analyzing comics. Considering the identification of comic genres is tied to the overall story plotting, a long-term understanding that makes full use of the semantic interactions between multi-level comic fragments needs to be fully exploited. In this paper, we propose P²Comic, a Panel-Page-aware Comic genre classification model, which takes page sequences of comics as the input and produces class-wise probabilities. P²Comic utilizes detected panel boxes to extract panel representations and deploys self-attention to construct panel-page understanding, assisted with inter-dependent classifiers to model label correlation. We develop the first comic dataset for the task of comic genre classification with multi-genre labels. Our approach is proved by experiments to outperform state-of-the-art methods on related tasks. We also validate the extensibility of our network to perform on the multi-modal scenario. Finally, we show the practicability of our approach by giving effective genre prediction results for whole comic books.</description><subject>attention mechanism</subject><subject>Classification</subject><subject>Comic</subject><subject>Comics</subject><subject>deep learning</subject><subject>Digital imaging</subject><subject>multi-image classification</subject><subject>Visual fields</subject><issn>1057-7149</issn><issn>1941-0042</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpdkE1LAzEQhoMotlbvHkQKXkTYOpOPTXIsRWuhYA_tOWSz2bJlu1s3XcR_b0qriKeZged9GR5CbhFGiKCfl7PFiAJlI0YlIIgz0kfNMQHg9DzuIGQikeseuQphA4BcYHpJekwiCkVlnzwtbO2rZGHXPhl_2tYPJ822dMOpr-O-qnPfhr2t87JeX5OLwlbB35zmgKxeX5aTt2T-Pp1NxvPEMa72SSEBqKc0cyiozm0KWQ60UE47qUWWWonUZRgP5CgyzqjShc2tTAWk1go2II_H3l3bfHQ-7M22DM5XVfy06YKhCmJWacYi-vAP3TRdW8fvIoVUcZpqiBQcKdc2IbS-MLu23Nr2yyCYg0cTPZqDR3PyGCP3p-Iu2_r8N_AjLgJ3R6D03v_pQ6QpV-wb-DlzMw</recordid><startdate>20230101</startdate><enddate>20230101</enddate><creator>Xu, Chenshu</creator><creator>Xu, Xuemiao</creator><creator>Zhao, Nanxuan</creator><creator>Cai, Weiwei</creator><creator>Zhang, Huaidong</creator><creator>Li, Chengze</creator><creator>Liu, Xueting</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0003-0377-8646</orcidid><orcidid>https://orcid.org/0000-0002-0665-1993</orcidid><orcidid>https://orcid.org/0000-0002-0868-5353</orcidid><orcidid>https://orcid.org/0000-0002-1519-750X</orcidid><orcidid>https://orcid.org/0000-0001-7662-9831</orcidid><orcidid>https://orcid.org/0000-0002-8006-3663</orcidid></search><sort><creationdate>20230101</creationdate><title>Panel-Page-Aware Comic Genre Understanding</title><author>Xu, Chenshu ; Xu, Xuemiao ; Zhao, Nanxuan ; Cai, Weiwei ; Zhang, Huaidong ; Li, Chengze ; Liu, Xueting</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c348t-f7002e22bc1529da60bd02f8c9c795b6a712cb1c791415b43289fada76506aa53</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>attention mechanism</topic><topic>Classification</topic><topic>Comic</topic><topic>Comics</topic><topic>deep learning</topic><topic>Digital imaging</topic><topic>multi-image classification</topic><topic>Visual fields</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Xu, Chenshu</creatorcontrib><creatorcontrib>Xu, Xuemiao</creatorcontrib><creatorcontrib>Zhao, Nanxuan</creatorcontrib><creatorcontrib>Cai, Weiwei</creatorcontrib><creatorcontrib>Zhang, Huaidong</creatorcontrib><creatorcontrib>Li, Chengze</creatorcontrib><creatorcontrib>Liu, Xueting</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>MEDLINE - Academic</collection><jtitle>IEEE transactions on image processing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Xu, Chenshu</au><au>Xu, Xuemiao</au><au>Zhao, Nanxuan</au><au>Cai, Weiwei</au><au>Zhang, Huaidong</au><au>Li, Chengze</au><au>Liu, Xueting</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Panel-Page-Aware Comic Genre Understanding</atitle><jtitle>IEEE transactions on image processing</jtitle><stitle>TIP</stitle><addtitle>IEEE Trans Image Process</addtitle><date>2023-01-01</date><risdate>2023</risdate><volume>32</volume><spage>1</spage><epage>1</epage><pages>1-1</pages><issn>1057-7149</issn><eissn>1941-0042</eissn><coden>IIPRE4</coden><abstract>Using a sequence of discrete still images to tell a story or introduce a process has become a tradition in the field of digital visual media. With the surge in these media and the requirements in downstream tasks, acquiring their main topic or genre in a very short time is urgently needed. As a representative form of the media, comic enjoys a huge boom as it has gone digital. However, different from natural images, comic images are divided by panels, and the images are not visually consistent from page to page. Therefore, existing works tailored for natural images perform poorly in analyzing comics. Considering the identification of comic genres is tied to the overall story plotting, a long-term understanding that makes full use of the semantic interactions between multi-level comic fragments needs to be fully exploited. In this paper, we propose P²Comic, a Panel-Page-aware Comic genre classification model, which takes page sequences of comics as the input and produces class-wise probabilities. P²Comic utilizes detected panel boxes to extract panel representations and deploys self-attention to construct panel-page understanding, assisted with inter-dependent classifiers to model label correlation. We develop the first comic dataset for the task of comic genre classification with multi-genre labels. Our approach is proved by experiments to outperform state-of-the-art methods on related tasks. We also validate the extensibility of our network to perform on the multi-modal scenario. Finally, we show the practicability of our approach by giving effective genre prediction results for whole comic books.</abstract><cop>United States</cop><pub>IEEE</pub><pmid>37115827</pmid><doi>10.1109/TIP.2023.3270105</doi><tpages>1</tpages><orcidid>https://orcid.org/0000-0003-0377-8646</orcidid><orcidid>https://orcid.org/0000-0002-0665-1993</orcidid><orcidid>https://orcid.org/0000-0002-0868-5353</orcidid><orcidid>https://orcid.org/0000-0002-1519-750X</orcidid><orcidid>https://orcid.org/0000-0001-7662-9831</orcidid><orcidid>https://orcid.org/0000-0002-8006-3663</orcidid></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISSN: 1057-7149
ispartof	IEEE transactions on image processing, 2023-01, Vol.32, p.1-1
issn	1057-7149 1941-0042
language	eng
recordid	cdi_pubmed_primary_37115827
source	IEEE Electronic Library (IEL)
subjects	attention mechanism Classification Comic Comics deep learning Digital imaging multi-image classification Visual fields
title	Panel-Page-Aware Comic Genre Understanding
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-12T15%3A41%3A32IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Panel-Page-Aware%20Comic%20Genre%20Understanding&rft.jtitle=IEEE%20transactions%20on%20image%20processing&rft.au=Xu,%20Chenshu&rft.date=2023-01-01&rft.volume=32&rft.spage=1&rft.epage=1&rft.pages=1-1&rft.issn=1057-7149&rft.eissn=1941-0042&rft.coden=IIPRE4&rft_id=info:doi/10.1109/TIP.2023.3270105&rft_dat=%3Cproquest_RIE%3E2812842690%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2812842690&rft_id=info:pmid/37115827&rft_ieee_id=10112648&rfr_iscdi=true