A Video Captioning Method Based on Multi-Representation Switching for Sustainable Computing

Video captioning is a problem that generates a natural language sentence as a video’s description. A video description includes not only words that express the objects in the video but also words that express the relationships between the objects, or grammatically necessary words. To reflect this ch...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Sustainability 2021-02, Vol.13 (4), p.2250
Hauptverfasser:	Kim, Heechan, Lee, Soowon
Format:	Artikel
Sprache:	eng
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue	4
container_start_page	2250
container_title	Sustainability
container_volume	13
creator	Kim, Heechan Lee, Soowon
description	Video captioning is a problem that generates a natural language sentence as a video’s description. A video description includes not only words that express the objects in the video but also words that express the relationships between the objects, or grammatically necessary words. To reflect this characteristic explicitly using a deep learning model, we propose a multi-representation switching method. The proposed method consists of three components: entity extraction, motion extraction, and textual feature extraction. The proposed multi-representation switching method makes it possible for the three components to extract important information for a given video and description pair efficiently. In experiments conducted on the Microsoft Research Video Description dataset, the proposed method recorded scores that exceeded the performance of most existing video captioning methods. This result was achieved without any preprocessing based on computer vision and natural language processing, nor any additional loss function. Consequently, the proposed method has a high generality that can be extended to various domains in terms of sustainable computing.
doi_str_mv	10.3390/su13042250
format	Article
fullrecord	<record><control><sourceid>crossref</sourceid><recordid>TN_cdi_crossref_primary_10_3390_su13042250</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>10_3390_su13042250</sourcerecordid><originalsourceid>FETCH-LOGICAL-c267t-886a51abe1055f4d66f73e4382cf5044191d2adb74f20f5736f72a03a951d0a3</originalsourceid><addsrcrecordid>eNpNUEtLxDAYDKLgsu7FX5CzUP2SNH0c1-ILdhHcxYuH8rVJ3Ei3KUmK-O9tUdC5zMAMwzCEXDK4FqKEmzAyASnnEk7IgkPOEgYSTv_pc7IK4QMmCMFKli3I25q-WqUdrXCI1vW2f6dbHQ9O0VsMWlHX0-3YRZu86MHroPuIc47uPm1sD3PcOE93Y4hoe2w6TSt3HMY4ORfkzGAX9OqXl2R_f7evHpPN88NTtd4kLc_ymBRFhpJho6eF0qQqy0wudCoK3hoJaToNVRxVk6eGg5G5mHyOILCUTAGKJbn6qW29C8FrUw_eHtF_1Qzq-Zj67xjxDe9dVa8</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>A Video Captioning Method Based on Multi-Representation Switching for Sustainable Computing</title><source>MDPI - Multidisciplinary Digital Publishing Institute</source><source>EZB-FREE-00999 freely available EZB journals</source><creator>Kim, Heechan ; Lee, Soowon</creator><creatorcontrib>Kim, Heechan ; Lee, Soowon</creatorcontrib><description>Video captioning is a problem that generates a natural language sentence as a video’s description. A video description includes not only words that express the objects in the video but also words that express the relationships between the objects, or grammatically necessary words. To reflect this characteristic explicitly using a deep learning model, we propose a multi-representation switching method. The proposed method consists of three components: entity extraction, motion extraction, and textual feature extraction. The proposed multi-representation switching method makes it possible for the three components to extract important information for a given video and description pair efficiently. In experiments conducted on the Microsoft Research Video Description dataset, the proposed method recorded scores that exceeded the performance of most existing video captioning methods. This result was achieved without any preprocessing based on computer vision and natural language processing, nor any additional loss function. Consequently, the proposed method has a high generality that can be extended to various domains in terms of sustainable computing.</description><identifier>ISSN: 2071-1050</identifier><identifier>EISSN: 2071-1050</identifier><identifier>DOI: 10.3390/su13042250</identifier><language>eng</language><ispartof>Sustainability, 2021-02, Vol.13 (4), p.2250</ispartof><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c267t-886a51abe1055f4d66f73e4382cf5044191d2adb74f20f5736f72a03a951d0a3</citedby><cites>FETCH-LOGICAL-c267t-886a51abe1055f4d66f73e4382cf5044191d2adb74f20f5736f72a03a951d0a3</cites><orcidid>0000-0002-7564-2230 ; 0000-0001-5863-1188</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27922,27923</link.rule.ids></links><search><creatorcontrib>Kim, Heechan</creatorcontrib><creatorcontrib>Lee, Soowon</creatorcontrib><title>A Video Captioning Method Based on Multi-Representation Switching for Sustainable Computing</title><title>Sustainability</title><description>Video captioning is a problem that generates a natural language sentence as a video’s description. A video description includes not only words that express the objects in the video but also words that express the relationships between the objects, or grammatically necessary words. To reflect this characteristic explicitly using a deep learning model, we propose a multi-representation switching method. The proposed method consists of three components: entity extraction, motion extraction, and textual feature extraction. The proposed multi-representation switching method makes it possible for the three components to extract important information for a given video and description pair efficiently. In experiments conducted on the Microsoft Research Video Description dataset, the proposed method recorded scores that exceeded the performance of most existing video captioning methods. This result was achieved without any preprocessing based on computer vision and natural language processing, nor any additional loss function. Consequently, the proposed method has a high generality that can be extended to various domains in terms of sustainable computing.</description><issn>2071-1050</issn><issn>2071-1050</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><recordid>eNpNUEtLxDAYDKLgsu7FX5CzUP2SNH0c1-ILdhHcxYuH8rVJ3Ei3KUmK-O9tUdC5zMAMwzCEXDK4FqKEmzAyASnnEk7IgkPOEgYSTv_pc7IK4QMmCMFKli3I25q-WqUdrXCI1vW2f6dbHQ9O0VsMWlHX0-3YRZu86MHroPuIc47uPm1sD3PcOE93Y4hoe2w6TSt3HMY4ORfkzGAX9OqXl2R_f7evHpPN88NTtd4kLc_ymBRFhpJho6eF0qQqy0wudCoK3hoJaToNVRxVk6eGg5G5mHyOILCUTAGKJbn6qW29C8FrUw_eHtF_1Qzq-Zj67xjxDe9dVa8</recordid><startdate>20210219</startdate><enddate>20210219</enddate><creator>Kim, Heechan</creator><creator>Lee, Soowon</creator><scope>AAYXX</scope><scope>CITATION</scope><orcidid>https://orcid.org/0000-0002-7564-2230</orcidid><orcidid>https://orcid.org/0000-0001-5863-1188</orcidid></search><sort><creationdate>20210219</creationdate><title>A Video Captioning Method Based on Multi-Representation Switching for Sustainable Computing</title><author>Kim, Heechan ; Lee, Soowon</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c267t-886a51abe1055f4d66f73e4382cf5044191d2adb74f20f5736f72a03a951d0a3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Kim, Heechan</creatorcontrib><creatorcontrib>Lee, Soowon</creatorcontrib><collection>CrossRef</collection><jtitle>Sustainability</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Kim, Heechan</au><au>Lee, Soowon</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A Video Captioning Method Based on Multi-Representation Switching for Sustainable Computing</atitle><jtitle>Sustainability</jtitle><date>2021-02-19</date><risdate>2021</risdate><volume>13</volume><issue>4</issue><spage>2250</spage><pages>2250-</pages><issn>2071-1050</issn><eissn>2071-1050</eissn><abstract>Video captioning is a problem that generates a natural language sentence as a video’s description. A video description includes not only words that express the objects in the video but also words that express the relationships between the objects, or grammatically necessary words. To reflect this characteristic explicitly using a deep learning model, we propose a multi-representation switching method. The proposed method consists of three components: entity extraction, motion extraction, and textual feature extraction. The proposed multi-representation switching method makes it possible for the three components to extract important information for a given video and description pair efficiently. In experiments conducted on the Microsoft Research Video Description dataset, the proposed method recorded scores that exceeded the performance of most existing video captioning methods. This result was achieved without any preprocessing based on computer vision and natural language processing, nor any additional loss function. Consequently, the proposed method has a high generality that can be extended to various domains in terms of sustainable computing.</abstract><doi>10.3390/su13042250</doi><orcidid>https://orcid.org/0000-0002-7564-2230</orcidid><orcidid>https://orcid.org/0000-0001-5863-1188</orcidid><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 2071-1050
ispartof	Sustainability, 2021-02, Vol.13 (4), p.2250
issn	2071-1050 2071-1050
language	eng
recordid	cdi_crossref_primary_10_3390_su13042250
source	MDPI - Multidisciplinary Digital Publishing Institute; EZB-FREE-00999 freely available EZB journals
title	A Video Captioning Method Based on Multi-Representation Switching for Sustainable Computing
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-10T09%3A20%3A00IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-crossref&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20Video%20Captioning%20Method%20Based%20on%20Multi-Representation%20Switching%20for%20Sustainable%20Computing&rft.jtitle=Sustainability&rft.au=Kim,%20Heechan&rft.date=2021-02-19&rft.volume=13&rft.issue=4&rft.spage=2250&rft.pages=2250-&rft.issn=2071-1050&rft.eissn=2071-1050&rft_id=info:doi/10.3390/su13042250&rft_dat=%3Ccrossref%3E10_3390_su13042250%3C/crossref%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true