A Video Captioning Method Based on Multi-Representation Switching for Sustainable Computing

Video captioning is a problem that generates a natural language sentence as a video’s description. A video description includes not only words that express the objects in the video but also words that express the relationships between the objects, or grammatically necessary words. To reflect this ch...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Sustainability 2021-02, Vol.13 (4), p.2250
Hauptverfasser: Kim, Heechan, Lee, Soowon
Format: Artikel
Sprache:eng
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue 4
container_start_page 2250
container_title Sustainability
container_volume 13
creator Kim, Heechan
Lee, Soowon
description Video captioning is a problem that generates a natural language sentence as a video’s description. A video description includes not only words that express the objects in the video but also words that express the relationships between the objects, or grammatically necessary words. To reflect this characteristic explicitly using a deep learning model, we propose a multi-representation switching method. The proposed method consists of three components: entity extraction, motion extraction, and textual feature extraction. The proposed multi-representation switching method makes it possible for the three components to extract important information for a given video and description pair efficiently. In experiments conducted on the Microsoft Research Video Description dataset, the proposed method recorded scores that exceeded the performance of most existing video captioning methods. This result was achieved without any preprocessing based on computer vision and natural language processing, nor any additional loss function. Consequently, the proposed method has a high generality that can be extended to various domains in terms of sustainable computing.
doi_str_mv 10.3390/su13042250
format Article
fullrecord <record><control><sourceid>crossref</sourceid><recordid>TN_cdi_crossref_primary_10_3390_su13042250</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>10_3390_su13042250</sourcerecordid><originalsourceid>FETCH-LOGICAL-c267t-886a51abe1055f4d66f73e4382cf5044191d2adb74f20f5736f72a03a951d0a3</originalsourceid><addsrcrecordid>eNpNUEtLxDAYDKLgsu7FX5CzUP2SNH0c1-ILdhHcxYuH8rVJ3Ei3KUmK-O9tUdC5zMAMwzCEXDK4FqKEmzAyASnnEk7IgkPOEgYSTv_pc7IK4QMmCMFKli3I25q-WqUdrXCI1vW2f6dbHQ9O0VsMWlHX0-3YRZu86MHroPuIc47uPm1sD3PcOE93Y4hoe2w6TSt3HMY4ORfkzGAX9OqXl2R_f7evHpPN88NTtd4kLc_ymBRFhpJho6eF0qQqy0wudCoK3hoJaToNVRxVk6eGg5G5mHyOILCUTAGKJbn6qW29C8FrUw_eHtF_1Qzq-Zj67xjxDe9dVa8</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>A Video Captioning Method Based on Multi-Representation Switching for Sustainable Computing</title><source>MDPI - Multidisciplinary Digital Publishing Institute</source><source>EZB-FREE-00999 freely available EZB journals</source><creator>Kim, Heechan ; Lee, Soowon</creator><creatorcontrib>Kim, Heechan ; Lee, Soowon</creatorcontrib><description>Video captioning is a problem that generates a natural language sentence as a video’s description. A video description includes not only words that express the objects in the video but also words that express the relationships between the objects, or grammatically necessary words. To reflect this characteristic explicitly using a deep learning model, we propose a multi-representation switching method. The proposed method consists of three components: entity extraction, motion extraction, and textual feature extraction. The proposed multi-representation switching method makes it possible for the three components to extract important information for a given video and description pair efficiently. In experiments conducted on the Microsoft Research Video Description dataset, the proposed method recorded scores that exceeded the performance of most existing video captioning methods. This result was achieved without any preprocessing based on computer vision and natural language processing, nor any additional loss function. Consequently, the proposed method has a high generality that can be extended to various domains in terms of sustainable computing.</description><identifier>ISSN: 2071-1050</identifier><identifier>EISSN: 2071-1050</identifier><identifier>DOI: 10.3390/su13042250</identifier><language>eng</language><ispartof>Sustainability, 2021-02, Vol.13 (4), p.2250</ispartof><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c267t-886a51abe1055f4d66f73e4382cf5044191d2adb74f20f5736f72a03a951d0a3</citedby><cites>FETCH-LOGICAL-c267t-886a51abe1055f4d66f73e4382cf5044191d2adb74f20f5736f72a03a951d0a3</cites><orcidid>0000-0002-7564-2230 ; 0000-0001-5863-1188</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27922,27923</link.rule.ids></links><search><creatorcontrib>Kim, Heechan</creatorcontrib><creatorcontrib>Lee, Soowon</creatorcontrib><title>A Video Captioning Method Based on Multi-Representation Switching for Sustainable Computing</title><title>Sustainability</title><description>Video captioning is a problem that generates a natural language sentence as a video’s description. A video description includes not only words that express the objects in the video but also words that express the relationships between the objects, or grammatically necessary words. To reflect this characteristic explicitly using a deep learning model, we propose a multi-representation switching method. The proposed method consists of three components: entity extraction, motion extraction, and textual feature extraction. The proposed multi-representation switching method makes it possible for the three components to extract important information for a given video and description pair efficiently. In experiments conducted on the Microsoft Research Video Description dataset, the proposed method recorded scores that exceeded the performance of most existing video captioning methods. This result was achieved without any preprocessing based on computer vision and natural language processing, nor any additional loss function. Consequently, the proposed method has a high generality that can be extended to various domains in terms of sustainable computing.</description><issn>2071-1050</issn><issn>2071-1050</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><recordid>eNpNUEtLxDAYDKLgsu7FX5CzUP2SNH0c1-ILdhHcxYuH8rVJ3Ei3KUmK-O9tUdC5zMAMwzCEXDK4FqKEmzAyASnnEk7IgkPOEgYSTv_pc7IK4QMmCMFKli3I25q-WqUdrXCI1vW2f6dbHQ9O0VsMWlHX0-3YRZu86MHroPuIc47uPm1sD3PcOE93Y4hoe2w6TSt3HMY4ORfkzGAX9OqXl2R_f7evHpPN88NTtd4kLc_ymBRFhpJho6eF0qQqy0wudCoK3hoJaToNVRxVk6eGg5G5mHyOILCUTAGKJbn6qW29C8FrUw_eHtF_1Qzq-Zj67xjxDe9dVa8</recordid><startdate>20210219</startdate><enddate>20210219</enddate><creator>Kim, Heechan</creator><creator>Lee, Soowon</creator><scope>AAYXX</scope><scope>CITATION</scope><orcidid>https://orcid.org/0000-0002-7564-2230</orcidid><orcidid>https://orcid.org/0000-0001-5863-1188</orcidid></search><sort><creationdate>20210219</creationdate><title>A Video Captioning Method Based on Multi-Representation Switching for Sustainable Computing</title><author>Kim, Heechan ; Lee, Soowon</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c267t-886a51abe1055f4d66f73e4382cf5044191d2adb74f20f5736f72a03a951d0a3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Kim, Heechan</creatorcontrib><creatorcontrib>Lee, Soowon</creatorcontrib><collection>CrossRef</collection><jtitle>Sustainability</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Kim, Heechan</au><au>Lee, Soowon</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A Video Captioning Method Based on Multi-Representation Switching for Sustainable Computing</atitle><jtitle>Sustainability</jtitle><date>2021-02-19</date><risdate>2021</risdate><volume>13</volume><issue>4</issue><spage>2250</spage><pages>2250-</pages><issn>2071-1050</issn><eissn>2071-1050</eissn><abstract>Video captioning is a problem that generates a natural language sentence as a video’s description. A video description includes not only words that express the objects in the video but also words that express the relationships between the objects, or grammatically necessary words. To reflect this characteristic explicitly using a deep learning model, we propose a multi-representation switching method. The proposed method consists of three components: entity extraction, motion extraction, and textual feature extraction. The proposed multi-representation switching method makes it possible for the three components to extract important information for a given video and description pair efficiently. In experiments conducted on the Microsoft Research Video Description dataset, the proposed method recorded scores that exceeded the performance of most existing video captioning methods. This result was achieved without any preprocessing based on computer vision and natural language processing, nor any additional loss function. Consequently, the proposed method has a high generality that can be extended to various domains in terms of sustainable computing.</abstract><doi>10.3390/su13042250</doi><orcidid>https://orcid.org/0000-0002-7564-2230</orcidid><orcidid>https://orcid.org/0000-0001-5863-1188</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 2071-1050
ispartof Sustainability, 2021-02, Vol.13 (4), p.2250
issn 2071-1050
2071-1050
language eng
recordid cdi_crossref_primary_10_3390_su13042250
source MDPI - Multidisciplinary Digital Publishing Institute; EZB-FREE-00999 freely available EZB journals
title A Video Captioning Method Based on Multi-Representation Switching for Sustainable Computing
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-10T09%3A20%3A00IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-crossref&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20Video%20Captioning%20Method%20Based%20on%20Multi-Representation%20Switching%20for%20Sustainable%20Computing&rft.jtitle=Sustainability&rft.au=Kim,%20Heechan&rft.date=2021-02-19&rft.volume=13&rft.issue=4&rft.spage=2250&rft.pages=2250-&rft.issn=2071-1050&rft.eissn=2071-1050&rft_id=info:doi/10.3390/su13042250&rft_dat=%3Ccrossref%3E10_3390_su13042250%3C/crossref%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true