Large-scale Video Classification guided by Batch Normalized LSTM Translator

Youtube-8M dataset enhances the development of large-scale video recognition technology as ImageNet dataset has encouraged image classification, recognition and detection of artificial intelligence fields. For this large video dataset, it is a challenging task to classify a huge amount of multi-labe...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
1. Verfasser:	Yoo, Jae Hyeon
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Computer Vision and Pattern Recognition
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Yoo, Jae Hyeon
description	Youtube-8M dataset enhances the development of large-scale video recognition technology as ImageNet dataset has encouraged image classification, recognition and detection of artificial intelligence fields. For this large video dataset, it is a challenging task to classify a huge amount of multi-labels. By change of perspective, we propose a novel method by regarding labels as words. In details, we describe online learning approaches to multi-label video classification that are guided by deep recurrent neural networks for video to sentence translator. We designed the translator based on LSTMs and found out that a stochastic gating before the input of each LSTM cell can help us to design the structural details. In addition, we adopted batch normalizations into our models to improve our LSTM models. Since our models are feature extractors, they can be used with other classifiers. Finally we report improved validation results of our models on large-scale Youtube-8M datasets and discussions for the further improvement.
doi_str_mv	10.48550/arxiv.1707.04045
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_1707_04045</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1707_04045</sourcerecordid><originalsourceid>FETCH-LOGICAL-a675-b25cabdcdec072811b0fdbb75638bd1ef9962913c89e03da6ab42c48e6fe952f3</originalsourceid><addsrcrecordid>eNotz7tOwzAYhmEvDKhwAUz4BhJsx8cRIk4iwEDEGv0-FUtug-yAKFdPKUyf9A6f9CB0RknLtRDkAspX-mypIqolnHBxjB4GKOvQVAc54Nfkw4z7DLWmmBwsad7i9ce-emx3-AoW94af5rKBnL73bXgZH_FYYFszLHM5QUcRcg2n_7tC48312N81w_PtfX85NCCVaCwTDqx3PjiimKbUkuitVUJ22noaojGSGdo5bQLpPEiwnDmug4zBCBa7FTr_uz1opveSNlB2069qOqi6H4-5SDA</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Large-scale Video Classification guided by Batch Normalized LSTM Translator</title><source>arXiv.org</source><creator>Yoo, Jae Hyeon</creator><creatorcontrib>Yoo, Jae Hyeon</creatorcontrib><description>Youtube-8M dataset enhances the development of large-scale video recognition technology as ImageNet dataset has encouraged image classification, recognition and detection of artificial intelligence fields. For this large video dataset, it is a challenging task to classify a huge amount of multi-labels. By change of perspective, we propose a novel method by regarding labels as words. In details, we describe online learning approaches to multi-label video classification that are guided by deep recurrent neural networks for video to sentence translator. We designed the translator based on LSTMs and found out that a stochastic gating before the input of each LSTM cell can help us to design the structural details. In addition, we adopted batch normalizations into our models to improve our LSTM models. Since our models are feature extractors, they can be used with other classifiers. Finally we report improved validation results of our models on large-scale Youtube-8M datasets and discussions for the further improvement.</description><identifier>DOI: 10.48550/arxiv.1707.04045</identifier><language>eng</language><subject>Computer Science - Computer Vision and Pattern Recognition</subject><creationdate>2017-07</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/1707.04045$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.1707.04045$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Yoo, Jae Hyeon</creatorcontrib><title>Large-scale Video Classification guided by Batch Normalized LSTM Translator</title><description>Youtube-8M dataset enhances the development of large-scale video recognition technology as ImageNet dataset has encouraged image classification, recognition and detection of artificial intelligence fields. For this large video dataset, it is a challenging task to classify a huge amount of multi-labels. By change of perspective, we propose a novel method by regarding labels as words. In details, we describe online learning approaches to multi-label video classification that are guided by deep recurrent neural networks for video to sentence translator. We designed the translator based on LSTMs and found out that a stochastic gating before the input of each LSTM cell can help us to design the structural details. In addition, we adopted batch normalizations into our models to improve our LSTM models. Since our models are feature extractors, they can be used with other classifiers. Finally we report improved validation results of our models on large-scale Youtube-8M datasets and discussions for the further improvement.</description><subject>Computer Science - Computer Vision and Pattern Recognition</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2017</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotz7tOwzAYhmEvDKhwAUz4BhJsx8cRIk4iwEDEGv0-FUtug-yAKFdPKUyf9A6f9CB0RknLtRDkAspX-mypIqolnHBxjB4GKOvQVAc54Nfkw4z7DLWmmBwsad7i9ce-emx3-AoW94af5rKBnL73bXgZH_FYYFszLHM5QUcRcg2n_7tC48312N81w_PtfX85NCCVaCwTDqx3PjiimKbUkuitVUJ22noaojGSGdo5bQLpPEiwnDmug4zBCBa7FTr_uz1opveSNlB2069qOqi6H4-5SDA</recordid><startdate>20170713</startdate><enddate>20170713</enddate><creator>Yoo, Jae Hyeon</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20170713</creationdate><title>Large-scale Video Classification guided by Batch Normalized LSTM Translator</title><author>Yoo, Jae Hyeon</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a675-b25cabdcdec072811b0fdbb75638bd1ef9962913c89e03da6ab42c48e6fe952f3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2017</creationdate><topic>Computer Science - Computer Vision and Pattern Recognition</topic><toplevel>online_resources</toplevel><creatorcontrib>Yoo, Jae Hyeon</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Yoo, Jae Hyeon</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Large-scale Video Classification guided by Batch Normalized LSTM Translator</atitle><date>2017-07-13</date><risdate>2017</risdate><abstract>Youtube-8M dataset enhances the development of large-scale video recognition technology as ImageNet dataset has encouraged image classification, recognition and detection of artificial intelligence fields. For this large video dataset, it is a challenging task to classify a huge amount of multi-labels. By change of perspective, we propose a novel method by regarding labels as words. In details, we describe online learning approaches to multi-label video classification that are guided by deep recurrent neural networks for video to sentence translator. We designed the translator based on LSTMs and found out that a stochastic gating before the input of each LSTM cell can help us to design the structural details. In addition, we adopted batch normalizations into our models to improve our LSTM models. Since our models are feature extractors, they can be used with other classifiers. Finally we report improved validation results of our models on large-scale Youtube-8M datasets and discussions for the further improvement.</abstract><doi>10.48550/arxiv.1707.04045</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.1707.04045
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_1707_04045
source	arXiv.org
subjects	Computer Science - Computer Vision and Pattern Recognition
title	Large-scale Video Classification guided by Batch Normalized LSTM Translator
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-03T13%3A40%3A56IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Large-scale%20Video%20Classification%20guided%20by%20Batch%20Normalized%20LSTM%20Translator&rft.au=Yoo,%20Jae%20Hyeon&rft.date=2017-07-13&rft_id=info:doi/10.48550/arxiv.1707.04045&rft_dat=%3Carxiv_GOX%3E1707_04045%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true