A Comprehensive Study on Deep Learning-Based Methods for Sign Language Recognition

In this paper, a comparative experimental assessment of computer vision-based methods for sign language recognition is conducted. By implementing the most recent deep neural network methods in this field, a thorough evaluation on multiple publicly available datasets is performed. The aim of the pres...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on multimedia 2022, Vol.24, p.1750-1762
Hauptverfasser: Adaloglou, Nikolas, Chatzis, Theocharis, Papastratis, Ilias, Stergioulas, Andreas, Papadopoulos, Georgios Th, Zacharopoulou, Vassia, Xydopoulos, George J., Atzakas, Klimnis, Papazachariou, Dimitris, Daras, Petros
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 1762
container_issue
container_start_page 1750
container_title IEEE transactions on multimedia
container_volume 24
creator Adaloglou, Nikolas
Chatzis, Theocharis
Papastratis, Ilias
Stergioulas, Andreas
Papadopoulos, Georgios Th
Zacharopoulou, Vassia
Xydopoulos, George J.
Atzakas, Klimnis
Papazachariou, Dimitris
Daras, Petros
description In this paper, a comparative experimental assessment of computer vision-based methods for sign language recognition is conducted. By implementing the most recent deep neural network methods in this field, a thorough evaluation on multiple publicly available datasets is performed. The aim of the present study is to provide insights on sign language recognition, focusing on mapping non-segmented video streams to glosses. For this task, two new sequence training criteria, known from the fields of speech and scene text recognition, are introduced. Furthermore, a plethora of pretraining schemes is thoroughly discussed. Finally, a new RGB+D dataset for the Greek sign language is created. To the best of our knowledge, this is the first sign language dataset where three annotation levels are provided (individual gloss, sentence and spoken language) for the same set of video captures.
doi_str_mv 10.1109/TMM.2021.3070438
format Article
fullrecord <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_proquest_journals_2647425692</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9393618</ieee_id><sourcerecordid>2647425692</sourcerecordid><originalsourceid>FETCH-LOGICAL-c291t-4a001c4c26fe4c46ccb573507b483a6f0aa361fafbc222a37bd9bf40779491b73</originalsourceid><addsrcrecordid>eNo9kEtPwzAQhCMEEqVwR-JiiXOKX4nrYylPKRVSW86W46zTVNQudoLUf49RK067h5ndmS_LbgmeEILlw3qxmFBMyYRhgTmbnmUjIjnJMRbiPO0FxbmkBF9mVzFuMSa8wGKULWdo7nf7ABtwsfsBtOqH5oC8Q08Ae1SBDq5zbf6oIzRoAf3GNxFZH9Cqax2qtGsH3QJagvGt6_rOu-vswuqvCDenOc4-X57X87e8-nh9n8-q3FBJ-pzrFMJwQ0sL3PDSmLoQLIWq-ZTp0mKtWUmstrWhlGom6kbWlqc6kktSCzbO7o9398F_DxB7tfVDcOmloiUXnBalpEmFjyoTfIwBrNqHbqfDQRGs_sipRE79kVMncslyd7R0APAvl0ymPFP2C3AzaOY</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2647425692</pqid></control><display><type>article</type><title>A Comprehensive Study on Deep Learning-Based Methods for Sign Language Recognition</title><source>IEEE Electronic Library (IEL)</source><creator>Adaloglou, Nikolas ; Chatzis, Theocharis ; Papastratis, Ilias ; Stergioulas, Andreas ; Papadopoulos, Georgios Th ; Zacharopoulou, Vassia ; Xydopoulos, George J. ; Atzakas, Klimnis ; Papazachariou, Dimitris ; Daras, Petros</creator><creatorcontrib>Adaloglou, Nikolas ; Chatzis, Theocharis ; Papastratis, Ilias ; Stergioulas, Andreas ; Papadopoulos, Georgios Th ; Zacharopoulou, Vassia ; Xydopoulos, George J. ; Atzakas, Klimnis ; Papazachariou, Dimitris ; Daras, Petros</creatorcontrib><description>In this paper, a comparative experimental assessment of computer vision-based methods for sign language recognition is conducted. By implementing the most recent deep neural network methods in this field, a thorough evaluation on multiple publicly available datasets is performed. The aim of the present study is to provide insights on sign language recognition, focusing on mapping non-segmented video streams to glosses. For this task, two new sequence training criteria, known from the fields of speech and scene text recognition, are introduced. Furthermore, a plethora of pretraining schemes is thoroughly discussed. Finally, a new RGB+D dataset for the Greek sign language is created. To the best of our knowledge, this is the first sign language dataset where three annotation levels are provided (individual gloss, sentence and spoken language) for the same set of video captures.</description><identifier>ISSN: 1520-9210</identifier><identifier>EISSN: 1941-0077</identifier><identifier>DOI: 10.1109/TMM.2021.3070438</identifier><identifier>CODEN: ITMUF8</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Annotations ; Artificial neural networks ; Assistive technology ; Computer vision ; conditional entropy CTC ; Datasets ; Deep neural networks ; Feature extraction ; Gesture recognition ; Gloss ; Greek sign language ; Hidden Markov models ; Machine learning ; Sign language ; Sign Language Recognition ; Speech recognition ; stimulated CTC ; Task analysis ; Three-dimensional displays ; Training ; Video data</subject><ispartof>IEEE transactions on multimedia, 2022, Vol.24, p.1750-1762</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2022</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c291t-4a001c4c26fe4c46ccb573507b483a6f0aa361fafbc222a37bd9bf40779491b73</citedby><cites>FETCH-LOGICAL-c291t-4a001c4c26fe4c46ccb573507b483a6f0aa361fafbc222a37bd9bf40779491b73</cites><orcidid>0000-0002-1145-5671 ; 0000-0003-1686-421X ; 0000-0003-3814-6710 ; 0000-0003-4938-6322 ; 0000-0003-4664-2626 ; 0000-0003-2969-5224 ; 0000-0003-4561-1238</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9393618$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,792,4010,27900,27901,27902,54733</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/9393618$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Adaloglou, Nikolas</creatorcontrib><creatorcontrib>Chatzis, Theocharis</creatorcontrib><creatorcontrib>Papastratis, Ilias</creatorcontrib><creatorcontrib>Stergioulas, Andreas</creatorcontrib><creatorcontrib>Papadopoulos, Georgios Th</creatorcontrib><creatorcontrib>Zacharopoulou, Vassia</creatorcontrib><creatorcontrib>Xydopoulos, George J.</creatorcontrib><creatorcontrib>Atzakas, Klimnis</creatorcontrib><creatorcontrib>Papazachariou, Dimitris</creatorcontrib><creatorcontrib>Daras, Petros</creatorcontrib><title>A Comprehensive Study on Deep Learning-Based Methods for Sign Language Recognition</title><title>IEEE transactions on multimedia</title><addtitle>TMM</addtitle><description>In this paper, a comparative experimental assessment of computer vision-based methods for sign language recognition is conducted. By implementing the most recent deep neural network methods in this field, a thorough evaluation on multiple publicly available datasets is performed. The aim of the present study is to provide insights on sign language recognition, focusing on mapping non-segmented video streams to glosses. For this task, two new sequence training criteria, known from the fields of speech and scene text recognition, are introduced. Furthermore, a plethora of pretraining schemes is thoroughly discussed. Finally, a new RGB+D dataset for the Greek sign language is created. To the best of our knowledge, this is the first sign language dataset where three annotation levels are provided (individual gloss, sentence and spoken language) for the same set of video captures.</description><subject>Annotations</subject><subject>Artificial neural networks</subject><subject>Assistive technology</subject><subject>Computer vision</subject><subject>conditional entropy CTC</subject><subject>Datasets</subject><subject>Deep neural networks</subject><subject>Feature extraction</subject><subject>Gesture recognition</subject><subject>Gloss</subject><subject>Greek sign language</subject><subject>Hidden Markov models</subject><subject>Machine learning</subject><subject>Sign language</subject><subject>Sign Language Recognition</subject><subject>Speech recognition</subject><subject>stimulated CTC</subject><subject>Task analysis</subject><subject>Three-dimensional displays</subject><subject>Training</subject><subject>Video data</subject><issn>1520-9210</issn><issn>1941-0077</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNo9kEtPwzAQhCMEEqVwR-JiiXOKX4nrYylPKRVSW86W46zTVNQudoLUf49RK067h5ndmS_LbgmeEILlw3qxmFBMyYRhgTmbnmUjIjnJMRbiPO0FxbmkBF9mVzFuMSa8wGKULWdo7nf7ABtwsfsBtOqH5oC8Q08Ae1SBDq5zbf6oIzRoAf3GNxFZH9Cqax2qtGsH3QJagvGt6_rOu-vswuqvCDenOc4-X57X87e8-nh9n8-q3FBJ-pzrFMJwQ0sL3PDSmLoQLIWq-ZTp0mKtWUmstrWhlGom6kbWlqc6kktSCzbO7o9398F_DxB7tfVDcOmloiUXnBalpEmFjyoTfIwBrNqHbqfDQRGs_sipRE79kVMncslyd7R0APAvl0ymPFP2C3AzaOY</recordid><startdate>2022</startdate><enddate>2022</enddate><creator>Adaloglou, Nikolas</creator><creator>Chatzis, Theocharis</creator><creator>Papastratis, Ilias</creator><creator>Stergioulas, Andreas</creator><creator>Papadopoulos, Georgios Th</creator><creator>Zacharopoulou, Vassia</creator><creator>Xydopoulos, George J.</creator><creator>Atzakas, Klimnis</creator><creator>Papazachariou, Dimitris</creator><creator>Daras, Petros</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0002-1145-5671</orcidid><orcidid>https://orcid.org/0000-0003-1686-421X</orcidid><orcidid>https://orcid.org/0000-0003-3814-6710</orcidid><orcidid>https://orcid.org/0000-0003-4938-6322</orcidid><orcidid>https://orcid.org/0000-0003-4664-2626</orcidid><orcidid>https://orcid.org/0000-0003-2969-5224</orcidid><orcidid>https://orcid.org/0000-0003-4561-1238</orcidid></search><sort><creationdate>2022</creationdate><title>A Comprehensive Study on Deep Learning-Based Methods for Sign Language Recognition</title><author>Adaloglou, Nikolas ; Chatzis, Theocharis ; Papastratis, Ilias ; Stergioulas, Andreas ; Papadopoulos, Georgios Th ; Zacharopoulou, Vassia ; Xydopoulos, George J. ; Atzakas, Klimnis ; Papazachariou, Dimitris ; Daras, Petros</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c291t-4a001c4c26fe4c46ccb573507b483a6f0aa361fafbc222a37bd9bf40779491b73</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Annotations</topic><topic>Artificial neural networks</topic><topic>Assistive technology</topic><topic>Computer vision</topic><topic>conditional entropy CTC</topic><topic>Datasets</topic><topic>Deep neural networks</topic><topic>Feature extraction</topic><topic>Gesture recognition</topic><topic>Gloss</topic><topic>Greek sign language</topic><topic>Hidden Markov models</topic><topic>Machine learning</topic><topic>Sign language</topic><topic>Sign Language Recognition</topic><topic>Speech recognition</topic><topic>stimulated CTC</topic><topic>Task analysis</topic><topic>Three-dimensional displays</topic><topic>Training</topic><topic>Video data</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Adaloglou, Nikolas</creatorcontrib><creatorcontrib>Chatzis, Theocharis</creatorcontrib><creatorcontrib>Papastratis, Ilias</creatorcontrib><creatorcontrib>Stergioulas, Andreas</creatorcontrib><creatorcontrib>Papadopoulos, Georgios Th</creatorcontrib><creatorcontrib>Zacharopoulou, Vassia</creatorcontrib><creatorcontrib>Xydopoulos, George J.</creatorcontrib><creatorcontrib>Atzakas, Klimnis</creatorcontrib><creatorcontrib>Papazachariou, Dimitris</creatorcontrib><creatorcontrib>Daras, Petros</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE transactions on multimedia</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Adaloglou, Nikolas</au><au>Chatzis, Theocharis</au><au>Papastratis, Ilias</au><au>Stergioulas, Andreas</au><au>Papadopoulos, Georgios Th</au><au>Zacharopoulou, Vassia</au><au>Xydopoulos, George J.</au><au>Atzakas, Klimnis</au><au>Papazachariou, Dimitris</au><au>Daras, Petros</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A Comprehensive Study on Deep Learning-Based Methods for Sign Language Recognition</atitle><jtitle>IEEE transactions on multimedia</jtitle><stitle>TMM</stitle><date>2022</date><risdate>2022</risdate><volume>24</volume><spage>1750</spage><epage>1762</epage><pages>1750-1762</pages><issn>1520-9210</issn><eissn>1941-0077</eissn><coden>ITMUF8</coden><abstract>In this paper, a comparative experimental assessment of computer vision-based methods for sign language recognition is conducted. By implementing the most recent deep neural network methods in this field, a thorough evaluation on multiple publicly available datasets is performed. The aim of the present study is to provide insights on sign language recognition, focusing on mapping non-segmented video streams to glosses. For this task, two new sequence training criteria, known from the fields of speech and scene text recognition, are introduced. Furthermore, a plethora of pretraining schemes is thoroughly discussed. Finally, a new RGB+D dataset for the Greek sign language is created. To the best of our knowledge, this is the first sign language dataset where three annotation levels are provided (individual gloss, sentence and spoken language) for the same set of video captures.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/TMM.2021.3070438</doi><tpages>13</tpages><orcidid>https://orcid.org/0000-0002-1145-5671</orcidid><orcidid>https://orcid.org/0000-0003-1686-421X</orcidid><orcidid>https://orcid.org/0000-0003-3814-6710</orcidid><orcidid>https://orcid.org/0000-0003-4938-6322</orcidid><orcidid>https://orcid.org/0000-0003-4664-2626</orcidid><orcidid>https://orcid.org/0000-0003-2969-5224</orcidid><orcidid>https://orcid.org/0000-0003-4561-1238</orcidid></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 1520-9210
ispartof IEEE transactions on multimedia, 2022, Vol.24, p.1750-1762
issn 1520-9210
1941-0077
language eng
recordid cdi_proquest_journals_2647425692
source IEEE Electronic Library (IEL)
subjects Annotations
Artificial neural networks
Assistive technology
Computer vision
conditional entropy CTC
Datasets
Deep neural networks
Feature extraction
Gesture recognition
Gloss
Greek sign language
Hidden Markov models
Machine learning
Sign language
Sign Language Recognition
Speech recognition
stimulated CTC
Task analysis
Three-dimensional displays
Training
Video data
title A Comprehensive Study on Deep Learning-Based Methods for Sign Language Recognition
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-08T06%3A51%3A36IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20Comprehensive%20Study%20on%20Deep%20Learning-Based%20Methods%20for%20Sign%20Language%20Recognition&rft.jtitle=IEEE%20transactions%20on%20multimedia&rft.au=Adaloglou,%20Nikolas&rft.date=2022&rft.volume=24&rft.spage=1750&rft.epage=1762&rft.pages=1750-1762&rft.issn=1520-9210&rft.eissn=1941-0077&rft.coden=ITMUF8&rft_id=info:doi/10.1109/TMM.2021.3070438&rft_dat=%3Cproquest_RIE%3E2647425692%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2647425692&rft_id=info:pmid/&rft_ieee_id=9393618&rfr_iscdi=true