A Unified Framework for Multilingual Speech Recognition in Air Traffic Control Systems

This work focuses on robust speech recognition in air traffic control (ATC) by designing a novel processing paradigm to integrate multilingual speech recognition into a single framework using three cascaded modules: an acoustic model (AM), a pronunciation model (PM), and a language model (LM). The A...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transaction on neural networks and learning systems 2021-08, Vol.32 (8), p.3608-3620
Hauptverfasser:	Lin, Yi, Guo, Dongyue, Zhang, Jianwei, Chen, Zhengmao, Yang, Bo
Format:	Artikel
Sprache:	eng
Schlagworte:	Acoustic model (AM) Acoustics Air traffic control air traffic control (ATC) Algorithms Artificial neural networks Atmospheric modeling Aviation Coders Computer Systems Control systems Decoding English language Hidden Markov models Language Machine translation machine translation pronunciation model (PM) multilingual Multilingualism multiscale CNN (MCNN) Neural networks Neural Networks, Computer Phonemes Radio transmission Real-time systems Recurrent neural networks Reproducibility of Results robust speech recognition Speech Speech recognition Speech Recognition Software Task analysis Translation Translations Vocabulary Voice recognition Words (language)
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	3620
container_issue	8
container_start_page	3608
container_title	IEEE transaction on neural networks and learning systems
container_volume	32
creator	Lin, Yi Guo, Dongyue Zhang, Jianwei Chen, Zhengmao Yang, Bo
description	This work focuses on robust speech recognition in air traffic control (ATC) by designing a novel processing paradigm to integrate multilingual speech recognition into a single framework using three cascaded modules: an acoustic model (AM), a pronunciation model (PM), and a language model (LM). The AM converts ATC speech into phoneme-based text sequences that the PM then translates into a word-based sequence, which is the ultimate goal of this research. The LM corrects both phoneme- and word-based errors in the decoding results. The AM, including the convolutional neural network (CNN) and recurrent neural network (RNN), considers the spatial and temporal dependences of the speech features and is trained by the connectionist temporal classification loss. To cope with radio transmission noise and diversity among speakers, a multiscale CNN architecture is proposed to fit the diverse data distributions and improve the performance. Phoneme-to-word translation is addressed via a proposed machine translation PM with an encoder-decoder architecture. RNN-based LMs are trained to consider the code-switching specificity of the ATC speech by building dependences with common words. We validate the proposed approach using large amounts of real Chinese and English ATC recordings and achieve a 3.95% label error rate on Chinese characters and English words, outperforming other popular approaches. The decoding efficiency is also comparable to that of the end-to-end model, and its generalizability is validated on several open corpora, making it suitable for real-time approaches to further support ATC applications, such as ATC prediction and safety checking.
doi_str_mv	10.1109/TNNLS.2020.3015830
format	Article
fullrecord	<record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_ieee_primary_9174746</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9174746</ieee_id><sourcerecordid>2557978591</sourcerecordid><originalsourceid>FETCH-LOGICAL-c351t-20957133aa1bdc62fae7ad83f230d74a9b23da6376350f64819fcded8c30fef73</originalsourceid><addsrcrecordid>eNpdkE1rGzEQhkVpaUKaP9BCEfTSix1Js6uPozFNE3ATSJzSm5C1o1Tp7sqVdin5913Xrg-ZywzM8w7DQ8h7zuacM3OxvrlZ3c8FE2wOjNca2CtyKrgUMwFavz7O6scJOS_liU0lWS0r85acgNAA03hKvi_oQx9DxIZeZtfhn5R_0ZAy_Ta2Q2xj_zi6lt5vEf1Peoc-PfZxiKmnsaeLmOk6uxCip8vUDzlN5HMZsCvvyJvg2oLnh35GHi6_rJdXs9Xt1-vlYjXzUPNhJpipFQdwjm8aL0VwqFyjIQhgjaqc2QhonAQloWZBVpqb4BtstAcWMCg4I5_3d7c5_R6xDLaLxWPbuh7TWKyoQHFhpN6hn16gT2nM_fSdFXWtjNK14RMl9pTPqZSMwW5z7Fx-tpzZnXj7T7zdibcH8VPo4-H0uOmwOUb-a56AD3sgIuJxbbiqVCXhL3gqhmg</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2557978591</pqid></control><display><type>article</type><title>A Unified Framework for Multilingual Speech Recognition in Air Traffic Control Systems</title><source>IEEE Electronic Library (IEL)</source><creator>Lin, Yi ; Guo, Dongyue ; Zhang, Jianwei ; Chen, Zhengmao ; Yang, Bo</creator><creatorcontrib>Lin, Yi ; Guo, Dongyue ; Zhang, Jianwei ; Chen, Zhengmao ; Yang, Bo</creatorcontrib><description>This work focuses on robust speech recognition in air traffic control (ATC) by designing a novel processing paradigm to integrate multilingual speech recognition into a single framework using three cascaded modules: an acoustic model (AM), a pronunciation model (PM), and a language model (LM). The AM converts ATC speech into phoneme-based text sequences that the PM then translates into a word-based sequence, which is the ultimate goal of this research. The LM corrects both phoneme- and word-based errors in the decoding results. The AM, including the convolutional neural network (CNN) and recurrent neural network (RNN), considers the spatial and temporal dependences of the speech features and is trained by the connectionist temporal classification loss. To cope with radio transmission noise and diversity among speakers, a multiscale CNN architecture is proposed to fit the diverse data distributions and improve the performance. Phoneme-to-word translation is addressed via a proposed machine translation PM with an encoder-decoder architecture. RNN-based LMs are trained to consider the code-switching specificity of the ATC speech by building dependences with common words. We validate the proposed approach using large amounts of real Chinese and English ATC recordings and achieve a 3.95% label error rate on Chinese characters and English words, outperforming other popular approaches. The decoding efficiency is also comparable to that of the end-to-end model, and its generalizability is validated on several open corpora, making it suitable for real-time approaches to further support ATC applications, such as ATC prediction and safety checking.</description><identifier>ISSN: 2162-237X</identifier><identifier>EISSN: 2162-2388</identifier><identifier>DOI: 10.1109/TNNLS.2020.3015830</identifier><identifier>PMID: 32833649</identifier><identifier>CODEN: ITNNAL</identifier><language>eng</language><publisher>United States: IEEE</publisher><subject>Acoustic model (AM) ; Acoustics ; Air traffic control ; air traffic control (ATC) ; Algorithms ; Artificial neural networks ; Atmospheric modeling ; Aviation ; Coders ; Computer Systems ; Control systems ; Decoding ; English language ; Hidden Markov models ; Language ; Machine translation ; machine translation pronunciation model (PM) ; multilingual ; Multilingualism ; multiscale CNN (MCNN) ; Neural networks ; Neural Networks, Computer ; Phonemes ; Radio transmission ; Real-time systems ; Recurrent neural networks ; Reproducibility of Results ; robust speech recognition ; Speech ; Speech recognition ; Speech Recognition Software ; Task analysis ; Translation ; Translations ; Vocabulary ; Voice recognition ; Words (language)</subject><ispartof>IEEE transaction on neural networks and learning systems, 2021-08, Vol.32 (8), p.3608-3620</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2021</rights><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c351t-20957133aa1bdc62fae7ad83f230d74a9b23da6376350f64819fcded8c30fef73</citedby><cites>FETCH-LOGICAL-c351t-20957133aa1bdc62fae7ad83f230d74a9b23da6376350f64819fcded8c30fef73</cites><orcidid>0000-0002-7194-5023 ; 0000-0002-5491-1745</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9174746$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,792,27903,27904,54736</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/9174746$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/32833649$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Lin, Yi</creatorcontrib><creatorcontrib>Guo, Dongyue</creatorcontrib><creatorcontrib>Zhang, Jianwei</creatorcontrib><creatorcontrib>Chen, Zhengmao</creatorcontrib><creatorcontrib>Yang, Bo</creatorcontrib><title>A Unified Framework for Multilingual Speech Recognition in Air Traffic Control Systems</title><title>IEEE transaction on neural networks and learning systems</title><addtitle>TNNLS</addtitle><addtitle>IEEE Trans Neural Netw Learn Syst</addtitle><description>This work focuses on robust speech recognition in air traffic control (ATC) by designing a novel processing paradigm to integrate multilingual speech recognition into a single framework using three cascaded modules: an acoustic model (AM), a pronunciation model (PM), and a language model (LM). The AM converts ATC speech into phoneme-based text sequences that the PM then translates into a word-based sequence, which is the ultimate goal of this research. The LM corrects both phoneme- and word-based errors in the decoding results. The AM, including the convolutional neural network (CNN) and recurrent neural network (RNN), considers the spatial and temporal dependences of the speech features and is trained by the connectionist temporal classification loss. To cope with radio transmission noise and diversity among speakers, a multiscale CNN architecture is proposed to fit the diverse data distributions and improve the performance. Phoneme-to-word translation is addressed via a proposed machine translation PM with an encoder-decoder architecture. RNN-based LMs are trained to consider the code-switching specificity of the ATC speech by building dependences with common words. We validate the proposed approach using large amounts of real Chinese and English ATC recordings and achieve a 3.95% label error rate on Chinese characters and English words, outperforming other popular approaches. The decoding efficiency is also comparable to that of the end-to-end model, and its generalizability is validated on several open corpora, making it suitable for real-time approaches to further support ATC applications, such as ATC prediction and safety checking.</description><subject>Acoustic model (AM)</subject><subject>Acoustics</subject><subject>Air traffic control</subject><subject>air traffic control (ATC)</subject><subject>Algorithms</subject><subject>Artificial neural networks</subject><subject>Atmospheric modeling</subject><subject>Aviation</subject><subject>Coders</subject><subject>Computer Systems</subject><subject>Control systems</subject><subject>Decoding</subject><subject>English language</subject><subject>Hidden Markov models</subject><subject>Language</subject><subject>Machine translation</subject><subject>machine translation pronunciation model (PM)</subject><subject>multilingual</subject><subject>Multilingualism</subject><subject>multiscale CNN (MCNN)</subject><subject>Neural networks</subject><subject>Neural Networks, Computer</subject><subject>Phonemes</subject><subject>Radio transmission</subject><subject>Real-time systems</subject><subject>Recurrent neural networks</subject><subject>Reproducibility of Results</subject><subject>robust speech recognition</subject><subject>Speech</subject><subject>Speech recognition</subject><subject>Speech Recognition Software</subject><subject>Task analysis</subject><subject>Translation</subject><subject>Translations</subject><subject>Vocabulary</subject><subject>Voice recognition</subject><subject>Words (language)</subject><issn>2162-237X</issn><issn>2162-2388</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><sourceid>EIF</sourceid><recordid>eNpdkE1rGzEQhkVpaUKaP9BCEfTSix1Js6uPozFNE3ATSJzSm5C1o1Tp7sqVdin5913Xrg-ZywzM8w7DQ8h7zuacM3OxvrlZ3c8FE2wOjNca2CtyKrgUMwFavz7O6scJOS_liU0lWS0r85acgNAA03hKvi_oQx9DxIZeZtfhn5R_0ZAy_Ta2Q2xj_zi6lt5vEf1Peoc-PfZxiKmnsaeLmOk6uxCip8vUDzlN5HMZsCvvyJvg2oLnh35GHi6_rJdXs9Xt1-vlYjXzUPNhJpipFQdwjm8aL0VwqFyjIQhgjaqc2QhonAQloWZBVpqb4BtstAcWMCg4I5_3d7c5_R6xDLaLxWPbuh7TWKyoQHFhpN6hn16gT2nM_fSdFXWtjNK14RMl9pTPqZSMwW5z7Fx-tpzZnXj7T7zdibcH8VPo4-H0uOmwOUb-a56AD3sgIuJxbbiqVCXhL3gqhmg</recordid><startdate>20210801</startdate><enddate>20210801</enddate><creator>Lin, Yi</creator><creator>Guo, Dongyue</creator><creator>Zhang, Jianwei</creator><creator>Chen, Zhengmao</creator><creator>Yang, Bo</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7QF</scope><scope>7QO</scope><scope>7QP</scope><scope>7QQ</scope><scope>7QR</scope><scope>7SC</scope><scope>7SE</scope><scope>7SP</scope><scope>7SR</scope><scope>7TA</scope><scope>7TB</scope><scope>7TK</scope><scope>7U5</scope><scope>8BQ</scope><scope>8FD</scope><scope>F28</scope><scope>FR3</scope><scope>H8D</scope><scope>JG9</scope><scope>JQ2</scope><scope>KR7</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>P64</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0002-7194-5023</orcidid><orcidid>https://orcid.org/0000-0002-5491-1745</orcidid></search><sort><creationdate>20210801</creationdate><title>A Unified Framework for Multilingual Speech Recognition in Air Traffic Control Systems</title><author>Lin, Yi ; Guo, Dongyue ; Zhang, Jianwei ; Chen, Zhengmao ; Yang, Bo</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c351t-20957133aa1bdc62fae7ad83f230d74a9b23da6376350f64819fcded8c30fef73</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Acoustic model (AM)</topic><topic>Acoustics</topic><topic>Air traffic control</topic><topic>air traffic control (ATC)</topic><topic>Algorithms</topic><topic>Artificial neural networks</topic><topic>Atmospheric modeling</topic><topic>Aviation</topic><topic>Coders</topic><topic>Computer Systems</topic><topic>Control systems</topic><topic>Decoding</topic><topic>English language</topic><topic>Hidden Markov models</topic><topic>Language</topic><topic>Machine translation</topic><topic>machine translation pronunciation model (PM)</topic><topic>multilingual</topic><topic>Multilingualism</topic><topic>multiscale CNN (MCNN)</topic><topic>Neural networks</topic><topic>Neural Networks, Computer</topic><topic>Phonemes</topic><topic>Radio transmission</topic><topic>Real-time systems</topic><topic>Recurrent neural networks</topic><topic>Reproducibility of Results</topic><topic>robust speech recognition</topic><topic>Speech</topic><topic>Speech recognition</topic><topic>Speech Recognition Software</topic><topic>Task analysis</topic><topic>Translation</topic><topic>Translations</topic><topic>Vocabulary</topic><topic>Voice recognition</topic><topic>Words (language)</topic><toplevel>online_resources</toplevel><creatorcontrib>Lin, Yi</creatorcontrib><creatorcontrib>Guo, Dongyue</creatorcontrib><creatorcontrib>Zhang, Jianwei</creatorcontrib><creatorcontrib>Chen, Zhengmao</creatorcontrib><creatorcontrib>Yang, Bo</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Aluminium Industry Abstracts</collection><collection>Biotechnology Research Abstracts</collection><collection>Calcium & Calcified Tissue Abstracts</collection><collection>Ceramic Abstracts</collection><collection>Chemoreception Abstracts</collection><collection>Computer and Information Systems Abstracts</collection><collection>Corrosion Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>Materials Business File</collection><collection>Mechanical & Transportation Engineering Abstracts</collection><collection>Neurosciences Abstracts</collection><collection>Solid State and Superconductivity Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>ANTE: Abstracts in New Technology & Engineering</collection><collection>Engineering Research Database</collection><collection>Aerospace Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Civil Engineering Abstracts</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>MEDLINE - Academic</collection><jtitle>IEEE transaction on neural networks and learning systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Lin, Yi</au><au>Guo, Dongyue</au><au>Zhang, Jianwei</au><au>Chen, Zhengmao</au><au>Yang, Bo</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A Unified Framework for Multilingual Speech Recognition in Air Traffic Control Systems</atitle><jtitle>IEEE transaction on neural networks and learning systems</jtitle><stitle>TNNLS</stitle><addtitle>IEEE Trans Neural Netw Learn Syst</addtitle><date>2021-08-01</date><risdate>2021</risdate><volume>32</volume><issue>8</issue><spage>3608</spage><epage>3620</epage><pages>3608-3620</pages><issn>2162-237X</issn><eissn>2162-2388</eissn><coden>ITNNAL</coden><abstract>This work focuses on robust speech recognition in air traffic control (ATC) by designing a novel processing paradigm to integrate multilingual speech recognition into a single framework using three cascaded modules: an acoustic model (AM), a pronunciation model (PM), and a language model (LM). The AM converts ATC speech into phoneme-based text sequences that the PM then translates into a word-based sequence, which is the ultimate goal of this research. The LM corrects both phoneme- and word-based errors in the decoding results. The AM, including the convolutional neural network (CNN) and recurrent neural network (RNN), considers the spatial and temporal dependences of the speech features and is trained by the connectionist temporal classification loss. To cope with radio transmission noise and diversity among speakers, a multiscale CNN architecture is proposed to fit the diverse data distributions and improve the performance. Phoneme-to-word translation is addressed via a proposed machine translation PM with an encoder-decoder architecture. RNN-based LMs are trained to consider the code-switching specificity of the ATC speech by building dependences with common words. We validate the proposed approach using large amounts of real Chinese and English ATC recordings and achieve a 3.95% label error rate on Chinese characters and English words, outperforming other popular approaches. The decoding efficiency is also comparable to that of the end-to-end model, and its generalizability is validated on several open corpora, making it suitable for real-time approaches to further support ATC applications, such as ATC prediction and safety checking.</abstract><cop>United States</cop><pub>IEEE</pub><pmid>32833649</pmid><doi>10.1109/TNNLS.2020.3015830</doi><tpages>13</tpages><orcidid>https://orcid.org/0000-0002-7194-5023</orcidid><orcidid>https://orcid.org/0000-0002-5491-1745</orcidid></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISSN: 2162-237X
ispartof	IEEE transaction on neural networks and learning systems, 2021-08, Vol.32 (8), p.3608-3620
issn	2162-237X 2162-2388
language	eng
recordid	cdi_ieee_primary_9174746
source	IEEE Electronic Library (IEL)
subjects	Acoustic model (AM) Acoustics Air traffic control air traffic control (ATC) Algorithms Artificial neural networks Atmospheric modeling Aviation Coders Computer Systems Control systems Decoding English language Hidden Markov models Language Machine translation machine translation pronunciation model (PM) multilingual Multilingualism multiscale CNN (MCNN) Neural networks Neural Networks, Computer Phonemes Radio transmission Real-time systems Recurrent neural networks Reproducibility of Results robust speech recognition Speech Speech recognition Speech Recognition Software Task analysis Translation Translations Vocabulary Voice recognition Words (language)
title	A Unified Framework for Multilingual Speech Recognition in Air Traffic Control Systems
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-27T07%3A18%3A09IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20Unified%20Framework%20for%20Multilingual%20Speech%20Recognition%20in%20Air%20Traffic%20Control%20Systems&rft.jtitle=IEEE%20transaction%20on%20neural%20networks%20and%20learning%20systems&rft.au=Lin,%20Yi&rft.date=2021-08-01&rft.volume=32&rft.issue=8&rft.spage=3608&rft.epage=3620&rft.pages=3608-3620&rft.issn=2162-237X&rft.eissn=2162-2388&rft.coden=ITNNAL&rft_id=info:doi/10.1109/TNNLS.2020.3015830&rft_dat=%3Cproquest_RIE%3E2557978591%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2557978591&rft_id=info:pmid/32833649&rft_ieee_id=9174746&rfr_iscdi=true