A Unified Framework for Multilingual Speech Recognition in Air Traffic Control Systems

This work focuses on robust speech recognition in air traffic control (ATC) by designing a novel processing paradigm to integrate multilingual speech recognition into a single framework using three cascaded modules: an acoustic model (AM), a pronunciation model (PM), and a language model (LM). The A...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transaction on neural networks and learning systems 2021-08, Vol.32 (8), p.3608-3620
Hauptverfasser: Lin, Yi, Guo, Dongyue, Zhang, Jianwei, Chen, Zhengmao, Yang, Bo
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 3620
container_issue 8
container_start_page 3608
container_title IEEE transaction on neural networks and learning systems
container_volume 32
creator Lin, Yi
Guo, Dongyue
Zhang, Jianwei
Chen, Zhengmao
Yang, Bo
description This work focuses on robust speech recognition in air traffic control (ATC) by designing a novel processing paradigm to integrate multilingual speech recognition into a single framework using three cascaded modules: an acoustic model (AM), a pronunciation model (PM), and a language model (LM). The AM converts ATC speech into phoneme-based text sequences that the PM then translates into a word-based sequence, which is the ultimate goal of this research. The LM corrects both phoneme- and word-based errors in the decoding results. The AM, including the convolutional neural network (CNN) and recurrent neural network (RNN), considers the spatial and temporal dependences of the speech features and is trained by the connectionist temporal classification loss. To cope with radio transmission noise and diversity among speakers, a multiscale CNN architecture is proposed to fit the diverse data distributions and improve the performance. Phoneme-to-word translation is addressed via a proposed machine translation PM with an encoder-decoder architecture. RNN-based LMs are trained to consider the code-switching specificity of the ATC speech by building dependences with common words. We validate the proposed approach using large amounts of real Chinese and English ATC recordings and achieve a 3.95% label error rate on Chinese characters and English words, outperforming other popular approaches. The decoding efficiency is also comparable to that of the end-to-end model, and its generalizability is validated on several open corpora, making it suitable for real-time approaches to further support ATC applications, such as ATC prediction and safety checking.
doi_str_mv 10.1109/TNNLS.2020.3015830
format Article
fullrecord <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_ieee_primary_9174746</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9174746</ieee_id><sourcerecordid>2557978591</sourcerecordid><originalsourceid>FETCH-LOGICAL-c351t-20957133aa1bdc62fae7ad83f230d74a9b23da6376350f64819fcded8c30fef73</originalsourceid><addsrcrecordid>eNpdkE1rGzEQhkVpaUKaP9BCEfTSix1Js6uPozFNE3ATSJzSm5C1o1Tp7sqVdin5913Xrg-ZywzM8w7DQ8h7zuacM3OxvrlZ3c8FE2wOjNca2CtyKrgUMwFavz7O6scJOS_liU0lWS0r85acgNAA03hKvi_oQx9DxIZeZtfhn5R_0ZAy_Ta2Q2xj_zi6lt5vEf1Peoc-PfZxiKmnsaeLmOk6uxCip8vUDzlN5HMZsCvvyJvg2oLnh35GHi6_rJdXs9Xt1-vlYjXzUPNhJpipFQdwjm8aL0VwqFyjIQhgjaqc2QhonAQloWZBVpqb4BtstAcWMCg4I5_3d7c5_R6xDLaLxWPbuh7TWKyoQHFhpN6hn16gT2nM_fSdFXWtjNK14RMl9pTPqZSMwW5z7Fx-tpzZnXj7T7zdibcH8VPo4-H0uOmwOUb-a56AD3sgIuJxbbiqVCXhL3gqhmg</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2557978591</pqid></control><display><type>article</type><title>A Unified Framework for Multilingual Speech Recognition in Air Traffic Control Systems</title><source>IEEE Electronic Library (IEL)</source><creator>Lin, Yi ; Guo, Dongyue ; Zhang, Jianwei ; Chen, Zhengmao ; Yang, Bo</creator><creatorcontrib>Lin, Yi ; Guo, Dongyue ; Zhang, Jianwei ; Chen, Zhengmao ; Yang, Bo</creatorcontrib><description>This work focuses on robust speech recognition in air traffic control (ATC) by designing a novel processing paradigm to integrate multilingual speech recognition into a single framework using three cascaded modules: an acoustic model (AM), a pronunciation model (PM), and a language model (LM). The AM converts ATC speech into phoneme-based text sequences that the PM then translates into a word-based sequence, which is the ultimate goal of this research. The LM corrects both phoneme- and word-based errors in the decoding results. The AM, including the convolutional neural network (CNN) and recurrent neural network (RNN), considers the spatial and temporal dependences of the speech features and is trained by the connectionist temporal classification loss. To cope with radio transmission noise and diversity among speakers, a multiscale CNN architecture is proposed to fit the diverse data distributions and improve the performance. Phoneme-to-word translation is addressed via a proposed machine translation PM with an encoder-decoder architecture. RNN-based LMs are trained to consider the code-switching specificity of the ATC speech by building dependences with common words. We validate the proposed approach using large amounts of real Chinese and English ATC recordings and achieve a 3.95% label error rate on Chinese characters and English words, outperforming other popular approaches. The decoding efficiency is also comparable to that of the end-to-end model, and its generalizability is validated on several open corpora, making it suitable for real-time approaches to further support ATC applications, such as ATC prediction and safety checking.</description><identifier>ISSN: 2162-237X</identifier><identifier>EISSN: 2162-2388</identifier><identifier>DOI: 10.1109/TNNLS.2020.3015830</identifier><identifier>PMID: 32833649</identifier><identifier>CODEN: ITNNAL</identifier><language>eng</language><publisher>United States: IEEE</publisher><subject>Acoustic model (AM) ; Acoustics ; Air traffic control ; air traffic control (ATC) ; Algorithms ; Artificial neural networks ; Atmospheric modeling ; Aviation ; Coders ; Computer Systems ; Control systems ; Decoding ; English language ; Hidden Markov models ; Language ; Machine translation ; machine translation pronunciation model (PM) ; multilingual ; Multilingualism ; multiscale CNN (MCNN) ; Neural networks ; Neural Networks, Computer ; Phonemes ; Radio transmission ; Real-time systems ; Recurrent neural networks ; Reproducibility of Results ; robust speech recognition ; Speech ; Speech recognition ; Speech Recognition Software ; Task analysis ; Translation ; Translations ; Vocabulary ; Voice recognition ; Words (language)</subject><ispartof>IEEE transaction on neural networks and learning systems, 2021-08, Vol.32 (8), p.3608-3620</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2021</rights><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c351t-20957133aa1bdc62fae7ad83f230d74a9b23da6376350f64819fcded8c30fef73</citedby><cites>FETCH-LOGICAL-c351t-20957133aa1bdc62fae7ad83f230d74a9b23da6376350f64819fcded8c30fef73</cites><orcidid>0000-0002-7194-5023 ; 0000-0002-5491-1745</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9174746$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,792,27903,27904,54736</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/9174746$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/32833649$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Lin, Yi</creatorcontrib><creatorcontrib>Guo, Dongyue</creatorcontrib><creatorcontrib>Zhang, Jianwei</creatorcontrib><creatorcontrib>Chen, Zhengmao</creatorcontrib><creatorcontrib>Yang, Bo</creatorcontrib><title>A Unified Framework for Multilingual Speech Recognition in Air Traffic Control Systems</title><title>IEEE transaction on neural networks and learning systems</title><addtitle>TNNLS</addtitle><addtitle>IEEE Trans Neural Netw Learn Syst</addtitle><description>This work focuses on robust speech recognition in air traffic control (ATC) by designing a novel processing paradigm to integrate multilingual speech recognition into a single framework using three cascaded modules: an acoustic model (AM), a pronunciation model (PM), and a language model (LM). The AM converts ATC speech into phoneme-based text sequences that the PM then translates into a word-based sequence, which is the ultimate goal of this research. The LM corrects both phoneme- and word-based errors in the decoding results. The AM, including the convolutional neural network (CNN) and recurrent neural network (RNN), considers the spatial and temporal dependences of the speech features and is trained by the connectionist temporal classification loss. To cope with radio transmission noise and diversity among speakers, a multiscale CNN architecture is proposed to fit the diverse data distributions and improve the performance. Phoneme-to-word translation is addressed via a proposed machine translation PM with an encoder-decoder architecture. RNN-based LMs are trained to consider the code-switching specificity of the ATC speech by building dependences with common words. We validate the proposed approach using large amounts of real Chinese and English ATC recordings and achieve a 3.95% label error rate on Chinese characters and English words, outperforming other popular approaches. The decoding efficiency is also comparable to that of the end-to-end model, and its generalizability is validated on several open corpora, making it suitable for real-time approaches to further support ATC applications, such as ATC prediction and safety checking.</description><subject>Acoustic model (AM)</subject><subject>Acoustics</subject><subject>Air traffic control</subject><subject>air traffic control (ATC)</subject><subject>Algorithms</subject><subject>Artificial neural networks</subject><subject>Atmospheric modeling</subject><subject>Aviation</subject><subject>Coders</subject><subject>Computer Systems</subject><subject>Control systems</subject><subject>Decoding</subject><subject>English language</subject><subject>Hidden Markov models</subject><subject>Language</subject><subject>Machine translation</subject><subject>machine translation pronunciation model (PM)</subject><subject>multilingual</subject><subject>Multilingualism</subject><subject>multiscale CNN (MCNN)</subject><subject>Neural networks</subject><subject>Neural Networks, Computer</subject><subject>Phonemes</subject><subject>Radio transmission</subject><subject>Real-time systems</subject><subject>Recurrent neural networks</subject><subject>Reproducibility of Results</subject><subject>robust speech recognition</subject><subject>Speech</subject><subject>Speech recognition</subject><subject>Speech Recognition Software</subject><subject>Task analysis</subject><subject>Translation</subject><subject>Translations</subject><subject>Vocabulary</subject><subject>Voice recognition</subject><subject>Words (language)</subject><issn>2162-237X</issn><issn>2162-2388</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><sourceid>EIF</sourceid><recordid>eNpdkE1rGzEQhkVpaUKaP9BCEfTSix1Js6uPozFNE3ATSJzSm5C1o1Tp7sqVdin5913Xrg-ZywzM8w7DQ8h7zuacM3OxvrlZ3c8FE2wOjNca2CtyKrgUMwFavz7O6scJOS_liU0lWS0r85acgNAA03hKvi_oQx9DxIZeZtfhn5R_0ZAy_Ta2Q2xj_zi6lt5vEf1Peoc-PfZxiKmnsaeLmOk6uxCip8vUDzlN5HMZsCvvyJvg2oLnh35GHi6_rJdXs9Xt1-vlYjXzUPNhJpipFQdwjm8aL0VwqFyjIQhgjaqc2QhonAQloWZBVpqb4BtstAcWMCg4I5_3d7c5_R6xDLaLxWPbuh7TWKyoQHFhpN6hn16gT2nM_fSdFXWtjNK14RMl9pTPqZSMwW5z7Fx-tpzZnXj7T7zdibcH8VPo4-H0uOmwOUb-a56AD3sgIuJxbbiqVCXhL3gqhmg</recordid><startdate>20210801</startdate><enddate>20210801</enddate><creator>Lin, Yi</creator><creator>Guo, Dongyue</creator><creator>Zhang, Jianwei</creator><creator>Chen, Zhengmao</creator><creator>Yang, Bo</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7QF</scope><scope>7QO</scope><scope>7QP</scope><scope>7QQ</scope><scope>7QR</scope><scope>7SC</scope><scope>7SE</scope><scope>7SP</scope><scope>7SR</scope><scope>7TA</scope><scope>7TB</scope><scope>7TK</scope><scope>7U5</scope><scope>8BQ</scope><scope>8FD</scope><scope>F28</scope><scope>FR3</scope><scope>H8D</scope><scope>JG9</scope><scope>JQ2</scope><scope>KR7</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>P64</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0002-7194-5023</orcidid><orcidid>https://orcid.org/0000-0002-5491-1745</orcidid></search><sort><creationdate>20210801</creationdate><title>A Unified Framework for Multilingual Speech Recognition in Air Traffic Control Systems</title><author>Lin, Yi ; Guo, Dongyue ; Zhang, Jianwei ; Chen, Zhengmao ; Yang, Bo</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c351t-20957133aa1bdc62fae7ad83f230d74a9b23da6376350f64819fcded8c30fef73</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Acoustic model (AM)</topic><topic>Acoustics</topic><topic>Air traffic control</topic><topic>air traffic control (ATC)</topic><topic>Algorithms</topic><topic>Artificial neural networks</topic><topic>Atmospheric modeling</topic><topic>Aviation</topic><topic>Coders</topic><topic>Computer Systems</topic><topic>Control systems</topic><topic>Decoding</topic><topic>English language</topic><topic>Hidden Markov models</topic><topic>Language</topic><topic>Machine translation</topic><topic>machine translation pronunciation model (PM)</topic><topic>multilingual</topic><topic>Multilingualism</topic><topic>multiscale CNN (MCNN)</topic><topic>Neural networks</topic><topic>Neural Networks, Computer</topic><topic>Phonemes</topic><topic>Radio transmission</topic><topic>Real-time systems</topic><topic>Recurrent neural networks</topic><topic>Reproducibility of Results</topic><topic>robust speech recognition</topic><topic>Speech</topic><topic>Speech recognition</topic><topic>Speech Recognition Software</topic><topic>Task analysis</topic><topic>Translation</topic><topic>Translations</topic><topic>Vocabulary</topic><topic>Voice recognition</topic><topic>Words (language)</topic><toplevel>online_resources</toplevel><creatorcontrib>Lin, Yi</creatorcontrib><creatorcontrib>Guo, Dongyue</creatorcontrib><creatorcontrib>Zhang, Jianwei</creatorcontrib><creatorcontrib>Chen, Zhengmao</creatorcontrib><creatorcontrib>Yang, Bo</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Aluminium Industry Abstracts</collection><collection>Biotechnology Research Abstracts</collection><collection>Calcium &amp; Calcified Tissue Abstracts</collection><collection>Ceramic Abstracts</collection><collection>Chemoreception Abstracts</collection><collection>Computer and Information Systems Abstracts</collection><collection>Corrosion Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>Materials Business File</collection><collection>Mechanical &amp; Transportation Engineering Abstracts</collection><collection>Neurosciences Abstracts</collection><collection>Solid State and Superconductivity Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>ANTE: Abstracts in New Technology &amp; Engineering</collection><collection>Engineering Research Database</collection><collection>Aerospace Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Civil Engineering Abstracts</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>MEDLINE - Academic</collection><jtitle>IEEE transaction on neural networks and learning systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Lin, Yi</au><au>Guo, Dongyue</au><au>Zhang, Jianwei</au><au>Chen, Zhengmao</au><au>Yang, Bo</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A Unified Framework for Multilingual Speech Recognition in Air Traffic Control Systems</atitle><jtitle>IEEE transaction on neural networks and learning systems</jtitle><stitle>TNNLS</stitle><addtitle>IEEE Trans Neural Netw Learn Syst</addtitle><date>2021-08-01</date><risdate>2021</risdate><volume>32</volume><issue>8</issue><spage>3608</spage><epage>3620</epage><pages>3608-3620</pages><issn>2162-237X</issn><eissn>2162-2388</eissn><coden>ITNNAL</coden><abstract>This work focuses on robust speech recognition in air traffic control (ATC) by designing a novel processing paradigm to integrate multilingual speech recognition into a single framework using three cascaded modules: an acoustic model (AM), a pronunciation model (PM), and a language model (LM). The AM converts ATC speech into phoneme-based text sequences that the PM then translates into a word-based sequence, which is the ultimate goal of this research. The LM corrects both phoneme- and word-based errors in the decoding results. The AM, including the convolutional neural network (CNN) and recurrent neural network (RNN), considers the spatial and temporal dependences of the speech features and is trained by the connectionist temporal classification loss. To cope with radio transmission noise and diversity among speakers, a multiscale CNN architecture is proposed to fit the diverse data distributions and improve the performance. Phoneme-to-word translation is addressed via a proposed machine translation PM with an encoder-decoder architecture. RNN-based LMs are trained to consider the code-switching specificity of the ATC speech by building dependences with common words. We validate the proposed approach using large amounts of real Chinese and English ATC recordings and achieve a 3.95% label error rate on Chinese characters and English words, outperforming other popular approaches. The decoding efficiency is also comparable to that of the end-to-end model, and its generalizability is validated on several open corpora, making it suitable for real-time approaches to further support ATC applications, such as ATC prediction and safety checking.</abstract><cop>United States</cop><pub>IEEE</pub><pmid>32833649</pmid><doi>10.1109/TNNLS.2020.3015830</doi><tpages>13</tpages><orcidid>https://orcid.org/0000-0002-7194-5023</orcidid><orcidid>https://orcid.org/0000-0002-5491-1745</orcidid></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 2162-237X
ispartof IEEE transaction on neural networks and learning systems, 2021-08, Vol.32 (8), p.3608-3620
issn 2162-237X
2162-2388
language eng
recordid cdi_ieee_primary_9174746
source IEEE Electronic Library (IEL)
subjects Acoustic model (AM)
Acoustics
Air traffic control
air traffic control (ATC)
Algorithms
Artificial neural networks
Atmospheric modeling
Aviation
Coders
Computer Systems
Control systems
Decoding
English language
Hidden Markov models
Language
Machine translation
machine translation pronunciation model (PM)
multilingual
Multilingualism
multiscale CNN (MCNN)
Neural networks
Neural Networks, Computer
Phonemes
Radio transmission
Real-time systems
Recurrent neural networks
Reproducibility of Results
robust speech recognition
Speech
Speech recognition
Speech Recognition Software
Task analysis
Translation
Translations
Vocabulary
Voice recognition
Words (language)
title A Unified Framework for Multilingual Speech Recognition in Air Traffic Control Systems
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-27T07%3A18%3A09IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20Unified%20Framework%20for%20Multilingual%20Speech%20Recognition%20in%20Air%20Traffic%20Control%20Systems&rft.jtitle=IEEE%20transaction%20on%20neural%20networks%20and%20learning%20systems&rft.au=Lin,%20Yi&rft.date=2021-08-01&rft.volume=32&rft.issue=8&rft.spage=3608&rft.epage=3620&rft.pages=3608-3620&rft.issn=2162-237X&rft.eissn=2162-2388&rft.coden=ITNNAL&rft_id=info:doi/10.1109/TNNLS.2020.3015830&rft_dat=%3Cproquest_RIE%3E2557978591%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2557978591&rft_id=info:pmid/32833649&rft_ieee_id=9174746&rfr_iscdi=true