Bilingual Experiments on Automatic Recovery of Capitalization and Punctuation of Automatic Speech Transcripts

This paper focuses on the tasks of recovering capitalization and punctuation marks from texts without that information, such as spoken transcripts, produced by automatic speech recognition systems. These two practical rich transcription tasks were performed using the same discriminative approach, ba...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on audio, speech, and language processing speech, and language processing, 2012-02, Vol.20 (2), p.474-485
Hauptverfasser: Batista, F., Moniz, H., Trancoso, I., Mamede, N.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 485
container_issue 2
container_start_page 474
container_title IEEE transactions on audio, speech, and language processing
container_volume 20
creator Batista, F.
Moniz, H.
Trancoso, I.
Mamede, N.
description This paper focuses on the tasks of recovering capitalization and punctuation marks from texts without that information, such as spoken transcripts, produced by automatic speech recognition systems. These two practical rich transcription tasks were performed using the same discriminative approach, based on maximum entropy, suitable for on-the-fly usage. Reported experiments were conducted both over Portuguese and English broadcast news data. Both force aligned and automatic transcripts were used, allowing to measure the impact of the speech recognition errors. Capitalized words and named entities are intrinsically related, and are influenced by time variation effects. For that reason, the so-called language dynamics have been addressed for the capitalization task. Language adaptation results indicate, for both languages, that the capitalization performance is affected by the temporal distance between the training and testing data. In what regards the punctuation task, this paper covers the three most frequent punctuation marks: full stop, comma, and question marks. Different methods were explored for improving the baseline results for full stop and comma. The first uses punctuation information extracted from large written corpora. The second applies different levels of linguistic structure, including lexical, prosodic, and speaker related features. The comma detection improved significantly in the first method, thus indicating that it depends more on lexical features. The second method provided even better results, for both languages and both punctuation marks, best results being achieved mainly for full stop. As for question marks, there is a small gain, but differences are not very significant, due to the relatively small number of question marks in the corpora.
doi_str_mv 10.1109/TASL.2011.2159594
format Article
fullrecord <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_ieee_primary_6135544</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>6135544</ieee_id><sourcerecordid>2570326581</sourcerecordid><originalsourceid>FETCH-LOGICAL-c421t-717a6dd15d920e4355fb5a23ab2c61aba1e38744b98464dc53226741f32044423</originalsourceid><addsrcrecordid>eNpdkF1LwzAUhosoOKc_QLwpguBNZ09ykraXc8wPGChuXpc0TTXSL5NWnL_elI4JXiXhPO_hzeN55xDOAMLkZjNfr2YkBJgRYAlL8MCbAGNxECUED_d34MfeibUfYYiUI0y86laXun7rRekvv1tldKXqzvpN7c_7rqlEp6X_omTzpczWbwp_IVrdiVL_uImDRJ37z30tu358O-Ivt26Vku_-xojaSqPbzp56R4UorTrbnVPv9W65WTwEq6f7x8V8FUgk0AURRILnObA8IaFCyliRMUGoyIjkIDIBisYRYpbEyDGXjBLCI4SCkhARCZ161-Pe1jSfvbJdWmkrVVmKWjW9TSF0zoAnCA69_Id-NL2pXbs0gSimrgF3EIyQNI21RhVp60wJs3Wb0sF_OvhPB__pzr_LXO0WCytFWTgLUtt9kDCMKI-HAhcjp5VS-zEH92tE-gtnvo5B</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>917839206</pqid></control><display><type>article</type><title>Bilingual Experiments on Automatic Recovery of Capitalization and Punctuation of Automatic Speech Transcripts</title><source>IEEE Electronic Library (IEL)</source><creator>Batista, F. ; Moniz, H. ; Trancoso, I. ; Mamede, N.</creator><creatorcontrib>Batista, F. ; Moniz, H. ; Trancoso, I. ; Mamede, N.</creatorcontrib><description>This paper focuses on the tasks of recovering capitalization and punctuation marks from texts without that information, such as spoken transcripts, produced by automatic speech recognition systems. These two practical rich transcription tasks were performed using the same discriminative approach, based on maximum entropy, suitable for on-the-fly usage. Reported experiments were conducted both over Portuguese and English broadcast news data. Both force aligned and automatic transcripts were used, allowing to measure the impact of the speech recognition errors. Capitalized words and named entities are intrinsically related, and are influenced by time variation effects. For that reason, the so-called language dynamics have been addressed for the capitalization task. Language adaptation results indicate, for both languages, that the capitalization performance is affected by the temporal distance between the training and testing data. In what regards the punctuation task, this paper covers the three most frequent punctuation marks: full stop, comma, and question marks. Different methods were explored for improving the baseline results for full stop and comma. The first uses punctuation information extracted from large written corpora. The second applies different levels of linguistic structure, including lexical, prosodic, and speaker related features. The comma detection improved significantly in the first method, thus indicating that it depends more on lexical features. The second method provided even better results, for both languages and both punctuation marks, best results being achieved mainly for full stop. As for question marks, there is a small gain, but differences are not very significant, due to the relatively small number of question marks in the corpora.</description><identifier>ISSN: 1558-7916</identifier><identifier>ISSN: 2329-9290</identifier><identifier>EISSN: 1558-7924</identifier><identifier>EISSN: 2329-9304</identifier><identifier>DOI: 10.1109/TASL.2011.2159594</identifier><identifier>CODEN: ITASD8</identifier><language>eng</language><publisher>Piscataway, NJ: IEEE</publisher><subject>Adaptation models ; Alignment ; Applied sciences ; Automatic speech processing ; Automatic speech recognition ; capitalization ; Error analysis ; Exact sciences and technology ; Hidden Markov models ; Information, signal and communications theory ; language dynamics ; Linguistics ; Miscellaneous ; natural language processing ; Plugs ; punctuation marks ; rich transcription ; Signal processing ; Speech ; Speech processing ; Tasks ; Telecommunications and information theory ; Texts ; Training ; Training data ; Vocabulary</subject><ispartof>IEEE transactions on audio, speech, and language processing, 2012-02, Vol.20 (2), p.474-485</ispartof><rights>2015 INIST-CNRS</rights><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) Feb 2012</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c421t-717a6dd15d920e4355fb5a23ab2c61aba1e38744b98464dc53226741f32044423</citedby><cites>FETCH-LOGICAL-c421t-717a6dd15d920e4355fb5a23ab2c61aba1e38744b98464dc53226741f32044423</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/6135544$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,796,27922,27923,54756</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/6135544$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc><backlink>$$Uhttp://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&amp;idt=25473681$$DView record in Pascal Francis$$Hfree_for_read</backlink></links><search><creatorcontrib>Batista, F.</creatorcontrib><creatorcontrib>Moniz, H.</creatorcontrib><creatorcontrib>Trancoso, I.</creatorcontrib><creatorcontrib>Mamede, N.</creatorcontrib><title>Bilingual Experiments on Automatic Recovery of Capitalization and Punctuation of Automatic Speech Transcripts</title><title>IEEE transactions on audio, speech, and language processing</title><addtitle>TASL</addtitle><description>This paper focuses on the tasks of recovering capitalization and punctuation marks from texts without that information, such as spoken transcripts, produced by automatic speech recognition systems. These two practical rich transcription tasks were performed using the same discriminative approach, based on maximum entropy, suitable for on-the-fly usage. Reported experiments were conducted both over Portuguese and English broadcast news data. Both force aligned and automatic transcripts were used, allowing to measure the impact of the speech recognition errors. Capitalized words and named entities are intrinsically related, and are influenced by time variation effects. For that reason, the so-called language dynamics have been addressed for the capitalization task. Language adaptation results indicate, for both languages, that the capitalization performance is affected by the temporal distance between the training and testing data. In what regards the punctuation task, this paper covers the three most frequent punctuation marks: full stop, comma, and question marks. Different methods were explored for improving the baseline results for full stop and comma. The first uses punctuation information extracted from large written corpora. The second applies different levels of linguistic structure, including lexical, prosodic, and speaker related features. The comma detection improved significantly in the first method, thus indicating that it depends more on lexical features. The second method provided even better results, for both languages and both punctuation marks, best results being achieved mainly for full stop. As for question marks, there is a small gain, but differences are not very significant, due to the relatively small number of question marks in the corpora.</description><subject>Adaptation models</subject><subject>Alignment</subject><subject>Applied sciences</subject><subject>Automatic speech processing</subject><subject>Automatic speech recognition</subject><subject>capitalization</subject><subject>Error analysis</subject><subject>Exact sciences and technology</subject><subject>Hidden Markov models</subject><subject>Information, signal and communications theory</subject><subject>language dynamics</subject><subject>Linguistics</subject><subject>Miscellaneous</subject><subject>natural language processing</subject><subject>Plugs</subject><subject>punctuation marks</subject><subject>rich transcription</subject><subject>Signal processing</subject><subject>Speech</subject><subject>Speech processing</subject><subject>Tasks</subject><subject>Telecommunications and information theory</subject><subject>Texts</subject><subject>Training</subject><subject>Training data</subject><subject>Vocabulary</subject><issn>1558-7916</issn><issn>2329-9290</issn><issn>1558-7924</issn><issn>2329-9304</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2012</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpdkF1LwzAUhosoOKc_QLwpguBNZ09ykraXc8wPGChuXpc0TTXSL5NWnL_elI4JXiXhPO_hzeN55xDOAMLkZjNfr2YkBJgRYAlL8MCbAGNxECUED_d34MfeibUfYYiUI0y86laXun7rRekvv1tldKXqzvpN7c_7rqlEp6X_omTzpczWbwp_IVrdiVL_uImDRJ37z30tu358O-Ivt26Vku_-xojaSqPbzp56R4UorTrbnVPv9W65WTwEq6f7x8V8FUgk0AURRILnObA8IaFCyliRMUGoyIjkIDIBisYRYpbEyDGXjBLCI4SCkhARCZ161-Pe1jSfvbJdWmkrVVmKWjW9TSF0zoAnCA69_Id-NL2pXbs0gSimrgF3EIyQNI21RhVp60wJs3Wb0sF_OvhPB__pzr_LXO0WCytFWTgLUtt9kDCMKI-HAhcjp5VS-zEH92tE-gtnvo5B</recordid><startdate>20120201</startdate><enddate>20120201</enddate><creator>Batista, F.</creator><creator>Moniz, H.</creator><creator>Trancoso, I.</creator><creator>Mamede, N.</creator><general>IEEE</general><general>Institute of Electrical and Electronics Engineers</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>IQODW</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>20120201</creationdate><title>Bilingual Experiments on Automatic Recovery of Capitalization and Punctuation of Automatic Speech Transcripts</title><author>Batista, F. ; Moniz, H. ; Trancoso, I. ; Mamede, N.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c421t-717a6dd15d920e4355fb5a23ab2c61aba1e38744b98464dc53226741f32044423</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2012</creationdate><topic>Adaptation models</topic><topic>Alignment</topic><topic>Applied sciences</topic><topic>Automatic speech processing</topic><topic>Automatic speech recognition</topic><topic>capitalization</topic><topic>Error analysis</topic><topic>Exact sciences and technology</topic><topic>Hidden Markov models</topic><topic>Information, signal and communications theory</topic><topic>language dynamics</topic><topic>Linguistics</topic><topic>Miscellaneous</topic><topic>natural language processing</topic><topic>Plugs</topic><topic>punctuation marks</topic><topic>rich transcription</topic><topic>Signal processing</topic><topic>Speech</topic><topic>Speech processing</topic><topic>Tasks</topic><topic>Telecommunications and information theory</topic><topic>Texts</topic><topic>Training</topic><topic>Training data</topic><topic>Vocabulary</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Batista, F.</creatorcontrib><creatorcontrib>Moniz, H.</creatorcontrib><creatorcontrib>Trancoso, I.</creatorcontrib><creatorcontrib>Mamede, N.</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>Pascal-Francis</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE transactions on audio, speech, and language processing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Batista, F.</au><au>Moniz, H.</au><au>Trancoso, I.</au><au>Mamede, N.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Bilingual Experiments on Automatic Recovery of Capitalization and Punctuation of Automatic Speech Transcripts</atitle><jtitle>IEEE transactions on audio, speech, and language processing</jtitle><stitle>TASL</stitle><date>2012-02-01</date><risdate>2012</risdate><volume>20</volume><issue>2</issue><spage>474</spage><epage>485</epage><pages>474-485</pages><issn>1558-7916</issn><issn>2329-9290</issn><eissn>1558-7924</eissn><eissn>2329-9304</eissn><coden>ITASD8</coden><abstract>This paper focuses on the tasks of recovering capitalization and punctuation marks from texts without that information, such as spoken transcripts, produced by automatic speech recognition systems. These two practical rich transcription tasks were performed using the same discriminative approach, based on maximum entropy, suitable for on-the-fly usage. Reported experiments were conducted both over Portuguese and English broadcast news data. Both force aligned and automatic transcripts were used, allowing to measure the impact of the speech recognition errors. Capitalized words and named entities are intrinsically related, and are influenced by time variation effects. For that reason, the so-called language dynamics have been addressed for the capitalization task. Language adaptation results indicate, for both languages, that the capitalization performance is affected by the temporal distance between the training and testing data. In what regards the punctuation task, this paper covers the three most frequent punctuation marks: full stop, comma, and question marks. Different methods were explored for improving the baseline results for full stop and comma. The first uses punctuation information extracted from large written corpora. The second applies different levels of linguistic structure, including lexical, prosodic, and speaker related features. The comma detection improved significantly in the first method, thus indicating that it depends more on lexical features. The second method provided even better results, for both languages and both punctuation marks, best results being achieved mainly for full stop. As for question marks, there is a small gain, but differences are not very significant, due to the relatively small number of question marks in the corpora.</abstract><cop>Piscataway, NJ</cop><pub>IEEE</pub><doi>10.1109/TASL.2011.2159594</doi><tpages>12</tpages></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 1558-7916
ispartof IEEE transactions on audio, speech, and language processing, 2012-02, Vol.20 (2), p.474-485
issn 1558-7916
2329-9290
1558-7924
2329-9304
language eng
recordid cdi_ieee_primary_6135544
source IEEE Electronic Library (IEL)
subjects Adaptation models
Alignment
Applied sciences
Automatic speech processing
Automatic speech recognition
capitalization
Error analysis
Exact sciences and technology
Hidden Markov models
Information, signal and communications theory
language dynamics
Linguistics
Miscellaneous
natural language processing
Plugs
punctuation marks
rich transcription
Signal processing
Speech
Speech processing
Tasks
Telecommunications and information theory
Texts
Training
Training data
Vocabulary
title Bilingual Experiments on Automatic Recovery of Capitalization and Punctuation of Automatic Speech Transcripts
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-13T23%3A08%3A11IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Bilingual%20Experiments%20on%20Automatic%20Recovery%20of%20Capitalization%20and%20Punctuation%20of%20Automatic%20Speech%20Transcripts&rft.jtitle=IEEE%20transactions%20on%20audio,%20speech,%20and%20language%20processing&rft.au=Batista,%20F.&rft.date=2012-02-01&rft.volume=20&rft.issue=2&rft.spage=474&rft.epage=485&rft.pages=474-485&rft.issn=1558-7916&rft.eissn=1558-7924&rft.coden=ITASD8&rft_id=info:doi/10.1109/TASL.2011.2159594&rft_dat=%3Cproquest_RIE%3E2570326581%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=917839206&rft_id=info:pmid/&rft_ieee_id=6135544&rfr_iscdi=true