Who Needs Decoders? Efficient Estimation of Sequence-level Attributes

State-of-the-art sequence-to-sequence models often require autoregressive decoding, which can be highly expensive. However, for some downstream tasks such as out-of-distribution (OOD) detection and resource allocation, the actual decoding output is not needed just a scalar attribute of this sequence...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	arXiv.org 2023-05
Hauptverfasser:	Fathullah, Yassir, Radmard, Puria, Liusie, Adian, Gales, Mark J F
Format:	Artikel
Sprache:	eng
Schlagworte:	Automatic speech recognition Autoregressive models Decoders Decoding Machine translation Optimization Performance measurement Performance prediction Resource allocation
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title	arXiv.org
container_volume
creator	Fathullah, Yassir Radmard, Puria Liusie, Adian Gales, Mark J F
description	State-of-the-art sequence-to-sequence models often require autoregressive decoding, which can be highly expensive. However, for some downstream tasks such as out-of-distribution (OOD) detection and resource allocation, the actual decoding output is not needed just a scalar attribute of this sequence. In these scenarios, where for example knowing the quality of a system's output to predict poor performance prevails over knowing the output itself, is it possible to bypass the autoregressive decoding? We propose Non-Autoregressive Proxy (NAP) models that can efficiently predict general scalar-valued sequence-level attributes. Importantly, NAPs predict these metrics directly from the encodings, avoiding the expensive autoregressive decoding stage. We consider two sequence-to-sequence task: Machine Translation (MT); and Automatic Speech Recognition (ASR). In OOD for MT, NAPs outperform a deep ensemble while being significantly faster. NAPs are also shown to be able to predict performance metrics such as BERTScore (MT) or word error rate (ASR). For downstream tasks, such as data filtering and resource optimization, NAPs generate performance predictions that outperform predictive uncertainty while being highly inference efficient.
format	Article
fullrecord	<record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2811755988</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2811755988</sourcerecordid><originalsourceid>FETCH-proquest_journals_28117559883</originalsourceid><addsrcrecordid>eNqNi7EKwjAUAIMgWLT_EHAutImxcRLRiJOLgmOp7Qum1ETzXv1-O_gBTjfc3YQlQsoi0yshZixF7PI8F-tSKCUTZm6PwM8ALfIDNKGFiFturHWNA0_cILlnTS54Hiy_wHsA30DWwwd6viOK7j4Q4IJNbd0jpD_O2fJorvtT9ophXJCqLgzRj6oSuihKpTZay_-qL0jNOr8</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2811755988</pqid></control><display><type>article</type><title>Who Needs Decoders? Efficient Estimation of Sequence-level Attributes</title><source>Freely Accessible Journals</source><creator>Fathullah, Yassir ; Radmard, Puria ; Liusie, Adian ; Gales, Mark J F</creator><creatorcontrib>Fathullah, Yassir ; Radmard, Puria ; Liusie, Adian ; Gales, Mark J F</creatorcontrib><description>State-of-the-art sequence-to-sequence models often require autoregressive decoding, which can be highly expensive. However, for some downstream tasks such as out-of-distribution (OOD) detection and resource allocation, the actual decoding output is not needed just a scalar attribute of this sequence. In these scenarios, where for example knowing the quality of a system's output to predict poor performance prevails over knowing the output itself, is it possible to bypass the autoregressive decoding? We propose Non-Autoregressive Proxy (NAP) models that can efficiently predict general scalar-valued sequence-level attributes. Importantly, NAPs predict these metrics directly from the encodings, avoiding the expensive autoregressive decoding stage. We consider two sequence-to-sequence task: Machine Translation (MT); and Automatic Speech Recognition (ASR). In OOD for MT, NAPs outperform a deep ensemble while being significantly faster. NAPs are also shown to be able to predict performance metrics such as BERTScore (MT) or word error rate (ASR). For downstream tasks, such as data filtering and resource optimization, NAPs generate performance predictions that outperform predictive uncertainty while being highly inference efficient.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Automatic speech recognition ; Autoregressive models ; Decoders ; Decoding ; Machine translation ; Optimization ; Performance measurement ; Performance prediction ; Resource allocation</subject><ispartof>arXiv.org, 2023-05</ispartof><rights>2023. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>778,782</link.rule.ids></links><search><creatorcontrib>Fathullah, Yassir</creatorcontrib><creatorcontrib>Radmard, Puria</creatorcontrib><creatorcontrib>Liusie, Adian</creatorcontrib><creatorcontrib>Gales, Mark J F</creatorcontrib><title>Who Needs Decoders? Efficient Estimation of Sequence-level Attributes</title><title>arXiv.org</title><description>State-of-the-art sequence-to-sequence models often require autoregressive decoding, which can be highly expensive. However, for some downstream tasks such as out-of-distribution (OOD) detection and resource allocation, the actual decoding output is not needed just a scalar attribute of this sequence. In these scenarios, where for example knowing the quality of a system's output to predict poor performance prevails over knowing the output itself, is it possible to bypass the autoregressive decoding? We propose Non-Autoregressive Proxy (NAP) models that can efficiently predict general scalar-valued sequence-level attributes. Importantly, NAPs predict these metrics directly from the encodings, avoiding the expensive autoregressive decoding stage. We consider two sequence-to-sequence task: Machine Translation (MT); and Automatic Speech Recognition (ASR). In OOD for MT, NAPs outperform a deep ensemble while being significantly faster. NAPs are also shown to be able to predict performance metrics such as BERTScore (MT) or word error rate (ASR). For downstream tasks, such as data filtering and resource optimization, NAPs generate performance predictions that outperform predictive uncertainty while being highly inference efficient.</description><subject>Automatic speech recognition</subject><subject>Autoregressive models</subject><subject>Decoders</subject><subject>Decoding</subject><subject>Machine translation</subject><subject>Optimization</subject><subject>Performance measurement</subject><subject>Performance prediction</subject><subject>Resource allocation</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><recordid>eNqNi7EKwjAUAIMgWLT_EHAutImxcRLRiJOLgmOp7Qum1ETzXv1-O_gBTjfc3YQlQsoi0yshZixF7PI8F-tSKCUTZm6PwM8ALfIDNKGFiFturHWNA0_cILlnTS54Hiy_wHsA30DWwwd6viOK7j4Q4IJNbd0jpD_O2fJorvtT9ophXJCqLgzRj6oSuihKpTZay_-qL0jNOr8</recordid><startdate>20230509</startdate><enddate>20230509</enddate><creator>Fathullah, Yassir</creator><creator>Radmard, Puria</creator><creator>Liusie, Adian</creator><creator>Gales, Mark J F</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20230509</creationdate><title>Who Needs Decoders? Efficient Estimation of Sequence-level Attributes</title><author>Fathullah, Yassir ; Radmard, Puria ; Liusie, Adian ; Gales, Mark J F</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_28117559883</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Automatic speech recognition</topic><topic>Autoregressive models</topic><topic>Decoders</topic><topic>Decoding</topic><topic>Machine translation</topic><topic>Optimization</topic><topic>Performance measurement</topic><topic>Performance prediction</topic><topic>Resource allocation</topic><toplevel>online_resources</toplevel><creatorcontrib>Fathullah, Yassir</creatorcontrib><creatorcontrib>Radmard, Puria</creatorcontrib><creatorcontrib>Liusie, Adian</creatorcontrib><creatorcontrib>Gales, Mark J F</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Fathullah, Yassir</au><au>Radmard, Puria</au><au>Liusie, Adian</au><au>Gales, Mark J F</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Who Needs Decoders? Efficient Estimation of Sequence-level Attributes</atitle><jtitle>arXiv.org</jtitle><date>2023-05-09</date><risdate>2023</risdate><eissn>2331-8422</eissn><abstract>State-of-the-art sequence-to-sequence models often require autoregressive decoding, which can be highly expensive. However, for some downstream tasks such as out-of-distribution (OOD) detection and resource allocation, the actual decoding output is not needed just a scalar attribute of this sequence. In these scenarios, where for example knowing the quality of a system's output to predict poor performance prevails over knowing the output itself, is it possible to bypass the autoregressive decoding? We propose Non-Autoregressive Proxy (NAP) models that can efficiently predict general scalar-valued sequence-level attributes. Importantly, NAPs predict these metrics directly from the encodings, avoiding the expensive autoregressive decoding stage. We consider two sequence-to-sequence task: Machine Translation (MT); and Automatic Speech Recognition (ASR). In OOD for MT, NAPs outperform a deep ensemble while being significantly faster. NAPs are also shown to be able to predict performance metrics such as BERTScore (MT) or word error rate (ASR). For downstream tasks, such as data filtering and resource optimization, NAPs generate performance predictions that outperform predictive uncertainty while being highly inference efficient.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	EISSN: 2331-8422
ispartof	arXiv.org, 2023-05
issn	2331-8422
language	eng
recordid	cdi_proquest_journals_2811755988
source	Freely Accessible Journals
subjects	Automatic speech recognition Autoregressive models Decoders Decoding Machine translation Optimization Performance measurement Performance prediction Resource allocation
title	Who Needs Decoders? Efficient Estimation of Sequence-level Attributes
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-16T17%3A48%3A05IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Who%20Needs%20Decoders?%20Efficient%20Estimation%20of%20Sequence-level%20Attributes&rft.jtitle=arXiv.org&rft.au=Fathullah,%20Yassir&rft.date=2023-05-09&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2811755988%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2811755988&rft_id=info:pmid/&rfr_iscdi=true