Who Needs Decoders? Efficient Estimation of Sequence-level Attributes

State-of-the-art sequence-to-sequence models often require autoregressive decoding, which can be highly expensive. However, for some downstream tasks such as out-of-distribution (OOD) detection and resource allocation, the actual decoding output is not needed just a scalar attribute of this sequence...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:arXiv.org 2023-05
Hauptverfasser: Fathullah, Yassir, Radmard, Puria, Liusie, Adian, Gales, Mark J F
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title arXiv.org
container_volume
creator Fathullah, Yassir
Radmard, Puria
Liusie, Adian
Gales, Mark J F
description State-of-the-art sequence-to-sequence models often require autoregressive decoding, which can be highly expensive. However, for some downstream tasks such as out-of-distribution (OOD) detection and resource allocation, the actual decoding output is not needed just a scalar attribute of this sequence. In these scenarios, where for example knowing the quality of a system's output to predict poor performance prevails over knowing the output itself, is it possible to bypass the autoregressive decoding? We propose Non-Autoregressive Proxy (NAP) models that can efficiently predict general scalar-valued sequence-level attributes. Importantly, NAPs predict these metrics directly from the encodings, avoiding the expensive autoregressive decoding stage. We consider two sequence-to-sequence task: Machine Translation (MT); and Automatic Speech Recognition (ASR). In OOD for MT, NAPs outperform a deep ensemble while being significantly faster. NAPs are also shown to be able to predict performance metrics such as BERTScore (MT) or word error rate (ASR). For downstream tasks, such as data filtering and resource optimization, NAPs generate performance predictions that outperform predictive uncertainty while being highly inference efficient.
format Article
fullrecord <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2811755988</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2811755988</sourcerecordid><originalsourceid>FETCH-proquest_journals_28117559883</originalsourceid><addsrcrecordid>eNqNi7EKwjAUAIMgWLT_EHAutImxcRLRiJOLgmOp7Qum1ETzXv1-O_gBTjfc3YQlQsoi0yshZixF7PI8F-tSKCUTZm6PwM8ALfIDNKGFiFturHWNA0_cILlnTS54Hiy_wHsA30DWwwd6viOK7j4Q4IJNbd0jpD_O2fJorvtT9ophXJCqLgzRj6oSuihKpTZay_-qL0jNOr8</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2811755988</pqid></control><display><type>article</type><title>Who Needs Decoders? Efficient Estimation of Sequence-level Attributes</title><source>Freely Accessible Journals</source><creator>Fathullah, Yassir ; Radmard, Puria ; Liusie, Adian ; Gales, Mark J F</creator><creatorcontrib>Fathullah, Yassir ; Radmard, Puria ; Liusie, Adian ; Gales, Mark J F</creatorcontrib><description>State-of-the-art sequence-to-sequence models often require autoregressive decoding, which can be highly expensive. However, for some downstream tasks such as out-of-distribution (OOD) detection and resource allocation, the actual decoding output is not needed just a scalar attribute of this sequence. In these scenarios, where for example knowing the quality of a system's output to predict poor performance prevails over knowing the output itself, is it possible to bypass the autoregressive decoding? We propose Non-Autoregressive Proxy (NAP) models that can efficiently predict general scalar-valued sequence-level attributes. Importantly, NAPs predict these metrics directly from the encodings, avoiding the expensive autoregressive decoding stage. We consider two sequence-to-sequence task: Machine Translation (MT); and Automatic Speech Recognition (ASR). In OOD for MT, NAPs outperform a deep ensemble while being significantly faster. NAPs are also shown to be able to predict performance metrics such as BERTScore (MT) or word error rate (ASR). For downstream tasks, such as data filtering and resource optimization, NAPs generate performance predictions that outperform predictive uncertainty while being highly inference efficient.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Automatic speech recognition ; Autoregressive models ; Decoders ; Decoding ; Machine translation ; Optimization ; Performance measurement ; Performance prediction ; Resource allocation</subject><ispartof>arXiv.org, 2023-05</ispartof><rights>2023. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>778,782</link.rule.ids></links><search><creatorcontrib>Fathullah, Yassir</creatorcontrib><creatorcontrib>Radmard, Puria</creatorcontrib><creatorcontrib>Liusie, Adian</creatorcontrib><creatorcontrib>Gales, Mark J F</creatorcontrib><title>Who Needs Decoders? Efficient Estimation of Sequence-level Attributes</title><title>arXiv.org</title><description>State-of-the-art sequence-to-sequence models often require autoregressive decoding, which can be highly expensive. However, for some downstream tasks such as out-of-distribution (OOD) detection and resource allocation, the actual decoding output is not needed just a scalar attribute of this sequence. In these scenarios, where for example knowing the quality of a system's output to predict poor performance prevails over knowing the output itself, is it possible to bypass the autoregressive decoding? We propose Non-Autoregressive Proxy (NAP) models that can efficiently predict general scalar-valued sequence-level attributes. Importantly, NAPs predict these metrics directly from the encodings, avoiding the expensive autoregressive decoding stage. We consider two sequence-to-sequence task: Machine Translation (MT); and Automatic Speech Recognition (ASR). In OOD for MT, NAPs outperform a deep ensemble while being significantly faster. NAPs are also shown to be able to predict performance metrics such as BERTScore (MT) or word error rate (ASR). For downstream tasks, such as data filtering and resource optimization, NAPs generate performance predictions that outperform predictive uncertainty while being highly inference efficient.</description><subject>Automatic speech recognition</subject><subject>Autoregressive models</subject><subject>Decoders</subject><subject>Decoding</subject><subject>Machine translation</subject><subject>Optimization</subject><subject>Performance measurement</subject><subject>Performance prediction</subject><subject>Resource allocation</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><recordid>eNqNi7EKwjAUAIMgWLT_EHAutImxcRLRiJOLgmOp7Qum1ETzXv1-O_gBTjfc3YQlQsoi0yshZixF7PI8F-tSKCUTZm6PwM8ALfIDNKGFiFturHWNA0_cILlnTS54Hiy_wHsA30DWwwd6viOK7j4Q4IJNbd0jpD_O2fJorvtT9ophXJCqLgzRj6oSuihKpTZay_-qL0jNOr8</recordid><startdate>20230509</startdate><enddate>20230509</enddate><creator>Fathullah, Yassir</creator><creator>Radmard, Puria</creator><creator>Liusie, Adian</creator><creator>Gales, Mark J F</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20230509</creationdate><title>Who Needs Decoders? Efficient Estimation of Sequence-level Attributes</title><author>Fathullah, Yassir ; Radmard, Puria ; Liusie, Adian ; Gales, Mark J F</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_28117559883</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Automatic speech recognition</topic><topic>Autoregressive models</topic><topic>Decoders</topic><topic>Decoding</topic><topic>Machine translation</topic><topic>Optimization</topic><topic>Performance measurement</topic><topic>Performance prediction</topic><topic>Resource allocation</topic><toplevel>online_resources</toplevel><creatorcontrib>Fathullah, Yassir</creatorcontrib><creatorcontrib>Radmard, Puria</creatorcontrib><creatorcontrib>Liusie, Adian</creatorcontrib><creatorcontrib>Gales, Mark J F</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science &amp; Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Fathullah, Yassir</au><au>Radmard, Puria</au><au>Liusie, Adian</au><au>Gales, Mark J F</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Who Needs Decoders? Efficient Estimation of Sequence-level Attributes</atitle><jtitle>arXiv.org</jtitle><date>2023-05-09</date><risdate>2023</risdate><eissn>2331-8422</eissn><abstract>State-of-the-art sequence-to-sequence models often require autoregressive decoding, which can be highly expensive. However, for some downstream tasks such as out-of-distribution (OOD) detection and resource allocation, the actual decoding output is not needed just a scalar attribute of this sequence. In these scenarios, where for example knowing the quality of a system's output to predict poor performance prevails over knowing the output itself, is it possible to bypass the autoregressive decoding? We propose Non-Autoregressive Proxy (NAP) models that can efficiently predict general scalar-valued sequence-level attributes. Importantly, NAPs predict these metrics directly from the encodings, avoiding the expensive autoregressive decoding stage. We consider two sequence-to-sequence task: Machine Translation (MT); and Automatic Speech Recognition (ASR). In OOD for MT, NAPs outperform a deep ensemble while being significantly faster. NAPs are also shown to be able to predict performance metrics such as BERTScore (MT) or word error rate (ASR). For downstream tasks, such as data filtering and resource optimization, NAPs generate performance predictions that outperform predictive uncertainty while being highly inference efficient.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier EISSN: 2331-8422
ispartof arXiv.org, 2023-05
issn 2331-8422
language eng
recordid cdi_proquest_journals_2811755988
source Freely Accessible Journals
subjects Automatic speech recognition
Autoregressive models
Decoders
Decoding
Machine translation
Optimization
Performance measurement
Performance prediction
Resource allocation
title Who Needs Decoders? Efficient Estimation of Sequence-level Attributes
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-16T17%3A48%3A05IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Who%20Needs%20Decoders?%20Efficient%20Estimation%20of%20Sequence-level%20Attributes&rft.jtitle=arXiv.org&rft.au=Fathullah,%20Yassir&rft.date=2023-05-09&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2811755988%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2811755988&rft_id=info:pmid/&rfr_iscdi=true