Who Needs Decoders? Efficient Estimation of Sequence-level Attributes
State-of-the-art sequence-to-sequence models often require autoregressive decoding, which can be highly expensive. However, for some downstream tasks such as out-of-distribution (OOD) detection and resource allocation, the actual decoding output is not needed just a scalar attribute of this sequence...
Gespeichert in:
Veröffentlicht in: | arXiv.org 2023-05 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | arXiv.org |
container_volume | |
creator | Fathullah, Yassir Radmard, Puria Liusie, Adian Gales, Mark J F |
description | State-of-the-art sequence-to-sequence models often require autoregressive decoding, which can be highly expensive. However, for some downstream tasks such as out-of-distribution (OOD) detection and resource allocation, the actual decoding output is not needed just a scalar attribute of this sequence. In these scenarios, where for example knowing the quality of a system's output to predict poor performance prevails over knowing the output itself, is it possible to bypass the autoregressive decoding? We propose Non-Autoregressive Proxy (NAP) models that can efficiently predict general scalar-valued sequence-level attributes. Importantly, NAPs predict these metrics directly from the encodings, avoiding the expensive autoregressive decoding stage. We consider two sequence-to-sequence task: Machine Translation (MT); and Automatic Speech Recognition (ASR). In OOD for MT, NAPs outperform a deep ensemble while being significantly faster. NAPs are also shown to be able to predict performance metrics such as BERTScore (MT) or word error rate (ASR). For downstream tasks, such as data filtering and resource optimization, NAPs generate performance predictions that outperform predictive uncertainty while being highly inference efficient. |
format | Article |
fullrecord | <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2811755988</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2811755988</sourcerecordid><originalsourceid>FETCH-proquest_journals_28117559883</originalsourceid><addsrcrecordid>eNqNi7EKwjAUAIMgWLT_EHAutImxcRLRiJOLgmOp7Qum1ETzXv1-O_gBTjfc3YQlQsoi0yshZixF7PI8F-tSKCUTZm6PwM8ALfIDNKGFiFturHWNA0_cILlnTS54Hiy_wHsA30DWwwd6viOK7j4Q4IJNbd0jpD_O2fJorvtT9ophXJCqLgzRj6oSuihKpTZay_-qL0jNOr8</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2811755988</pqid></control><display><type>article</type><title>Who Needs Decoders? Efficient Estimation of Sequence-level Attributes</title><source>Freely Accessible Journals</source><creator>Fathullah, Yassir ; Radmard, Puria ; Liusie, Adian ; Gales, Mark J F</creator><creatorcontrib>Fathullah, Yassir ; Radmard, Puria ; Liusie, Adian ; Gales, Mark J F</creatorcontrib><description>State-of-the-art sequence-to-sequence models often require autoregressive decoding, which can be highly expensive. However, for some downstream tasks such as out-of-distribution (OOD) detection and resource allocation, the actual decoding output is not needed just a scalar attribute of this sequence. In these scenarios, where for example knowing the quality of a system's output to predict poor performance prevails over knowing the output itself, is it possible to bypass the autoregressive decoding? We propose Non-Autoregressive Proxy (NAP) models that can efficiently predict general scalar-valued sequence-level attributes. Importantly, NAPs predict these metrics directly from the encodings, avoiding the expensive autoregressive decoding stage. We consider two sequence-to-sequence task: Machine Translation (MT); and Automatic Speech Recognition (ASR). In OOD for MT, NAPs outperform a deep ensemble while being significantly faster. NAPs are also shown to be able to predict performance metrics such as BERTScore (MT) or word error rate (ASR). For downstream tasks, such as data filtering and resource optimization, NAPs generate performance predictions that outperform predictive uncertainty while being highly inference efficient.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Automatic speech recognition ; Autoregressive models ; Decoders ; Decoding ; Machine translation ; Optimization ; Performance measurement ; Performance prediction ; Resource allocation</subject><ispartof>arXiv.org, 2023-05</ispartof><rights>2023. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>778,782</link.rule.ids></links><search><creatorcontrib>Fathullah, Yassir</creatorcontrib><creatorcontrib>Radmard, Puria</creatorcontrib><creatorcontrib>Liusie, Adian</creatorcontrib><creatorcontrib>Gales, Mark J F</creatorcontrib><title>Who Needs Decoders? Efficient Estimation of Sequence-level Attributes</title><title>arXiv.org</title><description>State-of-the-art sequence-to-sequence models often require autoregressive decoding, which can be highly expensive. However, for some downstream tasks such as out-of-distribution (OOD) detection and resource allocation, the actual decoding output is not needed just a scalar attribute of this sequence. In these scenarios, where for example knowing the quality of a system's output to predict poor performance prevails over knowing the output itself, is it possible to bypass the autoregressive decoding? We propose Non-Autoregressive Proxy (NAP) models that can efficiently predict general scalar-valued sequence-level attributes. Importantly, NAPs predict these metrics directly from the encodings, avoiding the expensive autoregressive decoding stage. We consider two sequence-to-sequence task: Machine Translation (MT); and Automatic Speech Recognition (ASR). In OOD for MT, NAPs outperform a deep ensemble while being significantly faster. NAPs are also shown to be able to predict performance metrics such as BERTScore (MT) or word error rate (ASR). For downstream tasks, such as data filtering and resource optimization, NAPs generate performance predictions that outperform predictive uncertainty while being highly inference efficient.</description><subject>Automatic speech recognition</subject><subject>Autoregressive models</subject><subject>Decoders</subject><subject>Decoding</subject><subject>Machine translation</subject><subject>Optimization</subject><subject>Performance measurement</subject><subject>Performance prediction</subject><subject>Resource allocation</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><recordid>eNqNi7EKwjAUAIMgWLT_EHAutImxcRLRiJOLgmOp7Qum1ETzXv1-O_gBTjfc3YQlQsoi0yshZixF7PI8F-tSKCUTZm6PwM8ALfIDNKGFiFturHWNA0_cILlnTS54Hiy_wHsA30DWwwd6viOK7j4Q4IJNbd0jpD_O2fJorvtT9ophXJCqLgzRj6oSuihKpTZay_-qL0jNOr8</recordid><startdate>20230509</startdate><enddate>20230509</enddate><creator>Fathullah, Yassir</creator><creator>Radmard, Puria</creator><creator>Liusie, Adian</creator><creator>Gales, Mark J F</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20230509</creationdate><title>Who Needs Decoders? Efficient Estimation of Sequence-level Attributes</title><author>Fathullah, Yassir ; Radmard, Puria ; Liusie, Adian ; Gales, Mark J F</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_28117559883</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Automatic speech recognition</topic><topic>Autoregressive models</topic><topic>Decoders</topic><topic>Decoding</topic><topic>Machine translation</topic><topic>Optimization</topic><topic>Performance measurement</topic><topic>Performance prediction</topic><topic>Resource allocation</topic><toplevel>online_resources</toplevel><creatorcontrib>Fathullah, Yassir</creatorcontrib><creatorcontrib>Radmard, Puria</creatorcontrib><creatorcontrib>Liusie, Adian</creatorcontrib><creatorcontrib>Gales, Mark J F</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Fathullah, Yassir</au><au>Radmard, Puria</au><au>Liusie, Adian</au><au>Gales, Mark J F</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Who Needs Decoders? Efficient Estimation of Sequence-level Attributes</atitle><jtitle>arXiv.org</jtitle><date>2023-05-09</date><risdate>2023</risdate><eissn>2331-8422</eissn><abstract>State-of-the-art sequence-to-sequence models often require autoregressive decoding, which can be highly expensive. However, for some downstream tasks such as out-of-distribution (OOD) detection and resource allocation, the actual decoding output is not needed just a scalar attribute of this sequence. In these scenarios, where for example knowing the quality of a system's output to predict poor performance prevails over knowing the output itself, is it possible to bypass the autoregressive decoding? We propose Non-Autoregressive Proxy (NAP) models that can efficiently predict general scalar-valued sequence-level attributes. Importantly, NAPs predict these metrics directly from the encodings, avoiding the expensive autoregressive decoding stage. We consider two sequence-to-sequence task: Machine Translation (MT); and Automatic Speech Recognition (ASR). In OOD for MT, NAPs outperform a deep ensemble while being significantly faster. NAPs are also shown to be able to predict performance metrics such as BERTScore (MT) or word error rate (ASR). For downstream tasks, such as data filtering and resource optimization, NAPs generate performance predictions that outperform predictive uncertainty while being highly inference efficient.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | EISSN: 2331-8422 |
ispartof | arXiv.org, 2023-05 |
issn | 2331-8422 |
language | eng |
recordid | cdi_proquest_journals_2811755988 |
source | Freely Accessible Journals |
subjects | Automatic speech recognition Autoregressive models Decoders Decoding Machine translation Optimization Performance measurement Performance prediction Resource allocation |
title | Who Needs Decoders? Efficient Estimation of Sequence-level Attributes |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-16T17%3A48%3A05IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Who%20Needs%20Decoders?%20Efficient%20Estimation%20of%20Sequence-level%20Attributes&rft.jtitle=arXiv.org&rft.au=Fathullah,%20Yassir&rft.date=2023-05-09&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2811755988%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2811755988&rft_id=info:pmid/&rfr_iscdi=true |