Bidirectional Language Models Are Also Few-shot Learners

Large language models such as GPT-3 (Brown et al., 2020) can perform arbitrary tasks without undergoing fine-tuning after being prompted with only a few labeled examples. An arbitrary task can be reformulated as a natural language prompt, and a language model can be asked to generate the completion,...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Patel, Ajay, Li, Bryan, Rasooli, Mohammad Sadegh, Constant, Noah, Raffel, Colin, Callison-Burch, Chris
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Computation and Language Computer Science - Learning
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Patel, Ajay Li, Bryan Rasooli, Mohammad Sadegh Constant, Noah Raffel, Colin Callison-Burch, Chris
description	Large language models such as GPT-3 (Brown et al., 2020) can perform arbitrary tasks without undergoing fine-tuning after being prompted with only a few labeled examples. An arbitrary task can be reformulated as a natural language prompt, and a language model can be asked to generate the completion, indirectly performing the task in a paradigm known as prompt-based learning. To date, emergent prompt-based learning capabilities have mainly been demonstrated for unidirectional language models. However, bidirectional language models pre-trained on denoising objectives such as masked language modeling produce stronger learned representations for transfer learning. This motivates the possibility of prompting bidirectional models, but their pre-training objectives have made them largely incompatible with the existing prompting paradigm. We present SAP (Sequential Autoregressive Prompting), a technique that enables the prompting of bidirectional models. Utilizing the machine translation task as a case study, we prompt the bidirectional mT5 model (Xue et al., 2021) with SAP and demonstrate its few-shot and zero-shot translations outperform the few-shot translations of unidirectional models like GPT-3 and XGLM (Lin et al., 2021), despite mT5's approximately 50% fewer parameters. We further show SAP is effective on question answering and summarization. For the first time, our results demonstrate prompt-based learning is an emergent property of a broader class of language models, rather than only unidirectional models.
doi_str_mv	10.48550/arxiv.2209.14500
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2209_14500</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2209_14500</sourcerecordid><originalsourceid>FETCH-LOGICAL-a670-be9cb9cc1f43d90d5e9a54a1565a964fbc7c2eb290e5fde6ca00450e15f031b23</originalsourceid><addsrcrecordid>eNotj8tOwzAQAH3hgAofwAn_QMLa8ab1MVQUKqXi0nu0ttfFUkiQXV5_Dy2c5jaaEeJGQW1WiHBH-St91FqDrZVBgEuxuk8hZfbHNE80yp6mwzsdWO7mwGORXWbZjWWWG_6syst8lD1TnjiXK3ERaSx8_c-F2G8e9uunqn9-3K67vqJ2CZVj6531XkXTBAsB2RIaUtgi2dZE55des9MWGGPg1hPAbxgrjNAop5uFuP3TntOHt5xeKX8Pp4XhvND8AHD0QMw</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Bidirectional Language Models Are Also Few-shot Learners</title><source>arXiv.org</source><creator>Patel, Ajay ; Li, Bryan ; Rasooli, Mohammad Sadegh ; Constant, Noah ; Raffel, Colin ; Callison-Burch, Chris</creator><creatorcontrib>Patel, Ajay ; Li, Bryan ; Rasooli, Mohammad Sadegh ; Constant, Noah ; Raffel, Colin ; Callison-Burch, Chris</creatorcontrib><description>Large language models such as GPT-3 (Brown et al., 2020) can perform arbitrary tasks without undergoing fine-tuning after being prompted with only a few labeled examples. An arbitrary task can be reformulated as a natural language prompt, and a language model can be asked to generate the completion, indirectly performing the task in a paradigm known as prompt-based learning. To date, emergent prompt-based learning capabilities have mainly been demonstrated for unidirectional language models. However, bidirectional language models pre-trained on denoising objectives such as masked language modeling produce stronger learned representations for transfer learning. This motivates the possibility of prompting bidirectional models, but their pre-training objectives have made them largely incompatible with the existing prompting paradigm. We present SAP (Sequential Autoregressive Prompting), a technique that enables the prompting of bidirectional models. Utilizing the machine translation task as a case study, we prompt the bidirectional mT5 model (Xue et al., 2021) with SAP and demonstrate its few-shot and zero-shot translations outperform the few-shot translations of unidirectional models like GPT-3 and XGLM (Lin et al., 2021), despite mT5's approximately 50% fewer parameters. We further show SAP is effective on question answering and summarization. For the first time, our results demonstrate prompt-based learning is an emergent property of a broader class of language models, rather than only unidirectional models.</description><identifier>DOI: 10.48550/arxiv.2209.14500</identifier><language>eng</language><subject>Computer Science - Computation and Language ; Computer Science - Learning</subject><creationdate>2022-09</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2209.14500$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2209.14500$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Patel, Ajay</creatorcontrib><creatorcontrib>Li, Bryan</creatorcontrib><creatorcontrib>Rasooli, Mohammad Sadegh</creatorcontrib><creatorcontrib>Constant, Noah</creatorcontrib><creatorcontrib>Raffel, Colin</creatorcontrib><creatorcontrib>Callison-Burch, Chris</creatorcontrib><title>Bidirectional Language Models Are Also Few-shot Learners</title><description>Large language models such as GPT-3 (Brown et al., 2020) can perform arbitrary tasks without undergoing fine-tuning after being prompted with only a few labeled examples. An arbitrary task can be reformulated as a natural language prompt, and a language model can be asked to generate the completion, indirectly performing the task in a paradigm known as prompt-based learning. To date, emergent prompt-based learning capabilities have mainly been demonstrated for unidirectional language models. However, bidirectional language models pre-trained on denoising objectives such as masked language modeling produce stronger learned representations for transfer learning. This motivates the possibility of prompting bidirectional models, but their pre-training objectives have made them largely incompatible with the existing prompting paradigm. We present SAP (Sequential Autoregressive Prompting), a technique that enables the prompting of bidirectional models. Utilizing the machine translation task as a case study, we prompt the bidirectional mT5 model (Xue et al., 2021) with SAP and demonstrate its few-shot and zero-shot translations outperform the few-shot translations of unidirectional models like GPT-3 and XGLM (Lin et al., 2021), despite mT5's approximately 50% fewer parameters. We further show SAP is effective on question answering and summarization. For the first time, our results demonstrate prompt-based learning is an emergent property of a broader class of language models, rather than only unidirectional models.</description><subject>Computer Science - Computation and Language</subject><subject>Computer Science - Learning</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotj8tOwzAQAH3hgAofwAn_QMLa8ab1MVQUKqXi0nu0ttfFUkiQXV5_Dy2c5jaaEeJGQW1WiHBH-St91FqDrZVBgEuxuk8hZfbHNE80yp6mwzsdWO7mwGORXWbZjWWWG_6syst8lD1TnjiXK3ERaSx8_c-F2G8e9uunqn9-3K67vqJ2CZVj6531XkXTBAsB2RIaUtgi2dZE55des9MWGGPg1hPAbxgrjNAop5uFuP3TntOHt5xeKX8Pp4XhvND8AHD0QMw</recordid><startdate>20220928</startdate><enddate>20220928</enddate><creator>Patel, Ajay</creator><creator>Li, Bryan</creator><creator>Rasooli, Mohammad Sadegh</creator><creator>Constant, Noah</creator><creator>Raffel, Colin</creator><creator>Callison-Burch, Chris</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20220928</creationdate><title>Bidirectional Language Models Are Also Few-shot Learners</title><author>Patel, Ajay ; Li, Bryan ; Rasooli, Mohammad Sadegh ; Constant, Noah ; Raffel, Colin ; Callison-Burch, Chris</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a670-be9cb9cc1f43d90d5e9a54a1565a964fbc7c2eb290e5fde6ca00450e15f031b23</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Computer Science - Computation and Language</topic><topic>Computer Science - Learning</topic><toplevel>online_resources</toplevel><creatorcontrib>Patel, Ajay</creatorcontrib><creatorcontrib>Li, Bryan</creatorcontrib><creatorcontrib>Rasooli, Mohammad Sadegh</creatorcontrib><creatorcontrib>Constant, Noah</creatorcontrib><creatorcontrib>Raffel, Colin</creatorcontrib><creatorcontrib>Callison-Burch, Chris</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Patel, Ajay</au><au>Li, Bryan</au><au>Rasooli, Mohammad Sadegh</au><au>Constant, Noah</au><au>Raffel, Colin</au><au>Callison-Burch, Chris</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Bidirectional Language Models Are Also Few-shot Learners</atitle><date>2022-09-28</date><risdate>2022</risdate><abstract>Large language models such as GPT-3 (Brown et al., 2020) can perform arbitrary tasks without undergoing fine-tuning after being prompted with only a few labeled examples. An arbitrary task can be reformulated as a natural language prompt, and a language model can be asked to generate the completion, indirectly performing the task in a paradigm known as prompt-based learning. To date, emergent prompt-based learning capabilities have mainly been demonstrated for unidirectional language models. However, bidirectional language models pre-trained on denoising objectives such as masked language modeling produce stronger learned representations for transfer learning. This motivates the possibility of prompting bidirectional models, but their pre-training objectives have made them largely incompatible with the existing prompting paradigm. We present SAP (Sequential Autoregressive Prompting), a technique that enables the prompting of bidirectional models. Utilizing the machine translation task as a case study, we prompt the bidirectional mT5 model (Xue et al., 2021) with SAP and demonstrate its few-shot and zero-shot translations outperform the few-shot translations of unidirectional models like GPT-3 and XGLM (Lin et al., 2021), despite mT5's approximately 50% fewer parameters. We further show SAP is effective on question answering and summarization. For the first time, our results demonstrate prompt-based learning is an emergent property of a broader class of language models, rather than only unidirectional models.</abstract><doi>10.48550/arxiv.2209.14500</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2209.14500
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2209_14500
source	arXiv.org
subjects	Computer Science - Computation and Language Computer Science - Learning
title	Bidirectional Language Models Are Also Few-shot Learners
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-03T01%3A44%3A55IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Bidirectional%20Language%20Models%20Are%20Also%20Few-shot%20Learners&rft.au=Patel,%20Ajay&rft.date=2022-09-28&rft_id=info:doi/10.48550/arxiv.2209.14500&rft_dat=%3Carxiv_GOX%3E2209_14500%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true