Language Models in the Loop: Incorporating Prompting into Weak Supervision

We propose a new strategy for applying large pre-trained language models to novel tasks when labeled training data is limited. Rather than apply the model in a typical zero-shot or few-shot fashion, we treat the model as the basis for labeling functions in a weak supervision framework. To create a c...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	ACM / IMS journal of data science 2024-06, Vol.1 (2), p.1-30
Hauptverfasser:	Smith, Ryan, Fries, Jason A., Hancock, Braden, Bach, Stephen H.
Format:	Artikel
Sprache:	eng
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	30
container_issue	2
container_start_page	1
container_title	ACM / IMS journal of data science
container_volume	1
creator	Smith, Ryan Fries, Jason A. Hancock, Braden Bach, Stephen H.
description	We propose a new strategy for applying large pre-trained language models to novel tasks when labeled training data is limited. Rather than apply the model in a typical zero-shot or few-shot fashion, we treat the model as the basis for labeling functions in a weak supervision framework. To create a classifier, we first prompt the model to answer multiple distinct queries about an example and define how the possible responses should be mapped to votes for labels and abstentions. We then denoise these noisy label sources using the Snorkel system and train an end classifier with the resulting training data. Our experimental evaluation shows that prompting large language models within a weak supervision framework can provide significant gains in accuracy. On the WRENCH weak supervision benchmark, this approach can significantly improve over zero-shot performance, an average 19.5% reduction in errors. We also find that this approach produces classifiers with comparable or superior accuracy to those trained from hand-engineered rules. Problem statement The goal of this paper is to use large language models to create smaller, specialized models. These specialized models can be better suited to specific tasks because they are tuned for them and are less expensive to serve in production. Existing approaches create training data for specialized models by prompting large language models in a zero-shot fashion, i.e., they instruct the language model to solve the task of interest and treat the responses as ground truth. This approach can be unreliable when the language model has noisy outputs and is sensitive to the wording of the prompt. Methods We address the problems of noisy outputs and prompt sensitivity by proposing a new strategy. We treat large language models as the basis for labeling functions in a weak supervision framework. To create a classifier, we first prompt the model to answer multiple distinct queries about an example and define how the possible responses should be mapped to votes for labels and abstentions. We then denoise these noisy label sources using the Snorkel system and train an end classifier with the resulting training data. Results Our experimental evaluation shows that prompting large language models within a weak supervision framework can provide significant gains in accuracy. On the WRENCH weak supervision benchmark, this approach can significantly improve over zero-shot performance, an average 19.5% reduction in errors. We also find tha
doi_str_mv	10.1145/3617130
format	Article
fullrecord	<record><control><sourceid>crossref</sourceid><recordid>TN_cdi_crossref_primary_10_1145_3617130</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>10_1145_3617130</sourcerecordid><originalsourceid>FETCH-LOGICAL-c650-5a7b48a0943bd92ebb67bbbfae97facb892be9d927fbcc7a6c284dbb485e9e5e3</originalsourceid><addsrcrecordid>eNpNkFFLwzAUhYMoOObwL-TNp2rStE3jmwx1k4qCAx_LveltjW5JSTbBf-_UPfh0Pjic8_Axdi7FpZRFeaUqqaUSR2yS10pmSpri-B-fsllK70KI3BgptZqwhwb8sIOB-GPoaJ2483z7RrwJYbzmS29DHEOErfMDf45hM_6S89vAXwk--MtupPjpkgv-jJ30sE40O-SUre5uV_NF1jzdL-c3TWarUmQlaCxqEKZQ2JmcECuNiD2Q0T1YrE2OZPaN7tFaDZXN66LD_aYkQyWpKbv4u7UxpBSpb8foNhC_WinaHwntQYL6BieVT3g</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Language Models in the Loop: Incorporating Prompting into Weak Supervision</title><source>Alma/SFX Local Collection</source><creator>Smith, Ryan ; Fries, Jason A. ; Hancock, Braden ; Bach, Stephen H.</creator><creatorcontrib>Smith, Ryan ; Fries, Jason A. ; Hancock, Braden ; Bach, Stephen H.</creatorcontrib><description>We propose a new strategy for applying large pre-trained language models to novel tasks when labeled training data is limited. Rather than apply the model in a typical zero-shot or few-shot fashion, we treat the model as the basis for labeling functions in a weak supervision framework. To create a classifier, we first prompt the model to answer multiple distinct queries about an example and define how the possible responses should be mapped to votes for labels and abstentions. We then denoise these noisy label sources using the Snorkel system and train an end classifier with the resulting training data. Our experimental evaluation shows that prompting large language models within a weak supervision framework can provide significant gains in accuracy. On the WRENCH weak supervision benchmark, this approach can significantly improve over zero-shot performance, an average 19.5% reduction in errors. We also find that this approach produces classifiers with comparable or superior accuracy to those trained from hand-engineered rules. Problem statement The goal of this paper is to use large language models to create smaller, specialized models. These specialized models can be better suited to specific tasks because they are tuned for them and are less expensive to serve in production. Existing approaches create training data for specialized models by prompting large language models in a zero-shot fashion, i.e., they instruct the language model to solve the task of interest and treat the responses as ground truth. This approach can be unreliable when the language model has noisy outputs and is sensitive to the wording of the prompt. Methods We address the problems of noisy outputs and prompt sensitivity by proposing a new strategy. We treat large language models as the basis for labeling functions in a weak supervision framework. To create a classifier, we first prompt the model to answer multiple distinct queries about an example and define how the possible responses should be mapped to votes for labels and abstentions. We then denoise these noisy label sources using the Snorkel system and train an end classifier with the resulting training data. Results Our experimental evaluation shows that prompting large language models within a weak supervision framework can provide significant gains in accuracy. On the WRENCH weak supervision benchmark, this approach can significantly improve over zero-shot performance, an average 19.5% reduction in errors. We also find that this approach produces classifiers with comparable or superior accuracy to those trained from hand-engineered rules. Significance Large language models are increasingly the starting point in many areas of machine learning. Incorporating prompting into weak supervision can enable users to more easily and accurately adapt them to specialized tasks.</description><identifier>ISSN: 2831-3194</identifier><identifier>EISSN: 2831-3194</identifier><identifier>DOI: 10.1145/3617130</identifier><language>eng</language><ispartof>ACM / IMS journal of data science, 2024-06, Vol.1 (2), p.1-30</ispartof><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c650-5a7b48a0943bd92ebb67bbbfae97facb892be9d927fbcc7a6c284dbb485e9e5e3</cites><orcidid>0000-0001-9316-5768 ; 0000-0003-3857-3560 ; 0000-0003-3167-1053 ; 0009-0009-6434-0737</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27924,27925</link.rule.ids></links><search><creatorcontrib>Smith, Ryan</creatorcontrib><creatorcontrib>Fries, Jason A.</creatorcontrib><creatorcontrib>Hancock, Braden</creatorcontrib><creatorcontrib>Bach, Stephen H.</creatorcontrib><title>Language Models in the Loop: Incorporating Prompting into Weak Supervision</title><title>ACM / IMS journal of data science</title><description>We propose a new strategy for applying large pre-trained language models to novel tasks when labeled training data is limited. Rather than apply the model in a typical zero-shot or few-shot fashion, we treat the model as the basis for labeling functions in a weak supervision framework. To create a classifier, we first prompt the model to answer multiple distinct queries about an example and define how the possible responses should be mapped to votes for labels and abstentions. We then denoise these noisy label sources using the Snorkel system and train an end classifier with the resulting training data. Our experimental evaluation shows that prompting large language models within a weak supervision framework can provide significant gains in accuracy. On the WRENCH weak supervision benchmark, this approach can significantly improve over zero-shot performance, an average 19.5% reduction in errors. We also find that this approach produces classifiers with comparable or superior accuracy to those trained from hand-engineered rules. Problem statement The goal of this paper is to use large language models to create smaller, specialized models. These specialized models can be better suited to specific tasks because they are tuned for them and are less expensive to serve in production. Existing approaches create training data for specialized models by prompting large language models in a zero-shot fashion, i.e., they instruct the language model to solve the task of interest and treat the responses as ground truth. This approach can be unreliable when the language model has noisy outputs and is sensitive to the wording of the prompt. Methods We address the problems of noisy outputs and prompt sensitivity by proposing a new strategy. We treat large language models as the basis for labeling functions in a weak supervision framework. To create a classifier, we first prompt the model to answer multiple distinct queries about an example and define how the possible responses should be mapped to votes for labels and abstentions. We then denoise these noisy label sources using the Snorkel system and train an end classifier with the resulting training data. Results Our experimental evaluation shows that prompting large language models within a weak supervision framework can provide significant gains in accuracy. On the WRENCH weak supervision benchmark, this approach can significantly improve over zero-shot performance, an average 19.5% reduction in errors. We also find that this approach produces classifiers with comparable or superior accuracy to those trained from hand-engineered rules. Significance Large language models are increasingly the starting point in many areas of machine learning. Incorporating prompting into weak supervision can enable users to more easily and accurately adapt them to specialized tasks.</description><issn>2831-3194</issn><issn>2831-3194</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><recordid>eNpNkFFLwzAUhYMoOObwL-TNp2rStE3jmwx1k4qCAx_LveltjW5JSTbBf-_UPfh0Pjic8_Axdi7FpZRFeaUqqaUSR2yS10pmSpri-B-fsllK70KI3BgptZqwhwb8sIOB-GPoaJ2483z7RrwJYbzmS29DHEOErfMDf45hM_6S89vAXwk--MtupPjpkgv-jJ30sE40O-SUre5uV_NF1jzdL-c3TWarUmQlaCxqEKZQ2JmcECuNiD2Q0T1YrE2OZPaN7tFaDZXN66LD_aYkQyWpKbv4u7UxpBSpb8foNhC_WinaHwntQYL6BieVT3g</recordid><startdate>20240630</startdate><enddate>20240630</enddate><creator>Smith, Ryan</creator><creator>Fries, Jason A.</creator><creator>Hancock, Braden</creator><creator>Bach, Stephen H.</creator><scope>AAYXX</scope><scope>CITATION</scope><orcidid>https://orcid.org/0000-0001-9316-5768</orcidid><orcidid>https://orcid.org/0000-0003-3857-3560</orcidid><orcidid>https://orcid.org/0000-0003-3167-1053</orcidid><orcidid>https://orcid.org/0009-0009-6434-0737</orcidid></search><sort><creationdate>20240630</creationdate><title>Language Models in the Loop: Incorporating Prompting into Weak Supervision</title><author>Smith, Ryan ; Fries, Jason A. ; Hancock, Braden ; Bach, Stephen H.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c650-5a7b48a0943bd92ebb67bbbfae97facb892be9d927fbcc7a6c284dbb485e9e5e3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Smith, Ryan</creatorcontrib><creatorcontrib>Fries, Jason A.</creatorcontrib><creatorcontrib>Hancock, Braden</creatorcontrib><creatorcontrib>Bach, Stephen H.</creatorcontrib><collection>CrossRef</collection><jtitle>ACM / IMS journal of data science</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Smith, Ryan</au><au>Fries, Jason A.</au><au>Hancock, Braden</au><au>Bach, Stephen H.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Language Models in the Loop: Incorporating Prompting into Weak Supervision</atitle><jtitle>ACM / IMS journal of data science</jtitle><date>2024-06-30</date><risdate>2024</risdate><volume>1</volume><issue>2</issue><spage>1</spage><epage>30</epage><pages>1-30</pages><issn>2831-3194</issn><eissn>2831-3194</eissn><abstract>We propose a new strategy for applying large pre-trained language models to novel tasks when labeled training data is limited. Rather than apply the model in a typical zero-shot or few-shot fashion, we treat the model as the basis for labeling functions in a weak supervision framework. To create a classifier, we first prompt the model to answer multiple distinct queries about an example and define how the possible responses should be mapped to votes for labels and abstentions. We then denoise these noisy label sources using the Snorkel system and train an end classifier with the resulting training data. Our experimental evaluation shows that prompting large language models within a weak supervision framework can provide significant gains in accuracy. On the WRENCH weak supervision benchmark, this approach can significantly improve over zero-shot performance, an average 19.5% reduction in errors. We also find that this approach produces classifiers with comparable or superior accuracy to those trained from hand-engineered rules. Problem statement The goal of this paper is to use large language models to create smaller, specialized models. These specialized models can be better suited to specific tasks because they are tuned for them and are less expensive to serve in production. Existing approaches create training data for specialized models by prompting large language models in a zero-shot fashion, i.e., they instruct the language model to solve the task of interest and treat the responses as ground truth. This approach can be unreliable when the language model has noisy outputs and is sensitive to the wording of the prompt. Methods We address the problems of noisy outputs and prompt sensitivity by proposing a new strategy. We treat large language models as the basis for labeling functions in a weak supervision framework. To create a classifier, we first prompt the model to answer multiple distinct queries about an example and define how the possible responses should be mapped to votes for labels and abstentions. We then denoise these noisy label sources using the Snorkel system and train an end classifier with the resulting training data. Results Our experimental evaluation shows that prompting large language models within a weak supervision framework can provide significant gains in accuracy. On the WRENCH weak supervision benchmark, this approach can significantly improve over zero-shot performance, an average 19.5% reduction in errors. We also find that this approach produces classifiers with comparable or superior accuracy to those trained from hand-engineered rules. Significance Large language models are increasingly the starting point in many areas of machine learning. Incorporating prompting into weak supervision can enable users to more easily and accurately adapt them to specialized tasks.</abstract><doi>10.1145/3617130</doi><tpages>30</tpages><orcidid>https://orcid.org/0000-0001-9316-5768</orcidid><orcidid>https://orcid.org/0000-0003-3857-3560</orcidid><orcidid>https://orcid.org/0000-0003-3167-1053</orcidid><orcidid>https://orcid.org/0009-0009-6434-0737</orcidid><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 2831-3194
ispartof	ACM / IMS journal of data science, 2024-06, Vol.1 (2), p.1-30
issn	2831-3194 2831-3194
language	eng
recordid	cdi_crossref_primary_10_1145_3617130
source	Alma/SFX Local Collection
title	Language Models in the Loop: Incorporating Prompting into Weak Supervision
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-02T00%3A59%3A24IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-crossref&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Language%20Models%20in%20the%20Loop:%20Incorporating%20Prompting%20into%20Weak%20Supervision&rft.jtitle=ACM%20/%20IMS%20journal%20of%20data%20science&rft.au=Smith,%20Ryan&rft.date=2024-06-30&rft.volume=1&rft.issue=2&rft.spage=1&rft.epage=30&rft.pages=1-30&rft.issn=2831-3194&rft.eissn=2831-3194&rft_id=info:doi/10.1145/3617130&rft_dat=%3Ccrossref%3E10_1145_3617130%3C/crossref%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true