Language Models in the Loop: Incorporating Prompting into Weak Supervision
We propose a new strategy for applying large pre-trained language models to novel tasks when labeled training data is limited. Rather than apply the model in a typical zero-shot or few-shot fashion, we treat the model as the basis for labeling functions in a weak supervision framework. To create a c...
Gespeichert in:
Veröffentlicht in: | ACM / IMS journal of data science 2024-06, Vol.1 (2), p.1-30 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 30 |
---|---|
container_issue | 2 |
container_start_page | 1 |
container_title | ACM / IMS journal of data science |
container_volume | 1 |
creator | Smith, Ryan Fries, Jason A. Hancock, Braden Bach, Stephen H. |
description | We propose a new strategy for applying large pre-trained language models to novel tasks when labeled training data is limited. Rather than apply the model in a typical zero-shot or few-shot fashion, we treat the model as the basis for labeling functions in a weak supervision framework. To create a classifier, we first prompt the model to answer multiple distinct queries about an example and define how the possible responses should be mapped to votes for labels and abstentions. We then denoise these noisy label sources using the Snorkel system and train an end classifier with the resulting training data. Our experimental evaluation shows that prompting large language models within a weak supervision framework can provide significant gains in accuracy. On the WRENCH weak supervision benchmark, this approach can significantly improve over zero-shot performance, an average 19.5% reduction in errors. We also find that this approach produces classifiers with comparable or superior accuracy to those trained from hand-engineered rules.
Problem statement
The goal of this paper is to use large language models to create smaller, specialized models. These specialized models can be better suited to specific tasks because they are tuned for them and are less expensive to serve in production. Existing approaches create training data for specialized models by prompting large language models in a zero-shot fashion, i.e., they instruct the language model to solve the task of interest and treat the responses as ground truth. This approach can be unreliable when the language model has noisy outputs and is sensitive to the wording of the prompt.
Methods
We address the problems of noisy outputs and prompt sensitivity by proposing a new strategy. We treat large language models as the basis for labeling functions in a weak supervision framework. To create a classifier, we first prompt the model to answer multiple distinct queries about an example and define how the possible responses should be mapped to votes for labels and abstentions. We then denoise these noisy label sources using the Snorkel system and train an end classifier with the resulting training data.
Results
Our experimental evaluation shows that prompting large language models within a weak supervision framework can provide significant gains in accuracy. On the WRENCH weak supervision benchmark, this approach can significantly improve over zero-shot performance, an average 19.5% reduction in errors. We also find tha |
doi_str_mv | 10.1145/3617130 |
format | Article |
fullrecord | <record><control><sourceid>crossref</sourceid><recordid>TN_cdi_crossref_primary_10_1145_3617130</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>10_1145_3617130</sourcerecordid><originalsourceid>FETCH-LOGICAL-c650-5a7b48a0943bd92ebb67bbbfae97facb892be9d927fbcc7a6c284dbb485e9e5e3</originalsourceid><addsrcrecordid>eNpNkFFLwzAUhYMoOObwL-TNp2rStE3jmwx1k4qCAx_LveltjW5JSTbBf-_UPfh0Pjic8_Axdi7FpZRFeaUqqaUSR2yS10pmSpri-B-fsllK70KI3BgptZqwhwb8sIOB-GPoaJ2483z7RrwJYbzmS29DHEOErfMDf45hM_6S89vAXwk--MtupPjpkgv-jJ30sE40O-SUre5uV_NF1jzdL-c3TWarUmQlaCxqEKZQ2JmcECuNiD2Q0T1YrE2OZPaN7tFaDZXN66LD_aYkQyWpKbv4u7UxpBSpb8foNhC_WinaHwntQYL6BieVT3g</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Language Models in the Loop: Incorporating Prompting into Weak Supervision</title><source>Alma/SFX Local Collection</source><creator>Smith, Ryan ; Fries, Jason A. ; Hancock, Braden ; Bach, Stephen H.</creator><creatorcontrib>Smith, Ryan ; Fries, Jason A. ; Hancock, Braden ; Bach, Stephen H.</creatorcontrib><description>We propose a new strategy for applying large pre-trained language models to novel tasks when labeled training data is limited. Rather than apply the model in a typical zero-shot or few-shot fashion, we treat the model as the basis for labeling functions in a weak supervision framework. To create a classifier, we first prompt the model to answer multiple distinct queries about an example and define how the possible responses should be mapped to votes for labels and abstentions. We then denoise these noisy label sources using the Snorkel system and train an end classifier with the resulting training data. Our experimental evaluation shows that prompting large language models within a weak supervision framework can provide significant gains in accuracy. On the WRENCH weak supervision benchmark, this approach can significantly improve over zero-shot performance, an average 19.5% reduction in errors. We also find that this approach produces classifiers with comparable or superior accuracy to those trained from hand-engineered rules.
Problem statement
The goal of this paper is to use large language models to create smaller, specialized models. These specialized models can be better suited to specific tasks because they are tuned for them and are less expensive to serve in production. Existing approaches create training data for specialized models by prompting large language models in a zero-shot fashion, i.e., they instruct the language model to solve the task of interest and treat the responses as ground truth. This approach can be unreliable when the language model has noisy outputs and is sensitive to the wording of the prompt.
Methods
We address the problems of noisy outputs and prompt sensitivity by proposing a new strategy. We treat large language models as the basis for labeling functions in a weak supervision framework. To create a classifier, we first prompt the model to answer multiple distinct queries about an example and define how the possible responses should be mapped to votes for labels and abstentions. We then denoise these noisy label sources using the Snorkel system and train an end classifier with the resulting training data.
Results
Our experimental evaluation shows that prompting large language models within a weak supervision framework can provide significant gains in accuracy. On the WRENCH weak supervision benchmark, this approach can significantly improve over zero-shot performance, an average 19.5% reduction in errors. We also find that this approach produces classifiers with comparable or superior accuracy to those trained from hand-engineered rules.
Significance
Large language models are increasingly the starting point in many areas of machine learning. Incorporating prompting into weak supervision can enable users to more easily and accurately adapt them to specialized tasks.</description><identifier>ISSN: 2831-3194</identifier><identifier>EISSN: 2831-3194</identifier><identifier>DOI: 10.1145/3617130</identifier><language>eng</language><ispartof>ACM / IMS journal of data science, 2024-06, Vol.1 (2), p.1-30</ispartof><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c650-5a7b48a0943bd92ebb67bbbfae97facb892be9d927fbcc7a6c284dbb485e9e5e3</cites><orcidid>0000-0001-9316-5768 ; 0000-0003-3857-3560 ; 0000-0003-3167-1053 ; 0009-0009-6434-0737</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27924,27925</link.rule.ids></links><search><creatorcontrib>Smith, Ryan</creatorcontrib><creatorcontrib>Fries, Jason A.</creatorcontrib><creatorcontrib>Hancock, Braden</creatorcontrib><creatorcontrib>Bach, Stephen H.</creatorcontrib><title>Language Models in the Loop: Incorporating Prompting into Weak Supervision</title><title>ACM / IMS journal of data science</title><description>We propose a new strategy for applying large pre-trained language models to novel tasks when labeled training data is limited. Rather than apply the model in a typical zero-shot or few-shot fashion, we treat the model as the basis for labeling functions in a weak supervision framework. To create a classifier, we first prompt the model to answer multiple distinct queries about an example and define how the possible responses should be mapped to votes for labels and abstentions. We then denoise these noisy label sources using the Snorkel system and train an end classifier with the resulting training data. Our experimental evaluation shows that prompting large language models within a weak supervision framework can provide significant gains in accuracy. On the WRENCH weak supervision benchmark, this approach can significantly improve over zero-shot performance, an average 19.5% reduction in errors. We also find that this approach produces classifiers with comparable or superior accuracy to those trained from hand-engineered rules.
Problem statement
The goal of this paper is to use large language models to create smaller, specialized models. These specialized models can be better suited to specific tasks because they are tuned for them and are less expensive to serve in production. Existing approaches create training data for specialized models by prompting large language models in a zero-shot fashion, i.e., they instruct the language model to solve the task of interest and treat the responses as ground truth. This approach can be unreliable when the language model has noisy outputs and is sensitive to the wording of the prompt.
Methods
We address the problems of noisy outputs and prompt sensitivity by proposing a new strategy. We treat large language models as the basis for labeling functions in a weak supervision framework. To create a classifier, we first prompt the model to answer multiple distinct queries about an example and define how the possible responses should be mapped to votes for labels and abstentions. We then denoise these noisy label sources using the Snorkel system and train an end classifier with the resulting training data.
Results
Our experimental evaluation shows that prompting large language models within a weak supervision framework can provide significant gains in accuracy. On the WRENCH weak supervision benchmark, this approach can significantly improve over zero-shot performance, an average 19.5% reduction in errors. We also find that this approach produces classifiers with comparable or superior accuracy to those trained from hand-engineered rules.
Significance
Large language models are increasingly the starting point in many areas of machine learning. Incorporating prompting into weak supervision can enable users to more easily and accurately adapt them to specialized tasks.</description><issn>2831-3194</issn><issn>2831-3194</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><recordid>eNpNkFFLwzAUhYMoOObwL-TNp2rStE3jmwx1k4qCAx_LveltjW5JSTbBf-_UPfh0Pjic8_Axdi7FpZRFeaUqqaUSR2yS10pmSpri-B-fsllK70KI3BgptZqwhwb8sIOB-GPoaJ2483z7RrwJYbzmS29DHEOErfMDf45hM_6S89vAXwk--MtupPjpkgv-jJ30sE40O-SUre5uV_NF1jzdL-c3TWarUmQlaCxqEKZQ2JmcECuNiD2Q0T1YrE2OZPaN7tFaDZXN66LD_aYkQyWpKbv4u7UxpBSpb8foNhC_WinaHwntQYL6BieVT3g</recordid><startdate>20240630</startdate><enddate>20240630</enddate><creator>Smith, Ryan</creator><creator>Fries, Jason A.</creator><creator>Hancock, Braden</creator><creator>Bach, Stephen H.</creator><scope>AAYXX</scope><scope>CITATION</scope><orcidid>https://orcid.org/0000-0001-9316-5768</orcidid><orcidid>https://orcid.org/0000-0003-3857-3560</orcidid><orcidid>https://orcid.org/0000-0003-3167-1053</orcidid><orcidid>https://orcid.org/0009-0009-6434-0737</orcidid></search><sort><creationdate>20240630</creationdate><title>Language Models in the Loop: Incorporating Prompting into Weak Supervision</title><author>Smith, Ryan ; Fries, Jason A. ; Hancock, Braden ; Bach, Stephen H.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c650-5a7b48a0943bd92ebb67bbbfae97facb892be9d927fbcc7a6c284dbb485e9e5e3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Smith, Ryan</creatorcontrib><creatorcontrib>Fries, Jason A.</creatorcontrib><creatorcontrib>Hancock, Braden</creatorcontrib><creatorcontrib>Bach, Stephen H.</creatorcontrib><collection>CrossRef</collection><jtitle>ACM / IMS journal of data science</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Smith, Ryan</au><au>Fries, Jason A.</au><au>Hancock, Braden</au><au>Bach, Stephen H.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Language Models in the Loop: Incorporating Prompting into Weak Supervision</atitle><jtitle>ACM / IMS journal of data science</jtitle><date>2024-06-30</date><risdate>2024</risdate><volume>1</volume><issue>2</issue><spage>1</spage><epage>30</epage><pages>1-30</pages><issn>2831-3194</issn><eissn>2831-3194</eissn><abstract>We propose a new strategy for applying large pre-trained language models to novel tasks when labeled training data is limited. Rather than apply the model in a typical zero-shot or few-shot fashion, we treat the model as the basis for labeling functions in a weak supervision framework. To create a classifier, we first prompt the model to answer multiple distinct queries about an example and define how the possible responses should be mapped to votes for labels and abstentions. We then denoise these noisy label sources using the Snorkel system and train an end classifier with the resulting training data. Our experimental evaluation shows that prompting large language models within a weak supervision framework can provide significant gains in accuracy. On the WRENCH weak supervision benchmark, this approach can significantly improve over zero-shot performance, an average 19.5% reduction in errors. We also find that this approach produces classifiers with comparable or superior accuracy to those trained from hand-engineered rules.
Problem statement
The goal of this paper is to use large language models to create smaller, specialized models. These specialized models can be better suited to specific tasks because they are tuned for them and are less expensive to serve in production. Existing approaches create training data for specialized models by prompting large language models in a zero-shot fashion, i.e., they instruct the language model to solve the task of interest and treat the responses as ground truth. This approach can be unreliable when the language model has noisy outputs and is sensitive to the wording of the prompt.
Methods
We address the problems of noisy outputs and prompt sensitivity by proposing a new strategy. We treat large language models as the basis for labeling functions in a weak supervision framework. To create a classifier, we first prompt the model to answer multiple distinct queries about an example and define how the possible responses should be mapped to votes for labels and abstentions. We then denoise these noisy label sources using the Snorkel system and train an end classifier with the resulting training data.
Results
Our experimental evaluation shows that prompting large language models within a weak supervision framework can provide significant gains in accuracy. On the WRENCH weak supervision benchmark, this approach can significantly improve over zero-shot performance, an average 19.5% reduction in errors. We also find that this approach produces classifiers with comparable or superior accuracy to those trained from hand-engineered rules.
Significance
Large language models are increasingly the starting point in many areas of machine learning. Incorporating prompting into weak supervision can enable users to more easily and accurately adapt them to specialized tasks.</abstract><doi>10.1145/3617130</doi><tpages>30</tpages><orcidid>https://orcid.org/0000-0001-9316-5768</orcidid><orcidid>https://orcid.org/0000-0003-3857-3560</orcidid><orcidid>https://orcid.org/0000-0003-3167-1053</orcidid><orcidid>https://orcid.org/0009-0009-6434-0737</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 2831-3194 |
ispartof | ACM / IMS journal of data science, 2024-06, Vol.1 (2), p.1-30 |
issn | 2831-3194 2831-3194 |
language | eng |
recordid | cdi_crossref_primary_10_1145_3617130 |
source | Alma/SFX Local Collection |
title | Language Models in the Loop: Incorporating Prompting into Weak Supervision |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-02T00%3A59%3A24IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-crossref&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Language%20Models%20in%20the%20Loop:%20Incorporating%20Prompting%20into%20Weak%20Supervision&rft.jtitle=ACM%20/%20IMS%20journal%20of%20data%20science&rft.au=Smith,%20Ryan&rft.date=2024-06-30&rft.volume=1&rft.issue=2&rft.spage=1&rft.epage=30&rft.pages=1-30&rft.issn=2831-3194&rft.eissn=2831-3194&rft_id=info:doi/10.1145/3617130&rft_dat=%3Ccrossref%3E10_1145_3617130%3C/crossref%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |