Train and You'll Miss It: Interactive Model Iteration with Weak Supervision and Pre-Trained Embeddings

Our goal is to enable machine learning systems to be trained interactively. This requires models that perform well and train quickly, without large amounts of hand-labeled data. We take a step forward in this direction by borrowing from weak supervision (WS), wherein models can be trained with noisy...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Chen, Mayee F, Fu, Daniel Y, Sala, Frederic, Wu, Sen, Mullapudi, Ravi Teja, Poms, Fait, Fatahalian, Kayvon, Ré, Christopher
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Learning Statistics - Machine Learning
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Chen, Mayee F Fu, Daniel Y Sala, Frederic Wu, Sen Mullapudi, Ravi Teja Poms, Fait Fatahalian, Kayvon Ré, Christopher
description	Our goal is to enable machine learning systems to be trained interactively. This requires models that perform well and train quickly, without large amounts of hand-labeled data. We take a step forward in this direction by borrowing from weak supervision (WS), wherein models can be trained with noisy sources of signal instead of hand-labeled data. But WS relies on training downstream deep networks to extrapolate to unseen data points, which can take hours or days. Pre-trained embeddings can remove this requirement. We do not use the embeddings as features as in transfer learning (TL), which requires fine-tuning for high performance, but instead use them to define a distance function on the data and extend WS source votes to nearby points. Theoretically, we provide a series of results studying how performance scales with changes in source coverage, source accuracy, and the Lipschitzness of label distributions in the embedding space, and compare this rate to standard WS without extension and TL without fine-tuning. On six benchmark NLP and video tasks, our method outperforms WS without extension by 4.1 points, TL without fine-tuning by 12.8 points, and traditionally-supervised deep networks by 13.1 points, and comes within 0.7 points of state-of-the-art weakly-supervised deep networks-all while training in less than half a second.
doi_str_mv	10.48550/arxiv.2006.15168
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2006_15168</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2006_15168</sourcerecordid><originalsourceid>FETCH-LOGICAL-a678-cf2af1c7159827b7c28f648b8a178212cd6cb66a5f7e7003bc6bec2e4a1cfeb53</originalsourceid><addsrcrecordid>eNotj71OwzAYAL0woMIDMOGNKcF24h_YUFWgUiuQiISYos_2Z7BIk8pJA7w9TWA66YaTjpALzvLSSMmuIX3HMReMqZxLrswpCVWC2FJoPX3rDldNQ7ex7-l6uKXrdsAEbogj0m3nsTnaoxhi19KvOHzQV4RP-nLYYxpjP9mp8pwwm5vo6Wpn0fvYvvdn5CRA0-P5Pxekul9Vy8ds8_SwXt5tMlDaZC4ICNxpLm-M0FY7YYIqjTXAtRFcOK-cVQpk0KgZK6xTFp3AErgLaGWxIJd_2Xm03qe4g_RTT8P1PFz8AoOHUZQ</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Train and You'll Miss It: Interactive Model Iteration with Weak Supervision and Pre-Trained Embeddings</title><source>arXiv.org</source><creator>Chen, Mayee F ; Fu, Daniel Y ; Sala, Frederic ; Wu, Sen ; Mullapudi, Ravi Teja ; Poms, Fait ; Fatahalian, Kayvon ; Ré, Christopher</creator><creatorcontrib>Chen, Mayee F ; Fu, Daniel Y ; Sala, Frederic ; Wu, Sen ; Mullapudi, Ravi Teja ; Poms, Fait ; Fatahalian, Kayvon ; Ré, Christopher</creatorcontrib><description>Our goal is to enable machine learning systems to be trained interactively. This requires models that perform well and train quickly, without large amounts of hand-labeled data. We take a step forward in this direction by borrowing from weak supervision (WS), wherein models can be trained with noisy sources of signal instead of hand-labeled data. But WS relies on training downstream deep networks to extrapolate to unseen data points, which can take hours or days. Pre-trained embeddings can remove this requirement. We do not use the embeddings as features as in transfer learning (TL), which requires fine-tuning for high performance, but instead use them to define a distance function on the data and extend WS source votes to nearby points. Theoretically, we provide a series of results studying how performance scales with changes in source coverage, source accuracy, and the Lipschitzness of label distributions in the embedding space, and compare this rate to standard WS without extension and TL without fine-tuning. On six benchmark NLP and video tasks, our method outperforms WS without extension by 4.1 points, TL without fine-tuning by 12.8 points, and traditionally-supervised deep networks by 13.1 points, and comes within 0.7 points of state-of-the-art weakly-supervised deep networks-all while training in less than half a second.</description><identifier>DOI: 10.48550/arxiv.2006.15168</identifier><language>eng</language><subject>Computer Science - Learning ; Statistics - Machine Learning</subject><creationdate>2020-06</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2006.15168$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2006.15168$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Chen, Mayee F</creatorcontrib><creatorcontrib>Fu, Daniel Y</creatorcontrib><creatorcontrib>Sala, Frederic</creatorcontrib><creatorcontrib>Wu, Sen</creatorcontrib><creatorcontrib>Mullapudi, Ravi Teja</creatorcontrib><creatorcontrib>Poms, Fait</creatorcontrib><creatorcontrib>Fatahalian, Kayvon</creatorcontrib><creatorcontrib>Ré, Christopher</creatorcontrib><title>Train and You'll Miss It: Interactive Model Iteration with Weak Supervision and Pre-Trained Embeddings</title><description>Our goal is to enable machine learning systems to be trained interactively. This requires models that perform well and train quickly, without large amounts of hand-labeled data. We take a step forward in this direction by borrowing from weak supervision (WS), wherein models can be trained with noisy sources of signal instead of hand-labeled data. But WS relies on training downstream deep networks to extrapolate to unseen data points, which can take hours or days. Pre-trained embeddings can remove this requirement. We do not use the embeddings as features as in transfer learning (TL), which requires fine-tuning for high performance, but instead use them to define a distance function on the data and extend WS source votes to nearby points. Theoretically, we provide a series of results studying how performance scales with changes in source coverage, source accuracy, and the Lipschitzness of label distributions in the embedding space, and compare this rate to standard WS without extension and TL without fine-tuning. On six benchmark NLP and video tasks, our method outperforms WS without extension by 4.1 points, TL without fine-tuning by 12.8 points, and traditionally-supervised deep networks by 13.1 points, and comes within 0.7 points of state-of-the-art weakly-supervised deep networks-all while training in less than half a second.</description><subject>Computer Science - Learning</subject><subject>Statistics - Machine Learning</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotj71OwzAYAL0woMIDMOGNKcF24h_YUFWgUiuQiISYos_2Z7BIk8pJA7w9TWA66YaTjpALzvLSSMmuIX3HMReMqZxLrswpCVWC2FJoPX3rDldNQ7ex7-l6uKXrdsAEbogj0m3nsTnaoxhi19KvOHzQV4RP-nLYYxpjP9mp8pwwm5vo6Wpn0fvYvvdn5CRA0-P5Pxekul9Vy8ds8_SwXt5tMlDaZC4ICNxpLm-M0FY7YYIqjTXAtRFcOK-cVQpk0KgZK6xTFp3AErgLaGWxIJd_2Xm03qe4g_RTT8P1PFz8AoOHUZQ</recordid><startdate>20200626</startdate><enddate>20200626</enddate><creator>Chen, Mayee F</creator><creator>Fu, Daniel Y</creator><creator>Sala, Frederic</creator><creator>Wu, Sen</creator><creator>Mullapudi, Ravi Teja</creator><creator>Poms, Fait</creator><creator>Fatahalian, Kayvon</creator><creator>Ré, Christopher</creator><scope>AKY</scope><scope>EPD</scope><scope>GOX</scope></search><sort><creationdate>20200626</creationdate><title>Train and You'll Miss It: Interactive Model Iteration with Weak Supervision and Pre-Trained Embeddings</title><author>Chen, Mayee F ; Fu, Daniel Y ; Sala, Frederic ; Wu, Sen ; Mullapudi, Ravi Teja ; Poms, Fait ; Fatahalian, Kayvon ; Ré, Christopher</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a678-cf2af1c7159827b7c28f648b8a178212cd6cb66a5f7e7003bc6bec2e4a1cfeb53</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Computer Science - Learning</topic><topic>Statistics - Machine Learning</topic><toplevel>online_resources</toplevel><creatorcontrib>Chen, Mayee F</creatorcontrib><creatorcontrib>Fu, Daniel Y</creatorcontrib><creatorcontrib>Sala, Frederic</creatorcontrib><creatorcontrib>Wu, Sen</creatorcontrib><creatorcontrib>Mullapudi, Ravi Teja</creatorcontrib><creatorcontrib>Poms, Fait</creatorcontrib><creatorcontrib>Fatahalian, Kayvon</creatorcontrib><creatorcontrib>Ré, Christopher</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv Statistics</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Chen, Mayee F</au><au>Fu, Daniel Y</au><au>Sala, Frederic</au><au>Wu, Sen</au><au>Mullapudi, Ravi Teja</au><au>Poms, Fait</au><au>Fatahalian, Kayvon</au><au>Ré, Christopher</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Train and You'll Miss It: Interactive Model Iteration with Weak Supervision and Pre-Trained Embeddings</atitle><date>2020-06-26</date><risdate>2020</risdate><abstract>Our goal is to enable machine learning systems to be trained interactively. This requires models that perform well and train quickly, without large amounts of hand-labeled data. We take a step forward in this direction by borrowing from weak supervision (WS), wherein models can be trained with noisy sources of signal instead of hand-labeled data. But WS relies on training downstream deep networks to extrapolate to unseen data points, which can take hours or days. Pre-trained embeddings can remove this requirement. We do not use the embeddings as features as in transfer learning (TL), which requires fine-tuning for high performance, but instead use them to define a distance function on the data and extend WS source votes to nearby points. Theoretically, we provide a series of results studying how performance scales with changes in source coverage, source accuracy, and the Lipschitzness of label distributions in the embedding space, and compare this rate to standard WS without extension and TL without fine-tuning. On six benchmark NLP and video tasks, our method outperforms WS without extension by 4.1 points, TL without fine-tuning by 12.8 points, and traditionally-supervised deep networks by 13.1 points, and comes within 0.7 points of state-of-the-art weakly-supervised deep networks-all while training in less than half a second.</abstract><doi>10.48550/arxiv.2006.15168</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2006.15168
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2006_15168
source	arXiv.org
subjects	Computer Science - Learning Statistics - Machine Learning
title	Train and You'll Miss It: Interactive Model Iteration with Weak Supervision and Pre-Trained Embeddings
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-24T17%3A31%3A47IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Train%20and%20You'll%20Miss%20It:%20Interactive%20Model%20Iteration%20with%20Weak%20Supervision%20and%20Pre-Trained%20Embeddings&rft.au=Chen,%20Mayee%20F&rft.date=2020-06-26&rft_id=info:doi/10.48550/arxiv.2006.15168&rft_dat=%3Carxiv_GOX%3E2006_15168%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true