SYSTEMS AND METHODS FOR FAULT TOLERANCE RECOVER DURING TRAINING OF A MODEL OF A CLASSIFIER USING A DISTRIBUTED SYSTEM

A distributed system for training a classifier is provided. The system comprises machine learning (ML) workers and a parameter server (PS). The PS is configured for parallel processing to provide the model to each of the ML workers, receive model updates from each of the ML workers, and iteratively...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	PETERFREUND, Natan, WU, Zuguang, TALYANSKY, Roman, MELAMED, Zach
Format:	Patent
Sprache:	eng ; fre ; ger
Schlagworte:	CALCULATING COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS COMPUTING COUNTING ELECTRIC DIGITAL DATA PROCESSING PHYSICS
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	PETERFREUND, Natan WU, Zuguang TALYANSKY, Roman MELAMED, Zach
description	A distributed system for training a classifier is provided. The system comprises machine learning (ML) workers and a parameter server (PS). The PS is configured for parallel processing to provide the model to each of the ML workers, receive model updates from each of the ML workers, and iteratively update the model using each model update. The PS contains gradient datasets associated with a respective ML worker, for storing a model-update-identification (delta-M-ID) indicative of the computed model update and the respective model update, a global dataset that stores, the delta-M-ID, an identification of the ML worker (ML-worker-ID) that computed the model update, and a model version that marks a new model in PS that is computed from merging the model update with a previous model in PS; and a model download dataset that stores the ML-worker-ID and the model version of each transmitted model.
format	Patent
fullrecord	<record><control><sourceid>epo_EVB</sourceid><recordid>TN_cdi_epo_espacenet_EP3529754A1</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>EP3529754A1</sourcerecordid><originalsourceid>FETCH-epo_espacenet_EP3529754A13</originalsourceid><addsrcrecordid>eNqNzUEKwjAQheFuXIh6h7mAC60iLsdkYgNpIpmJ4KoUiSvRQvX-WuoBXL1_8cGbFm--sFDNgF5DTVIFzWBCBIPJCUhwFNErgkgqnCmCTtH6I0hE64cIBhDqoMmNqRwyW2O_NPEAELRlifaQhDSMd_NicmvvfV78dlaAIVHVMnfPJvdde82P_GroVG7X-912g6vyD_IBXq44lQ</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>patent</recordtype></control><display><type>patent</type><title>SYSTEMS AND METHODS FOR FAULT TOLERANCE RECOVER DURING TRAINING OF A MODEL OF A CLASSIFIER USING A DISTRIBUTED SYSTEM</title><source>esp@cenet</source><creator>PETERFREUND, Natan ; WU, Zuguang ; TALYANSKY, Roman ; MELAMED, Zach</creator><creatorcontrib>PETERFREUND, Natan ; WU, Zuguang ; TALYANSKY, Roman ; MELAMED, Zach</creatorcontrib><description>A distributed system for training a classifier is provided. The system comprises machine learning (ML) workers and a parameter server (PS). The PS is configured for parallel processing to provide the model to each of the ML workers, receive model updates from each of the ML workers, and iteratively update the model using each model update. The PS contains gradient datasets associated with a respective ML worker, for storing a model-update-identification (delta-M-ID) indicative of the computed model update and the respective model update, a global dataset that stores, the delta-M-ID, an identification of the ML worker (ML-worker-ID) that computed the model update, and a model version that marks a new model in PS that is computed from merging the model update with a previous model in PS; and a model download dataset that stores the ML-worker-ID and the model version of each transmitted model.</description><language>eng ; fre ; ger</language><subject>CALCULATING ; COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS ; COMPUTING ; COUNTING ; ELECTRIC DIGITAL DATA PROCESSING ; PHYSICS</subject><creationdate>2019</creationdate><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://worldwide.espacenet.com/publicationDetails/biblio?FT=D&date=20190828&DB=EPODOC&CC=EP&NR=3529754A1$$EHTML$$P50$$Gepo$$Hfree_for_read</linktohtml><link.rule.ids>230,308,780,885,25564,76547</link.rule.ids><linktorsrc>$$Uhttps://worldwide.espacenet.com/publicationDetails/biblio?FT=D&date=20190828&DB=EPODOC&CC=EP&NR=3529754A1$$EView_record_in_European_Patent_Office$$FView_record_in_$$GEuropean_Patent_Office$$Hfree_for_read</linktorsrc></links><search><creatorcontrib>PETERFREUND, Natan</creatorcontrib><creatorcontrib>WU, Zuguang</creatorcontrib><creatorcontrib>TALYANSKY, Roman</creatorcontrib><creatorcontrib>MELAMED, Zach</creatorcontrib><title>SYSTEMS AND METHODS FOR FAULT TOLERANCE RECOVER DURING TRAINING OF A MODEL OF A CLASSIFIER USING A DISTRIBUTED SYSTEM</title><description>A distributed system for training a classifier is provided. The system comprises machine learning (ML) workers and a parameter server (PS). The PS is configured for parallel processing to provide the model to each of the ML workers, receive model updates from each of the ML workers, and iteratively update the model using each model update. The PS contains gradient datasets associated with a respective ML worker, for storing a model-update-identification (delta-M-ID) indicative of the computed model update and the respective model update, a global dataset that stores, the delta-M-ID, an identification of the ML worker (ML-worker-ID) that computed the model update, and a model version that marks a new model in PS that is computed from merging the model update with a previous model in PS; and a model download dataset that stores the ML-worker-ID and the model version of each transmitted model.</description><subject>CALCULATING</subject><subject>COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS</subject><subject>COMPUTING</subject><subject>COUNTING</subject><subject>ELECTRIC DIGITAL DATA PROCESSING</subject><subject>PHYSICS</subject><fulltext>true</fulltext><rsrctype>patent</rsrctype><creationdate>2019</creationdate><recordtype>patent</recordtype><sourceid>EVB</sourceid><recordid>eNqNzUEKwjAQheFuXIh6h7mAC60iLsdkYgNpIpmJ4KoUiSvRQvX-WuoBXL1_8cGbFm--sFDNgF5DTVIFzWBCBIPJCUhwFNErgkgqnCmCTtH6I0hE64cIBhDqoMmNqRwyW2O_NPEAELRlifaQhDSMd_NicmvvfV78dlaAIVHVMnfPJvdde82P_GroVG7X-912g6vyD_IBXq44lQ</recordid><startdate>20190828</startdate><enddate>20190828</enddate><creator>PETERFREUND, Natan</creator><creator>WU, Zuguang</creator><creator>TALYANSKY, Roman</creator><creator>MELAMED, Zach</creator><scope>EVB</scope></search><sort><creationdate>20190828</creationdate><title>SYSTEMS AND METHODS FOR FAULT TOLERANCE RECOVER DURING TRAINING OF A MODEL OF A CLASSIFIER USING A DISTRIBUTED SYSTEM</title><author>PETERFREUND, Natan ; WU, Zuguang ; TALYANSKY, Roman ; MELAMED, Zach</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-epo_espacenet_EP3529754A13</frbrgroupid><rsrctype>patents</rsrctype><prefilter>patents</prefilter><language>eng ; fre ; ger</language><creationdate>2019</creationdate><topic>CALCULATING</topic><topic>COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS</topic><topic>COMPUTING</topic><topic>COUNTING</topic><topic>ELECTRIC DIGITAL DATA PROCESSING</topic><topic>PHYSICS</topic><toplevel>online_resources</toplevel><creatorcontrib>PETERFREUND, Natan</creatorcontrib><creatorcontrib>WU, Zuguang</creatorcontrib><creatorcontrib>TALYANSKY, Roman</creatorcontrib><creatorcontrib>MELAMED, Zach</creatorcontrib><collection>esp@cenet</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>PETERFREUND, Natan</au><au>WU, Zuguang</au><au>TALYANSKY, Roman</au><au>MELAMED, Zach</au><format>patent</format><genre>patent</genre><ristype>GEN</ristype><title>SYSTEMS AND METHODS FOR FAULT TOLERANCE RECOVER DURING TRAINING OF A MODEL OF A CLASSIFIER USING A DISTRIBUTED SYSTEM</title><date>2019-08-28</date><risdate>2019</risdate><abstract>A distributed system for training a classifier is provided. The system comprises machine learning (ML) workers and a parameter server (PS). The PS is configured for parallel processing to provide the model to each of the ML workers, receive model updates from each of the ML workers, and iteratively update the model using each model update. The PS contains gradient datasets associated with a respective ML worker, for storing a model-update-identification (delta-M-ID) indicative of the computed model update and the respective model update, a global dataset that stores, the delta-M-ID, an identification of the ML worker (ML-worker-ID) that computed the model update, and a model version that marks a new model in PS that is computed from merging the model update with a previous model in PS; and a model download dataset that stores the ML-worker-ID and the model version of each transmitted model.</abstract><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier
ispartof
issn
language	eng ; fre ; ger
recordid	cdi_epo_espacenet_EP3529754A1
source	esp@cenet
subjects	CALCULATING COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS COMPUTING COUNTING ELECTRIC DIGITAL DATA PROCESSING PHYSICS
title	SYSTEMS AND METHODS FOR FAULT TOLERANCE RECOVER DURING TRAINING OF A MODEL OF A CLASSIFIER USING A DISTRIBUTED SYSTEM
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-06T18%3A21%3A17IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-epo_EVB&rft_val_fmt=info:ofi/fmt:kev:mtx:patent&rft.genre=patent&rft.au=PETERFREUND,%20Natan&rft.date=2019-08-28&rft_id=info:doi/&rft_dat=%3Cepo_EVB%3EEP3529754A1%3C/epo_EVB%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true