Ordering computations of a machine learning network in a machine learning accelerator for efficient memory usage

A compiler efficiently manages memory usage in the machine learning accelerator by intelligently ordering computations of a machine learning network. The compiler identifies a set of partial networks of the machine learning network representing portions of the machine learning network across multipl...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Shah, Nishit, Kotler, Reed
Format:	Patent
Sprache:	eng
Schlagworte:	CALCULATING COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS COMPUTING COUNTING PHYSICS
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Shah, Nishit Kotler, Reed
description	A compiler efficiently manages memory usage in the machine learning accelerator by intelligently ordering computations of a machine learning network. The compiler identifies a set of partial networks of the machine learning network representing portions of the machine learning network across multiple layers on which an output or set of outputs are dependent. Because any given output may depend on only a limited subset of intermediate outputs from the prior layers, each partial network may include only a small fraction of the intermediate outputs from each layer. Instead of implementing the MLN by computing one layer at a time, the compiler schedules instructions to sequentially implement partial networks. As each layer of a partial network is completed, the intermediate outputs can be released from memory. The described technique enables intermediate outputs to be directly streamed between processing elements of the machine learning accelerator without requiring large transfers to and from external memory.
format	Patent
fullrecord	<record><control><sourceid>epo_EVB</sourceid><recordid>TN_cdi_epo_espacenet_US11586894B2</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>US11586894B2</sourcerecordid><originalsourceid>FETCH-epo_espacenet_US11586894B23</originalsourceid><addsrcrecordid>eNqNjbsKAjEQRbexEPUfxg-wWF-sraLYWaj1MsSbNbjJhEkW8e9dwdLC4nKKc-AOi3jSG9SFhoz42GXOTkIiscTk2dxdALVgDZ8kID9FH-TCL8vGoIVyFiXbD9Y64xAyeXjRF3WJG4yLgeU2YfLlqJge9pfdcYYoNVJkg_6mvp7LclWtq81yO1_807wBdFZDyA</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>patent</recordtype></control><display><type>patent</type><title>Ordering computations of a machine learning network in a machine learning accelerator for efficient memory usage</title><source>esp@cenet</source><creator>Shah, Nishit ; Kotler, Reed</creator><creatorcontrib>Shah, Nishit ; Kotler, Reed</creatorcontrib><description>A compiler efficiently manages memory usage in the machine learning accelerator by intelligently ordering computations of a machine learning network. The compiler identifies a set of partial networks of the machine learning network representing portions of the machine learning network across multiple layers on which an output or set of outputs are dependent. Because any given output may depend on only a limited subset of intermediate outputs from the prior layers, each partial network may include only a small fraction of the intermediate outputs from each layer. Instead of implementing the MLN by computing one layer at a time, the compiler schedules instructions to sequentially implement partial networks. As each layer of a partial network is completed, the intermediate outputs can be released from memory. The described technique enables intermediate outputs to be directly streamed between processing elements of the machine learning accelerator without requiring large transfers to and from external memory.</description><language>eng</language><subject>CALCULATING ; COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS ; COMPUTING ; COUNTING ; PHYSICS</subject><creationdate>2023</creationdate><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://worldwide.espacenet.com/publicationDetails/biblio?FT=D&date=20230221&DB=EPODOC&CC=US&NR=11586894B2$$EHTML$$P50$$Gepo$$Hfree_for_read</linktohtml><link.rule.ids>230,308,780,885,25564,76547</link.rule.ids><linktorsrc>$$Uhttps://worldwide.espacenet.com/publicationDetails/biblio?FT=D&date=20230221&DB=EPODOC&CC=US&NR=11586894B2$$EView_record_in_European_Patent_Office$$FView_record_in_$$GEuropean_Patent_Office$$Hfree_for_read</linktorsrc></links><search><creatorcontrib>Shah, Nishit</creatorcontrib><creatorcontrib>Kotler, Reed</creatorcontrib><title>Ordering computations of a machine learning network in a machine learning accelerator for efficient memory usage</title><description>A compiler efficiently manages memory usage in the machine learning accelerator by intelligently ordering computations of a machine learning network. The compiler identifies a set of partial networks of the machine learning network representing portions of the machine learning network across multiple layers on which an output or set of outputs are dependent. Because any given output may depend on only a limited subset of intermediate outputs from the prior layers, each partial network may include only a small fraction of the intermediate outputs from each layer. Instead of implementing the MLN by computing one layer at a time, the compiler schedules instructions to sequentially implement partial networks. As each layer of a partial network is completed, the intermediate outputs can be released from memory. The described technique enables intermediate outputs to be directly streamed between processing elements of the machine learning accelerator without requiring large transfers to and from external memory.</description><subject>CALCULATING</subject><subject>COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS</subject><subject>COMPUTING</subject><subject>COUNTING</subject><subject>PHYSICS</subject><fulltext>true</fulltext><rsrctype>patent</rsrctype><creationdate>2023</creationdate><recordtype>patent</recordtype><sourceid>EVB</sourceid><recordid>eNqNjbsKAjEQRbexEPUfxg-wWF-sraLYWaj1MsSbNbjJhEkW8e9dwdLC4nKKc-AOi3jSG9SFhoz42GXOTkIiscTk2dxdALVgDZ8kID9FH-TCL8vGoIVyFiXbD9Y64xAyeXjRF3WJG4yLgeU2YfLlqJge9pfdcYYoNVJkg_6mvp7LclWtq81yO1_807wBdFZDyA</recordid><startdate>20230221</startdate><enddate>20230221</enddate><creator>Shah, Nishit</creator><creator>Kotler, Reed</creator><scope>EVB</scope></search><sort><creationdate>20230221</creationdate><title>Ordering computations of a machine learning network in a machine learning accelerator for efficient memory usage</title><author>Shah, Nishit ; Kotler, Reed</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-epo_espacenet_US11586894B23</frbrgroupid><rsrctype>patents</rsrctype><prefilter>patents</prefilter><language>eng</language><creationdate>2023</creationdate><topic>CALCULATING</topic><topic>COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS</topic><topic>COMPUTING</topic><topic>COUNTING</topic><topic>PHYSICS</topic><toplevel>online_resources</toplevel><creatorcontrib>Shah, Nishit</creatorcontrib><creatorcontrib>Kotler, Reed</creatorcontrib><collection>esp@cenet</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Shah, Nishit</au><au>Kotler, Reed</au><format>patent</format><genre>patent</genre><ristype>GEN</ristype><title>Ordering computations of a machine learning network in a machine learning accelerator for efficient memory usage</title><date>2023-02-21</date><risdate>2023</risdate><abstract>A compiler efficiently manages memory usage in the machine learning accelerator by intelligently ordering computations of a machine learning network. The compiler identifies a set of partial networks of the machine learning network representing portions of the machine learning network across multiple layers on which an output or set of outputs are dependent. Because any given output may depend on only a limited subset of intermediate outputs from the prior layers, each partial network may include only a small fraction of the intermediate outputs from each layer. Instead of implementing the MLN by computing one layer at a time, the compiler schedules instructions to sequentially implement partial networks. As each layer of a partial network is completed, the intermediate outputs can be released from memory. The described technique enables intermediate outputs to be directly streamed between processing elements of the machine learning accelerator without requiring large transfers to and from external memory.</abstract><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier
ispartof
issn
language	eng
recordid	cdi_epo_espacenet_US11586894B2
source	esp@cenet
subjects	CALCULATING COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS COMPUTING COUNTING PHYSICS
title	Ordering computations of a machine learning network in a machine learning accelerator for efficient memory usage
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-28T17%3A09%3A49IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-epo_EVB&rft_val_fmt=info:ofi/fmt:kev:mtx:patent&rft.genre=patent&rft.au=Shah,%20Nishit&rft.date=2023-02-21&rft_id=info:doi/&rft_dat=%3Cepo_EVB%3EUS11586894B2%3C/epo_EVB%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true