Proactive Embedding on Cold Data for Deep Learning Recommendation Model Training

Deep learning recommendation model (DLRM) is an important class of deep learning networks that are commonly used in many applications. DRLM presents unique challenges, especially for scale-out training since it not only has compute and memory-intensive components but the communication between the mu...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE computer architecture letters 2024-07, Vol.23 (2), p.203-206
Hauptverfasser:	Cho, Haeyoon, Son, Hyojun, Choi, Jungmin, Koh, Byungil, Ha, Minho, Kim, John
Format:	Artikel
Sprache:	eng
Schlagworte:	Backpropagation Data models Deep learning Graphics processing units Parallel processing Pipelines recommendation system Training
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	206
container_issue	2
container_start_page	203
container_title	IEEE computer architecture letters
container_volume	23
creator	Cho, Haeyoon Son, Hyojun Choi, Jungmin Koh, Byungil Ha, Minho Kim, John
description	Deep learning recommendation model (DLRM) is an important class of deep learning networks that are commonly used in many applications. DRLM presents unique challenges, especially for scale-out training since it not only has compute and memory-intensive components but the communication between the multiple GPUs is also on the critical path. In this work, we propose how cold data in DLRM embedding tables can be exploited to propose proactive embedding. In particular, proactive embedding allows embedding table accesses to be done in advance to reduce the impact of the memory access latency by overlapping the embedding access with communication. Our analysis of proactive embedding demonstrates that it can improve overall training performance by 46%.
doi_str_mv	10.1109/LCA.2024.3445948
format	Article
fullrecord	<record><control><sourceid>crossref_RIE</sourceid><recordid>TN_cdi_ieee_primary_10654665</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10654665</ieee_id><sourcerecordid>10_1109_LCA_2024_3445948</sourcerecordid><originalsourceid>FETCH-LOGICAL-c625-2dfb515f665c844456b0d5b3917b6d2f04798011030f46a74e32b24c7ee0f2573</originalsourceid><addsrcrecordid>eNpNkE9LxDAQxYMouK7ePXjIF-g6SSZpe1y66x-ouMjeS9pMpLJtlrQIfntbdhFPM_DeG978GLsXsBIC8seyWK8kSFwpRJ1jdsEWQmuTGDB4-bdrc81uhuELAI3KcMF2uxhsM7bfxLddTc61_ScPPS_CwfGNHS33IfIN0ZGXZGM_yx_UhK6j3tmxnaxvwdGB76NtZ_WWXXl7GOjuPJds_7TdFy9J-f78WqzLpDFSJ9L5WgvtjdFNhlNjU4PTtcpFWhsnPWCaZzA9psCjsSmSkrXEJiUCL3WqlgxOZ5sYhiGSr46x7Wz8qQRUM5BqAlLNQKozkCnycIq0RPTPbjRONdQvGeRbFA</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Proactive Embedding on Cold Data for Deep Learning Recommendation Model Training</title><source>IEEE Electronic Library (IEL)</source><creator>Cho, Haeyoon ; Son, Hyojun ; Choi, Jungmin ; Koh, Byungil ; Ha, Minho ; Kim, John</creator><creatorcontrib>Cho, Haeyoon ; Son, Hyojun ; Choi, Jungmin ; Koh, Byungil ; Ha, Minho ; Kim, John</creatorcontrib><description>Deep learning recommendation model (DLRM) is an important class of deep learning networks that are commonly used in many applications. DRLM presents unique challenges, especially for scale-out training since it not only has compute and memory-intensive components but the communication between the multiple GPUs is also on the critical path. In this work, we propose how cold data in DLRM embedding tables can be exploited to propose proactive embedding. In particular, proactive embedding allows embedding table accesses to be done in advance to reduce the impact of the memory access latency by overlapping the embedding access with communication. Our analysis of proactive embedding demonstrates that it can improve overall training performance by 46%.</description><identifier>ISSN: 1556-6056</identifier><identifier>EISSN: 1556-6064</identifier><identifier>DOI: 10.1109/LCA.2024.3445948</identifier><identifier>CODEN: ICALC3</identifier><language>eng</language><publisher>IEEE</publisher><subject>Backpropagation ; Data models ; Deep learning ; Graphics processing units ; Parallel processing ; Pipelines ; recommendation system ; Training</subject><ispartof>IEEE computer architecture letters, 2024-07, Vol.23 (2), p.203-206</ispartof><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c625-2dfb515f665c844456b0d5b3917b6d2f04798011030f46a74e32b24c7ee0f2573</cites><orcidid>0000-0003-3958-3891 ; 0000-0002-5199-9139</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10654665$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,796,27924,27925,54758</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10654665$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Cho, Haeyoon</creatorcontrib><creatorcontrib>Son, Hyojun</creatorcontrib><creatorcontrib>Choi, Jungmin</creatorcontrib><creatorcontrib>Koh, Byungil</creatorcontrib><creatorcontrib>Ha, Minho</creatorcontrib><creatorcontrib>Kim, John</creatorcontrib><title>Proactive Embedding on Cold Data for Deep Learning Recommendation Model Training</title><title>IEEE computer architecture letters</title><addtitle>LCA</addtitle><description>Deep learning recommendation model (DLRM) is an important class of deep learning networks that are commonly used in many applications. DRLM presents unique challenges, especially for scale-out training since it not only has compute and memory-intensive components but the communication between the multiple GPUs is also on the critical path. In this work, we propose how cold data in DLRM embedding tables can be exploited to propose proactive embedding. In particular, proactive embedding allows embedding table accesses to be done in advance to reduce the impact of the memory access latency by overlapping the embedding access with communication. Our analysis of proactive embedding demonstrates that it can improve overall training performance by 46%.</description><subject>Backpropagation</subject><subject>Data models</subject><subject>Deep learning</subject><subject>Graphics processing units</subject><subject>Parallel processing</subject><subject>Pipelines</subject><subject>recommendation system</subject><subject>Training</subject><issn>1556-6056</issn><issn>1556-6064</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpNkE9LxDAQxYMouK7ePXjIF-g6SSZpe1y66x-ouMjeS9pMpLJtlrQIfntbdhFPM_DeG978GLsXsBIC8seyWK8kSFwpRJ1jdsEWQmuTGDB4-bdrc81uhuELAI3KcMF2uxhsM7bfxLddTc61_ScPPS_CwfGNHS33IfIN0ZGXZGM_yx_UhK6j3tmxnaxvwdGB76NtZ_WWXXl7GOjuPJds_7TdFy9J-f78WqzLpDFSJ9L5WgvtjdFNhlNjU4PTtcpFWhsnPWCaZzA9psCjsSmSkrXEJiUCL3WqlgxOZ5sYhiGSr46x7Wz8qQRUM5BqAlLNQKozkCnycIq0RPTPbjRONdQvGeRbFA</recordid><startdate>202407</startdate><enddate>202407</enddate><creator>Cho, Haeyoon</creator><creator>Son, Hyojun</creator><creator>Choi, Jungmin</creator><creator>Koh, Byungil</creator><creator>Ha, Minho</creator><creator>Kim, John</creator><general>IEEE</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><orcidid>https://orcid.org/0000-0003-3958-3891</orcidid><orcidid>https://orcid.org/0000-0002-5199-9139</orcidid></search><sort><creationdate>202407</creationdate><title>Proactive Embedding on Cold Data for Deep Learning Recommendation Model Training</title><author>Cho, Haeyoon ; Son, Hyojun ; Choi, Jungmin ; Koh, Byungil ; Ha, Minho ; Kim, John</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c625-2dfb515f665c844456b0d5b3917b6d2f04798011030f46a74e32b24c7ee0f2573</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Backpropagation</topic><topic>Data models</topic><topic>Deep learning</topic><topic>Graphics processing units</topic><topic>Parallel processing</topic><topic>Pipelines</topic><topic>recommendation system</topic><topic>Training</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Cho, Haeyoon</creatorcontrib><creatorcontrib>Son, Hyojun</creatorcontrib><creatorcontrib>Choi, Jungmin</creatorcontrib><creatorcontrib>Koh, Byungil</creatorcontrib><creatorcontrib>Ha, Minho</creatorcontrib><creatorcontrib>Kim, John</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><jtitle>IEEE computer architecture letters</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Cho, Haeyoon</au><au>Son, Hyojun</au><au>Choi, Jungmin</au><au>Koh, Byungil</au><au>Ha, Minho</au><au>Kim, John</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Proactive Embedding on Cold Data for Deep Learning Recommendation Model Training</atitle><jtitle>IEEE computer architecture letters</jtitle><stitle>LCA</stitle><date>2024-07</date><risdate>2024</risdate><volume>23</volume><issue>2</issue><spage>203</spage><epage>206</epage><pages>203-206</pages><issn>1556-6056</issn><eissn>1556-6064</eissn><coden>ICALC3</coden><abstract>Deep learning recommendation model (DLRM) is an important class of deep learning networks that are commonly used in many applications. DRLM presents unique challenges, especially for scale-out training since it not only has compute and memory-intensive components but the communication between the multiple GPUs is also on the critical path. In this work, we propose how cold data in DLRM embedding tables can be exploited to propose proactive embedding. In particular, proactive embedding allows embedding table accesses to be done in advance to reduce the impact of the memory access latency by overlapping the embedding access with communication. Our analysis of proactive embedding demonstrates that it can improve overall training performance by 46%.</abstract><pub>IEEE</pub><doi>10.1109/LCA.2024.3445948</doi><tpages>4</tpages><orcidid>https://orcid.org/0000-0003-3958-3891</orcidid><orcidid>https://orcid.org/0000-0002-5199-9139</orcidid></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISSN: 1556-6056
ispartof	IEEE computer architecture letters, 2024-07, Vol.23 (2), p.203-206
issn	1556-6056 1556-6064
language	eng
recordid	cdi_ieee_primary_10654665
source	IEEE Electronic Library (IEL)
subjects	Backpropagation Data models Deep learning Graphics processing units Parallel processing Pipelines recommendation system Training
title	Proactive Embedding on Cold Data for Deep Learning Recommendation Model Training
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-27T12%3A02%3A35IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-crossref_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Proactive%20Embedding%20on%20Cold%20Data%20for%20Deep%20Learning%20Recommendation%20Model%20Training&rft.jtitle=IEEE%20computer%20architecture%20letters&rft.au=Cho,%20Haeyoon&rft.date=2024-07&rft.volume=23&rft.issue=2&rft.spage=203&rft.epage=206&rft.pages=203-206&rft.issn=1556-6056&rft.eissn=1556-6064&rft.coden=ICALC3&rft_id=info:doi/10.1109/LCA.2024.3445948&rft_dat=%3Ccrossref_RIE%3E10_1109_LCA_2024_3445948%3C/crossref_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=10654665&rfr_iscdi=true