TensorTEE: Unifying Heterogeneous TEE Granularity for Efficient Secure Collaborative Tensor Computing

Heterogeneous collaborative computing with NPU and CPU has received widespread attention due to its substantial performance benefits. To ensure data confidentiality and integrity during computing, Trusted Execution Environments (TEE) is considered a promising solution because of its comparatively lo...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:arXiv.org 2024-07
Hauptverfasser: Han, Husheng, Zheng, Xinyao, Wen, Yuanbo, Hao, Yifan, Feng, Erhu, Liang, Ling, Mu, Jianan, Li, Xiaqing, Ma, Tianyun, Jin, Pengwei, Song, Xinkai, Du, Zidong, Guo, Qi, Hu, Xing
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title arXiv.org
container_volume
creator Han, Husheng
Zheng, Xinyao
Wen, Yuanbo
Hao, Yifan
Feng, Erhu
Liang, Ling
Mu, Jianan
Li, Xiaqing
Ma, Tianyun
Jin, Pengwei
Song, Xinkai
Du, Zidong
Guo, Qi
Hu, Xing
description Heterogeneous collaborative computing with NPU and CPU has received widespread attention due to its substantial performance benefits. To ensure data confidentiality and integrity during computing, Trusted Execution Environments (TEE) is considered a promising solution because of its comparatively lower overhead. However, existing heterogeneous TEE designs are inefficient for collaborative computing due to fine and different memory granularities between CPU and NPU. 1) The cacheline granularity of CPU TEE intensifies memory pressure due to its extra memory access, and 2) the cacheline granularity MAC of NPU escalates the pressure on the limited memory storage. 3) Data transfer across heterogeneous enclaves relies on the transit of non-secure regions, resulting in cumbersome re-encryption and scheduling. To address these issues, we propose TensorTEE, a unified tensor-granularity heterogeneous TEE for efficient secure collaborative tensor computing. First, we virtually support tensor granularity in CPU TEE to eliminate the off-chip metadata access by detecting and maintaining tensor structures on-chip. Second, we propose tensor-granularity MAC management with predictive execution to avoid computational stalls while eliminating off-chip MAC storage and access. Moreover, based on the unified granularity, we enable direct data transfer without re-encryption and scheduling dilemmas. Our evaluation is built on enhanced Gem5 and a cycle-accurate NPU simulator. The results show that TensorTEE improves the performance of Large Language Model (LLM) training workloads by 4.0x compared to existing work and incurs only 2.1% overhead compared to non-secure training, offering a practical security assurance for LLM training.
doi_str_mv 10.48550/arxiv.2407.08903
format Article
fullrecord <record><control><sourceid>proquest_arxiv</sourceid><recordid>TN_cdi_arxiv_primary_2407_08903</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3080873583</sourcerecordid><originalsourceid>FETCH-LOGICAL-a523-a7af18e7daa559c34c403c77a42b9bde8f2d91c1c754aa7feec57ad1ef3c46e83</originalsourceid><addsrcrecordid>eNotkF1LwzAUhoMgOOZ-gFcGvO5M87Gk3smomzDwwnpdztKTkdElM22H-_fWzasD73l5eHkIecjZXBql2DOkH3-ac8n0nJmCiRsy4ULkmZGc35FZ1-0ZY3yhuVJiQrDC0MVUleUL_QrenX3Y0TX2mOIOA8aho-OPrhKEoYXk-zN1MdHSOW89hp5-oh0S0mVsW9jGBL0_Ib1Cx_BwHPqReE9uHbQdzv7vlFRvZbVcZ5uP1fvydZOB4iIDDS43qBsApQorpJVMWK1B8m2xbdA43hS5za1WEkA7RKs0NDk6YeUCjZiSxyv24qA-Jn-AdK7_XNQXF2Pj6do4pvg9YNfX-zikMG6qBTPMaKGMEL_xV2PD</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3080873583</pqid></control><display><type>article</type><title>TensorTEE: Unifying Heterogeneous TEE Granularity for Efficient Secure Collaborative Tensor Computing</title><source>arXiv.org</source><source>Free E- Journals</source><creator>Han, Husheng ; Zheng, Xinyao ; Wen, Yuanbo ; Hao, Yifan ; Feng, Erhu ; Liang, Ling ; Mu, Jianan ; Li, Xiaqing ; Ma, Tianyun ; Jin, Pengwei ; Song, Xinkai ; Du, Zidong ; Guo, Qi ; Hu, Xing</creator><creatorcontrib>Han, Husheng ; Zheng, Xinyao ; Wen, Yuanbo ; Hao, Yifan ; Feng, Erhu ; Liang, Ling ; Mu, Jianan ; Li, Xiaqing ; Ma, Tianyun ; Jin, Pengwei ; Song, Xinkai ; Du, Zidong ; Guo, Qi ; Hu, Xing</creatorcontrib><description>Heterogeneous collaborative computing with NPU and CPU has received widespread attention due to its substantial performance benefits. To ensure data confidentiality and integrity during computing, Trusted Execution Environments (TEE) is considered a promising solution because of its comparatively lower overhead. However, existing heterogeneous TEE designs are inefficient for collaborative computing due to fine and different memory granularities between CPU and NPU. 1) The cacheline granularity of CPU TEE intensifies memory pressure due to its extra memory access, and 2) the cacheline granularity MAC of NPU escalates the pressure on the limited memory storage. 3) Data transfer across heterogeneous enclaves relies on the transit of non-secure regions, resulting in cumbersome re-encryption and scheduling. To address these issues, we propose TensorTEE, a unified tensor-granularity heterogeneous TEE for efficient secure collaborative tensor computing. First, we virtually support tensor granularity in CPU TEE to eliminate the off-chip metadata access by detecting and maintaining tensor structures on-chip. Second, we propose tensor-granularity MAC management with predictive execution to avoid computational stalls while eliminating off-chip MAC storage and access. Moreover, based on the unified granularity, we enable direct data transfer without re-encryption and scheduling dilemmas. Our evaluation is built on enhanced Gem5 and a cycle-accurate NPU simulator. The results show that TensorTEE improves the performance of Large Language Model (LLM) training workloads by 4.0x compared to existing work and incurs only 2.1% overhead compared to non-secure training, offering a practical security assurance for LLM training.</description><identifier>EISSN: 2331-8422</identifier><identifier>DOI: 10.48550/arxiv.2407.08903</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Collaboration ; Computation ; Computer memory ; Computer Science - Artificial Intelligence ; Computer Science - Cryptography and Security ; Computer Science - Hardware Architecture ; Data transfer (computers) ; Large language models ; Performance enhancement ; Performance evaluation ; Scheduling ; Tensors</subject><ispartof>arXiv.org, 2024-07</ispartof><rights>2024. This work is published under http://creativecommons.org/licenses/by-nc-sa/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><rights>http://creativecommons.org/licenses/by-nc-sa/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,784,885,27925</link.rule.ids><backlink>$$Uhttps://doi.org/10.1145/3622781.3674168$$DView published paper (Access to full text may be restricted)$$Hfree_for_read</backlink><backlink>$$Uhttps://doi.org/10.48550/arXiv.2407.08903$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Han, Husheng</creatorcontrib><creatorcontrib>Zheng, Xinyao</creatorcontrib><creatorcontrib>Wen, Yuanbo</creatorcontrib><creatorcontrib>Hao, Yifan</creatorcontrib><creatorcontrib>Feng, Erhu</creatorcontrib><creatorcontrib>Liang, Ling</creatorcontrib><creatorcontrib>Mu, Jianan</creatorcontrib><creatorcontrib>Li, Xiaqing</creatorcontrib><creatorcontrib>Ma, Tianyun</creatorcontrib><creatorcontrib>Jin, Pengwei</creatorcontrib><creatorcontrib>Song, Xinkai</creatorcontrib><creatorcontrib>Du, Zidong</creatorcontrib><creatorcontrib>Guo, Qi</creatorcontrib><creatorcontrib>Hu, Xing</creatorcontrib><title>TensorTEE: Unifying Heterogeneous TEE Granularity for Efficient Secure Collaborative Tensor Computing</title><title>arXiv.org</title><description>Heterogeneous collaborative computing with NPU and CPU has received widespread attention due to its substantial performance benefits. To ensure data confidentiality and integrity during computing, Trusted Execution Environments (TEE) is considered a promising solution because of its comparatively lower overhead. However, existing heterogeneous TEE designs are inefficient for collaborative computing due to fine and different memory granularities between CPU and NPU. 1) The cacheline granularity of CPU TEE intensifies memory pressure due to its extra memory access, and 2) the cacheline granularity MAC of NPU escalates the pressure on the limited memory storage. 3) Data transfer across heterogeneous enclaves relies on the transit of non-secure regions, resulting in cumbersome re-encryption and scheduling. To address these issues, we propose TensorTEE, a unified tensor-granularity heterogeneous TEE for efficient secure collaborative tensor computing. First, we virtually support tensor granularity in CPU TEE to eliminate the off-chip metadata access by detecting and maintaining tensor structures on-chip. Second, we propose tensor-granularity MAC management with predictive execution to avoid computational stalls while eliminating off-chip MAC storage and access. Moreover, based on the unified granularity, we enable direct data transfer without re-encryption and scheduling dilemmas. Our evaluation is built on enhanced Gem5 and a cycle-accurate NPU simulator. The results show that TensorTEE improves the performance of Large Language Model (LLM) training workloads by 4.0x compared to existing work and incurs only 2.1% overhead compared to non-secure training, offering a practical security assurance for LLM training.</description><subject>Collaboration</subject><subject>Computation</subject><subject>Computer memory</subject><subject>Computer Science - Artificial Intelligence</subject><subject>Computer Science - Cryptography and Security</subject><subject>Computer Science - Hardware Architecture</subject><subject>Data transfer (computers)</subject><subject>Large language models</subject><subject>Performance enhancement</subject><subject>Performance evaluation</subject><subject>Scheduling</subject><subject>Tensors</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GOX</sourceid><recordid>eNotkF1LwzAUhoMgOOZ-gFcGvO5M87Gk3smomzDwwnpdztKTkdElM22H-_fWzasD73l5eHkIecjZXBql2DOkH3-ac8n0nJmCiRsy4ULkmZGc35FZ1-0ZY3yhuVJiQrDC0MVUleUL_QrenX3Y0TX2mOIOA8aho-OPrhKEoYXk-zN1MdHSOW89hp5-oh0S0mVsW9jGBL0_Ib1Cx_BwHPqReE9uHbQdzv7vlFRvZbVcZ5uP1fvydZOB4iIDDS43qBsApQorpJVMWK1B8m2xbdA43hS5za1WEkA7RKs0NDk6YeUCjZiSxyv24qA-Jn-AdK7_XNQXF2Pj6do4pvg9YNfX-zikMG6qBTPMaKGMEL_xV2PD</recordid><startdate>20240712</startdate><enddate>20240712</enddate><creator>Han, Husheng</creator><creator>Zheng, Xinyao</creator><creator>Wen, Yuanbo</creator><creator>Hao, Yifan</creator><creator>Feng, Erhu</creator><creator>Liang, Ling</creator><creator>Mu, Jianan</creator><creator>Li, Xiaqing</creator><creator>Ma, Tianyun</creator><creator>Jin, Pengwei</creator><creator>Song, Xinkai</creator><creator>Du, Zidong</creator><creator>Guo, Qi</creator><creator>Hu, Xing</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20240712</creationdate><title>TensorTEE: Unifying Heterogeneous TEE Granularity for Efficient Secure Collaborative Tensor Computing</title><author>Han, Husheng ; Zheng, Xinyao ; Wen, Yuanbo ; Hao, Yifan ; Feng, Erhu ; Liang, Ling ; Mu, Jianan ; Li, Xiaqing ; Ma, Tianyun ; Jin, Pengwei ; Song, Xinkai ; Du, Zidong ; Guo, Qi ; Hu, Xing</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a523-a7af18e7daa559c34c403c77a42b9bde8f2d91c1c754aa7feec57ad1ef3c46e83</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Collaboration</topic><topic>Computation</topic><topic>Computer memory</topic><topic>Computer Science - Artificial Intelligence</topic><topic>Computer Science - Cryptography and Security</topic><topic>Computer Science - Hardware Architecture</topic><topic>Data transfer (computers)</topic><topic>Large language models</topic><topic>Performance enhancement</topic><topic>Performance evaluation</topic><topic>Scheduling</topic><topic>Tensors</topic><toplevel>online_resources</toplevel><creatorcontrib>Han, Husheng</creatorcontrib><creatorcontrib>Zheng, Xinyao</creatorcontrib><creatorcontrib>Wen, Yuanbo</creatorcontrib><creatorcontrib>Hao, Yifan</creatorcontrib><creatorcontrib>Feng, Erhu</creatorcontrib><creatorcontrib>Liang, Ling</creatorcontrib><creatorcontrib>Mu, Jianan</creatorcontrib><creatorcontrib>Li, Xiaqing</creatorcontrib><creatorcontrib>Ma, Tianyun</creatorcontrib><creatorcontrib>Jin, Pengwei</creatorcontrib><creatorcontrib>Song, Xinkai</creatorcontrib><creatorcontrib>Du, Zidong</creatorcontrib><creatorcontrib>Guo, Qi</creatorcontrib><creatorcontrib>Hu, Xing</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science &amp; Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection (ProQuest)</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection><collection>arXiv Computer Science</collection><collection>arXiv.org</collection><jtitle>arXiv.org</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Han, Husheng</au><au>Zheng, Xinyao</au><au>Wen, Yuanbo</au><au>Hao, Yifan</au><au>Feng, Erhu</au><au>Liang, Ling</au><au>Mu, Jianan</au><au>Li, Xiaqing</au><au>Ma, Tianyun</au><au>Jin, Pengwei</au><au>Song, Xinkai</au><au>Du, Zidong</au><au>Guo, Qi</au><au>Hu, Xing</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>TensorTEE: Unifying Heterogeneous TEE Granularity for Efficient Secure Collaborative Tensor Computing</atitle><jtitle>arXiv.org</jtitle><date>2024-07-12</date><risdate>2024</risdate><eissn>2331-8422</eissn><abstract>Heterogeneous collaborative computing with NPU and CPU has received widespread attention due to its substantial performance benefits. To ensure data confidentiality and integrity during computing, Trusted Execution Environments (TEE) is considered a promising solution because of its comparatively lower overhead. However, existing heterogeneous TEE designs are inefficient for collaborative computing due to fine and different memory granularities between CPU and NPU. 1) The cacheline granularity of CPU TEE intensifies memory pressure due to its extra memory access, and 2) the cacheline granularity MAC of NPU escalates the pressure on the limited memory storage. 3) Data transfer across heterogeneous enclaves relies on the transit of non-secure regions, resulting in cumbersome re-encryption and scheduling. To address these issues, we propose TensorTEE, a unified tensor-granularity heterogeneous TEE for efficient secure collaborative tensor computing. First, we virtually support tensor granularity in CPU TEE to eliminate the off-chip metadata access by detecting and maintaining tensor structures on-chip. Second, we propose tensor-granularity MAC management with predictive execution to avoid computational stalls while eliminating off-chip MAC storage and access. Moreover, based on the unified granularity, we enable direct data transfer without re-encryption and scheduling dilemmas. Our evaluation is built on enhanced Gem5 and a cycle-accurate NPU simulator. The results show that TensorTEE improves the performance of Large Language Model (LLM) training workloads by 4.0x compared to existing work and incurs only 2.1% overhead compared to non-secure training, offering a practical security assurance for LLM training.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><doi>10.48550/arxiv.2407.08903</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier EISSN: 2331-8422
ispartof arXiv.org, 2024-07
issn 2331-8422
language eng
recordid cdi_arxiv_primary_2407_08903
source arXiv.org; Free E- Journals
subjects Collaboration
Computation
Computer memory
Computer Science - Artificial Intelligence
Computer Science - Cryptography and Security
Computer Science - Hardware Architecture
Data transfer (computers)
Large language models
Performance enhancement
Performance evaluation
Scheduling
Tensors
title TensorTEE: Unifying Heterogeneous TEE Granularity for Efficient Secure Collaborative Tensor Computing
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-06T15%3A54%3A41IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_arxiv&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=TensorTEE:%20Unifying%20Heterogeneous%20TEE%20Granularity%20for%20Efficient%20Secure%20Collaborative%20Tensor%20Computing&rft.jtitle=arXiv.org&rft.au=Han,%20Husheng&rft.date=2024-07-12&rft.eissn=2331-8422&rft_id=info:doi/10.48550/arxiv.2407.08903&rft_dat=%3Cproquest_arxiv%3E3080873583%3C/proquest_arxiv%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3080873583&rft_id=info:pmid/&rfr_iscdi=true