TensorTEE: Unifying Heterogeneous TEE Granularity for Efficient Secure Collaborative Tensor Computing

Heterogeneous collaborative computing with NPU and CPU has received widespread attention due to its substantial performance benefits. To ensure data confidentiality and integrity during computing, Trusted Execution Environments (TEE) is considered a promising solution because of its comparatively lo...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	arXiv.org 2024-07
Hauptverfasser:	Han, Husheng, Zheng, Xinyao, Wen, Yuanbo, Hao, Yifan, Feng, Erhu, Liang, Ling, Mu, Jianan, Li, Xiaqing, Ma, Tianyun, Jin, Pengwei, Song, Xinkai, Du, Zidong, Guo, Qi, Hu, Xing
Format:	Artikel
Sprache:	eng
Schlagworte:	Collaboration Computation Computer memory Computer Science - Artificial Intelligence Computer Science - Cryptography and Security Computer Science - Hardware Architecture Data transfer (computers) Large language models Performance enhancement Performance evaluation Scheduling Tensors
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title	arXiv.org
container_volume
creator	Han, Husheng Zheng, Xinyao Wen, Yuanbo Hao, Yifan Feng, Erhu Liang, Ling Mu, Jianan Li, Xiaqing Ma, Tianyun Jin, Pengwei Song, Xinkai Du, Zidong Guo, Qi Hu, Xing
description	Heterogeneous collaborative computing with NPU and CPU has received widespread attention due to its substantial performance benefits. To ensure data confidentiality and integrity during computing, Trusted Execution Environments (TEE) is considered a promising solution because of its comparatively lower overhead. However, existing heterogeneous TEE designs are inefficient for collaborative computing due to fine and different memory granularities between CPU and NPU. 1) The cacheline granularity of CPU TEE intensifies memory pressure due to its extra memory access, and 2) the cacheline granularity MAC of NPU escalates the pressure on the limited memory storage. 3) Data transfer across heterogeneous enclaves relies on the transit of non-secure regions, resulting in cumbersome re-encryption and scheduling. To address these issues, we propose TensorTEE, a unified tensor-granularity heterogeneous TEE for efficient secure collaborative tensor computing. First, we virtually support tensor granularity in CPU TEE to eliminate the off-chip metadata access by detecting and maintaining tensor structures on-chip. Second, we propose tensor-granularity MAC management with predictive execution to avoid computational stalls while eliminating off-chip MAC storage and access. Moreover, based on the unified granularity, we enable direct data transfer without re-encryption and scheduling dilemmas. Our evaluation is built on enhanced Gem5 and a cycle-accurate NPU simulator. The results show that TensorTEE improves the performance of Large Language Model (LLM) training workloads by 4.0x compared to existing work and incurs only 2.1% overhead compared to non-secure training, offering a practical security assurance for LLM training.
doi_str_mv	10.48550/arxiv.2407.08903
format	Article
fullrecord	<record><control><sourceid>proquest_arxiv</sourceid><recordid>TN_cdi_arxiv_primary_2407_08903</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3080873583</sourcerecordid><originalsourceid>FETCH-LOGICAL-a523-a7af18e7daa559c34c403c77a42b9bde8f2d91c1c754aa7feec57ad1ef3c46e83</originalsourceid><addsrcrecordid>eNotkF1LwzAUhoMgOOZ-gFcGvO5M87Gk3smomzDwwnpdztKTkdElM22H-_fWzasD73l5eHkIecjZXBql2DOkH3-ac8n0nJmCiRsy4ULkmZGc35FZ1-0ZY3yhuVJiQrDC0MVUleUL_QrenX3Y0TX2mOIOA8aho-OPrhKEoYXk-zN1MdHSOW89hp5-oh0S0mVsW9jGBL0_Ib1Cx_BwHPqReE9uHbQdzv7vlFRvZbVcZ5uP1fvydZOB4iIDDS43qBsApQorpJVMWK1B8m2xbdA43hS5za1WEkA7RKs0NDk6YeUCjZiSxyv24qA-Jn-AdK7_XNQXF2Pj6do4pvg9YNfX-zikMG6qBTPMaKGMEL_xV2PD</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3080873583</pqid></control><display><type>article</type><title>TensorTEE: Unifying Heterogeneous TEE Granularity for Efficient Secure Collaborative Tensor Computing</title><source>arXiv.org</source><source>Free E- Journals</source><creator>Han, Husheng ; Zheng, Xinyao ; Wen, Yuanbo ; Hao, Yifan ; Feng, Erhu ; Liang, Ling ; Mu, Jianan ; Li, Xiaqing ; Ma, Tianyun ; Jin, Pengwei ; Song, Xinkai ; Du, Zidong ; Guo, Qi ; Hu, Xing</creator><creatorcontrib>Han, Husheng ; Zheng, Xinyao ; Wen, Yuanbo ; Hao, Yifan ; Feng, Erhu ; Liang, Ling ; Mu, Jianan ; Li, Xiaqing ; Ma, Tianyun ; Jin, Pengwei ; Song, Xinkai ; Du, Zidong ; Guo, Qi ; Hu, Xing</creatorcontrib><description>Heterogeneous collaborative computing with NPU and CPU has received widespread attention due to its substantial performance benefits. To ensure data confidentiality and integrity during computing, Trusted Execution Environments (TEE) is considered a promising solution because of its comparatively lower overhead. However, existing heterogeneous TEE designs are inefficient for collaborative computing due to fine and different memory granularities between CPU and NPU. 1) The cacheline granularity of CPU TEE intensifies memory pressure due to its extra memory access, and 2) the cacheline granularity MAC of NPU escalates the pressure on the limited memory storage. 3) Data transfer across heterogeneous enclaves relies on the transit of non-secure regions, resulting in cumbersome re-encryption and scheduling. To address these issues, we propose TensorTEE, a unified tensor-granularity heterogeneous TEE for efficient secure collaborative tensor computing. First, we virtually support tensor granularity in CPU TEE to eliminate the off-chip metadata access by detecting and maintaining tensor structures on-chip. Second, we propose tensor-granularity MAC management with predictive execution to avoid computational stalls while eliminating off-chip MAC storage and access. Moreover, based on the unified granularity, we enable direct data transfer without re-encryption and scheduling dilemmas. Our evaluation is built on enhanced Gem5 and a cycle-accurate NPU simulator. The results show that TensorTEE improves the performance of Large Language Model (LLM) training workloads by 4.0x compared to existing work and incurs only 2.1% overhead compared to non-secure training, offering a practical security assurance for LLM training.</description><identifier>EISSN: 2331-8422</identifier><identifier>DOI: 10.48550/arxiv.2407.08903</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Collaboration ; Computation ; Computer memory ; Computer Science - Artificial Intelligence ; Computer Science - Cryptography and Security ; Computer Science - Hardware Architecture ; Data transfer (computers) ; Large language models ; Performance enhancement ; Performance evaluation ; Scheduling ; Tensors</subject><ispartof>arXiv.org, 2024-07</ispartof><rights>2024. This work is published under http://creativecommons.org/licenses/by-nc-sa/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><rights>http://creativecommons.org/licenses/by-nc-sa/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,784,885,27925</link.rule.ids><backlink>$$Uhttps://doi.org/10.1145/3622781.3674168$$DView published paper (Access to full text may be restricted)$$Hfree_for_read</backlink><backlink>$$Uhttps://doi.org/10.48550/arXiv.2407.08903$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Han, Husheng</creatorcontrib><creatorcontrib>Zheng, Xinyao</creatorcontrib><creatorcontrib>Wen, Yuanbo</creatorcontrib><creatorcontrib>Hao, Yifan</creatorcontrib><creatorcontrib>Feng, Erhu</creatorcontrib><creatorcontrib>Liang, Ling</creatorcontrib><creatorcontrib>Mu, Jianan</creatorcontrib><creatorcontrib>Li, Xiaqing</creatorcontrib><creatorcontrib>Ma, Tianyun</creatorcontrib><creatorcontrib>Jin, Pengwei</creatorcontrib><creatorcontrib>Song, Xinkai</creatorcontrib><creatorcontrib>Du, Zidong</creatorcontrib><creatorcontrib>Guo, Qi</creatorcontrib><creatorcontrib>Hu, Xing</creatorcontrib><title>TensorTEE: Unifying Heterogeneous TEE Granularity for Efficient Secure Collaborative Tensor Computing</title><title>arXiv.org</title><description>Heterogeneous collaborative computing with NPU and CPU has received widespread attention due to its substantial performance benefits. To ensure data confidentiality and integrity during computing, Trusted Execution Environments (TEE) is considered a promising solution because of its comparatively lower overhead. However, existing heterogeneous TEE designs are inefficient for collaborative computing due to fine and different memory granularities between CPU and NPU. 1) The cacheline granularity of CPU TEE intensifies memory pressure due to its extra memory access, and 2) the cacheline granularity MAC of NPU escalates the pressure on the limited memory storage. 3) Data transfer across heterogeneous enclaves relies on the transit of non-secure regions, resulting in cumbersome re-encryption and scheduling. To address these issues, we propose TensorTEE, a unified tensor-granularity heterogeneous TEE for efficient secure collaborative tensor computing. First, we virtually support tensor granularity in CPU TEE to eliminate the off-chip metadata access by detecting and maintaining tensor structures on-chip. Second, we propose tensor-granularity MAC management with predictive execution to avoid computational stalls while eliminating off-chip MAC storage and access. Moreover, based on the unified granularity, we enable direct data transfer without re-encryption and scheduling dilemmas. Our evaluation is built on enhanced Gem5 and a cycle-accurate NPU simulator. The results show that TensorTEE improves the performance of Large Language Model (LLM) training workloads by 4.0x compared to existing work and incurs only 2.1% overhead compared to non-secure training, offering a practical security assurance for LLM training.</description><subject>Collaboration</subject><subject>Computation</subject><subject>Computer memory</subject><subject>Computer Science - Artificial Intelligence</subject><subject>Computer Science - Cryptography and Security</subject><subject>Computer Science - Hardware Architecture</subject><subject>Data transfer (computers)</subject><subject>Large language models</subject><subject>Performance enhancement</subject><subject>Performance evaluation</subject><subject>Scheduling</subject><subject>Tensors</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GOX</sourceid><recordid>eNotkF1LwzAUhoMgOOZ-gFcGvO5M87Gk3smomzDwwnpdztKTkdElM22H-_fWzasD73l5eHkIecjZXBql2DOkH3-ac8n0nJmCiRsy4ULkmZGc35FZ1-0ZY3yhuVJiQrDC0MVUleUL_QrenX3Y0TX2mOIOA8aho-OPrhKEoYXk-zN1MdHSOW89hp5-oh0S0mVsW9jGBL0_Ib1Cx_BwHPqReE9uHbQdzv7vlFRvZbVcZ5uP1fvydZOB4iIDDS43qBsApQorpJVMWK1B8m2xbdA43hS5za1WEkA7RKs0NDk6YeUCjZiSxyv24qA-Jn-AdK7_XNQXF2Pj6do4pvg9YNfX-zikMG6qBTPMaKGMEL_xV2PD</recordid><startdate>20240712</startdate><enddate>20240712</enddate><creator>Han, Husheng</creator><creator>Zheng, Xinyao</creator><creator>Wen, Yuanbo</creator><creator>Hao, Yifan</creator><creator>Feng, Erhu</creator><creator>Liang, Ling</creator><creator>Mu, Jianan</creator><creator>Li, Xiaqing</creator><creator>Ma, Tianyun</creator><creator>Jin, Pengwei</creator><creator>Song, Xinkai</creator><creator>Du, Zidong</creator><creator>Guo, Qi</creator><creator>Hu, Xing</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20240712</creationdate><title>TensorTEE: Unifying Heterogeneous TEE Granularity for Efficient Secure Collaborative Tensor Computing</title><author>Han, Husheng ; Zheng, Xinyao ; Wen, Yuanbo ; Hao, Yifan ; Feng, Erhu ; Liang, Ling ; Mu, Jianan ; Li, Xiaqing ; Ma, Tianyun ; Jin, Pengwei ; Song, Xinkai ; Du, Zidong ; Guo, Qi ; Hu, Xing</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a523-a7af18e7daa559c34c403c77a42b9bde8f2d91c1c754aa7feec57ad1ef3c46e83</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Collaboration</topic><topic>Computation</topic><topic>Computer memory</topic><topic>Computer Science - Artificial Intelligence</topic><topic>Computer Science - Cryptography and Security</topic><topic>Computer Science - Hardware Architecture</topic><topic>Data transfer (computers)</topic><topic>Large language models</topic><topic>Performance enhancement</topic><topic>Performance evaluation</topic><topic>Scheduling</topic><topic>Tensors</topic><toplevel>online_resources</toplevel><creatorcontrib>Han, Husheng</creatorcontrib><creatorcontrib>Zheng, Xinyao</creatorcontrib><creatorcontrib>Wen, Yuanbo</creatorcontrib><creatorcontrib>Hao, Yifan</creatorcontrib><creatorcontrib>Feng, Erhu</creatorcontrib><creatorcontrib>Liang, Ling</creatorcontrib><creatorcontrib>Mu, Jianan</creatorcontrib><creatorcontrib>Li, Xiaqing</creatorcontrib><creatorcontrib>Ma, Tianyun</creatorcontrib><creatorcontrib>Jin, Pengwei</creatorcontrib><creatorcontrib>Song, Xinkai</creatorcontrib><creatorcontrib>Du, Zidong</creatorcontrib><creatorcontrib>Guo, Qi</creatorcontrib><creatorcontrib>Hu, Xing</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection (ProQuest)</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection><collection>arXiv Computer Science</collection><collection>arXiv.org</collection><jtitle>arXiv.org</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Han, Husheng</au><au>Zheng, Xinyao</au><au>Wen, Yuanbo</au><au>Hao, Yifan</au><au>Feng, Erhu</au><au>Liang, Ling</au><au>Mu, Jianan</au><au>Li, Xiaqing</au><au>Ma, Tianyun</au><au>Jin, Pengwei</au><au>Song, Xinkai</au><au>Du, Zidong</au><au>Guo, Qi</au><au>Hu, Xing</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>TensorTEE: Unifying Heterogeneous TEE Granularity for Efficient Secure Collaborative Tensor Computing</atitle><jtitle>arXiv.org</jtitle><date>2024-07-12</date><risdate>2024</risdate><eissn>2331-8422</eissn><abstract>Heterogeneous collaborative computing with NPU and CPU has received widespread attention due to its substantial performance benefits. To ensure data confidentiality and integrity during computing, Trusted Execution Environments (TEE) is considered a promising solution because of its comparatively lower overhead. However, existing heterogeneous TEE designs are inefficient for collaborative computing due to fine and different memory granularities between CPU and NPU. 1) The cacheline granularity of CPU TEE intensifies memory pressure due to its extra memory access, and 2) the cacheline granularity MAC of NPU escalates the pressure on the limited memory storage. 3) Data transfer across heterogeneous enclaves relies on the transit of non-secure regions, resulting in cumbersome re-encryption and scheduling. To address these issues, we propose TensorTEE, a unified tensor-granularity heterogeneous TEE for efficient secure collaborative tensor computing. First, we virtually support tensor granularity in CPU TEE to eliminate the off-chip metadata access by detecting and maintaining tensor structures on-chip. Second, we propose tensor-granularity MAC management with predictive execution to avoid computational stalls while eliminating off-chip MAC storage and access. Moreover, based on the unified granularity, we enable direct data transfer without re-encryption and scheduling dilemmas. Our evaluation is built on enhanced Gem5 and a cycle-accurate NPU simulator. The results show that TensorTEE improves the performance of Large Language Model (LLM) training workloads by 4.0x compared to existing work and incurs only 2.1% overhead compared to non-secure training, offering a practical security assurance for LLM training.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><doi>10.48550/arxiv.2407.08903</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	EISSN: 2331-8422
ispartof	arXiv.org, 2024-07
issn	2331-8422
language	eng
recordid	cdi_arxiv_primary_2407_08903
source	arXiv.org; Free E- Journals
subjects	Collaboration Computation Computer memory Computer Science - Artificial Intelligence Computer Science - Cryptography and Security Computer Science - Hardware Architecture Data transfer (computers) Large language models Performance enhancement Performance evaluation Scheduling Tensors
title	TensorTEE: Unifying Heterogeneous TEE Granularity for Efficient Secure Collaborative Tensor Computing
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-06T15%3A54%3A41IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_arxiv&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=TensorTEE:%20Unifying%20Heterogeneous%20TEE%20Granularity%20for%20Efficient%20Secure%20Collaborative%20Tensor%20Computing&rft.jtitle=arXiv.org&rft.au=Han,%20Husheng&rft.date=2024-07-12&rft.eissn=2331-8422&rft_id=info:doi/10.48550/arxiv.2407.08903&rft_dat=%3Cproquest_arxiv%3E3080873583%3C/proquest_arxiv%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3080873583&rft_id=info:pmid/&rfr_iscdi=true