TensorTEE: Unifying Heterogeneous TEE Granularity for Efficient Secure Collaborative Tensor Computing
Heterogeneous collaborative computing with NPU and CPU has received widespread attention due to its substantial performance benefits. To ensure data confidentiality and integrity during computing, Trusted Execution Environments (TEE) is considered a promising solution because of its comparatively lo...
Gespeichert in:
Veröffentlicht in: | arXiv.org 2024-07 |
---|---|
Hauptverfasser: | , , , , , , , , , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | arXiv.org |
container_volume | |
creator | Han, Husheng Zheng, Xinyao Wen, Yuanbo Hao, Yifan Feng, Erhu Liang, Ling Mu, Jianan Li, Xiaqing Ma, Tianyun Jin, Pengwei Song, Xinkai Du, Zidong Guo, Qi Hu, Xing |
description | Heterogeneous collaborative computing with NPU and CPU has received widespread attention due to its substantial performance benefits. To ensure data confidentiality and integrity during computing, Trusted Execution Environments (TEE) is considered a promising solution because of its comparatively lower overhead. However, existing heterogeneous TEE designs are inefficient for collaborative computing due to fine and different memory granularities between CPU and NPU. 1) The cacheline granularity of CPU TEE intensifies memory pressure due to its extra memory access, and 2) the cacheline granularity MAC of NPU escalates the pressure on the limited memory storage. 3) Data transfer across heterogeneous enclaves relies on the transit of non-secure regions, resulting in cumbersome re-encryption and scheduling. To address these issues, we propose TensorTEE, a unified tensor-granularity heterogeneous TEE for efficient secure collaborative tensor computing. First, we virtually support tensor granularity in CPU TEE to eliminate the off-chip metadata access by detecting and maintaining tensor structures on-chip. Second, we propose tensor-granularity MAC management with predictive execution to avoid computational stalls while eliminating off-chip MAC storage and access. Moreover, based on the unified granularity, we enable direct data transfer without re-encryption and scheduling dilemmas. Our evaluation is built on enhanced Gem5 and a cycle-accurate NPU simulator. The results show that TensorTEE improves the performance of Large Language Model (LLM) training workloads by 4.0x compared to existing work and incurs only 2.1% overhead compared to non-secure training, offering a practical security assurance for LLM training. |
doi_str_mv | 10.48550/arxiv.2407.08903 |
format | Article |
fullrecord | <record><control><sourceid>proquest_arxiv</sourceid><recordid>TN_cdi_arxiv_primary_2407_08903</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3080873583</sourcerecordid><originalsourceid>FETCH-LOGICAL-a523-a7af18e7daa559c34c403c77a42b9bde8f2d91c1c754aa7feec57ad1ef3c46e83</originalsourceid><addsrcrecordid>eNotkF1LwzAUhoMgOOZ-gFcGvO5M87Gk3smomzDwwnpdztKTkdElM22H-_fWzasD73l5eHkIecjZXBql2DOkH3-ac8n0nJmCiRsy4ULkmZGc35FZ1-0ZY3yhuVJiQrDC0MVUleUL_QrenX3Y0TX2mOIOA8aho-OPrhKEoYXk-zN1MdHSOW89hp5-oh0S0mVsW9jGBL0_Ib1Cx_BwHPqReE9uHbQdzv7vlFRvZbVcZ5uP1fvydZOB4iIDDS43qBsApQorpJVMWK1B8m2xbdA43hS5za1WEkA7RKs0NDk6YeUCjZiSxyv24qA-Jn-AdK7_XNQXF2Pj6do4pvg9YNfX-zikMG6qBTPMaKGMEL_xV2PD</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3080873583</pqid></control><display><type>article</type><title>TensorTEE: Unifying Heterogeneous TEE Granularity for Efficient Secure Collaborative Tensor Computing</title><source>arXiv.org</source><source>Free E- Journals</source><creator>Han, Husheng ; Zheng, Xinyao ; Wen, Yuanbo ; Hao, Yifan ; Feng, Erhu ; Liang, Ling ; Mu, Jianan ; Li, Xiaqing ; Ma, Tianyun ; Jin, Pengwei ; Song, Xinkai ; Du, Zidong ; Guo, Qi ; Hu, Xing</creator><creatorcontrib>Han, Husheng ; Zheng, Xinyao ; Wen, Yuanbo ; Hao, Yifan ; Feng, Erhu ; Liang, Ling ; Mu, Jianan ; Li, Xiaqing ; Ma, Tianyun ; Jin, Pengwei ; Song, Xinkai ; Du, Zidong ; Guo, Qi ; Hu, Xing</creatorcontrib><description>Heterogeneous collaborative computing with NPU and CPU has received widespread attention due to its substantial performance benefits. To ensure data confidentiality and integrity during computing, Trusted Execution Environments (TEE) is considered a promising solution because of its comparatively lower overhead. However, existing heterogeneous TEE designs are inefficient for collaborative computing due to fine and different memory granularities between CPU and NPU. 1) The cacheline granularity of CPU TEE intensifies memory pressure due to its extra memory access, and 2) the cacheline granularity MAC of NPU escalates the pressure on the limited memory storage. 3) Data transfer across heterogeneous enclaves relies on the transit of non-secure regions, resulting in cumbersome re-encryption and scheduling. To address these issues, we propose TensorTEE, a unified tensor-granularity heterogeneous TEE for efficient secure collaborative tensor computing. First, we virtually support tensor granularity in CPU TEE to eliminate the off-chip metadata access by detecting and maintaining tensor structures on-chip. Second, we propose tensor-granularity MAC management with predictive execution to avoid computational stalls while eliminating off-chip MAC storage and access. Moreover, based on the unified granularity, we enable direct data transfer without re-encryption and scheduling dilemmas. Our evaluation is built on enhanced Gem5 and a cycle-accurate NPU simulator. The results show that TensorTEE improves the performance of Large Language Model (LLM) training workloads by 4.0x compared to existing work and incurs only 2.1% overhead compared to non-secure training, offering a practical security assurance for LLM training.</description><identifier>EISSN: 2331-8422</identifier><identifier>DOI: 10.48550/arxiv.2407.08903</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Collaboration ; Computation ; Computer memory ; Computer Science - Artificial Intelligence ; Computer Science - Cryptography and Security ; Computer Science - Hardware Architecture ; Data transfer (computers) ; Large language models ; Performance enhancement ; Performance evaluation ; Scheduling ; Tensors</subject><ispartof>arXiv.org, 2024-07</ispartof><rights>2024. This work is published under http://creativecommons.org/licenses/by-nc-sa/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><rights>http://creativecommons.org/licenses/by-nc-sa/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,784,885,27925</link.rule.ids><backlink>$$Uhttps://doi.org/10.1145/3622781.3674168$$DView published paper (Access to full text may be restricted)$$Hfree_for_read</backlink><backlink>$$Uhttps://doi.org/10.48550/arXiv.2407.08903$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Han, Husheng</creatorcontrib><creatorcontrib>Zheng, Xinyao</creatorcontrib><creatorcontrib>Wen, Yuanbo</creatorcontrib><creatorcontrib>Hao, Yifan</creatorcontrib><creatorcontrib>Feng, Erhu</creatorcontrib><creatorcontrib>Liang, Ling</creatorcontrib><creatorcontrib>Mu, Jianan</creatorcontrib><creatorcontrib>Li, Xiaqing</creatorcontrib><creatorcontrib>Ma, Tianyun</creatorcontrib><creatorcontrib>Jin, Pengwei</creatorcontrib><creatorcontrib>Song, Xinkai</creatorcontrib><creatorcontrib>Du, Zidong</creatorcontrib><creatorcontrib>Guo, Qi</creatorcontrib><creatorcontrib>Hu, Xing</creatorcontrib><title>TensorTEE: Unifying Heterogeneous TEE Granularity for Efficient Secure Collaborative Tensor Computing</title><title>arXiv.org</title><description>Heterogeneous collaborative computing with NPU and CPU has received widespread attention due to its substantial performance benefits. To ensure data confidentiality and integrity during computing, Trusted Execution Environments (TEE) is considered a promising solution because of its comparatively lower overhead. However, existing heterogeneous TEE designs are inefficient for collaborative computing due to fine and different memory granularities between CPU and NPU. 1) The cacheline granularity of CPU TEE intensifies memory pressure due to its extra memory access, and 2) the cacheline granularity MAC of NPU escalates the pressure on the limited memory storage. 3) Data transfer across heterogeneous enclaves relies on the transit of non-secure regions, resulting in cumbersome re-encryption and scheduling. To address these issues, we propose TensorTEE, a unified tensor-granularity heterogeneous TEE for efficient secure collaborative tensor computing. First, we virtually support tensor granularity in CPU TEE to eliminate the off-chip metadata access by detecting and maintaining tensor structures on-chip. Second, we propose tensor-granularity MAC management with predictive execution to avoid computational stalls while eliminating off-chip MAC storage and access. Moreover, based on the unified granularity, we enable direct data transfer without re-encryption and scheduling dilemmas. Our evaluation is built on enhanced Gem5 and a cycle-accurate NPU simulator. The results show that TensorTEE improves the performance of Large Language Model (LLM) training workloads by 4.0x compared to existing work and incurs only 2.1% overhead compared to non-secure training, offering a practical security assurance for LLM training.</description><subject>Collaboration</subject><subject>Computation</subject><subject>Computer memory</subject><subject>Computer Science - Artificial Intelligence</subject><subject>Computer Science - Cryptography and Security</subject><subject>Computer Science - Hardware Architecture</subject><subject>Data transfer (computers)</subject><subject>Large language models</subject><subject>Performance enhancement</subject><subject>Performance evaluation</subject><subject>Scheduling</subject><subject>Tensors</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GOX</sourceid><recordid>eNotkF1LwzAUhoMgOOZ-gFcGvO5M87Gk3smomzDwwnpdztKTkdElM22H-_fWzasD73l5eHkIecjZXBql2DOkH3-ac8n0nJmCiRsy4ULkmZGc35FZ1-0ZY3yhuVJiQrDC0MVUleUL_QrenX3Y0TX2mOIOA8aho-OPrhKEoYXk-zN1MdHSOW89hp5-oh0S0mVsW9jGBL0_Ib1Cx_BwHPqReE9uHbQdzv7vlFRvZbVcZ5uP1fvydZOB4iIDDS43qBsApQorpJVMWK1B8m2xbdA43hS5za1WEkA7RKs0NDk6YeUCjZiSxyv24qA-Jn-AdK7_XNQXF2Pj6do4pvg9YNfX-zikMG6qBTPMaKGMEL_xV2PD</recordid><startdate>20240712</startdate><enddate>20240712</enddate><creator>Han, Husheng</creator><creator>Zheng, Xinyao</creator><creator>Wen, Yuanbo</creator><creator>Hao, Yifan</creator><creator>Feng, Erhu</creator><creator>Liang, Ling</creator><creator>Mu, Jianan</creator><creator>Li, Xiaqing</creator><creator>Ma, Tianyun</creator><creator>Jin, Pengwei</creator><creator>Song, Xinkai</creator><creator>Du, Zidong</creator><creator>Guo, Qi</creator><creator>Hu, Xing</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20240712</creationdate><title>TensorTEE: Unifying Heterogeneous TEE Granularity for Efficient Secure Collaborative Tensor Computing</title><author>Han, Husheng ; Zheng, Xinyao ; Wen, Yuanbo ; Hao, Yifan ; Feng, Erhu ; Liang, Ling ; Mu, Jianan ; Li, Xiaqing ; Ma, Tianyun ; Jin, Pengwei ; Song, Xinkai ; Du, Zidong ; Guo, Qi ; Hu, Xing</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a523-a7af18e7daa559c34c403c77a42b9bde8f2d91c1c754aa7feec57ad1ef3c46e83</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Collaboration</topic><topic>Computation</topic><topic>Computer memory</topic><topic>Computer Science - Artificial Intelligence</topic><topic>Computer Science - Cryptography and Security</topic><topic>Computer Science - Hardware Architecture</topic><topic>Data transfer (computers)</topic><topic>Large language models</topic><topic>Performance enhancement</topic><topic>Performance evaluation</topic><topic>Scheduling</topic><topic>Tensors</topic><toplevel>online_resources</toplevel><creatorcontrib>Han, Husheng</creatorcontrib><creatorcontrib>Zheng, Xinyao</creatorcontrib><creatorcontrib>Wen, Yuanbo</creatorcontrib><creatorcontrib>Hao, Yifan</creatorcontrib><creatorcontrib>Feng, Erhu</creatorcontrib><creatorcontrib>Liang, Ling</creatorcontrib><creatorcontrib>Mu, Jianan</creatorcontrib><creatorcontrib>Li, Xiaqing</creatorcontrib><creatorcontrib>Ma, Tianyun</creatorcontrib><creatorcontrib>Jin, Pengwei</creatorcontrib><creatorcontrib>Song, Xinkai</creatorcontrib><creatorcontrib>Du, Zidong</creatorcontrib><creatorcontrib>Guo, Qi</creatorcontrib><creatorcontrib>Hu, Xing</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection (ProQuest)</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection><collection>arXiv Computer Science</collection><collection>arXiv.org</collection><jtitle>arXiv.org</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Han, Husheng</au><au>Zheng, Xinyao</au><au>Wen, Yuanbo</au><au>Hao, Yifan</au><au>Feng, Erhu</au><au>Liang, Ling</au><au>Mu, Jianan</au><au>Li, Xiaqing</au><au>Ma, Tianyun</au><au>Jin, Pengwei</au><au>Song, Xinkai</au><au>Du, Zidong</au><au>Guo, Qi</au><au>Hu, Xing</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>TensorTEE: Unifying Heterogeneous TEE Granularity for Efficient Secure Collaborative Tensor Computing</atitle><jtitle>arXiv.org</jtitle><date>2024-07-12</date><risdate>2024</risdate><eissn>2331-8422</eissn><abstract>Heterogeneous collaborative computing with NPU and CPU has received widespread attention due to its substantial performance benefits. To ensure data confidentiality and integrity during computing, Trusted Execution Environments (TEE) is considered a promising solution because of its comparatively lower overhead. However, existing heterogeneous TEE designs are inefficient for collaborative computing due to fine and different memory granularities between CPU and NPU. 1) The cacheline granularity of CPU TEE intensifies memory pressure due to its extra memory access, and 2) the cacheline granularity MAC of NPU escalates the pressure on the limited memory storage. 3) Data transfer across heterogeneous enclaves relies on the transit of non-secure regions, resulting in cumbersome re-encryption and scheduling. To address these issues, we propose TensorTEE, a unified tensor-granularity heterogeneous TEE for efficient secure collaborative tensor computing. First, we virtually support tensor granularity in CPU TEE to eliminate the off-chip metadata access by detecting and maintaining tensor structures on-chip. Second, we propose tensor-granularity MAC management with predictive execution to avoid computational stalls while eliminating off-chip MAC storage and access. Moreover, based on the unified granularity, we enable direct data transfer without re-encryption and scheduling dilemmas. Our evaluation is built on enhanced Gem5 and a cycle-accurate NPU simulator. The results show that TensorTEE improves the performance of Large Language Model (LLM) training workloads by 4.0x compared to existing work and incurs only 2.1% overhead compared to non-secure training, offering a practical security assurance for LLM training.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><doi>10.48550/arxiv.2407.08903</doi><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | EISSN: 2331-8422 |
ispartof | arXiv.org, 2024-07 |
issn | 2331-8422 |
language | eng |
recordid | cdi_arxiv_primary_2407_08903 |
source | arXiv.org; Free E- Journals |
subjects | Collaboration Computation Computer memory Computer Science - Artificial Intelligence Computer Science - Cryptography and Security Computer Science - Hardware Architecture Data transfer (computers) Large language models Performance enhancement Performance evaluation Scheduling Tensors |
title | TensorTEE: Unifying Heterogeneous TEE Granularity for Efficient Secure Collaborative Tensor Computing |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-06T15%3A54%3A41IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_arxiv&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=TensorTEE:%20Unifying%20Heterogeneous%20TEE%20Granularity%20for%20Efficient%20Secure%20Collaborative%20Tensor%20Computing&rft.jtitle=arXiv.org&rft.au=Han,%20Husheng&rft.date=2024-07-12&rft.eissn=2331-8422&rft_id=info:doi/10.48550/arxiv.2407.08903&rft_dat=%3Cproquest_arxiv%3E3080873583%3C/proquest_arxiv%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3080873583&rft_id=info:pmid/&rfr_iscdi=true |