Trade-Offs Between Energy and Depth of Neural Networks
We present an investigation on threshold circuits and other discretized neural networks in terms of the following four computational resources-size (the number of gates), depth (the number of layers), weight (weight resolution), and energy-where the energy is a complexity measure inspired by sparse...
Gespeichert in:
Veröffentlicht in: | Neural computation 2024-07, Vol.36 (8), p.1541-1567 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 1567 |
---|---|
container_issue | 8 |
container_start_page | 1541 |
container_title | Neural computation |
container_volume | 36 |
creator | Uchizawa, Kei Abe, Haruki |
description | We present an investigation on threshold circuits and other discretized neural networks in terms of the following four computational resources-size (the number of gates), depth (the number of layers), weight (weight resolution), and energy-where the energy is a complexity measure inspired by sparse coding and is defined as the maximum number of gates outputting nonzero values, taken over all the input assignments. As our main result, we prove that if a threshold circuit C of size s, depth d, energy e, and weight w computes a Boolean function f (i.e., a classification task) of n variables, it holds that log( rk (f))≤ed(logs+logw+logn) regardless of the algorithm employed by C to compute f, where rk (f) is a parameter solely determined by a scale of f and defined as the maximum rank of a communication matrix with regard to f taken over all the possible partitions of the n input variables. For example, given a Boolean function CD n(ξ) =⋁i=1n/2ξi∧ξn/2+i, we can prove that n/2≤ed( log s+logw+logn) holds for any circuit C computing CD n. While its left-hand side is linear in n, its right-hand side is bounded by the product of the logarithmic factors of s,w,n and the linear factors of d,e. If we view the logarithmic terms as having a negligible impact on the bound, our result implies a trade-off between depth and energy: n/2 needs to be smaller than the product of e and d. For other neural network models, such as discretized ReLU circuits and discretized sigmoid circuits, we also prove that a similar trade-off holds. Thus, our results indicate that increasing depth linearly enhances the capability of neural networks to acquire sparse representations when there are hardware constraints on the number of neurons and weight resolution. |
doi_str_mv | 10.1162/neco_a_01683 |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_3082834316</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3082834316</sourcerecordid><originalsourceid>FETCH-LOGICAL-c248t-fc2bebfb3881506039f301b612b3a55d29f86a056e3bb2b40008caa46641f97e3</originalsourceid><addsrcrecordid>eNpNkD1PwzAURS0EoqWwMaOMDASe7di1R2jLh1TRpUhslp08QyFNgp2o6r8nqAUxneXo6uoQck7hmlLJbirMa2MNUKn4ARlSwSFVSr0ekiEordOxlOMBOYnxAwAkBXFMBlwDU1pkQyKXwRaYLryPyR22G8QqmVUY3raJrYpkik37ntQ-ecYu2LJHu6nDZzwlR96WEc_2HJGX-9ly8pjOFw9Pk9t5mrNMtanPmUPnHVeKCpDAtedAnaTMcStEwbRX0oKQyJ1jLusfqtzaTMqMej1GPiKXu90m1F8dxtasVzHHsrQV1l00HBRTPONU9urVTs1DHWNAb5qwWtuwNRTMTynzv1SvX-yXO7fG4k_-TcO_ARLTY1o</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3082834316</pqid></control><display><type>article</type><title>Trade-Offs Between Energy and Depth of Neural Networks</title><source>MIT Press Journals</source><creator>Uchizawa, Kei ; Abe, Haruki</creator><creatorcontrib>Uchizawa, Kei ; Abe, Haruki</creatorcontrib><description>We present an investigation on threshold circuits and other discretized neural networks in terms of the following four computational resources-size (the number of gates), depth (the number of layers), weight (weight resolution), and energy-where the energy is a complexity measure inspired by sparse coding and is defined as the maximum number of gates outputting nonzero values, taken over all the input assignments. As our main result, we prove that if a threshold circuit C of size s, depth d, energy e, and weight w computes a Boolean function f (i.e., a classification task) of n variables, it holds that log( rk (f))≤ed(logs+logw+logn) regardless of the algorithm employed by C to compute f, where rk (f) is a parameter solely determined by a scale of f and defined as the maximum rank of a communication matrix with regard to f taken over all the possible partitions of the n input variables. For example, given a Boolean function CD n(ξ) =⋁i=1n/2ξi∧ξn/2+i, we can prove that n/2≤ed( log s+logw+logn) holds for any circuit C computing CD n. While its left-hand side is linear in n, its right-hand side is bounded by the product of the logarithmic factors of s,w,n and the linear factors of d,e. If we view the logarithmic terms as having a negligible impact on the bound, our result implies a trade-off between depth and energy: n/2 needs to be smaller than the product of e and d. For other neural network models, such as discretized ReLU circuits and discretized sigmoid circuits, we also prove that a similar trade-off holds. Thus, our results indicate that increasing depth linearly enhances the capability of neural networks to acquire sparse representations when there are hardware constraints on the number of neurons and weight resolution.</description><identifier>ISSN: 0899-7667</identifier><identifier>ISSN: 1530-888X</identifier><identifier>EISSN: 1530-888X</identifier><identifier>DOI: 10.1162/neco_a_01683</identifier><identifier>PMID: 39028954</identifier><language>eng</language><publisher>United States</publisher><ispartof>Neural computation, 2024-07, Vol.36 (8), p.1541-1567</ispartof><rights>2024 Massachusetts Institute of Technology.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c248t-fc2bebfb3881506039f301b612b3a55d29f86a056e3bb2b40008caa46641f97e3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,776,780,27901,27902</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/39028954$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Uchizawa, Kei</creatorcontrib><creatorcontrib>Abe, Haruki</creatorcontrib><title>Trade-Offs Between Energy and Depth of Neural Networks</title><title>Neural computation</title><addtitle>Neural Comput</addtitle><description>We present an investigation on threshold circuits and other discretized neural networks in terms of the following four computational resources-size (the number of gates), depth (the number of layers), weight (weight resolution), and energy-where the energy is a complexity measure inspired by sparse coding and is defined as the maximum number of gates outputting nonzero values, taken over all the input assignments. As our main result, we prove that if a threshold circuit C of size s, depth d, energy e, and weight w computes a Boolean function f (i.e., a classification task) of n variables, it holds that log( rk (f))≤ed(logs+logw+logn) regardless of the algorithm employed by C to compute f, where rk (f) is a parameter solely determined by a scale of f and defined as the maximum rank of a communication matrix with regard to f taken over all the possible partitions of the n input variables. For example, given a Boolean function CD n(ξ) =⋁i=1n/2ξi∧ξn/2+i, we can prove that n/2≤ed( log s+logw+logn) holds for any circuit C computing CD n. While its left-hand side is linear in n, its right-hand side is bounded by the product of the logarithmic factors of s,w,n and the linear factors of d,e. If we view the logarithmic terms as having a negligible impact on the bound, our result implies a trade-off between depth and energy: n/2 needs to be smaller than the product of e and d. For other neural network models, such as discretized ReLU circuits and discretized sigmoid circuits, we also prove that a similar trade-off holds. Thus, our results indicate that increasing depth linearly enhances the capability of neural networks to acquire sparse representations when there are hardware constraints on the number of neurons and weight resolution.</description><issn>0899-7667</issn><issn>1530-888X</issn><issn>1530-888X</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><recordid>eNpNkD1PwzAURS0EoqWwMaOMDASe7di1R2jLh1TRpUhslp08QyFNgp2o6r8nqAUxneXo6uoQck7hmlLJbirMa2MNUKn4ARlSwSFVSr0ekiEordOxlOMBOYnxAwAkBXFMBlwDU1pkQyKXwRaYLryPyR22G8QqmVUY3raJrYpkik37ntQ-ecYu2LJHu6nDZzwlR96WEc_2HJGX-9ly8pjOFw9Pk9t5mrNMtanPmUPnHVeKCpDAtedAnaTMcStEwbRX0oKQyJ1jLusfqtzaTMqMej1GPiKXu90m1F8dxtasVzHHsrQV1l00HBRTPONU9urVTs1DHWNAb5qwWtuwNRTMTynzv1SvX-yXO7fG4k_-TcO_ARLTY1o</recordid><startdate>20240719</startdate><enddate>20240719</enddate><creator>Uchizawa, Kei</creator><creator>Abe, Haruki</creator><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope></search><sort><creationdate>20240719</creationdate><title>Trade-Offs Between Energy and Depth of Neural Networks</title><author>Uchizawa, Kei ; Abe, Haruki</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c248t-fc2bebfb3881506039f301b612b3a55d29f86a056e3bb2b40008caa46641f97e3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Uchizawa, Kei</creatorcontrib><creatorcontrib>Abe, Haruki</creatorcontrib><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><jtitle>Neural computation</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Uchizawa, Kei</au><au>Abe, Haruki</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Trade-Offs Between Energy and Depth of Neural Networks</atitle><jtitle>Neural computation</jtitle><addtitle>Neural Comput</addtitle><date>2024-07-19</date><risdate>2024</risdate><volume>36</volume><issue>8</issue><spage>1541</spage><epage>1567</epage><pages>1541-1567</pages><issn>0899-7667</issn><issn>1530-888X</issn><eissn>1530-888X</eissn><abstract>We present an investigation on threshold circuits and other discretized neural networks in terms of the following four computational resources-size (the number of gates), depth (the number of layers), weight (weight resolution), and energy-where the energy is a complexity measure inspired by sparse coding and is defined as the maximum number of gates outputting nonzero values, taken over all the input assignments. As our main result, we prove that if a threshold circuit C of size s, depth d, energy e, and weight w computes a Boolean function f (i.e., a classification task) of n variables, it holds that log( rk (f))≤ed(logs+logw+logn) regardless of the algorithm employed by C to compute f, where rk (f) is a parameter solely determined by a scale of f and defined as the maximum rank of a communication matrix with regard to f taken over all the possible partitions of the n input variables. For example, given a Boolean function CD n(ξ) =⋁i=1n/2ξi∧ξn/2+i, we can prove that n/2≤ed( log s+logw+logn) holds for any circuit C computing CD n. While its left-hand side is linear in n, its right-hand side is bounded by the product of the logarithmic factors of s,w,n and the linear factors of d,e. If we view the logarithmic terms as having a negligible impact on the bound, our result implies a trade-off between depth and energy: n/2 needs to be smaller than the product of e and d. For other neural network models, such as discretized ReLU circuits and discretized sigmoid circuits, we also prove that a similar trade-off holds. Thus, our results indicate that increasing depth linearly enhances the capability of neural networks to acquire sparse representations when there are hardware constraints on the number of neurons and weight resolution.</abstract><cop>United States</cop><pmid>39028954</pmid><doi>10.1162/neco_a_01683</doi><tpages>27</tpages></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0899-7667 |
ispartof | Neural computation, 2024-07, Vol.36 (8), p.1541-1567 |
issn | 0899-7667 1530-888X 1530-888X |
language | eng |
recordid | cdi_proquest_miscellaneous_3082834316 |
source | MIT Press Journals |
title | Trade-Offs Between Energy and Depth of Neural Networks |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-19T23%3A41%3A45IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Trade-Offs%20Between%20Energy%20and%20Depth%20of%20Neural%20Networks&rft.jtitle=Neural%20computation&rft.au=Uchizawa,%20Kei&rft.date=2024-07-19&rft.volume=36&rft.issue=8&rft.spage=1541&rft.epage=1567&rft.pages=1541-1567&rft.issn=0899-7667&rft.eissn=1530-888X&rft_id=info:doi/10.1162/neco_a_01683&rft_dat=%3Cproquest_cross%3E3082834316%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3082834316&rft_id=info:pmid/39028954&rfr_iscdi=true |