Complexity-Aware Training of Deep Neural Networks for Optimal Structure Discovery

We propose a novel algorithm for combined unit/filter and layer pruning of deep neural networks that functions during training and without requiring a pre-trained network to apply. Our algorithm optimally trades-off learning accuracy and pruning levels while balancing layer vs. unit/filter pruning a...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Guenter, Valentin Frank Ingmar, Sideris, Athanasios
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Learning
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Guenter, Valentin Frank Ingmar Sideris, Athanasios
description	We propose a novel algorithm for combined unit/filter and layer pruning of deep neural networks that functions during training and without requiring a pre-trained network to apply. Our algorithm optimally trades-off learning accuracy and pruning levels while balancing layer vs. unit/filter pruning and computational vs. parameter complexity using only three user-defined parameters, which are easy to interpret and tune. The optimal network structure is found as the solution of a stochastic optimization problem over the network weights and the parameters of variational Bernoulli distributions for 0/1 Random Variables scaling the units and layers of the network. Pruning occurs when a variational parameter converges to 0 rendering the corresponding structure permanently inactive, thus saving computations during training and prediction. A key contribution of our approach is to define a cost function that combines the objectives of prediction accuracy and network pruning in a computational/parameter complexity-aware manner and the automatic selection of the many regularization parameters. We show that the solutions of the optimization problem to which the algorithm converges are deterministic networks. We analyze the ODE system that underlies our stochastic optimization algorithm and establish domains of attraction around zero for the dynamics of the network parameters. These results provide theoretical support for safely pruning units/filters and/or layers during training and lead to practical pruning conditions. We evaluate our method on the CIFAR-10/100 and ImageNet datasets using ResNet architectures and demonstrate that our method improves upon layer only or unit only pruning and favorably competes with combined unit/filter and layer pruning algorithms requiring pre-trained networks with respect to pruning ratios and test accuracy.
doi_str_mv	10.48550/arxiv.2411.09127
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2411_09127</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2411_09127</sourcerecordid><originalsourceid>FETCH-arxiv_primary_2411_091273</originalsourceid><addsrcrecordid>eNpjYJA0NNAzsTA1NdBPLKrILNMzMjE01DOwNDQy52QIdM7PLchJrcgsqdR1LE8sSlUIKUrMzMvMS1fIT1NwSU0tUPBLLS1KzAFSJeX5RdnFCmn5RQr-BSWZuUDB4JKi0uSSUqA2l8zi5Pyy1KJKHgbWtMSc4lReKM3NIO_mGuLsoQu2PL6gCKixqDIe5Ih4sCOMCasAAFaSPTo</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Complexity-Aware Training of Deep Neural Networks for Optimal Structure Discovery</title><source>arXiv.org</source><creator>Guenter, Valentin Frank Ingmar ; Sideris, Athanasios</creator><creatorcontrib>Guenter, Valentin Frank Ingmar ; Sideris, Athanasios</creatorcontrib><description>We propose a novel algorithm for combined unit/filter and layer pruning of deep neural networks that functions during training and without requiring a pre-trained network to apply. Our algorithm optimally trades-off learning accuracy and pruning levels while balancing layer vs. unit/filter pruning and computational vs. parameter complexity using only three user-defined parameters, which are easy to interpret and tune. The optimal network structure is found as the solution of a stochastic optimization problem over the network weights and the parameters of variational Bernoulli distributions for 0/1 Random Variables scaling the units and layers of the network. Pruning occurs when a variational parameter converges to 0 rendering the corresponding structure permanently inactive, thus saving computations during training and prediction. A key contribution of our approach is to define a cost function that combines the objectives of prediction accuracy and network pruning in a computational/parameter complexity-aware manner and the automatic selection of the many regularization parameters. We show that the solutions of the optimization problem to which the algorithm converges are deterministic networks. We analyze the ODE system that underlies our stochastic optimization algorithm and establish domains of attraction around zero for the dynamics of the network parameters. These results provide theoretical support for safely pruning units/filters and/or layers during training and lead to practical pruning conditions. We evaluate our method on the CIFAR-10/100 and ImageNet datasets using ResNet architectures and demonstrate that our method improves upon layer only or unit only pruning and favorably competes with combined unit/filter and layer pruning algorithms requiring pre-trained networks with respect to pruning ratios and test accuracy.</description><identifier>DOI: 10.48550/arxiv.2411.09127</identifier><language>eng</language><subject>Computer Science - Learning</subject><creationdate>2024-11</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2411.09127$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2411.09127$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Guenter, Valentin Frank Ingmar</creatorcontrib><creatorcontrib>Sideris, Athanasios</creatorcontrib><title>Complexity-Aware Training of Deep Neural Networks for Optimal Structure Discovery</title><description>We propose a novel algorithm for combined unit/filter and layer pruning of deep neural networks that functions during training and without requiring a pre-trained network to apply. Our algorithm optimally trades-off learning accuracy and pruning levels while balancing layer vs. unit/filter pruning and computational vs. parameter complexity using only three user-defined parameters, which are easy to interpret and tune. The optimal network structure is found as the solution of a stochastic optimization problem over the network weights and the parameters of variational Bernoulli distributions for 0/1 Random Variables scaling the units and layers of the network. Pruning occurs when a variational parameter converges to 0 rendering the corresponding structure permanently inactive, thus saving computations during training and prediction. A key contribution of our approach is to define a cost function that combines the objectives of prediction accuracy and network pruning in a computational/parameter complexity-aware manner and the automatic selection of the many regularization parameters. We show that the solutions of the optimization problem to which the algorithm converges are deterministic networks. We analyze the ODE system that underlies our stochastic optimization algorithm and establish domains of attraction around zero for the dynamics of the network parameters. These results provide theoretical support for safely pruning units/filters and/or layers during training and lead to practical pruning conditions. We evaluate our method on the CIFAR-10/100 and ImageNet datasets using ResNet architectures and demonstrate that our method improves upon layer only or unit only pruning and favorably competes with combined unit/filter and layer pruning algorithms requiring pre-trained networks with respect to pruning ratios and test accuracy.</description><subject>Computer Science - Learning</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNpjYJA0NNAzsTA1NdBPLKrILNMzMjE01DOwNDQy52QIdM7PLchJrcgsqdR1LE8sSlUIKUrMzMvMS1fIT1NwSU0tUPBLLS1KzAFSJeX5RdnFCmn5RQr-BSWZuUDB4JKi0uSSUqA2l8zi5Pyy1KJKHgbWtMSc4lReKM3NIO_mGuLsoQu2PL6gCKixqDIe5Ih4sCOMCasAAFaSPTo</recordid><startdate>20241113</startdate><enddate>20241113</enddate><creator>Guenter, Valentin Frank Ingmar</creator><creator>Sideris, Athanasios</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20241113</creationdate><title>Complexity-Aware Training of Deep Neural Networks for Optimal Structure Discovery</title><author>Guenter, Valentin Frank Ingmar ; Sideris, Athanasios</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-arxiv_primary_2411_091273</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Learning</topic><toplevel>online_resources</toplevel><creatorcontrib>Guenter, Valentin Frank Ingmar</creatorcontrib><creatorcontrib>Sideris, Athanasios</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Guenter, Valentin Frank Ingmar</au><au>Sideris, Athanasios</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Complexity-Aware Training of Deep Neural Networks for Optimal Structure Discovery</atitle><date>2024-11-13</date><risdate>2024</risdate><abstract>We propose a novel algorithm for combined unit/filter and layer pruning of deep neural networks that functions during training and without requiring a pre-trained network to apply. Our algorithm optimally trades-off learning accuracy and pruning levels while balancing layer vs. unit/filter pruning and computational vs. parameter complexity using only three user-defined parameters, which are easy to interpret and tune. The optimal network structure is found as the solution of a stochastic optimization problem over the network weights and the parameters of variational Bernoulli distributions for 0/1 Random Variables scaling the units and layers of the network. Pruning occurs when a variational parameter converges to 0 rendering the corresponding structure permanently inactive, thus saving computations during training and prediction. A key contribution of our approach is to define a cost function that combines the objectives of prediction accuracy and network pruning in a computational/parameter complexity-aware manner and the automatic selection of the many regularization parameters. We show that the solutions of the optimization problem to which the algorithm converges are deterministic networks. We analyze the ODE system that underlies our stochastic optimization algorithm and establish domains of attraction around zero for the dynamics of the network parameters. These results provide theoretical support for safely pruning units/filters and/or layers during training and lead to practical pruning conditions. We evaluate our method on the CIFAR-10/100 and ImageNet datasets using ResNet architectures and demonstrate that our method improves upon layer only or unit only pruning and favorably competes with combined unit/filter and layer pruning algorithms requiring pre-trained networks with respect to pruning ratios and test accuracy.</abstract><doi>10.48550/arxiv.2411.09127</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2411.09127
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2411_09127
source	arXiv.org
subjects	Computer Science - Learning
title	Complexity-Aware Training of Deep Neural Networks for Optimal Structure Discovery
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-25T12%3A13%3A04IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Complexity-Aware%20Training%20of%20Deep%20Neural%20Networks%20for%20Optimal%20Structure%20Discovery&rft.au=Guenter,%20Valentin%20Frank%20Ingmar&rft.date=2024-11-13&rft_id=info:doi/10.48550/arxiv.2411.09127&rft_dat=%3Carxiv_GOX%3E2411_09127%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true