Approximating the Gradient of Cross-Entropy Loss Function

A loss function has two crucial roles in training a conventional discriminant deep neural network (DNN): (i) it measures the goodness of classification and (ii) generates the gradients that drive the training of the network. In this paper, we approximate the gradients of cross-entropy loss which is...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE access 2020, Vol.8, p.111626-111635
Hauptverfasser:	Li, Li, Doroslovacki, Milos, Loew, Murray H.
Format:	Artikel
Sprache:	eng
Schlagworte:	Acceleration Artificial neural networks Backpropagation Classification cross-entropy Deep neural networks Entropy gradient loss function Loss measurement Neural networks Propagation losses Training
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	111635
container_issue
container_start_page	111626
container_title	IEEE access
container_volume	8
creator	Li, Li Doroslovacki, Milos Loew, Murray H.
description	A loss function has two crucial roles in training a conventional discriminant deep neural network (DNN): (i) it measures the goodness of classification and (ii) generates the gradients that drive the training of the network. In this paper, we approximate the gradients of cross-entropy loss which is the most often used loss function in the classification DNNs. The proposed approximations are noise-free, which means they depend only on the labels of the training set. They have a fixed length to avoid the vanishing gradient problem of the cross-entropy loss. By skipping the forward pass, the computational complexities of the proposed approximations are reduced to \mathcal {O}(n) where n is the batch size. Two claims are established based on the experiments of training DNNs using the proposed approximations: (i) It is possible to train a discriminant network without explicitly defining a loss function. (ii) The success of training does not imply the convergence of network parameters to fixed values. The experiments show that the proposed gradient approximations achieve comparable classification accuracy to the conventional loss functions and can accelerate the training process on multiple datasets.
doi_str_mv	10.1109/ACCESS.2020.3001531
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_crossref_primary_10_1109_ACCESS_2020_3001531</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9113308</ieee_id><doaj_id>oai_doaj_org_article_156340beb6de4632b54d56a97050df34</doaj_id><sourcerecordid>2454615425</sourcerecordid><originalsourceid>FETCH-LOGICAL-c408t-153490672bc75d9308106be99bea22be9f0eab39641a1f9f21f1dc9be80eeb6a3</originalsourceid><addsrcrecordid>eNpNUMtuwjAQtKpWKqJ8AZdIPYeun8FHFAFFQuqB9mzZiU2DaJw6Rip_X9MgVF-8Xu_MzgxCUwwzjEG-LMpyudvNCBCYUQDMKb5DI4KFzCmn4v5f_YgmfX-AdOapxYsRkouuC_6n-dKxafdZ_LTZOui6sW3MvMvK4Ps-X7Yx-O6cbdMjW53aKja-fUIPTh97O7neY_SxWr6Xr_n2bb0pF9u8YjCPeZLDJIiCmKrgtaRpMwhjpTRWE5IKB1YbKgXDGjvpCHa4rtLvHKw1QtMx2gy8tdcH1YUkNZyV1436a_iwVzrEpjpahbmgDEyC1ZYJSgxnNRdaFsChdpQlrueBK3n-Ptk-qoM_hTbJV4RxJjBnhKcpOkxVF_fButtWDOoSuRoiV5fI1TXyhJoOqMZae0NIjGnyTH8BsS17Hg</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2454615425</pqid></control><display><type>article</type><title>Approximating the Gradient of Cross-Entropy Loss Function</title><source>IEEE Open Access Journals</source><source>DOAJ Directory of Open Access Journals</source><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><creator>Li, Li ; Doroslovacki, Milos ; Loew, Murray H.</creator><creatorcontrib>Li, Li ; Doroslovacki, Milos ; Loew, Murray H.</creatorcontrib><description><![CDATA[A loss function has two crucial roles in training a conventional discriminant deep neural network (DNN): (i) it measures the goodness of classification and (ii) generates the gradients that drive the training of the network. In this paper, we approximate the gradients of cross-entropy loss which is the most often used loss function in the classification DNNs. The proposed approximations are noise-free, which means they depend only on the labels of the training set. They have a fixed length to avoid the vanishing gradient problem of the cross-entropy loss. By skipping the forward pass, the computational complexities of the proposed approximations are reduced to <inline-formula> <tex-math notation="LaTeX">\mathcal {O}(n) </tex-math></inline-formula> where <inline-formula> <tex-math notation="LaTeX">n </tex-math></inline-formula> is the batch size. Two claims are established based on the experiments of training DNNs using the proposed approximations: (i) It is possible to train a discriminant network without explicitly defining a loss function. (ii) The success of training does not imply the convergence of network parameters to fixed values. The experiments show that the proposed gradient approximations achieve comparable classification accuracy to the conventional loss functions and can accelerate the training process on multiple datasets.]]></description><identifier>ISSN: 2169-3536</identifier><identifier>EISSN: 2169-3536</identifier><identifier>DOI: 10.1109/ACCESS.2020.3001531</identifier><identifier>CODEN: IAECCG</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Acceleration ; Artificial neural networks ; Backpropagation ; Classification ; cross-entropy ; Deep neural networks ; Entropy ; gradient ; loss function ; Loss measurement ; Neural networks ; Propagation losses ; Training</subject><ispartof>IEEE access, 2020, Vol.8, p.111626-111635</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2020</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c408t-153490672bc75d9308106be99bea22be9f0eab39641a1f9f21f1dc9be80eeb6a3</citedby><cites>FETCH-LOGICAL-c408t-153490672bc75d9308106be99bea22be9f0eab39641a1f9f21f1dc9be80eeb6a3</cites><orcidid>0000-0001-7165-5184</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9113308$$EHTML$$P50$$Gieee$$Hfree_for_read</linktohtml><link.rule.ids>314,780,784,864,2102,4024,27633,27923,27924,27925,54933</link.rule.ids></links><search><creatorcontrib>Li, Li</creatorcontrib><creatorcontrib>Doroslovacki, Milos</creatorcontrib><creatorcontrib>Loew, Murray H.</creatorcontrib><title>Approximating the Gradient of Cross-Entropy Loss Function</title><title>IEEE access</title><addtitle>Access</addtitle><description><![CDATA[A loss function has two crucial roles in training a conventional discriminant deep neural network (DNN): (i) it measures the goodness of classification and (ii) generates the gradients that drive the training of the network. In this paper, we approximate the gradients of cross-entropy loss which is the most often used loss function in the classification DNNs. The proposed approximations are noise-free, which means they depend only on the labels of the training set. They have a fixed length to avoid the vanishing gradient problem of the cross-entropy loss. By skipping the forward pass, the computational complexities of the proposed approximations are reduced to <inline-formula> <tex-math notation="LaTeX">\mathcal {O}(n) </tex-math></inline-formula> where <inline-formula> <tex-math notation="LaTeX">n </tex-math></inline-formula> is the batch size. Two claims are established based on the experiments of training DNNs using the proposed approximations: (i) It is possible to train a discriminant network without explicitly defining a loss function. (ii) The success of training does not imply the convergence of network parameters to fixed values. The experiments show that the proposed gradient approximations achieve comparable classification accuracy to the conventional loss functions and can accelerate the training process on multiple datasets.]]></description><subject>Acceleration</subject><subject>Artificial neural networks</subject><subject>Backpropagation</subject><subject>Classification</subject><subject>cross-entropy</subject><subject>Deep neural networks</subject><subject>Entropy</subject><subject>gradient</subject><subject>loss function</subject><subject>Loss measurement</subject><subject>Neural networks</subject><subject>Propagation losses</subject><subject>Training</subject><issn>2169-3536</issn><issn>2169-3536</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><sourceid>ESBDL</sourceid><sourceid>RIE</sourceid><sourceid>DOA</sourceid><recordid>eNpNUMtuwjAQtKpWKqJ8AZdIPYeun8FHFAFFQuqB9mzZiU2DaJw6Rip_X9MgVF-8Xu_MzgxCUwwzjEG-LMpyudvNCBCYUQDMKb5DI4KFzCmn4v5f_YgmfX-AdOapxYsRkouuC_6n-dKxafdZ_LTZOui6sW3MvMvK4Ps-X7Yx-O6cbdMjW53aKja-fUIPTh97O7neY_SxWr6Xr_n2bb0pF9u8YjCPeZLDJIiCmKrgtaRpMwhjpTRWE5IKB1YbKgXDGjvpCHa4rtLvHKw1QtMx2gy8tdcH1YUkNZyV1436a_iwVzrEpjpahbmgDEyC1ZYJSgxnNRdaFsChdpQlrueBK3n-Ptk-qoM_hTbJV4RxJjBnhKcpOkxVF_fButtWDOoSuRoiV5fI1TXyhJoOqMZae0NIjGnyTH8BsS17Hg</recordid><startdate>2020</startdate><enddate>2020</enddate><creator>Li, Li</creator><creator>Doroslovacki, Milos</creator><creator>Loew, Murray H.</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>ESBDL</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>7SR</scope><scope>8BQ</scope><scope>8FD</scope><scope>JG9</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0001-7165-5184</orcidid></search><sort><creationdate>2020</creationdate><title>Approximating the Gradient of Cross-Entropy Loss Function</title><author>Li, Li ; Doroslovacki, Milos ; Loew, Murray H.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c408t-153490672bc75d9308106be99bea22be9f0eab39641a1f9f21f1dc9be80eeb6a3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Acceleration</topic><topic>Artificial neural networks</topic><topic>Backpropagation</topic><topic>Classification</topic><topic>cross-entropy</topic><topic>Deep neural networks</topic><topic>Entropy</topic><topic>gradient</topic><topic>loss function</topic><topic>Loss measurement</topic><topic>Neural networks</topic><topic>Propagation losses</topic><topic>Training</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Li, Li</creatorcontrib><creatorcontrib>Doroslovacki, Milos</creatorcontrib><creatorcontrib>Loew, Murray H.</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE Open Access Journals</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>IEEE access</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Li, Li</au><au>Doroslovacki, Milos</au><au>Loew, Murray H.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Approximating the Gradient of Cross-Entropy Loss Function</atitle><jtitle>IEEE access</jtitle><stitle>Access</stitle><date>2020</date><risdate>2020</risdate><volume>8</volume><spage>111626</spage><epage>111635</epage><pages>111626-111635</pages><issn>2169-3536</issn><eissn>2169-3536</eissn><coden>IAECCG</coden><abstract><![CDATA[A loss function has two crucial roles in training a conventional discriminant deep neural network (DNN): (i) it measures the goodness of classification and (ii) generates the gradients that drive the training of the network. In this paper, we approximate the gradients of cross-entropy loss which is the most often used loss function in the classification DNNs. The proposed approximations are noise-free, which means they depend only on the labels of the training set. They have a fixed length to avoid the vanishing gradient problem of the cross-entropy loss. By skipping the forward pass, the computational complexities of the proposed approximations are reduced to <inline-formula> <tex-math notation="LaTeX">\mathcal {O}(n) </tex-math></inline-formula> where <inline-formula> <tex-math notation="LaTeX">n </tex-math></inline-formula> is the batch size. Two claims are established based on the experiments of training DNNs using the proposed approximations: (i) It is possible to train a discriminant network without explicitly defining a loss function. (ii) The success of training does not imply the convergence of network parameters to fixed values. The experiments show that the proposed gradient approximations achieve comparable classification accuracy to the conventional loss functions and can accelerate the training process on multiple datasets.]]></abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/ACCESS.2020.3001531</doi><tpages>10</tpages><orcidid>https://orcid.org/0000-0001-7165-5184</orcidid><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 2169-3536
ispartof	IEEE access, 2020, Vol.8, p.111626-111635
issn	2169-3536 2169-3536
language	eng
recordid	cdi_crossref_primary_10_1109_ACCESS_2020_3001531
source	IEEE Open Access Journals; DOAJ Directory of Open Access Journals; Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals
subjects	Acceleration Artificial neural networks Backpropagation Classification cross-entropy Deep neural networks Entropy gradient loss function Loss measurement Neural networks Propagation losses Training
title	Approximating the Gradient of Cross-Entropy Loss Function
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-05T20%3A32%3A50IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Approximating%20the%20Gradient%20of%20Cross-Entropy%20Loss%20Function&rft.jtitle=IEEE%20access&rft.au=Li,%20Li&rft.date=2020&rft.volume=8&rft.spage=111626&rft.epage=111635&rft.pages=111626-111635&rft.issn=2169-3536&rft.eissn=2169-3536&rft.coden=IAECCG&rft_id=info:doi/10.1109/ACCESS.2020.3001531&rft_dat=%3Cproquest_cross%3E2454615425%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2454615425&rft_id=info:pmid/&rft_ieee_id=9113308&rft_doaj_id=oai_doaj_org_article_156340beb6de4632b54d56a97050df34&rfr_iscdi=true