Sparse-softmax: A Simpler and Faster Alternative Softmax Transformation

The softmax function is widely used in artificial neural networks for the multiclass classification problems, where the softmax transformation enforces the output to be positive and sum to one, and the corresponding loss function allows to use maximum likelihood principle to optimize the model. Howe...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Sun, Shaoshi, Zhang, Zhenyuan, Huang, BoCheng, Lei, Pengbin, Su, Jianlin, Pan, Shengfeng, Cao, Jiarun
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Sun, Shaoshi
Zhang, Zhenyuan
Huang, BoCheng
Lei, Pengbin
Su, Jianlin
Pan, Shengfeng
Cao, Jiarun
description The softmax function is widely used in artificial neural networks for the multiclass classification problems, where the softmax transformation enforces the output to be positive and sum to one, and the corresponding loss function allows to use maximum likelihood principle to optimize the model. However, softmax leaves a large margin for loss function to conduct optimizing operation when it comes to high-dimensional classification, which results in low-performance to some extent. In this paper, we provide an empirical study on a simple and concise softmax variant, namely sparse-softmax, to alleviate the problem that occurred in traditional softmax in terms of high-dimensional classification problems. We evaluate our approach in several interdisciplinary tasks, the experimental results show that sparse-softmax is simpler, faster, and produces better results than the baseline models.
doi_str_mv 10.48550/arxiv.2112.12433
format Article
fullrecord <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2112_12433</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2112_12433</sourcerecordid><originalsourceid>FETCH-LOGICAL-a673-5f0a4e4ae3d454c5a019391d2c6d073072111a710aa61f99e27a581107d2f6413</originalsourceid><addsrcrecordid>eNotj8tqwzAURLXpoqT5gK6qH7Crq4cVd2dCkxYCXdh7M0QSGPxCMiH9-7pJNjPDMFzuYeyVRK53xoh3xGt3ySWRzElqpZ7ZsZ4Rk8_SFJYB1w9e8bob5t5HjtHxA9KyxqpfdcTSXTyv70veRIwpTHFY62l8YU8BffLbh29Yc_hs9l_Z6ef4va9OGQqrMhMEtNfwymmjzwaCSlWSk-fCCauEXV8jWBJAQaEsvbQwOyJhnQyFJrVhb_ezN5J2jt2A-Nv-E7U3IvUHD9NFRw</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Sparse-softmax: A Simpler and Faster Alternative Softmax Transformation</title><source>arXiv.org</source><creator>Sun, Shaoshi ; Zhang, Zhenyuan ; Huang, BoCheng ; Lei, Pengbin ; Su, Jianlin ; Pan, Shengfeng ; Cao, Jiarun</creator><creatorcontrib>Sun, Shaoshi ; Zhang, Zhenyuan ; Huang, BoCheng ; Lei, Pengbin ; Su, Jianlin ; Pan, Shengfeng ; Cao, Jiarun</creatorcontrib><description>The softmax function is widely used in artificial neural networks for the multiclass classification problems, where the softmax transformation enforces the output to be positive and sum to one, and the corresponding loss function allows to use maximum likelihood principle to optimize the model. However, softmax leaves a large margin for loss function to conduct optimizing operation when it comes to high-dimensional classification, which results in low-performance to some extent. In this paper, we provide an empirical study on a simple and concise softmax variant, namely sparse-softmax, to alleviate the problem that occurred in traditional softmax in terms of high-dimensional classification problems. We evaluate our approach in several interdisciplinary tasks, the experimental results show that sparse-softmax is simpler, faster, and produces better results than the baseline models.</description><identifier>DOI: 10.48550/arxiv.2112.12433</identifier><language>eng</language><subject>Computer Science - Computation and Language ; Computer Science - Learning</subject><creationdate>2021-12</creationdate><rights>http://creativecommons.org/licenses/by-nc-nd/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2112.12433$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2112.12433$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Sun, Shaoshi</creatorcontrib><creatorcontrib>Zhang, Zhenyuan</creatorcontrib><creatorcontrib>Huang, BoCheng</creatorcontrib><creatorcontrib>Lei, Pengbin</creatorcontrib><creatorcontrib>Su, Jianlin</creatorcontrib><creatorcontrib>Pan, Shengfeng</creatorcontrib><creatorcontrib>Cao, Jiarun</creatorcontrib><title>Sparse-softmax: A Simpler and Faster Alternative Softmax Transformation</title><description>The softmax function is widely used in artificial neural networks for the multiclass classification problems, where the softmax transformation enforces the output to be positive and sum to one, and the corresponding loss function allows to use maximum likelihood principle to optimize the model. However, softmax leaves a large margin for loss function to conduct optimizing operation when it comes to high-dimensional classification, which results in low-performance to some extent. In this paper, we provide an empirical study on a simple and concise softmax variant, namely sparse-softmax, to alleviate the problem that occurred in traditional softmax in terms of high-dimensional classification problems. We evaluate our approach in several interdisciplinary tasks, the experimental results show that sparse-softmax is simpler, faster, and produces better results than the baseline models.</description><subject>Computer Science - Computation and Language</subject><subject>Computer Science - Learning</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotj8tqwzAURLXpoqT5gK6qH7Crq4cVd2dCkxYCXdh7M0QSGPxCMiH9-7pJNjPDMFzuYeyVRK53xoh3xGt3ySWRzElqpZ7ZsZ4Rk8_SFJYB1w9e8bob5t5HjtHxA9KyxqpfdcTSXTyv70veRIwpTHFY62l8YU8BffLbh29Yc_hs9l_Z6ef4va9OGQqrMhMEtNfwymmjzwaCSlWSk-fCCauEXV8jWBJAQaEsvbQwOyJhnQyFJrVhb_ezN5J2jt2A-Nv-E7U3IvUHD9NFRw</recordid><startdate>20211223</startdate><enddate>20211223</enddate><creator>Sun, Shaoshi</creator><creator>Zhang, Zhenyuan</creator><creator>Huang, BoCheng</creator><creator>Lei, Pengbin</creator><creator>Su, Jianlin</creator><creator>Pan, Shengfeng</creator><creator>Cao, Jiarun</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20211223</creationdate><title>Sparse-softmax: A Simpler and Faster Alternative Softmax Transformation</title><author>Sun, Shaoshi ; Zhang, Zhenyuan ; Huang, BoCheng ; Lei, Pengbin ; Su, Jianlin ; Pan, Shengfeng ; Cao, Jiarun</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a673-5f0a4e4ae3d454c5a019391d2c6d073072111a710aa61f99e27a581107d2f6413</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Computer Science - Computation and Language</topic><topic>Computer Science - Learning</topic><toplevel>online_resources</toplevel><creatorcontrib>Sun, Shaoshi</creatorcontrib><creatorcontrib>Zhang, Zhenyuan</creatorcontrib><creatorcontrib>Huang, BoCheng</creatorcontrib><creatorcontrib>Lei, Pengbin</creatorcontrib><creatorcontrib>Su, Jianlin</creatorcontrib><creatorcontrib>Pan, Shengfeng</creatorcontrib><creatorcontrib>Cao, Jiarun</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Sun, Shaoshi</au><au>Zhang, Zhenyuan</au><au>Huang, BoCheng</au><au>Lei, Pengbin</au><au>Su, Jianlin</au><au>Pan, Shengfeng</au><au>Cao, Jiarun</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Sparse-softmax: A Simpler and Faster Alternative Softmax Transformation</atitle><date>2021-12-23</date><risdate>2021</risdate><abstract>The softmax function is widely used in artificial neural networks for the multiclass classification problems, where the softmax transformation enforces the output to be positive and sum to one, and the corresponding loss function allows to use maximum likelihood principle to optimize the model. However, softmax leaves a large margin for loss function to conduct optimizing operation when it comes to high-dimensional classification, which results in low-performance to some extent. In this paper, we provide an empirical study on a simple and concise softmax variant, namely sparse-softmax, to alleviate the problem that occurred in traditional softmax in terms of high-dimensional classification problems. We evaluate our approach in several interdisciplinary tasks, the experimental results show that sparse-softmax is simpler, faster, and produces better results than the baseline models.</abstract><doi>10.48550/arxiv.2112.12433</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier DOI: 10.48550/arxiv.2112.12433
ispartof
issn
language eng
recordid cdi_arxiv_primary_2112_12433
source arXiv.org
subjects Computer Science - Computation and Language
Computer Science - Learning
title Sparse-softmax: A Simpler and Faster Alternative Softmax Transformation
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-22T14%3A56%3A12IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Sparse-softmax:%20A%20Simpler%20and%20Faster%20Alternative%20Softmax%20Transformation&rft.au=Sun,%20Shaoshi&rft.date=2021-12-23&rft_id=info:doi/10.48550/arxiv.2112.12433&rft_dat=%3Carxiv_GOX%3E2112_12433%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true