An improved multiclass LogitBoost using adaptive-one-vs-one

LogitBoost is a popular Boosting variant that can be applied to either binary or multi-class classification. From a statistical viewpoint LogitBoost can be seen as additive tree regression by minimizing the Logistic loss. Following this setting, it is still non-trivial to devise a sound multi-class...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Machine learning 2014-12, Vol.97 (3), p.295-326
Hauptverfasser: Sun, Peng, Reid, Mark D., Zhou, Jie
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 326
container_issue 3
container_start_page 295
container_title Machine learning
container_volume 97
creator Sun, Peng
Reid, Mark D.
Zhou, Jie
description LogitBoost is a popular Boosting variant that can be applied to either binary or multi-class classification. From a statistical viewpoint LogitBoost can be seen as additive tree regression by minimizing the Logistic loss. Following this setting, it is still non-trivial to devise a sound multi-class LogitBoost compared with to devise its binary counterpart. The difficulties are due to two important factors arising in multiclass Logistic loss. The first is the invariant property implied by the Logistic loss, causing the optimal classifier output being not unique, i.e. adding a constant to each component of the output vector won’t change the loss value. The second is the density of the Hessian matrices that arise when computing tree node split gain and node value fittings. Oversimplification of this learning problem can lead to degraded performance. For example, the original LogitBoost algorithm is outperformed by ABC-LogitBoost thanks to the latter’s more careful treatment of the above two factors. In this paper we propose new techniques to address the two main difficulties in multiclass LogitBoost setting: (1) we adopt a vector tree model (i.e. each node value is vector) where the unique classifier output is guaranteed by adding a sum-to-zero constraint, and (2) we use an adaptive block coordinate descent that exploits the dense Hessian when computing tree split gain and node values. Higher classification accuracy and faster convergence rates are observed for a range of public data sets when compared to both the original and the ABC-LogitBoost implementations. We also discuss another possibility to cope with LogitBoost’s dense Hessian matrix. We derive a loss similar to the multi-class Logistic loss but which guarantees a diagonal Hessian matrix. While this makes the optimization (by Newton descent) easier we unfortunately observe degraded performance for this modification. We argue that working with the dense Hessian is likely unavoidable, therefore making techniques like those proposed in this paper necessary for efficient implementations.
doi_str_mv 10.1007/s10994-014-5434-3
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_1753502747</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1629343082</sourcerecordid><originalsourceid>FETCH-LOGICAL-c525t-8ae0e3b7c2387f1a28001d5443738a288edca44b91fec3b22dbc5ab997343d413</originalsourceid><addsrcrecordid>eNqFkE1LxDAQhoMouK7-AG8FEbxE89mkeFoXv2DBi55DmqZLlm67ZtoF_71Zu4gI4mkY5pmXmQehc0quKSHqBigpCoEJFVgKLjA_QBMqFcdE5vIQTYjWEueUyWN0ArAihLBc5xN0O2uzsN7EbuurbD00fXCNBcgW3TL0d10HfTZAaJeZreymD1uPu9bjLezKKTqqbQP-bF-n6O3h_nX-hBcvj8_z2QI7yWSPtfXE81I5xrWqqWWaEFpJIbjiOnXaV84KURa09o6XjFWlk7YsCsUFrwTlU3Q15qYz3wcPvVkHcL5pbOu7AQxVkkvClFD_ozkrUirRLKEXv9BVN8Q2PZIoKpIr9UXRkXKxA4i-NpsY1jZ-GErMzrwZzZtk3uzMG552LvfJFpxt6mhbF-B7MX2sZZ7rxLGRgzRqlz7-uODP8E8VRZDh</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1614612782</pqid></control><display><type>article</type><title>An improved multiclass LogitBoost using adaptive-one-vs-one</title><source>Springer Nature - Complete Springer Journals</source><creator>Sun, Peng ; Reid, Mark D. ; Zhou, Jie</creator><creatorcontrib>Sun, Peng ; Reid, Mark D. ; Zhou, Jie</creatorcontrib><description>LogitBoost is a popular Boosting variant that can be applied to either binary or multi-class classification. From a statistical viewpoint LogitBoost can be seen as additive tree regression by minimizing the Logistic loss. Following this setting, it is still non-trivial to devise a sound multi-class LogitBoost compared with to devise its binary counterpart. The difficulties are due to two important factors arising in multiclass Logistic loss. The first is the invariant property implied by the Logistic loss, causing the optimal classifier output being not unique, i.e. adding a constant to each component of the output vector won’t change the loss value. The second is the density of the Hessian matrices that arise when computing tree node split gain and node value fittings. Oversimplification of this learning problem can lead to degraded performance. For example, the original LogitBoost algorithm is outperformed by ABC-LogitBoost thanks to the latter’s more careful treatment of the above two factors. In this paper we propose new techniques to address the two main difficulties in multiclass LogitBoost setting: (1) we adopt a vector tree model (i.e. each node value is vector) where the unique classifier output is guaranteed by adding a sum-to-zero constraint, and (2) we use an adaptive block coordinate descent that exploits the dense Hessian when computing tree split gain and node values. Higher classification accuracy and faster convergence rates are observed for a range of public data sets when compared to both the original and the ABC-LogitBoost implementations. We also discuss another possibility to cope with LogitBoost’s dense Hessian matrix. We derive a loss similar to the multi-class Logistic loss but which guarantees a diagonal Hessian matrix. While this makes the optimization (by Newton descent) easier we unfortunately observe degraded performance for this modification. We argue that working with the dense Hessian is likely unavoidable, therefore making techniques like those proposed in this paper necessary for efficient implementations.</description><identifier>ISSN: 0885-6125</identifier><identifier>EISSN: 1573-0565</identifier><identifier>DOI: 10.1007/s10994-014-5434-3</identifier><language>eng</language><publisher>New York: Springer US</publisher><subject>Applied sciences ; Artificial Intelligence ; Classification ; Computer Science ; Computer science; control theory; systems ; Control ; Data processing. List processing. Character string processing ; Degradation ; Descent ; Exact sciences and technology ; Genes ; Logistics ; Mathematical analysis ; Mathematical models ; Mathematical programming ; Mechatronics ; Memory organisation. Data processing ; Natural Language Processing (NLP) ; Operational research and scientific management ; Operational research. Management science ; Robotics ; Simulation and Modeling ; Software ; Statistical analysis ; Trees ; Vectors (mathematics)</subject><ispartof>Machine learning, 2014-12, Vol.97 (3), p.295-326</ispartof><rights>The Author(s) 2014</rights><rights>2015 INIST-CNRS</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c525t-8ae0e3b7c2387f1a28001d5443738a288edca44b91fec3b22dbc5ab997343d413</citedby><cites>FETCH-LOGICAL-c525t-8ae0e3b7c2387f1a28001d5443738a288edca44b91fec3b22dbc5ab997343d413</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s10994-014-5434-3$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s10994-014-5434-3$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>314,776,780,27902,27903,41466,42535,51296</link.rule.ids><backlink>$$Uhttp://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&amp;idt=28885668$$DView record in Pascal Francis$$Hfree_for_read</backlink></links><search><creatorcontrib>Sun, Peng</creatorcontrib><creatorcontrib>Reid, Mark D.</creatorcontrib><creatorcontrib>Zhou, Jie</creatorcontrib><title>An improved multiclass LogitBoost using adaptive-one-vs-one</title><title>Machine learning</title><addtitle>Mach Learn</addtitle><description>LogitBoost is a popular Boosting variant that can be applied to either binary or multi-class classification. From a statistical viewpoint LogitBoost can be seen as additive tree regression by minimizing the Logistic loss. Following this setting, it is still non-trivial to devise a sound multi-class LogitBoost compared with to devise its binary counterpart. The difficulties are due to two important factors arising in multiclass Logistic loss. The first is the invariant property implied by the Logistic loss, causing the optimal classifier output being not unique, i.e. adding a constant to each component of the output vector won’t change the loss value. The second is the density of the Hessian matrices that arise when computing tree node split gain and node value fittings. Oversimplification of this learning problem can lead to degraded performance. For example, the original LogitBoost algorithm is outperformed by ABC-LogitBoost thanks to the latter’s more careful treatment of the above two factors. In this paper we propose new techniques to address the two main difficulties in multiclass LogitBoost setting: (1) we adopt a vector tree model (i.e. each node value is vector) where the unique classifier output is guaranteed by adding a sum-to-zero constraint, and (2) we use an adaptive block coordinate descent that exploits the dense Hessian when computing tree split gain and node values. Higher classification accuracy and faster convergence rates are observed for a range of public data sets when compared to both the original and the ABC-LogitBoost implementations. We also discuss another possibility to cope with LogitBoost’s dense Hessian matrix. We derive a loss similar to the multi-class Logistic loss but which guarantees a diagonal Hessian matrix. While this makes the optimization (by Newton descent) easier we unfortunately observe degraded performance for this modification. We argue that working with the dense Hessian is likely unavoidable, therefore making techniques like those proposed in this paper necessary for efficient implementations.</description><subject>Applied sciences</subject><subject>Artificial Intelligence</subject><subject>Classification</subject><subject>Computer Science</subject><subject>Computer science; control theory; systems</subject><subject>Control</subject><subject>Data processing. List processing. Character string processing</subject><subject>Degradation</subject><subject>Descent</subject><subject>Exact sciences and technology</subject><subject>Genes</subject><subject>Logistics</subject><subject>Mathematical analysis</subject><subject>Mathematical models</subject><subject>Mathematical programming</subject><subject>Mechatronics</subject><subject>Memory organisation. Data processing</subject><subject>Natural Language Processing (NLP)</subject><subject>Operational research and scientific management</subject><subject>Operational research. Management science</subject><subject>Robotics</subject><subject>Simulation and Modeling</subject><subject>Software</subject><subject>Statistical analysis</subject><subject>Trees</subject><subject>Vectors (mathematics)</subject><issn>0885-6125</issn><issn>1573-0565</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2014</creationdate><recordtype>article</recordtype><sourceid>BENPR</sourceid><recordid>eNqFkE1LxDAQhoMouK7-AG8FEbxE89mkeFoXv2DBi55DmqZLlm67ZtoF_71Zu4gI4mkY5pmXmQehc0quKSHqBigpCoEJFVgKLjA_QBMqFcdE5vIQTYjWEueUyWN0ArAihLBc5xN0O2uzsN7EbuurbD00fXCNBcgW3TL0d10HfTZAaJeZreymD1uPu9bjLezKKTqqbQP-bF-n6O3h_nX-hBcvj8_z2QI7yWSPtfXE81I5xrWqqWWaEFpJIbjiOnXaV84KURa09o6XjFWlk7YsCsUFrwTlU3Q15qYz3wcPvVkHcL5pbOu7AQxVkkvClFD_ozkrUirRLKEXv9BVN8Q2PZIoKpIr9UXRkXKxA4i-NpsY1jZ-GErMzrwZzZtk3uzMG552LvfJFpxt6mhbF-B7MX2sZZ7rxLGRgzRqlz7-uODP8E8VRZDh</recordid><startdate>20141201</startdate><enddate>20141201</enddate><creator>Sun, Peng</creator><creator>Reid, Mark D.</creator><creator>Zhou, Jie</creator><general>Springer US</general><general>Springer</general><general>Springer Nature B.V</general><scope>IQODW</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>3V.</scope><scope>7SC</scope><scope>7XB</scope><scope>88I</scope><scope>8AL</scope><scope>8AO</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FK</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K7-</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>M0N</scope><scope>M2P</scope><scope>P5Z</scope><scope>P62</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>Q9U</scope></search><sort><creationdate>20141201</creationdate><title>An improved multiclass LogitBoost using adaptive-one-vs-one</title><author>Sun, Peng ; Reid, Mark D. ; Zhou, Jie</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c525t-8ae0e3b7c2387f1a28001d5443738a288edca44b91fec3b22dbc5ab997343d413</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2014</creationdate><topic>Applied sciences</topic><topic>Artificial Intelligence</topic><topic>Classification</topic><topic>Computer Science</topic><topic>Computer science; control theory; systems</topic><topic>Control</topic><topic>Data processing. List processing. Character string processing</topic><topic>Degradation</topic><topic>Descent</topic><topic>Exact sciences and technology</topic><topic>Genes</topic><topic>Logistics</topic><topic>Mathematical analysis</topic><topic>Mathematical models</topic><topic>Mathematical programming</topic><topic>Mechatronics</topic><topic>Memory organisation. Data processing</topic><topic>Natural Language Processing (NLP)</topic><topic>Operational research and scientific management</topic><topic>Operational research. Management science</topic><topic>Robotics</topic><topic>Simulation and Modeling</topic><topic>Software</topic><topic>Statistical analysis</topic><topic>Trees</topic><topic>Vectors (mathematics)</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Sun, Peng</creatorcontrib><creatorcontrib>Reid, Mark D.</creatorcontrib><creatorcontrib>Zhou, Jie</creatorcontrib><collection>Pascal-Francis</collection><collection>CrossRef</collection><collection>ProQuest Central (Corporate)</collection><collection>Computer and Information Systems Abstracts</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Science Database (Alumni Edition)</collection><collection>Computing Database (Alumni Edition)</collection><collection>ProQuest Pharma Collection</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies &amp; Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>Computer Science Database</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Computing Database</collection><collection>Science Database</collection><collection>Advanced Technologies &amp; Aerospace Database</collection><collection>ProQuest Advanced Technologies &amp; Aerospace Collection</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>ProQuest Central Basic</collection><jtitle>Machine learning</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Sun, Peng</au><au>Reid, Mark D.</au><au>Zhou, Jie</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>An improved multiclass LogitBoost using adaptive-one-vs-one</atitle><jtitle>Machine learning</jtitle><stitle>Mach Learn</stitle><date>2014-12-01</date><risdate>2014</risdate><volume>97</volume><issue>3</issue><spage>295</spage><epage>326</epage><pages>295-326</pages><issn>0885-6125</issn><eissn>1573-0565</eissn><abstract>LogitBoost is a popular Boosting variant that can be applied to either binary or multi-class classification. From a statistical viewpoint LogitBoost can be seen as additive tree regression by minimizing the Logistic loss. Following this setting, it is still non-trivial to devise a sound multi-class LogitBoost compared with to devise its binary counterpart. The difficulties are due to two important factors arising in multiclass Logistic loss. The first is the invariant property implied by the Logistic loss, causing the optimal classifier output being not unique, i.e. adding a constant to each component of the output vector won’t change the loss value. The second is the density of the Hessian matrices that arise when computing tree node split gain and node value fittings. Oversimplification of this learning problem can lead to degraded performance. For example, the original LogitBoost algorithm is outperformed by ABC-LogitBoost thanks to the latter’s more careful treatment of the above two factors. In this paper we propose new techniques to address the two main difficulties in multiclass LogitBoost setting: (1) we adopt a vector tree model (i.e. each node value is vector) where the unique classifier output is guaranteed by adding a sum-to-zero constraint, and (2) we use an adaptive block coordinate descent that exploits the dense Hessian when computing tree split gain and node values. Higher classification accuracy and faster convergence rates are observed for a range of public data sets when compared to both the original and the ABC-LogitBoost implementations. We also discuss another possibility to cope with LogitBoost’s dense Hessian matrix. We derive a loss similar to the multi-class Logistic loss but which guarantees a diagonal Hessian matrix. While this makes the optimization (by Newton descent) easier we unfortunately observe degraded performance for this modification. We argue that working with the dense Hessian is likely unavoidable, therefore making techniques like those proposed in this paper necessary for efficient implementations.</abstract><cop>New York</cop><pub>Springer US</pub><doi>10.1007/s10994-014-5434-3</doi><tpages>32</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 0885-6125
ispartof Machine learning, 2014-12, Vol.97 (3), p.295-326
issn 0885-6125
1573-0565
language eng
recordid cdi_proquest_miscellaneous_1753502747
source Springer Nature - Complete Springer Journals
subjects Applied sciences
Artificial Intelligence
Classification
Computer Science
Computer science
control theory
systems
Control
Data processing. List processing. Character string processing
Degradation
Descent
Exact sciences and technology
Genes
Logistics
Mathematical analysis
Mathematical models
Mathematical programming
Mechatronics
Memory organisation. Data processing
Natural Language Processing (NLP)
Operational research and scientific management
Operational research. Management science
Robotics
Simulation and Modeling
Software
Statistical analysis
Trees
Vectors (mathematics)
title An improved multiclass LogitBoost using adaptive-one-vs-one
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-27T08%3A36%3A46IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=An%20improved%20multiclass%20LogitBoost%20using%20adaptive-one-vs-one&rft.jtitle=Machine%20learning&rft.au=Sun,%20Peng&rft.date=2014-12-01&rft.volume=97&rft.issue=3&rft.spage=295&rft.epage=326&rft.pages=295-326&rft.issn=0885-6125&rft.eissn=1573-0565&rft_id=info:doi/10.1007/s10994-014-5434-3&rft_dat=%3Cproquest_cross%3E1629343082%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1614612782&rft_id=info:pmid/&rfr_iscdi=true