An improved multiclass LogitBoost using adaptive-one-vs-one
LogitBoost is a popular Boosting variant that can be applied to either binary or multi-class classification. From a statistical viewpoint LogitBoost can be seen as additive tree regression by minimizing the Logistic loss. Following this setting, it is still non-trivial to devise a sound multi-class...
Gespeichert in:
Veröffentlicht in: | Machine learning 2014-12, Vol.97 (3), p.295-326 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 326 |
---|---|
container_issue | 3 |
container_start_page | 295 |
container_title | Machine learning |
container_volume | 97 |
creator | Sun, Peng Reid, Mark D. Zhou, Jie |
description | LogitBoost is a popular Boosting variant that can be applied to either binary or multi-class classification. From a statistical viewpoint LogitBoost can be seen as additive tree regression by minimizing the Logistic loss. Following this setting, it is still non-trivial to devise a sound multi-class LogitBoost compared with to devise its binary counterpart. The difficulties are due to two important factors arising in multiclass Logistic loss. The first is the invariant property implied by the Logistic loss, causing the optimal classifier output being not unique, i.e. adding a constant to each component of the output vector won’t change the loss value. The second is the density of the Hessian matrices that arise when computing tree node split gain and node value fittings. Oversimplification of this learning problem can lead to degraded performance. For example, the original LogitBoost algorithm is outperformed by ABC-LogitBoost thanks to the latter’s more careful treatment of the above two factors. In this paper we propose new techniques to address the two main difficulties in multiclass LogitBoost setting: (1) we adopt a vector tree model (i.e. each node value is vector) where the unique classifier output is guaranteed by adding a sum-to-zero constraint, and (2) we use an adaptive block coordinate descent that exploits the dense Hessian when computing tree split gain and node values. Higher classification accuracy and faster convergence rates are observed for a range of public data sets when compared to both the original and the ABC-LogitBoost implementations. We also discuss another possibility to cope with LogitBoost’s dense Hessian matrix. We derive a loss similar to the multi-class Logistic loss but which guarantees a diagonal Hessian matrix. While this makes the optimization (by Newton descent) easier we unfortunately observe degraded performance for this modification. We argue that working with the dense Hessian is likely unavoidable, therefore making techniques like those proposed in this paper necessary for efficient implementations. |
doi_str_mv | 10.1007/s10994-014-5434-3 |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_1753502747</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1629343082</sourcerecordid><originalsourceid>FETCH-LOGICAL-c525t-8ae0e3b7c2387f1a28001d5443738a288edca44b91fec3b22dbc5ab997343d413</originalsourceid><addsrcrecordid>eNqFkE1LxDAQhoMouK7-AG8FEbxE89mkeFoXv2DBi55DmqZLlm67ZtoF_71Zu4gI4mkY5pmXmQehc0quKSHqBigpCoEJFVgKLjA_QBMqFcdE5vIQTYjWEueUyWN0ArAihLBc5xN0O2uzsN7EbuurbD00fXCNBcgW3TL0d10HfTZAaJeZreymD1uPu9bjLezKKTqqbQP-bF-n6O3h_nX-hBcvj8_z2QI7yWSPtfXE81I5xrWqqWWaEFpJIbjiOnXaV84KURa09o6XjFWlk7YsCsUFrwTlU3Q15qYz3wcPvVkHcL5pbOu7AQxVkkvClFD_ozkrUirRLKEXv9BVN8Q2PZIoKpIr9UXRkXKxA4i-NpsY1jZ-GErMzrwZzZtk3uzMG552LvfJFpxt6mhbF-B7MX2sZZ7rxLGRgzRqlz7-uODP8E8VRZDh</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1614612782</pqid></control><display><type>article</type><title>An improved multiclass LogitBoost using adaptive-one-vs-one</title><source>Springer Nature - Complete Springer Journals</source><creator>Sun, Peng ; Reid, Mark D. ; Zhou, Jie</creator><creatorcontrib>Sun, Peng ; Reid, Mark D. ; Zhou, Jie</creatorcontrib><description>LogitBoost is a popular Boosting variant that can be applied to either binary or multi-class classification. From a statistical viewpoint LogitBoost can be seen as additive tree regression by minimizing the Logistic loss. Following this setting, it is still non-trivial to devise a sound multi-class LogitBoost compared with to devise its binary counterpart. The difficulties are due to two important factors arising in multiclass Logistic loss. The first is the invariant property implied by the Logistic loss, causing the optimal classifier output being not unique, i.e. adding a constant to each component of the output vector won’t change the loss value. The second is the density of the Hessian matrices that arise when computing tree node split gain and node value fittings. Oversimplification of this learning problem can lead to degraded performance. For example, the original LogitBoost algorithm is outperformed by ABC-LogitBoost thanks to the latter’s more careful treatment of the above two factors. In this paper we propose new techniques to address the two main difficulties in multiclass LogitBoost setting: (1) we adopt a vector tree model (i.e. each node value is vector) where the unique classifier output is guaranteed by adding a sum-to-zero constraint, and (2) we use an adaptive block coordinate descent that exploits the dense Hessian when computing tree split gain and node values. Higher classification accuracy and faster convergence rates are observed for a range of public data sets when compared to both the original and the ABC-LogitBoost implementations. We also discuss another possibility to cope with LogitBoost’s dense Hessian matrix. We derive a loss similar to the multi-class Logistic loss but which guarantees a diagonal Hessian matrix. While this makes the optimization (by Newton descent) easier we unfortunately observe degraded performance for this modification. We argue that working with the dense Hessian is likely unavoidable, therefore making techniques like those proposed in this paper necessary for efficient implementations.</description><identifier>ISSN: 0885-6125</identifier><identifier>EISSN: 1573-0565</identifier><identifier>DOI: 10.1007/s10994-014-5434-3</identifier><language>eng</language><publisher>New York: Springer US</publisher><subject>Applied sciences ; Artificial Intelligence ; Classification ; Computer Science ; Computer science; control theory; systems ; Control ; Data processing. List processing. Character string processing ; Degradation ; Descent ; Exact sciences and technology ; Genes ; Logistics ; Mathematical analysis ; Mathematical models ; Mathematical programming ; Mechatronics ; Memory organisation. Data processing ; Natural Language Processing (NLP) ; Operational research and scientific management ; Operational research. Management science ; Robotics ; Simulation and Modeling ; Software ; Statistical analysis ; Trees ; Vectors (mathematics)</subject><ispartof>Machine learning, 2014-12, Vol.97 (3), p.295-326</ispartof><rights>The Author(s) 2014</rights><rights>2015 INIST-CNRS</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c525t-8ae0e3b7c2387f1a28001d5443738a288edca44b91fec3b22dbc5ab997343d413</citedby><cites>FETCH-LOGICAL-c525t-8ae0e3b7c2387f1a28001d5443738a288edca44b91fec3b22dbc5ab997343d413</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s10994-014-5434-3$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s10994-014-5434-3$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>314,776,780,27902,27903,41466,42535,51296</link.rule.ids><backlink>$$Uhttp://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&idt=28885668$$DView record in Pascal Francis$$Hfree_for_read</backlink></links><search><creatorcontrib>Sun, Peng</creatorcontrib><creatorcontrib>Reid, Mark D.</creatorcontrib><creatorcontrib>Zhou, Jie</creatorcontrib><title>An improved multiclass LogitBoost using adaptive-one-vs-one</title><title>Machine learning</title><addtitle>Mach Learn</addtitle><description>LogitBoost is a popular Boosting variant that can be applied to either binary or multi-class classification. From a statistical viewpoint LogitBoost can be seen as additive tree regression by minimizing the Logistic loss. Following this setting, it is still non-trivial to devise a sound multi-class LogitBoost compared with to devise its binary counterpart. The difficulties are due to two important factors arising in multiclass Logistic loss. The first is the invariant property implied by the Logistic loss, causing the optimal classifier output being not unique, i.e. adding a constant to each component of the output vector won’t change the loss value. The second is the density of the Hessian matrices that arise when computing tree node split gain and node value fittings. Oversimplification of this learning problem can lead to degraded performance. For example, the original LogitBoost algorithm is outperformed by ABC-LogitBoost thanks to the latter’s more careful treatment of the above two factors. In this paper we propose new techniques to address the two main difficulties in multiclass LogitBoost setting: (1) we adopt a vector tree model (i.e. each node value is vector) where the unique classifier output is guaranteed by adding a sum-to-zero constraint, and (2) we use an adaptive block coordinate descent that exploits the dense Hessian when computing tree split gain and node values. Higher classification accuracy and faster convergence rates are observed for a range of public data sets when compared to both the original and the ABC-LogitBoost implementations. We also discuss another possibility to cope with LogitBoost’s dense Hessian matrix. We derive a loss similar to the multi-class Logistic loss but which guarantees a diagonal Hessian matrix. While this makes the optimization (by Newton descent) easier we unfortunately observe degraded performance for this modification. We argue that working with the dense Hessian is likely unavoidable, therefore making techniques like those proposed in this paper necessary for efficient implementations.</description><subject>Applied sciences</subject><subject>Artificial Intelligence</subject><subject>Classification</subject><subject>Computer Science</subject><subject>Computer science; control theory; systems</subject><subject>Control</subject><subject>Data processing. List processing. Character string processing</subject><subject>Degradation</subject><subject>Descent</subject><subject>Exact sciences and technology</subject><subject>Genes</subject><subject>Logistics</subject><subject>Mathematical analysis</subject><subject>Mathematical models</subject><subject>Mathematical programming</subject><subject>Mechatronics</subject><subject>Memory organisation. Data processing</subject><subject>Natural Language Processing (NLP)</subject><subject>Operational research and scientific management</subject><subject>Operational research. Management science</subject><subject>Robotics</subject><subject>Simulation and Modeling</subject><subject>Software</subject><subject>Statistical analysis</subject><subject>Trees</subject><subject>Vectors (mathematics)</subject><issn>0885-6125</issn><issn>1573-0565</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2014</creationdate><recordtype>article</recordtype><sourceid>BENPR</sourceid><recordid>eNqFkE1LxDAQhoMouK7-AG8FEbxE89mkeFoXv2DBi55DmqZLlm67ZtoF_71Zu4gI4mkY5pmXmQehc0quKSHqBigpCoEJFVgKLjA_QBMqFcdE5vIQTYjWEueUyWN0ArAihLBc5xN0O2uzsN7EbuurbD00fXCNBcgW3TL0d10HfTZAaJeZreymD1uPu9bjLezKKTqqbQP-bF-n6O3h_nX-hBcvj8_z2QI7yWSPtfXE81I5xrWqqWWaEFpJIbjiOnXaV84KURa09o6XjFWlk7YsCsUFrwTlU3Q15qYz3wcPvVkHcL5pbOu7AQxVkkvClFD_ozkrUirRLKEXv9BVN8Q2PZIoKpIr9UXRkXKxA4i-NpsY1jZ-GErMzrwZzZtk3uzMG552LvfJFpxt6mhbF-B7MX2sZZ7rxLGRgzRqlz7-uODP8E8VRZDh</recordid><startdate>20141201</startdate><enddate>20141201</enddate><creator>Sun, Peng</creator><creator>Reid, Mark D.</creator><creator>Zhou, Jie</creator><general>Springer US</general><general>Springer</general><general>Springer Nature B.V</general><scope>IQODW</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>3V.</scope><scope>7SC</scope><scope>7XB</scope><scope>88I</scope><scope>8AL</scope><scope>8AO</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FK</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K7-</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>M0N</scope><scope>M2P</scope><scope>P5Z</scope><scope>P62</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>Q9U</scope></search><sort><creationdate>20141201</creationdate><title>An improved multiclass LogitBoost using adaptive-one-vs-one</title><author>Sun, Peng ; Reid, Mark D. ; Zhou, Jie</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c525t-8ae0e3b7c2387f1a28001d5443738a288edca44b91fec3b22dbc5ab997343d413</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2014</creationdate><topic>Applied sciences</topic><topic>Artificial Intelligence</topic><topic>Classification</topic><topic>Computer Science</topic><topic>Computer science; control theory; systems</topic><topic>Control</topic><topic>Data processing. List processing. Character string processing</topic><topic>Degradation</topic><topic>Descent</topic><topic>Exact sciences and technology</topic><topic>Genes</topic><topic>Logistics</topic><topic>Mathematical analysis</topic><topic>Mathematical models</topic><topic>Mathematical programming</topic><topic>Mechatronics</topic><topic>Memory organisation. Data processing</topic><topic>Natural Language Processing (NLP)</topic><topic>Operational research and scientific management</topic><topic>Operational research. Management science</topic><topic>Robotics</topic><topic>Simulation and Modeling</topic><topic>Software</topic><topic>Statistical analysis</topic><topic>Trees</topic><topic>Vectors (mathematics)</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Sun, Peng</creatorcontrib><creatorcontrib>Reid, Mark D.</creatorcontrib><creatorcontrib>Zhou, Jie</creatorcontrib><collection>Pascal-Francis</collection><collection>CrossRef</collection><collection>ProQuest Central (Corporate)</collection><collection>Computer and Information Systems Abstracts</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Science Database (Alumni Edition)</collection><collection>Computing Database (Alumni Edition)</collection><collection>ProQuest Pharma Collection</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies & Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>Computer Science Database</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Computing Database</collection><collection>Science Database</collection><collection>Advanced Technologies & Aerospace Database</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>ProQuest Central Basic</collection><jtitle>Machine learning</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Sun, Peng</au><au>Reid, Mark D.</au><au>Zhou, Jie</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>An improved multiclass LogitBoost using adaptive-one-vs-one</atitle><jtitle>Machine learning</jtitle><stitle>Mach Learn</stitle><date>2014-12-01</date><risdate>2014</risdate><volume>97</volume><issue>3</issue><spage>295</spage><epage>326</epage><pages>295-326</pages><issn>0885-6125</issn><eissn>1573-0565</eissn><abstract>LogitBoost is a popular Boosting variant that can be applied to either binary or multi-class classification. From a statistical viewpoint LogitBoost can be seen as additive tree regression by minimizing the Logistic loss. Following this setting, it is still non-trivial to devise a sound multi-class LogitBoost compared with to devise its binary counterpart. The difficulties are due to two important factors arising in multiclass Logistic loss. The first is the invariant property implied by the Logistic loss, causing the optimal classifier output being not unique, i.e. adding a constant to each component of the output vector won’t change the loss value. The second is the density of the Hessian matrices that arise when computing tree node split gain and node value fittings. Oversimplification of this learning problem can lead to degraded performance. For example, the original LogitBoost algorithm is outperformed by ABC-LogitBoost thanks to the latter’s more careful treatment of the above two factors. In this paper we propose new techniques to address the two main difficulties in multiclass LogitBoost setting: (1) we adopt a vector tree model (i.e. each node value is vector) where the unique classifier output is guaranteed by adding a sum-to-zero constraint, and (2) we use an adaptive block coordinate descent that exploits the dense Hessian when computing tree split gain and node values. Higher classification accuracy and faster convergence rates are observed for a range of public data sets when compared to both the original and the ABC-LogitBoost implementations. We also discuss another possibility to cope with LogitBoost’s dense Hessian matrix. We derive a loss similar to the multi-class Logistic loss but which guarantees a diagonal Hessian matrix. While this makes the optimization (by Newton descent) easier we unfortunately observe degraded performance for this modification. We argue that working with the dense Hessian is likely unavoidable, therefore making techniques like those proposed in this paper necessary for efficient implementations.</abstract><cop>New York</cop><pub>Springer US</pub><doi>10.1007/s10994-014-5434-3</doi><tpages>32</tpages><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0885-6125 |
ispartof | Machine learning, 2014-12, Vol.97 (3), p.295-326 |
issn | 0885-6125 1573-0565 |
language | eng |
recordid | cdi_proquest_miscellaneous_1753502747 |
source | Springer Nature - Complete Springer Journals |
subjects | Applied sciences Artificial Intelligence Classification Computer Science Computer science control theory systems Control Data processing. List processing. Character string processing Degradation Descent Exact sciences and technology Genes Logistics Mathematical analysis Mathematical models Mathematical programming Mechatronics Memory organisation. Data processing Natural Language Processing (NLP) Operational research and scientific management Operational research. Management science Robotics Simulation and Modeling Software Statistical analysis Trees Vectors (mathematics) |
title | An improved multiclass LogitBoost using adaptive-one-vs-one |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-27T08%3A36%3A46IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=An%20improved%20multiclass%20LogitBoost%20using%20adaptive-one-vs-one&rft.jtitle=Machine%20learning&rft.au=Sun,%20Peng&rft.date=2014-12-01&rft.volume=97&rft.issue=3&rft.spage=295&rft.epage=326&rft.pages=295-326&rft.issn=0885-6125&rft.eissn=1573-0565&rft_id=info:doi/10.1007/s10994-014-5434-3&rft_dat=%3Cproquest_cross%3E1629343082%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1614612782&rft_id=info:pmid/&rfr_iscdi=true |