Semi-Supervised Text Classification With Universum Learning

Universum, a collection of nonexamples that do not belong to any class of interest, has become a new research topic in machine learning. This paper devises a semi-supervised learning with Universum algorithm based on boosting technique, and focuses on situations where only a few labeled examples are...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on cybernetics 2016-02, Vol.46 (2), p.462-473
Hauptverfasser:	Liu, Chien-Liang, Hsaio, Wen-Hoar, Lee, Chia-Hoang, Chang, Tao-Hsing, Kuo, Tsung-Hsun
Format:	Artikel
Sprache:	eng
Schlagworte:	AdaBoost Algorithm design and analysis Algorithms Approximation Boosting Classification Clustering algorithms Errors Learning learning with Universum Machine learning Semisupervised learning Support vector machines text classification Texts Training
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	473
container_issue	2
container_start_page	462
container_title	IEEE transactions on cybernetics
container_volume	46
creator	Liu, Chien-Liang Hsaio, Wen-Hoar Lee, Chia-Hoang Chang, Tao-Hsing Kuo, Tsung-Hsun
description	Universum, a collection of nonexamples that do not belong to any class of interest, has become a new research topic in machine learning. This paper devises a semi-supervised learning with Universum algorithm based on boosting technique, and focuses on situations where only a few labeled examples are available. We also show that the training error of AdaBoost with Universum is bounded by the product of normalization factor, and the training error drops exponentially fast when each weak classifier is slightly better than random guessing. Finally, the experiments use four data sets with several combinations. Experimental results indicate that the proposed algorithm can benefit from Universum examples and outperform several alternative methods, particularly when insufficient labeled examples are available. When the number of labeled examples is insufficient to estimate the parameters of classification functions, the Universum can be used to approximate the prior distribution of the classification functions. The experimental results can be explained using the concept of Universum introduced by Vapnik, that is, Universum examples implicitly specify a prior distribution on the set of classification functions.
doi_str_mv	10.1109/TCYB.2015.2403573
format	Article
fullrecord	<record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_proquest_journals_1756714232</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>7051235</ieee_id><sourcerecordid>3921450391</sourcerecordid><originalsourceid>FETCH-LOGICAL-c382t-c0feea82117919458ee3d3551e28dd72a2c65876c1dca65d269e496b0a86b2e33</originalsourceid><addsrcrecordid>eNqNkU1Lw0AQhhdRrNT-ABEk4MVL6u5s9gtPGvyCgoe2iKewTaa6pUnqblL035vS2oMn5zLDzDMvzLyEnDE6ZIya60n6djcEysQQEsqF4gfkBJjUMYASh_taqh4ZhLCgXeiuZfQx6UGHU83NCbkZY-nicbtCv3YBi2iCX02ULm0Ibu5y27i6il5d8xFNK7dGH9oyGqH1laveT8nR3C4DDna5T6YP95P0KR69PD6nt6M45xqaOKdzRKuBMWWYSYRG5AUXgiHoolBgIZdCK5mzIrdSFCANJkbOqNVyBsh5n1xtdVe-_mwxNFnpQo7Lpa2wbkPGlJYAiTb0H6ikhkqebFQv_6CLuvVVd0hHCalYAhw6im2p3NcheJxnK-9K678zRrOND9nGh2zjQ7bzodu52Cm3sxKL_cbv1zvgfAs4RNyPFRUMuOA_Dr6JYA</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1756714232</pqid></control><display><type>article</type><title>Semi-Supervised Text Classification With Universum Learning</title><source>IEEE Electronic Library (IEL)</source><creator>Liu, Chien-Liang ; Hsaio, Wen-Hoar ; Lee, Chia-Hoang ; Chang, Tao-Hsing ; Kuo, Tsung-Hsun</creator><creatorcontrib>Liu, Chien-Liang ; Hsaio, Wen-Hoar ; Lee, Chia-Hoang ; Chang, Tao-Hsing ; Kuo, Tsung-Hsun</creatorcontrib><description>Universum, a collection of nonexamples that do not belong to any class of interest, has become a new research topic in machine learning. This paper devises a semi-supervised learning with Universum algorithm based on boosting technique, and focuses on situations where only a few labeled examples are available. We also show that the training error of AdaBoost with Universum is bounded by the product of normalization factor, and the training error drops exponentially fast when each weak classifier is slightly better than random guessing. Finally, the experiments use four data sets with several combinations. Experimental results indicate that the proposed algorithm can benefit from Universum examples and outperform several alternative methods, particularly when insufficient labeled examples are available. When the number of labeled examples is insufficient to estimate the parameters of classification functions, the Universum can be used to approximate the prior distribution of the classification functions. The experimental results can be explained using the concept of Universum introduced by Vapnik, that is, Universum examples implicitly specify a prior distribution on the set of classification functions.</description><identifier>ISSN: 2168-2267</identifier><identifier>EISSN: 2168-2275</identifier><identifier>DOI: 10.1109/TCYB.2015.2403573</identifier><identifier>PMID: 25730839</identifier><identifier>CODEN: ITCEB8</identifier><language>eng</language><publisher>United States: IEEE</publisher><subject>AdaBoost ; Algorithm design and analysis ; Algorithms ; Approximation ; Boosting ; Classification ; Clustering algorithms ; Errors ; Learning ; learning with Universum ; Machine learning ; Semisupervised learning ; Support vector machines ; text classification ; Texts ; Training</subject><ispartof>IEEE transactions on cybernetics, 2016-02, Vol.46 (2), p.462-473</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2016</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c382t-c0feea82117919458ee3d3551e28dd72a2c65876c1dca65d269e496b0a86b2e33</citedby><cites>FETCH-LOGICAL-c382t-c0feea82117919458ee3d3551e28dd72a2c65876c1dca65d269e496b0a86b2e33</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/7051235$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,792,27901,27902,54733</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/7051235$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/25730839$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Liu, Chien-Liang</creatorcontrib><creatorcontrib>Hsaio, Wen-Hoar</creatorcontrib><creatorcontrib>Lee, Chia-Hoang</creatorcontrib><creatorcontrib>Chang, Tao-Hsing</creatorcontrib><creatorcontrib>Kuo, Tsung-Hsun</creatorcontrib><title>Semi-Supervised Text Classification With Universum Learning</title><title>IEEE transactions on cybernetics</title><addtitle>TCYB</addtitle><addtitle>IEEE Trans Cybern</addtitle><description>Universum, a collection of nonexamples that do not belong to any class of interest, has become a new research topic in machine learning. This paper devises a semi-supervised learning with Universum algorithm based on boosting technique, and focuses on situations where only a few labeled examples are available. We also show that the training error of AdaBoost with Universum is bounded by the product of normalization factor, and the training error drops exponentially fast when each weak classifier is slightly better than random guessing. Finally, the experiments use four data sets with several combinations. Experimental results indicate that the proposed algorithm can benefit from Universum examples and outperform several alternative methods, particularly when insufficient labeled examples are available. When the number of labeled examples is insufficient to estimate the parameters of classification functions, the Universum can be used to approximate the prior distribution of the classification functions. The experimental results can be explained using the concept of Universum introduced by Vapnik, that is, Universum examples implicitly specify a prior distribution on the set of classification functions.</description><subject>AdaBoost</subject><subject>Algorithm design and analysis</subject><subject>Algorithms</subject><subject>Approximation</subject><subject>Boosting</subject><subject>Classification</subject><subject>Clustering algorithms</subject><subject>Errors</subject><subject>Learning</subject><subject>learning with Universum</subject><subject>Machine learning</subject><subject>Semisupervised learning</subject><subject>Support vector machines</subject><subject>text classification</subject><subject>Texts</subject><subject>Training</subject><issn>2168-2267</issn><issn>2168-2275</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2016</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNqNkU1Lw0AQhhdRrNT-ABEk4MVL6u5s9gtPGvyCgoe2iKewTaa6pUnqblL035vS2oMn5zLDzDMvzLyEnDE6ZIya60n6djcEysQQEsqF4gfkBJjUMYASh_taqh4ZhLCgXeiuZfQx6UGHU83NCbkZY-nicbtCv3YBi2iCX02ULm0Ibu5y27i6il5d8xFNK7dGH9oyGqH1laveT8nR3C4DDna5T6YP95P0KR69PD6nt6M45xqaOKdzRKuBMWWYSYRG5AUXgiHoolBgIZdCK5mzIrdSFCANJkbOqNVyBsh5n1xtdVe-_mwxNFnpQo7Lpa2wbkPGlJYAiTb0H6ikhkqebFQv_6CLuvVVd0hHCalYAhw6im2p3NcheJxnK-9K678zRrOND9nGh2zjQ7bzodu52Cm3sxKL_cbv1zvgfAs4RNyPFRUMuOA_Dr6JYA</recordid><startdate>201602</startdate><enddate>201602</enddate><creator>Liu, Chien-Liang</creator><creator>Hsaio, Wen-Hoar</creator><creator>Lee, Chia-Hoang</creator><creator>Chang, Tao-Hsing</creator><creator>Kuo, Tsung-Hsun</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>7TB</scope><scope>8FD</scope><scope>F28</scope><scope>FR3</scope><scope>H8D</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>7X8</scope></search><sort><creationdate>201602</creationdate><title>Semi-Supervised Text Classification With Universum Learning</title><author>Liu, Chien-Liang ; Hsaio, Wen-Hoar ; Lee, Chia-Hoang ; Chang, Tao-Hsing ; Kuo, Tsung-Hsun</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c382t-c0feea82117919458ee3d3551e28dd72a2c65876c1dca65d269e496b0a86b2e33</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2016</creationdate><topic>AdaBoost</topic><topic>Algorithm design and analysis</topic><topic>Algorithms</topic><topic>Approximation</topic><topic>Boosting</topic><topic>Classification</topic><topic>Clustering algorithms</topic><topic>Errors</topic><topic>Learning</topic><topic>learning with Universum</topic><topic>Machine learning</topic><topic>Semisupervised learning</topic><topic>Support vector machines</topic><topic>text classification</topic><topic>Texts</topic><topic>Training</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Liu, Chien-Liang</creatorcontrib><creatorcontrib>Hsaio, Wen-Hoar</creatorcontrib><creatorcontrib>Lee, Chia-Hoang</creatorcontrib><creatorcontrib>Chang, Tao-Hsing</creatorcontrib><creatorcontrib>Kuo, Tsung-Hsun</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Mechanical & Transportation Engineering Abstracts</collection><collection>Technology Research Database</collection><collection>ANTE: Abstracts in New Technology & Engineering</collection><collection>Engineering Research Database</collection><collection>Aerospace Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>MEDLINE - Academic</collection><jtitle>IEEE transactions on cybernetics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Liu, Chien-Liang</au><au>Hsaio, Wen-Hoar</au><au>Lee, Chia-Hoang</au><au>Chang, Tao-Hsing</au><au>Kuo, Tsung-Hsun</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Semi-Supervised Text Classification With Universum Learning</atitle><jtitle>IEEE transactions on cybernetics</jtitle><stitle>TCYB</stitle><addtitle>IEEE Trans Cybern</addtitle><date>2016-02</date><risdate>2016</risdate><volume>46</volume><issue>2</issue><spage>462</spage><epage>473</epage><pages>462-473</pages><issn>2168-2267</issn><eissn>2168-2275</eissn><coden>ITCEB8</coden><abstract>Universum, a collection of nonexamples that do not belong to any class of interest, has become a new research topic in machine learning. This paper devises a semi-supervised learning with Universum algorithm based on boosting technique, and focuses on situations where only a few labeled examples are available. We also show that the training error of AdaBoost with Universum is bounded by the product of normalization factor, and the training error drops exponentially fast when each weak classifier is slightly better than random guessing. Finally, the experiments use four data sets with several combinations. Experimental results indicate that the proposed algorithm can benefit from Universum examples and outperform several alternative methods, particularly when insufficient labeled examples are available. When the number of labeled examples is insufficient to estimate the parameters of classification functions, the Universum can be used to approximate the prior distribution of the classification functions. The experimental results can be explained using the concept of Universum introduced by Vapnik, that is, Universum examples implicitly specify a prior distribution on the set of classification functions.</abstract><cop>United States</cop><pub>IEEE</pub><pmid>25730839</pmid><doi>10.1109/TCYB.2015.2403573</doi><tpages>12</tpages></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISSN: 2168-2267
ispartof	IEEE transactions on cybernetics, 2016-02, Vol.46 (2), p.462-473
issn	2168-2267 2168-2275
language	eng
recordid	cdi_proquest_journals_1756714232
source	IEEE Electronic Library (IEL)
subjects	AdaBoost Algorithm design and analysis Algorithms Approximation Boosting Classification Clustering algorithms Errors Learning learning with Universum Machine learning Semisupervised learning Support vector machines text classification Texts Training
title	Semi-Supervised Text Classification With Universum Learning
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-30T17%3A01%3A44IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Semi-Supervised%20Text%20Classification%20With%20Universum%20Learning&rft.jtitle=IEEE%20transactions%20on%20cybernetics&rft.au=Liu,%20Chien-Liang&rft.date=2016-02&rft.volume=46&rft.issue=2&rft.spage=462&rft.epage=473&rft.pages=462-473&rft.issn=2168-2267&rft.eissn=2168-2275&rft.coden=ITCEB8&rft_id=info:doi/10.1109/TCYB.2015.2403573&rft_dat=%3Cproquest_RIE%3E3921450391%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1756714232&rft_id=info:pmid/25730839&rft_ieee_id=7051235&rfr_iscdi=true