Semisupervised Learning for a Hybrid Generative/Discriminative Classifier based on the Maximum Entropy Principle

This paper presents a method for designing semisupervised classifiers trained on labeled and unlabeled samples. We focus on a probabilistic semisupervised classifier design for multiclass and single-labeled classification problems and propose a hybrid approach that takes advantage of generative and...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on pattern analysis and machine intelligence 2008-03, Vol.30 (3), p.424-437
Hauptverfasser: Fujino, A., Ueda, N., Saito, K.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 437
container_issue 3
container_start_page 424
container_title IEEE transactions on pattern analysis and machine intelligence
container_volume 30
creator Fujino, A.
Ueda, N.
Saito, K.
description This paper presents a method for designing semisupervised classifiers trained on labeled and unlabeled samples. We focus on a probabilistic semisupervised classifier design for multiclass and single-labeled classification problems and propose a hybrid approach that takes advantage of generative and discriminative approaches. In our approach, we first consider a generative model trained by using labeled samples and introduce a bias correction model, where these models belong to the same model family but have different parameters. Then, we construct a hybrid classifier by combining these models based on the maximum entropy principle. To enable us to apply our hybrid approach to text classification problems, we employed naive Bayes models as the generative and bias correction models. Our experimental results for four text data sets confirmed that the generalization ability of our hybrid classifier was much improved by using a large number of unlabeled samples for training when there were too few labeled samples to obtain good performance. We also confirmed that our hybrid approach significantly outperformed the generative and discriminative approaches when the performance of the generative and discriminative approaches was comparable. Moreover, we examined the performance of our hybrid classifier when the labeled and unlabeled data distributions were different.
doi_str_mv 10.1109/TPAMI.2007.70710
format Article
fullrecord <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_proquest_journals_862350991</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>4359332</ieee_id><sourcerecordid>875044254</sourcerecordid><originalsourceid>FETCH-LOGICAL-c469t-dfd1c2303e2cbc18e21cd02ae651c44ffd30a324ecdeeee5917e2179b54570d63</originalsourceid><addsrcrecordid>eNqFks1r3DAQxUVpaTZp74VCEYW2J29GX5Z1DNs0CWxooOnZyPK4VfDajmSH7n8feXdJoYdGl0HMbx7MvEfIOwZLxsCc3t6cXV8tOYBeatAMXpAFM8JkQgnzkiyA5TwrCl4ckeMY7wCYVCBekyNWMKOk0Asy_MCNj9OA4cFHrOkabeh894s2faCWXm6r4Gt6gR0GO_oHPP3qowt-47vdl65aG6NvPAZa2Vmg7-j4G-m1_eM304aed2Pohy29Cb5zfmjxDXnV2Dbi20M9IT-_nd-uLrP194ur1dk6czI3Y1Y3NXNcgEDuKscK5MzVwC3mijkpm6YWYAWX6GpMTxmmE6JNpaTSUOfihHzZ6w6hv58wjmXa02Hb2g77KZYGRC654vJZstAKZEJn8vN_SQ2cMS7yZ0Ehpc4h5wn8-A9410-hS4cpi9RWYAxLEOwhF_oYAzblkBywYVsyKOcclLsclHMOyl0O0siHg-5UbbD-O3AwPgGfDoCNzrZNsMme-MQlKaWlmTd5v-d8uvJTWwplhODiERF1w-o</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>862350991</pqid></control><display><type>article</type><title>Semisupervised Learning for a Hybrid Generative/Discriminative Classifier based on the Maximum Entropy Principle</title><source>IEEE Electronic Library (IEL)</source><creator>Fujino, A. ; Ueda, N. ; Saito, K.</creator><creatorcontrib>Fujino, A. ; Ueda, N. ; Saito, K.</creatorcontrib><description>This paper presents a method for designing semisupervised classifiers trained on labeled and unlabeled samples. We focus on a probabilistic semisupervised classifier design for multiclass and single-labeled classification problems and propose a hybrid approach that takes advantage of generative and discriminative approaches. In our approach, we first consider a generative model trained by using labeled samples and introduce a bias correction model, where these models belong to the same model family but have different parameters. Then, we construct a hybrid classifier by combining these models based on the maximum entropy principle. To enable us to apply our hybrid approach to text classification problems, we employed naive Bayes models as the generative and bias correction models. Our experimental results for four text data sets confirmed that the generalization ability of our hybrid classifier was much improved by using a large number of unlabeled samples for training when there were too few labeled samples to obtain good performance. We also confirmed that our hybrid approach significantly outperformed the generative and discriminative approaches when the performance of the generative and discriminative approaches was comparable. Moreover, we examined the performance of our hybrid classifier when the labeled and unlabeled data distributions were different.</description><identifier>ISSN: 0162-8828</identifier><identifier>EISSN: 1939-3539</identifier><identifier>DOI: 10.1109/TPAMI.2007.70710</identifier><identifier>PMID: 18195437</identifier><identifier>CODEN: ITPIDJ</identifier><language>eng</language><publisher>Los Alamitos, CA: IEEE</publisher><subject>Algorithms ; Applied sciences ; Artificial Intelligence ; Bias ; bias correction ; Classification ; Classifiers ; Computer science; control theory; systems ; Computer Simulation ; Design engineering ; Design methodology ; Discriminant Analysis ; Entropy ; Exact sciences and technology ; generative model ; Hidden Markov models ; Hybrid power systems ; Information Storage and Retrieval - methods ; Learning ; Machine learning ; Mathematical models ; Maximum entropy ; maximum entropy principle ; Models, Statistical ; Pattern recognition ; Pattern Recognition, Automated - methods ; Predictive models ; Reproducibility of Results ; Semisupervised learning ; Sensitivity and Specificity ; Speech and sound recognition and synthesis. Linguistics ; Studies ; Supervised learning ; Text categorization ; text classification ; Texts ; unlabeled samples</subject><ispartof>IEEE transactions on pattern analysis and machine intelligence, 2008-03, Vol.30 (3), p.424-437</ispartof><rights>2008 INIST-CNRS</rights><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2008</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c469t-dfd1c2303e2cbc18e21cd02ae651c44ffd30a324ecdeeee5917e2179b54570d63</citedby><cites>FETCH-LOGICAL-c469t-dfd1c2303e2cbc18e21cd02ae651c44ffd30a324ecdeeee5917e2179b54570d63</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/4359332$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,796,27924,27925,54758</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/4359332$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc><backlink>$$Uhttp://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&amp;idt=20057496$$DView record in Pascal Francis$$Hfree_for_read</backlink><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/18195437$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Fujino, A.</creatorcontrib><creatorcontrib>Ueda, N.</creatorcontrib><creatorcontrib>Saito, K.</creatorcontrib><title>Semisupervised Learning for a Hybrid Generative/Discriminative Classifier based on the Maximum Entropy Principle</title><title>IEEE transactions on pattern analysis and machine intelligence</title><addtitle>TPAMI</addtitle><addtitle>IEEE Trans Pattern Anal Mach Intell</addtitle><description>This paper presents a method for designing semisupervised classifiers trained on labeled and unlabeled samples. We focus on a probabilistic semisupervised classifier design for multiclass and single-labeled classification problems and propose a hybrid approach that takes advantage of generative and discriminative approaches. In our approach, we first consider a generative model trained by using labeled samples and introduce a bias correction model, where these models belong to the same model family but have different parameters. Then, we construct a hybrid classifier by combining these models based on the maximum entropy principle. To enable us to apply our hybrid approach to text classification problems, we employed naive Bayes models as the generative and bias correction models. Our experimental results for four text data sets confirmed that the generalization ability of our hybrid classifier was much improved by using a large number of unlabeled samples for training when there were too few labeled samples to obtain good performance. We also confirmed that our hybrid approach significantly outperformed the generative and discriminative approaches when the performance of the generative and discriminative approaches was comparable. Moreover, we examined the performance of our hybrid classifier when the labeled and unlabeled data distributions were different.</description><subject>Algorithms</subject><subject>Applied sciences</subject><subject>Artificial Intelligence</subject><subject>Bias</subject><subject>bias correction</subject><subject>Classification</subject><subject>Classifiers</subject><subject>Computer science; control theory; systems</subject><subject>Computer Simulation</subject><subject>Design engineering</subject><subject>Design methodology</subject><subject>Discriminant Analysis</subject><subject>Entropy</subject><subject>Exact sciences and technology</subject><subject>generative model</subject><subject>Hidden Markov models</subject><subject>Hybrid power systems</subject><subject>Information Storage and Retrieval - methods</subject><subject>Learning</subject><subject>Machine learning</subject><subject>Mathematical models</subject><subject>Maximum entropy</subject><subject>maximum entropy principle</subject><subject>Models, Statistical</subject><subject>Pattern recognition</subject><subject>Pattern Recognition, Automated - methods</subject><subject>Predictive models</subject><subject>Reproducibility of Results</subject><subject>Semisupervised learning</subject><subject>Sensitivity and Specificity</subject><subject>Speech and sound recognition and synthesis. Linguistics</subject><subject>Studies</subject><subject>Supervised learning</subject><subject>Text categorization</subject><subject>text classification</subject><subject>Texts</subject><subject>unlabeled samples</subject><issn>0162-8828</issn><issn>1939-3539</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2008</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><sourceid>EIF</sourceid><recordid>eNqFks1r3DAQxUVpaTZp74VCEYW2J29GX5Z1DNs0CWxooOnZyPK4VfDajmSH7n8feXdJoYdGl0HMbx7MvEfIOwZLxsCc3t6cXV8tOYBeatAMXpAFM8JkQgnzkiyA5TwrCl4ckeMY7wCYVCBekyNWMKOk0Asy_MCNj9OA4cFHrOkabeh894s2faCWXm6r4Gt6gR0GO_oHPP3qowt-47vdl65aG6NvPAZa2Vmg7-j4G-m1_eM304aed2Pohy29Cb5zfmjxDXnV2Dbi20M9IT-_nd-uLrP194ur1dk6czI3Y1Y3NXNcgEDuKscK5MzVwC3mijkpm6YWYAWX6GpMTxmmE6JNpaTSUOfihHzZ6w6hv58wjmXa02Hb2g77KZYGRC654vJZstAKZEJn8vN_SQ2cMS7yZ0Ehpc4h5wn8-A9410-hS4cpi9RWYAxLEOwhF_oYAzblkBywYVsyKOcclLsclHMOyl0O0siHg-5UbbD-O3AwPgGfDoCNzrZNsMme-MQlKaWlmTd5v-d8uvJTWwplhODiERF1w-o</recordid><startdate>20080301</startdate><enddate>20080301</enddate><creator>Fujino, A.</creator><creator>Ueda, N.</creator><creator>Saito, K.</creator><general>IEEE</general><general>IEEE Computer Society</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>IQODW</scope><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>F28</scope><scope>FR3</scope><scope>7X8</scope></search><sort><creationdate>20080301</creationdate><title>Semisupervised Learning for a Hybrid Generative/Discriminative Classifier based on the Maximum Entropy Principle</title><author>Fujino, A. ; Ueda, N. ; Saito, K.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c469t-dfd1c2303e2cbc18e21cd02ae651c44ffd30a324ecdeeee5917e2179b54570d63</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2008</creationdate><topic>Algorithms</topic><topic>Applied sciences</topic><topic>Artificial Intelligence</topic><topic>Bias</topic><topic>bias correction</topic><topic>Classification</topic><topic>Classifiers</topic><topic>Computer science; control theory; systems</topic><topic>Computer Simulation</topic><topic>Design engineering</topic><topic>Design methodology</topic><topic>Discriminant Analysis</topic><topic>Entropy</topic><topic>Exact sciences and technology</topic><topic>generative model</topic><topic>Hidden Markov models</topic><topic>Hybrid power systems</topic><topic>Information Storage and Retrieval - methods</topic><topic>Learning</topic><topic>Machine learning</topic><topic>Mathematical models</topic><topic>Maximum entropy</topic><topic>maximum entropy principle</topic><topic>Models, Statistical</topic><topic>Pattern recognition</topic><topic>Pattern Recognition, Automated - methods</topic><topic>Predictive models</topic><topic>Reproducibility of Results</topic><topic>Semisupervised learning</topic><topic>Sensitivity and Specificity</topic><topic>Speech and sound recognition and synthesis. Linguistics</topic><topic>Studies</topic><topic>Supervised learning</topic><topic>Text categorization</topic><topic>text classification</topic><topic>Texts</topic><topic>unlabeled samples</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Fujino, A.</creatorcontrib><creatorcontrib>Ueda, N.</creatorcontrib><creatorcontrib>Saito, K.</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998–Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>Pascal-Francis</collection><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>ANTE: Abstracts in New Technology &amp; Engineering</collection><collection>Engineering Research Database</collection><collection>MEDLINE - Academic</collection><jtitle>IEEE transactions on pattern analysis and machine intelligence</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Fujino, A.</au><au>Ueda, N.</au><au>Saito, K.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Semisupervised Learning for a Hybrid Generative/Discriminative Classifier based on the Maximum Entropy Principle</atitle><jtitle>IEEE transactions on pattern analysis and machine intelligence</jtitle><stitle>TPAMI</stitle><addtitle>IEEE Trans Pattern Anal Mach Intell</addtitle><date>2008-03-01</date><risdate>2008</risdate><volume>30</volume><issue>3</issue><spage>424</spage><epage>437</epage><pages>424-437</pages><issn>0162-8828</issn><eissn>1939-3539</eissn><coden>ITPIDJ</coden><abstract>This paper presents a method for designing semisupervised classifiers trained on labeled and unlabeled samples. We focus on a probabilistic semisupervised classifier design for multiclass and single-labeled classification problems and propose a hybrid approach that takes advantage of generative and discriminative approaches. In our approach, we first consider a generative model trained by using labeled samples and introduce a bias correction model, where these models belong to the same model family but have different parameters. Then, we construct a hybrid classifier by combining these models based on the maximum entropy principle. To enable us to apply our hybrid approach to text classification problems, we employed naive Bayes models as the generative and bias correction models. Our experimental results for four text data sets confirmed that the generalization ability of our hybrid classifier was much improved by using a large number of unlabeled samples for training when there were too few labeled samples to obtain good performance. We also confirmed that our hybrid approach significantly outperformed the generative and discriminative approaches when the performance of the generative and discriminative approaches was comparable. Moreover, we examined the performance of our hybrid classifier when the labeled and unlabeled data distributions were different.</abstract><cop>Los Alamitos, CA</cop><pub>IEEE</pub><pmid>18195437</pmid><doi>10.1109/TPAMI.2007.70710</doi><tpages>14</tpages></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 0162-8828
ispartof IEEE transactions on pattern analysis and machine intelligence, 2008-03, Vol.30 (3), p.424-437
issn 0162-8828
1939-3539
language eng
recordid cdi_proquest_journals_862350991
source IEEE Electronic Library (IEL)
subjects Algorithms
Applied sciences
Artificial Intelligence
Bias
bias correction
Classification
Classifiers
Computer science
control theory
systems
Computer Simulation
Design engineering
Design methodology
Discriminant Analysis
Entropy
Exact sciences and technology
generative model
Hidden Markov models
Hybrid power systems
Information Storage and Retrieval - methods
Learning
Machine learning
Mathematical models
Maximum entropy
maximum entropy principle
Models, Statistical
Pattern recognition
Pattern Recognition, Automated - methods
Predictive models
Reproducibility of Results
Semisupervised learning
Sensitivity and Specificity
Speech and sound recognition and synthesis. Linguistics
Studies
Supervised learning
Text categorization
text classification
Texts
unlabeled samples
title Semisupervised Learning for a Hybrid Generative/Discriminative Classifier based on the Maximum Entropy Principle
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-01T00%3A47%3A47IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Semisupervised%20Learning%20for%20a%20Hybrid%20Generative/Discriminative%20Classifier%20based%20on%20the%20Maximum%20Entropy%20Principle&rft.jtitle=IEEE%20transactions%20on%20pattern%20analysis%20and%20machine%20intelligence&rft.au=Fujino,%20A.&rft.date=2008-03-01&rft.volume=30&rft.issue=3&rft.spage=424&rft.epage=437&rft.pages=424-437&rft.issn=0162-8828&rft.eissn=1939-3539&rft.coden=ITPIDJ&rft_id=info:doi/10.1109/TPAMI.2007.70710&rft_dat=%3Cproquest_RIE%3E875044254%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=862350991&rft_id=info:pmid/18195437&rft_ieee_id=4359332&rfr_iscdi=true