A New PU Learning Algorithm for Text Classification

This paper studies the problem of building text classifiers using positive and unlabeled examples. The primary challenge of this problem as compared with classical text classification problem is that no labeled negative documents are available in the training example set. We call this problem PU-Ori...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Yu, Hailong, Zuo, Wanli, Peng, Tao
Format:	Buchkapitel
Sprache:	eng
Schlagworte:	Final Classifier Negative Data Probably Approximately Correct Unlabeled Data Weighted Vote
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	832
container_issue
container_start_page	824
container_title
container_volume
creator	Yu, Hailong Zuo, Wanli Peng, Tao
description	This paper studies the problem of building text classifiers using positive and unlabeled examples. The primary challenge of this problem as compared with classical text classification problem is that no labeled negative documents are available in the training example set. We call this problem PU-Oriented text Classification. Our text classifier adopts traditional two-step approach by making use of both positive and unlabeled examples. In the first step, we improved the 1-DNF algorithm by identifying much more reliable negative documents with very low error rate. In the second step, we build a set of classifiers by iteratively applying SVM algorithm on training data set, which is augmented during iteration. Different from previous PU-oriented text classification works, we adopt the weighted vote of all classifiers generated in the iteration steps to construct the final classifier instead of choosing one of the classifiers as the final classifier. Experimental results on the Reuter data set show that our method increases the performance (F1-measure) of classifier by 1.734 percent compared with PEBL.
doi_str_mv	10.1007/11579427_84
format	Book Chapter
fullrecord	<record><control><sourceid>springer</sourceid><recordid>TN_cdi_springer_books_10_1007_11579427_84</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>springer_books_10_1007_11579427_84</sourcerecordid><originalsourceid>FETCH-LOGICAL-s189t-196d7b683d929e9dcc22e955481a823bd6ad2ef39c4358d50c7802fb5d7159ac3</originalsourceid><addsrcrecordid>eNpNkE1PhDAURetXIo6s_APdukD7-lratyQTR02IuphZk0LLiCIYSqI_Xya68G7u4iYnN4exKxA3IIS5BdCGlDSVVUfsArUSCLlGOGYJ5AAZoqITlpKxh02SpVycskSgkBkZhecsjfFNLEGwJE3CsOBP4Yu_7HgZ3DR0w54X_X6cuvn1g7fjxLfhe-br3sXYtV3j5m4cLtlZ6_oY0r9esd3mbrt-yMrn-8d1UWZxgc8ZUO5NnVv0JCmQbxopA2mtLDgrsfa58zK0SI1Cbb0WjbFCtrX2BjS5Blfs-pcbP6flWJiqehzfYwWiOtio_tnAH1HXSso</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>book_chapter</recordtype></control><display><type>book_chapter</type><title>A New PU Learning Algorithm for Text Classification</title><source>Springer Books</source><creator>Yu, Hailong ; Zuo, Wanli ; Peng, Tao</creator><contributor>Terashima-Marín, Hugo ; de Albornoz, Álvaro ; Gelbukh, Alexander</contributor><creatorcontrib>Yu, Hailong ; Zuo, Wanli ; Peng, Tao ; Terashima-Marín, Hugo ; de Albornoz, Álvaro ; Gelbukh, Alexander</creatorcontrib><description>This paper studies the problem of building text classifiers using positive and unlabeled examples. The primary challenge of this problem as compared with classical text classification problem is that no labeled negative documents are available in the training example set. We call this problem PU-Oriented text Classification. Our text classifier adopts traditional two-step approach by making use of both positive and unlabeled examples. In the first step, we improved the 1-DNF algorithm by identifying much more reliable negative documents with very low error rate. In the second step, we build a set of classifiers by iteratively applying SVM algorithm on training data set, which is augmented during iteration. Different from previous PU-oriented text classification works, we adopt the weighted vote of all classifiers generated in the iteration steps to construct the final classifier instead of choosing one of the classifiers as the final classifier. Experimental results on the Reuter data set show that our method increases the performance (F1-measure) of classifier by 1.734 percent compared with PEBL.</description><identifier>ISSN: 0302-9743</identifier><identifier>ISBN: 9783540298960</identifier><identifier>ISBN: 3540298967</identifier><identifier>EISSN: 1611-3349</identifier><identifier>EISBN: 3540316531</identifier><identifier>EISBN: 9783540316534</identifier><identifier>DOI: 10.1007/11579427_84</identifier><language>eng</language><publisher>Berlin, Heidelberg: Springer Berlin Heidelberg</publisher><subject>Final Classifier ; Negative Data ; Probably Approximately Correct ; Unlabeled Data ; Weighted Vote</subject><ispartof>MICAI 2005: Advances in Artificial Intelligence, 2005, p.824-832</ispartof><rights>Springer-Verlag Berlin Heidelberg 2005</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><relation>Lecture Notes in Computer Science</relation></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/11579427_84$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/11579427_84$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>775,776,780,789,27902,38232,41418,42487</link.rule.ids></links><search><contributor>Terashima-Marín, Hugo</contributor><contributor>de Albornoz, Álvaro</contributor><contributor>Gelbukh, Alexander</contributor><creatorcontrib>Yu, Hailong</creatorcontrib><creatorcontrib>Zuo, Wanli</creatorcontrib><creatorcontrib>Peng, Tao</creatorcontrib><title>A New PU Learning Algorithm for Text Classification</title><title>MICAI 2005: Advances in Artificial Intelligence</title><description>This paper studies the problem of building text classifiers using positive and unlabeled examples. The primary challenge of this problem as compared with classical text classification problem is that no labeled negative documents are available in the training example set. We call this problem PU-Oriented text Classification. Our text classifier adopts traditional two-step approach by making use of both positive and unlabeled examples. In the first step, we improved the 1-DNF algorithm by identifying much more reliable negative documents with very low error rate. In the second step, we build a set of classifiers by iteratively applying SVM algorithm on training data set, which is augmented during iteration. Different from previous PU-oriented text classification works, we adopt the weighted vote of all classifiers generated in the iteration steps to construct the final classifier instead of choosing one of the classifiers as the final classifier. Experimental results on the Reuter data set show that our method increases the performance (F1-measure) of classifier by 1.734 percent compared with PEBL.</description><subject>Final Classifier</subject><subject>Negative Data</subject><subject>Probably Approximately Correct</subject><subject>Unlabeled Data</subject><subject>Weighted Vote</subject><issn>0302-9743</issn><issn>1611-3349</issn><isbn>9783540298960</isbn><isbn>3540298967</isbn><isbn>3540316531</isbn><isbn>9783540316534</isbn><fulltext>true</fulltext><rsrctype>book_chapter</rsrctype><creationdate>2005</creationdate><recordtype>book_chapter</recordtype><sourceid/><recordid>eNpNkE1PhDAURetXIo6s_APdukD7-lratyQTR02IuphZk0LLiCIYSqI_Xya68G7u4iYnN4exKxA3IIS5BdCGlDSVVUfsArUSCLlGOGYJ5AAZoqITlpKxh02SpVycskSgkBkZhecsjfFNLEGwJE3CsOBP4Yu_7HgZ3DR0w54X_X6cuvn1g7fjxLfhe-br3sXYtV3j5m4cLtlZ6_oY0r9esd3mbrt-yMrn-8d1UWZxgc8ZUO5NnVv0JCmQbxopA2mtLDgrsfa58zK0SI1Cbb0WjbFCtrX2BjS5Blfs-pcbP6flWJiqehzfYwWiOtio_tnAH1HXSso</recordid><startdate>2005</startdate><enddate>2005</enddate><creator>Yu, Hailong</creator><creator>Zuo, Wanli</creator><creator>Peng, Tao</creator><general>Springer Berlin Heidelberg</general><scope/></search><sort><creationdate>2005</creationdate><title>A New PU Learning Algorithm for Text Classification</title><author>Yu, Hailong ; Zuo, Wanli ; Peng, Tao</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-s189t-196d7b683d929e9dcc22e955481a823bd6ad2ef39c4358d50c7802fb5d7159ac3</frbrgroupid><rsrctype>book_chapters</rsrctype><prefilter>book_chapters</prefilter><language>eng</language><creationdate>2005</creationdate><topic>Final Classifier</topic><topic>Negative Data</topic><topic>Probably Approximately Correct</topic><topic>Unlabeled Data</topic><topic>Weighted Vote</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Yu, Hailong</creatorcontrib><creatorcontrib>Zuo, Wanli</creatorcontrib><creatorcontrib>Peng, Tao</creatorcontrib></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Yu, Hailong</au><au>Zuo, Wanli</au><au>Peng, Tao</au><au>Terashima-Marín, Hugo</au><au>de Albornoz, Álvaro</au><au>Gelbukh, Alexander</au><format>book</format><genre>bookitem</genre><ristype>CHAP</ristype><atitle>A New PU Learning Algorithm for Text Classification</atitle><btitle>MICAI 2005: Advances in Artificial Intelligence</btitle><seriestitle>Lecture Notes in Computer Science</seriestitle><date>2005</date><risdate>2005</risdate><spage>824</spage><epage>832</epage><pages>824-832</pages><issn>0302-9743</issn><eissn>1611-3349</eissn><isbn>9783540298960</isbn><isbn>3540298967</isbn><eisbn>3540316531</eisbn><eisbn>9783540316534</eisbn><abstract>This paper studies the problem of building text classifiers using positive and unlabeled examples. The primary challenge of this problem as compared with classical text classification problem is that no labeled negative documents are available in the training example set. We call this problem PU-Oriented text Classification. Our text classifier adopts traditional two-step approach by making use of both positive and unlabeled examples. In the first step, we improved the 1-DNF algorithm by identifying much more reliable negative documents with very low error rate. In the second step, we build a set of classifiers by iteratively applying SVM algorithm on training data set, which is augmented during iteration. Different from previous PU-oriented text classification works, we adopt the weighted vote of all classifiers generated in the iteration steps to construct the final classifier instead of choosing one of the classifiers as the final classifier. Experimental results on the Reuter data set show that our method increases the performance (F1-measure) of classifier by 1.734 percent compared with PEBL.</abstract><cop>Berlin, Heidelberg</cop><pub>Springer Berlin Heidelberg</pub><doi>10.1007/11579427_84</doi><tpages>9</tpages></addata></record>
fulltext	fulltext
identifier	ISSN: 0302-9743
ispartof	MICAI 2005: Advances in Artificial Intelligence, 2005, p.824-832
issn	0302-9743 1611-3349
language	eng
recordid	cdi_springer_books_10_1007_11579427_84
source	Springer Books
subjects	Final Classifier Negative Data Probably Approximately Correct Unlabeled Data Weighted Vote
title	A New PU Learning Algorithm for Text Classification
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-03T17%3A35%3A31IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-springer&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=bookitem&rft.atitle=A%20New%20PU%20Learning%20Algorithm%20for%20Text%20Classification&rft.btitle=MICAI%202005:%20Advances%20in%20Artificial%20Intelligence&rft.au=Yu,%20Hailong&rft.date=2005&rft.spage=824&rft.epage=832&rft.pages=824-832&rft.issn=0302-9743&rft.eissn=1611-3349&rft.isbn=9783540298960&rft.isbn_list=3540298967&rft_id=info:doi/10.1007/11579427_84&rft_dat=%3Cspringer%3Espringer_books_10_1007_11579427_84%3C/springer%3E%3Curl%3E%3C/url%3E&rft.eisbn=3540316531&rft.eisbn_list=9783540316534&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true