Using bipartite heterogeneous networks to speed up inductive semi-supervised learning and improve automatic text categorization

Due to the volume of texts available in digital form, the organization, management and knowledge extraction are laborious and frequently impossible to be handled. To automatically cope with these tasks, usually classification models are generated through supervised learning techniques. Unfortunately...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Knowledge-based systems 2017-09, Vol.132, p.94-118
Hauptverfasser:	Geraldeli Rossi, Rafael, Andrade Lopes, Alneu de, Oliveira Rezende, Solange
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Analogies Bipartite heterogeneous network Classification Graph-based learning Label propagation Machine learning Networks Semi-supervised learning Text categorization Text classification Texts Transductive learning
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	118
container_issue
container_start_page	94
container_title	Knowledge-based systems
container_volume	132
creator	Geraldeli Rossi, Rafael Andrade Lopes, Alneu de Oliveira Rezende, Solange
description	Due to the volume of texts available in digital form, the organization, management and knowledge extraction are laborious and frequently impossible to be handled. To automatically cope with these tasks, usually classification models are generated through supervised learning techniques. Unfortunately, this type of learning usually demands a huge human effort to label large volume of texts to build accurate classification models. Since collecting unlabeled texts is easy and inexpensive in several domains, the generation of classification models through inductive semi-supervised learning has been highlighted in recent years. Inductive semi-supervised learning allows to build a classification model using labeled and unlabeled texts. In this scenario, the goal is to augment the set of labeled documents with unlabeled documents to better discriminate class patterns. Hence, fewer texts must be previously labeled. However, semi-supervised learning algorithms that consider texts represented in a vector space model usually obtain unsatisfactory classification performances and are surpassed by semi-supervised learning algorithms that consider texts represented in a network. Nevertheless, despite the classification performances, effective approaches based on networks are generated through the similarities among documents and the classification of a new document are also based on the computation of similarities. This implies to set parameters and compute similarities to both generation the networks and classification of new documents. This approach is not feasible to generate fast responses and consequently to classify a huge volume of texts. In this article, we propose an approach to induce a classification model through semi-supervised learning considering text collections represented by bipartite heterogeneous networks. Bipartite networks are easily and quickly generated, leading to classification performance equivalent or better than other approaches based on network or vector space model and allows a fast classification of new documents. The results presented in this article demonstrate that the proposed approach is able to (i) speed up semi-supervised learning, (ii) speed up the classification of new documents and (iii) surpass classification performance of other existing inductive semi-supervised learning techniques.
doi_str_mv	10.1016/j.knosys.2017.06.016
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_1941699199</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0950705117302903</els_id><sourcerecordid>1941699199</sourcerecordid><originalsourceid>FETCH-LOGICAL-c334t-45f0c00f2cf1663cdbcc6a389721aa8781537fd17f27f825ebbe7530d93329f13</originalsourceid><addsrcrecordid>eNp9UMtqHDEQFMaGrB9_kIMg5xm3pJnR6BIIJn6AIRf7LLSa1kZrrzSRNJs4F_96tGzOPjVUV1VXFyGfGbQM2HC9bV9CzG-55cBkC0NbwROyYqPkjexAnZIVqB4aCT37RM5z3gIA52xckffn7MOGrv1sUvEF6U8smOIGA8Yl04Dld0wvmZZI84w40WWmPkyLLX6PNOPON3mZMe19rstXNCkc_EyYqN_NKVaSWUrcmeItLfinUGsKbmLyfysUwyU5c-Y149X_eUGeb78_3dw3jz_uHm6-PTZWiK40Xe_AAjhuHRsGYae1tYMRo5KcGTPKkfVCuolJx6UbeY_rNcpewKSE4MoxcUG-HH1rpl8L5qK3cUmhntRMdWxQiilVWd2RZVPMOaHTc_I7k940A32oWm_1sWp9qFrDoCtYZV-PMqwf7D0mna3HYHHyCW3RU_QfG_wDa1GNeA</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1941699199</pqid></control><display><type>article</type><title>Using bipartite heterogeneous networks to speed up inductive semi-supervised learning and improve automatic text categorization</title><source>Access via ScienceDirect (Elsevier)</source><creator>Geraldeli Rossi, Rafael ; Andrade Lopes, Alneu de ; Oliveira Rezende, Solange</creator><creatorcontrib>Geraldeli Rossi, Rafael ; Andrade Lopes, Alneu de ; Oliveira Rezende, Solange</creatorcontrib><description>Due to the volume of texts available in digital form, the organization, management and knowledge extraction are laborious and frequently impossible to be handled. To automatically cope with these tasks, usually classification models are generated through supervised learning techniques. Unfortunately, this type of learning usually demands a huge human effort to label large volume of texts to build accurate classification models. Since collecting unlabeled texts is easy and inexpensive in several domains, the generation of classification models through inductive semi-supervised learning has been highlighted in recent years. Inductive semi-supervised learning allows to build a classification model using labeled and unlabeled texts. In this scenario, the goal is to augment the set of labeled documents with unlabeled documents to better discriminate class patterns. Hence, fewer texts must be previously labeled. However, semi-supervised learning algorithms that consider texts represented in a vector space model usually obtain unsatisfactory classification performances and are surpassed by semi-supervised learning algorithms that consider texts represented in a network. Nevertheless, despite the classification performances, effective approaches based on networks are generated through the similarities among documents and the classification of a new document are also based on the computation of similarities. This implies to set parameters and compute similarities to both generation the networks and classification of new documents. This approach is not feasible to generate fast responses and consequently to classify a huge volume of texts. In this article, we propose an approach to induce a classification model through semi-supervised learning considering text collections represented by bipartite heterogeneous networks. Bipartite networks are easily and quickly generated, leading to classification performance equivalent or better than other approaches based on network or vector space model and allows a fast classification of new documents. The results presented in this article demonstrate that the proposed approach is able to (i) speed up semi-supervised learning, (ii) speed up the classification of new documents and (iii) surpass classification performance of other existing inductive semi-supervised learning techniques.</description><identifier>ISSN: 0950-7051</identifier><identifier>EISSN: 1872-7409</identifier><identifier>DOI: 10.1016/j.knosys.2017.06.016</identifier><language>eng</language><publisher>Amsterdam: Elsevier B.V</publisher><subject>Algorithms ; Analogies ; Bipartite heterogeneous network ; Classification ; Graph-based learning ; Label propagation ; Machine learning ; Networks ; Semi-supervised learning ; Text categorization ; Text classification ; Texts ; Transductive learning</subject><ispartof>Knowledge-based systems, 2017-09, Vol.132, p.94-118</ispartof><rights>2017</rights><rights>Copyright Elsevier Science Ltd. Sep 15, 2017</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c334t-45f0c00f2cf1663cdbcc6a389721aa8781537fd17f27f825ebbe7530d93329f13</citedby><cites>FETCH-LOGICAL-c334t-45f0c00f2cf1663cdbcc6a389721aa8781537fd17f27f825ebbe7530d93329f13</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://dx.doi.org/10.1016/j.knosys.2017.06.016$$EHTML$$P50$$Gelsevier$$H</linktohtml><link.rule.ids>315,782,786,3552,27931,27932,46002</link.rule.ids></links><search><creatorcontrib>Geraldeli Rossi, Rafael</creatorcontrib><creatorcontrib>Andrade Lopes, Alneu de</creatorcontrib><creatorcontrib>Oliveira Rezende, Solange</creatorcontrib><title>Using bipartite heterogeneous networks to speed up inductive semi-supervised learning and improve automatic text categorization</title><title>Knowledge-based systems</title><description>Due to the volume of texts available in digital form, the organization, management and knowledge extraction are laborious and frequently impossible to be handled. To automatically cope with these tasks, usually classification models are generated through supervised learning techniques. Unfortunately, this type of learning usually demands a huge human effort to label large volume of texts to build accurate classification models. Since collecting unlabeled texts is easy and inexpensive in several domains, the generation of classification models through inductive semi-supervised learning has been highlighted in recent years. Inductive semi-supervised learning allows to build a classification model using labeled and unlabeled texts. In this scenario, the goal is to augment the set of labeled documents with unlabeled documents to better discriminate class patterns. Hence, fewer texts must be previously labeled. However, semi-supervised learning algorithms that consider texts represented in a vector space model usually obtain unsatisfactory classification performances and are surpassed by semi-supervised learning algorithms that consider texts represented in a network. Nevertheless, despite the classification performances, effective approaches based on networks are generated through the similarities among documents and the classification of a new document are also based on the computation of similarities. This implies to set parameters and compute similarities to both generation the networks and classification of new documents. This approach is not feasible to generate fast responses and consequently to classify a huge volume of texts. In this article, we propose an approach to induce a classification model through semi-supervised learning considering text collections represented by bipartite heterogeneous networks. Bipartite networks are easily and quickly generated, leading to classification performance equivalent or better than other approaches based on network or vector space model and allows a fast classification of new documents. The results presented in this article demonstrate that the proposed approach is able to (i) speed up semi-supervised learning, (ii) speed up the classification of new documents and (iii) surpass classification performance of other existing inductive semi-supervised learning techniques.</description><subject>Algorithms</subject><subject>Analogies</subject><subject>Bipartite heterogeneous network</subject><subject>Classification</subject><subject>Graph-based learning</subject><subject>Label propagation</subject><subject>Machine learning</subject><subject>Networks</subject><subject>Semi-supervised learning</subject><subject>Text categorization</subject><subject>Text classification</subject><subject>Texts</subject><subject>Transductive learning</subject><issn>0950-7051</issn><issn>1872-7409</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2017</creationdate><recordtype>article</recordtype><recordid>eNp9UMtqHDEQFMaGrB9_kIMg5xm3pJnR6BIIJn6AIRf7LLSa1kZrrzSRNJs4F_96tGzOPjVUV1VXFyGfGbQM2HC9bV9CzG-55cBkC0NbwROyYqPkjexAnZIVqB4aCT37RM5z3gIA52xckffn7MOGrv1sUvEF6U8smOIGA8Yl04Dld0wvmZZI84w40WWmPkyLLX6PNOPON3mZMe19rstXNCkc_EyYqN_NKVaSWUrcmeItLfinUGsKbmLyfysUwyU5c-Y149X_eUGeb78_3dw3jz_uHm6-PTZWiK40Xe_AAjhuHRsGYae1tYMRo5KcGTPKkfVCuolJx6UbeY_rNcpewKSE4MoxcUG-HH1rpl8L5qK3cUmhntRMdWxQiilVWd2RZVPMOaHTc_I7k940A32oWm_1sWp9qFrDoCtYZV-PMqwf7D0mna3HYHHyCW3RU_QfG_wDa1GNeA</recordid><startdate>20170915</startdate><enddate>20170915</enddate><creator>Geraldeli Rossi, Rafael</creator><creator>Andrade Lopes, Alneu de</creator><creator>Oliveira Rezende, Solange</creator><general>Elsevier B.V</general><general>Elsevier Science Ltd</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>E3H</scope><scope>F2A</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>20170915</creationdate><title>Using bipartite heterogeneous networks to speed up inductive semi-supervised learning and improve automatic text categorization</title><author>Geraldeli Rossi, Rafael ; Andrade Lopes, Alneu de ; Oliveira Rezende, Solange</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c334t-45f0c00f2cf1663cdbcc6a389721aa8781537fd17f27f825ebbe7530d93329f13</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2017</creationdate><topic>Algorithms</topic><topic>Analogies</topic><topic>Bipartite heterogeneous network</topic><topic>Classification</topic><topic>Graph-based learning</topic><topic>Label propagation</topic><topic>Machine learning</topic><topic>Networks</topic><topic>Semi-supervised learning</topic><topic>Text categorization</topic><topic>Text classification</topic><topic>Texts</topic><topic>Transductive learning</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Geraldeli Rossi, Rafael</creatorcontrib><creatorcontrib>Andrade Lopes, Alneu de</creatorcontrib><creatorcontrib>Oliveira Rezende, Solange</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>Library & Information Sciences Abstracts (LISA)</collection><collection>Library & Information Science Abstracts (LISA)</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Knowledge-based systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Geraldeli Rossi, Rafael</au><au>Andrade Lopes, Alneu de</au><au>Oliveira Rezende, Solange</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Using bipartite heterogeneous networks to speed up inductive semi-supervised learning and improve automatic text categorization</atitle><jtitle>Knowledge-based systems</jtitle><date>2017-09-15</date><risdate>2017</risdate><volume>132</volume><spage>94</spage><epage>118</epage><pages>94-118</pages><issn>0950-7051</issn><eissn>1872-7409</eissn><abstract>Due to the volume of texts available in digital form, the organization, management and knowledge extraction are laborious and frequently impossible to be handled. To automatically cope with these tasks, usually classification models are generated through supervised learning techniques. Unfortunately, this type of learning usually demands a huge human effort to label large volume of texts to build accurate classification models. Since collecting unlabeled texts is easy and inexpensive in several domains, the generation of classification models through inductive semi-supervised learning has been highlighted in recent years. Inductive semi-supervised learning allows to build a classification model using labeled and unlabeled texts. In this scenario, the goal is to augment the set of labeled documents with unlabeled documents to better discriminate class patterns. Hence, fewer texts must be previously labeled. However, semi-supervised learning algorithms that consider texts represented in a vector space model usually obtain unsatisfactory classification performances and are surpassed by semi-supervised learning algorithms that consider texts represented in a network. Nevertheless, despite the classification performances, effective approaches based on networks are generated through the similarities among documents and the classification of a new document are also based on the computation of similarities. This implies to set parameters and compute similarities to both generation the networks and classification of new documents. This approach is not feasible to generate fast responses and consequently to classify a huge volume of texts. In this article, we propose an approach to induce a classification model through semi-supervised learning considering text collections represented by bipartite heterogeneous networks. Bipartite networks are easily and quickly generated, leading to classification performance equivalent or better than other approaches based on network or vector space model and allows a fast classification of new documents. The results presented in this article demonstrate that the proposed approach is able to (i) speed up semi-supervised learning, (ii) speed up the classification of new documents and (iii) surpass classification performance of other existing inductive semi-supervised learning techniques.</abstract><cop>Amsterdam</cop><pub>Elsevier B.V</pub><doi>10.1016/j.knosys.2017.06.016</doi><tpages>25</tpages></addata></record>
fulltext	fulltext
identifier	ISSN: 0950-7051
ispartof	Knowledge-based systems, 2017-09, Vol.132, p.94-118
issn	0950-7051 1872-7409
language	eng
recordid	cdi_proquest_journals_1941699199
source	Access via ScienceDirect (Elsevier)
subjects	Algorithms Analogies Bipartite heterogeneous network Classification Graph-based learning Label propagation Machine learning Networks Semi-supervised learning Text categorization Text classification Texts Transductive learning
title	Using bipartite heterogeneous networks to speed up inductive semi-supervised learning and improve automatic text categorization
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-04T16%3A17%3A13IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Using%20bipartite%20heterogeneous%20networks%20to%20speed%20up%20inductive%20semi-supervised%20learning%20and%20improve%20automatic%20text%20categorization&rft.jtitle=Knowledge-based%20systems&rft.au=Geraldeli%20Rossi,%20Rafael&rft.date=2017-09-15&rft.volume=132&rft.spage=94&rft.epage=118&rft.pages=94-118&rft.issn=0950-7051&rft.eissn=1872-7409&rft_id=info:doi/10.1016/j.knosys.2017.06.016&rft_dat=%3Cproquest_cross%3E1941699199%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1941699199&rft_id=info:pmid/&rft_els_id=S0950705117302903&rfr_iscdi=true