Optimization and label propagation in bipartite heterogeneous networks to improve transductive classification of texts

•Scalable algorithm based on bipartite networks to perform transduction.•Unlabeled data effectively employed to improve classification performance.•Better performance than algorithms based on vector space model or networks.•Rigorous evaluation to show the drawbacks of the existing transductive algor...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Information processing & management 2016-03, Vol.52 (2), p.217-257
Hauptverfasser:	Rossi, Rafael Geraldeli, Lopes, Alneu de Andrade, Rezende, Solange Oliveira
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Bipartite heterogeneous network Classification Collection Computer networks Data mining Graph-based learning Label propagation Labels Mathematical models Networks Optimization Optimization algorithms Propagation Studies Text classification Text mining Texts Transductive learning Vector space Vector spaces
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	•Scalable algorithm based on bipartite networks to perform transduction.•Unlabeled data effectively employed to improve classification performance.•Better performance than algorithms based on vector space model or networks.•Rigorous evaluation to show the drawbacks of the existing transductive algorithms.•Trade-off analysis between inductive supervised and transductive classification. Transductive classification is a useful way to classify texts when labeled training examples are insufficient. Several algorithms to perform transductive classification considering text collections represented in a vector space model have been proposed. However, the use of these algorithms is unfeasible in practical applications due to the independence assumption among instances or terms and the drawbacks of these algorithms. Network-based algorithms come up to avoid the drawbacks of the algorithms based on vector space model and to improve transductive classification. Networks are mostly used for label propagation, in which some labeled objects propagate their labels to other objects through the network connections. Bipartite networks are useful to represent text collections as networks and perform label propagation. The generation of this type of network avoids requirements such as collections with hyperlinks or citations, computation of similarities among all texts in the collection, as well as the setup of a number of parameters. In a bipartite heterogeneous network, objects correspond to documents and terms, and the connections are given by the occurrences of terms in documents. The label propagation is performed from documents to terms and then from terms to documents iteratively. Nevertheless, instead of using terms just as means of label propagation, in this article we propose the use of the bipartite network structure to define the relevance scores of terms for classes through an optimization process and then propagate these relevance scores to define labels for unlabeled documents. The new document labels are used to redefine the relevance scores of terms which consequently redefine the labels of unlabeled documents in an iterative process. We demonstrated that the proposed approach surpasses the algorithms for transductive classification based on vector space model or networks. Moreover, we demonstrated that the proposed algorithm effectively makes use of unlabeled documents to improve classification and it is faster than other transductive algorithms.
ISSN:	0306-4573 1873-5371
DOI:	10.1016/j.ipm.2015.07.004