Incremental feature selection for efficient classification of dynamic graph bags

Summary Learning and analyzing graph data is one of the most fundamental research areas in machine learning and data mining. Among numerous graph‐based data structures, this paper focuses on a graph bag (simply, bag), which corresponds to a training object containing one or more graphs, and a label...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Concurrency and computation 2020-09, Vol.32 (18), p.n/a
Hauptverfasser: Chae, Dong‐Kyu, Kim, Bo‐Kyum, Kim, Seung‐Ho, Kim, Sang‐Wook
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Summary Learning and analyzing graph data is one of the most fundamental research areas in machine learning and data mining. Among numerous graph‐based data structures, this paper focuses on a graph bag (simply, bag), which corresponds to a training object containing one or more graphs, and a label is available only for a bag. This type of a bag can represent various real‐world objects such as drugs, web pages, XML documents, and images, among many others, and there have been many researches on models for learning this type of bag data. Within this research context, we define a novel problem of dynamic graph bag classification, and propose an algorithm to solve this problem. Dynamic bag classification aims to build a classification model for bags, which are presented in a streaming fashion, ie, frequent emerging of new bags or graphs over time. Given such changes made to the bag dataset, our proposed algorithm aims to update incrementally the top‐m most discriminative features instead of searching for them from scratch. Incremental gSpan and incremental gScore are proposed as core parts of our algorithm to deal with a stream of bags efficiently. We evaluate our algorithm on two real‐world datasets in terms of both feature selection time and classification accuracy. The experimental results demonstrate that our algorithm derives an informative feature set much faster than the existing one originally designed for targeting static bag data, with little accuracy loss.
ISSN:1532-0626
1532-0634
DOI:10.1002/cpe.5502