Scalable Teacher Forcing Network for Semi-Supervised Large Scale Data Streams
Information Sciences, 2021 The large-scale data stream problem refers to high-speed information flow which cannot be processed in scalable manner under a traditional computing platform. This problem also imposes expensive labelling cost making the deployment of fully supervised algorithms unfeasible...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Information Sciences, 2021 The large-scale data stream problem refers to high-speed information flow
which cannot be processed in scalable manner under a traditional computing
platform. This problem also imposes expensive labelling cost making the
deployment of fully supervised algorithms unfeasible. On the other hand, the
problem of semi-supervised large-scale data streams is little explored in the
literature because most works are designed in the traditional single-node
computing environments while also being fully supervised approaches. This paper
offers Weakly Supervised Scalable Teacher Forcing Network (WeScatterNet) to
cope with the scarcity of labelled samples and the large-scale data streams
simultaneously. WeScatterNet is crafted under distributed computing platform of
Apache Spark with a data-free model fusion strategy for model compression after
parallel computing stage. It features an open network structure to address the
global and local drift problems while integrating a data augmentation,
annotation and auto-correction ($DA^3$) method for handling partially labelled
data streams. The performance of WeScatterNet is numerically evaluated in the
six large-scale data stream problems with only $25\%$ label proportions. It
shows highly competitive performance even if compared with fully supervised
learners with $100\%$ label proportions. |
---|---|
DOI: | 10.48550/arxiv.2107.02943 |