High Performance GPU Tensor Completion With Tubal-Sampling Pattern

Data completion is a problem of filling missing or unobserved elements of partially observed datasets. Data completion algorithms have received wide attention and achievements in diverse domains including data mining, signal processing, and computer vision. We observe a ubiquitous tubal-sampling pat...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on parallel and distributed systems 2020-07, Vol.31 (7), p.1724-1739
Hauptverfasser: Zhang, Tao, Liu, Xiao-Yang, Wang, Xiaodong
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Data completion is a problem of filling missing or unobserved elements of partially observed datasets. Data completion algorithms have received wide attention and achievements in diverse domains including data mining, signal processing, and computer vision. We observe a ubiquitous tubal-sampling pattern in big data and Internet of Things (IoT) applications, which is introduced by many reasons such as high data acquisition cost, downsampling for data compression, sensor node failures, and packet losses in low-power wireless transmissions. To meet the time and accuracy requirements of applications, data completion methods are expected to be accurate as well as fast. However, the existing methods for data completion with the tubal-sampling pattern are either accurate or fast, but not both. In this article, we propose high-performance graphics processing unit (GPU) tensor completion for data completion with the tubal-sampling pattern. First, by exploiting the convolution theorem, we split a tensor least-squares minimization problem into multiple least-squares sub-problems in the frequency domain. In this way, massive parallelisms are exposed for many-core GPU architectures while still preserving high recovery accuracy. Second, we propose computing slice-level and tube-level tasks in batches to improve GPU utilization. Third, we reduce the data transfer cost by eliminating the accesses to the CPU memory inside algorithm loop structures. The experimental results show that the proposed tensor completion is both fast and accurate. Using synthetic data of varying sizes, the proposed GPU tensor completion achieves maximum 248.18 \times 248.18× , 7,403.27 \times 7,403.27× , and 33.27 \times 33.27× speedups over the CPU MATLAB implementation, GPU elem
ISSN:1045-9219
1558-2183
DOI:10.1109/TPDS.2020.2975196