High Performance GPU Tensor Completion With Tubal-Sampling Pattern
Data completion is a problem of filling missing or unobserved elements of partially observed datasets. Data completion algorithms have received wide attention and achievements in diverse domains including data mining, signal processing, and computer vision. We observe a ubiquitous tubal-sampling pat...
Gespeichert in:
Veröffentlicht in: | IEEE transactions on parallel and distributed systems 2020-07, Vol.31 (7), p.1724-1739 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Data completion is a problem of filling missing or unobserved elements of partially observed datasets. Data completion algorithms have received wide attention and achievements in diverse domains including data mining, signal processing, and computer vision. We observe a ubiquitous tubal-sampling pattern in big data and Internet of Things (IoT) applications, which is introduced by many reasons such as high data acquisition cost, downsampling for data compression, sensor node failures, and packet losses in low-power wireless transmissions. To meet the time and accuracy requirements of applications, data completion methods are expected to be accurate as well as fast. However, the existing methods for data completion with the tubal-sampling pattern are either accurate or fast, but not both. In this article, we propose high-performance graphics processing unit (GPU) tensor completion for data completion with the tubal-sampling pattern. First, by exploiting the convolution theorem, we split a tensor least-squares minimization problem into multiple least-squares sub-problems in the frequency domain. In this way, massive parallelisms are exposed for many-core GPU architectures while still preserving high recovery accuracy. Second, we propose computing slice-level and tube-level tasks in batches to improve GPU utilization. Third, we reduce the data transfer cost by eliminating the accesses to the CPU memory inside algorithm loop structures. The experimental results show that the proposed tensor completion is both fast and accurate. Using synthetic data of varying sizes, the proposed GPU tensor completion achieves maximum 248.18 \times 248.18× , 7,403.27 \times 7,403.27× , and 33.27 \times 33.27× speedups over the CPU MATLAB implementation, GPU elem |
---|---|
ISSN: | 1045-9219 1558-2183 |
DOI: | 10.1109/TPDS.2020.2975196 |