Fast and memory-efficient algorithms for high-order Tucker decomposition
Multi-aspect data appear frequently in web-related applications. For example, product reviews are quadruplets of the form (user, product, keyword, timestamp), and search-engine logs are quadruplets of the form (user, keyword, location, timestamp). How can we analyze such web-scale multi-aspect data...
Gespeichert in:
Veröffentlicht in: | Knowledge and information systems 2020-07, Vol.62 (7), p.2765-2794 |
---|---|
Hauptverfasser: | , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Multi-aspect data appear frequently in web-related applications. For example, product reviews are quadruplets of the form (user, product, keyword, timestamp), and search-engine logs are quadruplets of the form (user, keyword, location, timestamp). How can we analyze such web-scale multi-aspect data on an off-the-shelf workstation with a limited amount of memory? Tucker decomposition has been used widely for discovering patterns in such multi-aspect data, which are naturally expressed as large but sparse tensors. However, existing Tucker decomposition algorithms have limited scalability, failing to decompose large-scale high-order (
≥
4) tensors, since they
explicitly materialize
intermediate data, whose size grows exponentially with the order. To address this problem, which we call “Materialization Bottleneck,” we propose
S-HOT
, a scalable algorithm for high-order Tucker decomposition.
S-HOT
minimizes materialized intermediate data by using an
on-the-fly computation
, and it is optimized for disk-resident tensors that are too large to fit in memory. We theoretically analyze the amount of memory and the number of data scans required by
S-HOT
. Moreover, we empirically show that
S-HOT
handles tensors with higher order, dimensionality, and rank than baselines. For example,
S-HOT
successfully decomposes a real-world tensor from the Microsoft Academic Graph on an off-the-shelf workstation, while all baselines fail. Especially, in terms of dimensionality,
S-HOT
decomposes
1000
×
larger
tensors than baselines. |
---|---|
ISSN: | 0219-1377 0219-3116 |
DOI: | 10.1007/s10115-019-01435-1 |