Fast and memory-efficient algorithms for high-order Tucker decomposition

Multi-aspect data appear frequently in web-related applications. For example, product reviews are quadruplets of the form (user, product, keyword, timestamp), and search-engine logs are quadruplets of the form (user, keyword, location, timestamp). How can we analyze such web-scale multi-aspect data...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Knowledge and information systems 2020-07, Vol.62 (7), p.2765-2794
Hauptverfasser:	Zhang, Jiyuan, Oh, Jinoh, Shin, Kijung, Papalexakis, Evangelos E., Faloutsos, Christos, Yu, Hwanjo
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Computer Science Computer Science, Artificial Intelligence Computer Science, Information Systems Data Mining and Knowledge Discovery Database Management Decomposition Information Storage and Retrieval Information Systems and Communication Service Information Systems Applications (incl.Internet) IT in Business Mathematical analysis Product reviews Regular Paper Science & Technology Search engines Technology Tensors Work stations Workstations
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Multi-aspect data appear frequently in web-related applications. For example, product reviews are quadruplets of the form (user, product, keyword, timestamp), and search-engine logs are quadruplets of the form (user, keyword, location, timestamp). How can we analyze such web-scale multi-aspect data on an off-the-shelf workstation with a limited amount of memory? Tucker decomposition has been used widely for discovering patterns in such multi-aspect data, which are naturally expressed as large but sparse tensors. However, existing Tucker decomposition algorithms have limited scalability, failing to decompose large-scale high-order ( ≥ 4) tensors, since they explicitly materialize intermediate data, whose size grows exponentially with the order. To address this problem, which we call “Materialization Bottleneck,” we propose S-HOT , a scalable algorithm for high-order Tucker decomposition. S-HOT minimizes materialized intermediate data by using an on-the-fly computation , and it is optimized for disk-resident tensors that are too large to fit in memory. We theoretically analyze the amount of memory and the number of data scans required by S-HOT . Moreover, we empirically show that S-HOT handles tensors with higher order, dimensionality, and rank than baselines. For example, S-HOT successfully decomposes a real-world tensor from the Microsoft Academic Graph on an off-the-shelf workstation, while all baselines fail. Especially, in terms of dimensionality, S-HOT decomposes 1000 × larger tensors than baselines.
ISSN:	0219-1377 0219-3116
DOI:	10.1007/s10115-019-01435-1