Large Scale Network Embedding: A Separable Approach

Many successful methods have been proposed for learning low-dimensional representations on large-scale networks, while almost all existing methods are designed in inseparable processes, learning embeddings for entire networks even when only a small proportion of nodes are of interest. This leads to...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on knowledge and data engineering 2022-04, Vol.34 (4), p.1829-1842
Hauptverfasser: Song, Guojie, Zhang, Liang, Li, Ziyao, Li, Yi
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Many successful methods have been proposed for learning low-dimensional representations on large-scale networks, while almost all existing methods are designed in inseparable processes, learning embeddings for entire networks even when only a small proportion of nodes are of interest. This leads to great inconvenience, especially on large-scale or dynamic networks, where these methods become almost impossible to implement. In this paper, we formalize the problem of separated matrix factorization, based on which we elaborate a novel objective function that preserves both local and global information. We compare our SMF framework with approximate SVD algorithms and demonstrate SMF can capture more information when factorizing a given matrix. We further propose SepNE, a simple and flexible network embedding algorithm which independently learns representations for different subsets of nodes in separated processes. By implementing separability, our algorithm reduces the redundant efforts to embed irrelevant nodes, yielding scalability to large networks. To further incorporate complex information into SepNE, we discuss several methods that can be used to leverage high-order proximities in large networks. We demonstrate the effectiveness of SepNE on several real-world networks with different scales and subjects. With comparable accuracy, our approach significantly outperforms state-of-the-art baselines in running times on large networks.
ISSN:1041-4347
1558-2191
DOI:10.1109/TKDE.2020.3002700