DeepWalk with Reinforcement Learning (DWRL) for node embedding

DeepWalk is used to convert nodes in an original graph into equivalent vectors in a latent space for performing various predictive tasks. To ensure second-order structural similarity between nodes in the original graph and their vectors in the latent space, dot products are applied to each pair of n...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Expert systems with applications 2024-06, Vol.243, p.122819, Article 122819
Hauptverfasser: Jeyaraj, Rathinaraja, Balasubramaniam, Thirunavukarasu, Balasubramaniam, Anandkumar, Paul, Anand
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:DeepWalk is used to convert nodes in an original graph into equivalent vectors in a latent space for performing various predictive tasks. To ensure second-order structural similarity between nodes in the original graph and their vectors in the latent space, dot products are applied to each pair of nodes explored on the random walk (RW) in the latent space. However. dot products for graphs with millions of nodes and billions of edges are computationally expensive. To minimize the computation time required for calculating the second-order structural similarity, DeepWalk with reinforcement learning (DWRL) is proposed herein. In DWRL, a level pointer for each node in the original graph is prepared. By identifying common nodes between each pair of nodes in the original graph, the number of computations in the dot product in the latent space is reduced, thereby ensuring second-order structural similarity. Additionally, repeated selection of the same node during RWs produces redundant samples for training. Therefore, the subsampling technique is used to choose the next node based on its degree, which improves the generalization of node representations in the latent space and increases accuracy. The proposed techniques are applied to popular datasets to perform multilabel classification and link prediction tasks, and their efficiency in reducing the computation time is verified. The proposed DWRL minimizes the computation time 47% for large graphs to build latent vectors and improves the average micro and macro F1 scores up to 12%. The link prediction performance also increases up to 20%. •DWRL is proposed to minimize the computation time during similarity calculation.•Subsampling is used to frequently select nodes with less number of edges.•For language modeling, the DWRL can be used to find similar words in latent space.•It can be used to find the distance between two nodes in a large graph.•Node classification and link prediction can be achieved with this method.
ISSN:0957-4174
1873-6793
DOI:10.1016/j.eswa.2023.122819