gGN: Representing the Gene Ontology as low-rank Gaussian distributions
Computational representations of knowledge graphs are critical for several tasks in bioinformatics, including large-scale graph analysis and gene function characterization. In this study, we introduce gGN, an unsupervised neural network for learning node representations as Gaussian distributions. Un...
Gespeichert in:
Veröffentlicht in: | Computers in biology and medicine 2024-12, Vol.183, p.109234, Article 109234 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Computational representations of knowledge graphs are critical for several tasks in bioinformatics, including large-scale graph analysis and gene function characterization. In this study, we introduce gGN, an unsupervised neural network for learning node representations as Gaussian distributions. Unlike prior efforts, where the covariance matrices of these distributions are simplified to diagonal, we propose representing them with a low-rank approximation. This representation not only maintains manageable learning complexity, allowing for scaling to large graphs, but is also more effective for modeling the structural features of knowledge graphs, such as their hierarchical and directional relationships between nodes. To learn the low-rank Gaussian distributions, we introduce a semantic-based loss function that effectively preserves these structural features. Systematic experiments reveal that gGN preserves structural features more effectively than existing approaches and scales efficiently on large knowledge graphs. Furthermore, applying gGN to represent the Gene Ontology, a widely used knowledge graph in bioinformatics, outperformed multiple baseline methods in ubiquitous gene characterization tasks. Altogether, the proposed low-rank Gaussian distributions not only effectively represent knowledge graphs but also open new avenues for enhancing bioinformatics tasks. gGN is publicly available as an easily installable package at https://github.com/aedera/ggn.
[Display omitted]
•A knowledge graph can be modeled using a set of Gaussian distributions efficiently.•Low-rank Gaussian distributions offer an accurate approach to graph representation.•This approach outperforms those using diagonal Gaussian distributions in accuracy.•A semantic-based loss function is proposed to learn low-rank representations.•The resulting representations are beneficial for semantic similarity analysis. |
---|---|
ISSN: | 0010-4825 1879-0534 1879-0534 |
DOI: | 10.1016/j.compbiomed.2024.109234 |