Topical: Learning Repository Embeddings from Source Code using Attention
This paper presents Topical, a novel deep neural network for repository level embeddings. Existing methods, reliant on natural language documentation or naive aggregation techniques, are outperformed by Topical's utilization of an attention mechanism. This mechanism generates repository-level r...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | This paper presents Topical, a novel deep neural network for repository level
embeddings. Existing methods, reliant on natural language documentation or
naive aggregation techniques, are outperformed by Topical's utilization of an
attention mechanism. This mechanism generates repository-level representations
from source code, full dependency graphs, and script level textual data.
Trained on publicly accessible GitHub repositories, Topical surpasses multiple
baselines in tasks such as repository auto-tagging, highlighting the attention
mechanism's efficacy over traditional aggregation methods. Topical also
demonstrates scalability and efficiency, making it a valuable contribution to
repository-level representation computation. For further research, the
accompanying tools, code, and training dataset are provided at:
https://github.com/jpmorganchase/topical. |
---|---|
DOI: | 10.48550/arxiv.2208.09495 |