A NORMALIZATION SCHEME FOR SELF-ATTENTION NEURAL NETWORKS

Described is a data processing device for performing an attention-based operation on a graph neural network. The device is configured to receive one or more input graphs each having a plurality of nodes and to, for at least one of the input graphs: form an input node representation for each node in...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	SCAMAN, Kevin, DASOULAS, George, VIRMAUX, Aladin
Format:	Patent
Sprache:	eng ; fre ; ger
Schlagworte:	CALCULATING COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS COMPUTING COUNTING PHYSICS
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Described is a data processing device for performing an attention-based operation on a graph neural network. The device is configured to receive one or more input graphs each having a plurality of nodes and to, for at least one of the input graphs: form an input node representation for each node in the respective input graph, wherein a respective norm is defined for each input node representation; form a set of attention parameters; multiply each of the input node representations with each of the set of attention parameters to form a score function of the respective input graph; normalize the score function based on a maximum of the norms of the input node representations to form a normalised score function; and form a weighted node representation by weighting each node in the respective input graph using a respective element of the normalised score function. The normalization of the score function enables deep attention-based neural networks to perform better by enforcing Lipschitz continuity.