STRUCTURAL ENCODING AND ATTENTION PARADIGMS FOR SEQUENCE MODELING

Systems and methods for providing a structure-aware sequence model that can interpret a document's text without first inferring the proper reading order of the document. In some examples, the model may use a graph convolutional network to generate contextualized "supertoken" embedding...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Hua, Nan, Wang, Renshen, Perot, Vincent, Ainslie, Joshua, Lee, Chen-Yu, Li, Chun-Liang, Su, Guolong, Dozat, Timothy, Pfister, Tomas, Fujii, Yasuhisa
Format: Patent
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Systems and methods for providing a structure-aware sequence model that can interpret a document's text without first inferring the proper reading order of the document. In some examples, the model may use a graph convolutional network to generate contextualized "supertoken" embeddings for each token, which are then fed to a transformer that employs a sparse attention paradigm in which attention weights for at least some supertokens are modified based on differences between predicted and actual values of the order and distance between the attender and attendee supertokens.