Open information extraction from low resource languages

A method is provided for extracting machine readable data structures from unstructured, low-resource language input text. The method includes obtaining a corpus of high-resource language data structures, filtering the corpus of high-resource language data structures to obtain a filtered corpus of hi...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Lawrence, Carolin, Gashteovski, Kiril, Kotnis, Bhushan
Format: Patent
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:A method is provided for extracting machine readable data structures from unstructured, low-resource language input text. The method includes obtaining a corpus of high-resource language data structures, filtering the corpus of high-resource language data structures to obtain a filtered corpus of high-resource language data structures, obtaining entity types for each entity of each filtered high-resource language data structure, performing type substitution for each obtained entity by replacing each entity with an entity of the same type to generate type substituted data structures, and replacing each entity with an equivalent a corresponding low-resource language data structure entity to generate code switched sentences. The method further includes generating an augmented data structure corpus, training a multi-head self-attention transformer model, and providing the unstructured low-resource language input text to the trained model to extract the machine readable data structures.