Schema-informed extraction for unstructured data

A method of extracting data from documents is provided. The method comprises receiving input of a number of documents and input of a schema of data items available for extraction from the documents. The documents are parsed into a machine-readable representation, and data items in the machine-readab...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Vendryes, Delphine, Relyea, David, Theisen, Matthew, Curme, Chester, Tong, Baojia
Format: Patent
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:A method of extracting data from documents is provided. The method comprises receiving input of a number of documents and input of a schema of data items available for extraction from the documents. The documents are parsed into a machine-readable representation, and data items in the machine-readable representation are identified according to the schema. Interpretations of data items are propagated within the documents to disambiguate identified data items, and identified data items are matched with other data items in the documents according to the schema. Only identified data items that include a minimal set of interpretations specified by the schema are extracted.