Systems and Methods for the Automatic Classification of Documents

Systems and computer implemented methods for classifying documents are provided that include: pretraining and then fine tuning a machine learning model with a domain specific dataset that includes a plurality of documents each annotated with at one label selected from a plurality of predefined label...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: SONG, Dezhao, MADAN, Kanika, SCHILDER, Frank, VOLD, Andrew
Format: Patent
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Systems and computer implemented methods for classifying documents are provided that include: pretraining and then fine tuning a machine learning model with a domain specific dataset that includes a plurality of documents each annotated with at one label selected from a plurality of predefined labels for a given domain; and predicting using the trained/fine tuned machine learning model, at least one label from the plurality of labels for at least one other document. The machine learning model is preferably fine tuned using a label attention multi-task learning process that includes: a first task for training the machine learning model with respect to all labels used for the plurality of documents in the dataset, and a second task for training the machine learning model with respect to a subset of all of the labels used for the plurality of documents in the dataset.