Systems and Methods for the Automatic Classification of Documents

Systems and computer implemented methods for classifying documents are provided that include: pretraining and then fine tuning a machine learning model with a domain specific dataset that includes a plurality of documents each annotated with at one label selected from a plurality of predefined label...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	SONG, Dezhao, MADAN, Kanika, SCHILDER, Frank, VOLD, Andrew
Format:	Patent
Sprache:	eng
Schlagworte:	CALCULATING COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS COMPUTING COUNTING ELECTRIC DIGITAL DATA PROCESSING PHYSICS
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Systems and computer implemented methods for classifying documents are provided that include: pretraining and then fine tuning a machine learning model with a domain specific dataset that includes a plurality of documents each annotated with at one label selected from a plurality of predefined labels for a given domain; and predicting using the trained/fine tuned machine learning model, at least one label from the plurality of labels for at least one other document. The machine learning model is preferably fine tuned using a label attention multi-task learning process that includes: a first task for training the machine learning model with respect to all labels used for the plurality of documents in the dataset, and a second task for training the machine learning model with respect to a subset of all of the labels used for the plurality of documents in the dataset.