Classification Evaluation and Improvement in Machine Learning Models

Persistent storage contains a training dataset and a test dataset, each with units of text labelled from a plurality of categories. A machine learning model has been trained with the training dataset to classify input units of text into the plurality of categories. One or more processors are configu...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Le, Uyen Diana Vu, Marinier, Joseph Béchard, Tyler, Christopher John, Marquez Ayala, Orlando, Branchaud-Charron, Frédéric, Atighehchian, Parmida, Gauthier-Melançon, Gabrielle, Brin, Lindsay Devon
Format: Patent
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Persistent storage contains a training dataset and a test dataset, each with units of text labelled from a plurality of categories. A machine learning model has been trained with the training dataset to classify input units of text into the plurality of categories. One or more processors are configured to: read the training dataset or the test dataset; determine distributional properties of the training dataset or the test dataset; determine, using the machine learning model, saliency maps for tokens in the training dataset or the test dataset; perturb, by way of token insertion, token deletion, or token replacement, the training dataset or the test dataset into an expanded dataset; obtain, using the machine learning model, classifications into the plurality of categories for the expanded dataset; and based on the distributional properties, the saliency maps, and the classifications, identify causes of failure for the machine learning model.