Tracking missing data using provenance traces and data simulation
Methods, systems, and computer program products for tracking missing data using provenance traces and data simulation are provided herein. A computer-implemented method includes generating, for each of multiple stages in a data curation sequence, a machine learning model of the data curation sequenc...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Patent |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Methods, systems, and computer program products for tracking missing data using provenance traces and data simulation are provided herein. A computer-implemented method includes generating, for each of multiple stages in a data curation sequence, a machine learning model of the data curation sequence, wherein the model is based on historical input records within the data curation sequence, historical output records within the data curation sequence, and provenance data within the data curation sequence; creating a simulated output record based on a detected anomaly corresponding to the data curation sequence; predicting the content of absent input records that precede the simulated output record in the data curation sequence and provenance data corresponding to the simulated output record; and outputting, to a user, in response to a query pertaining to the detected anomaly, the predicted input records and information relating the predicted input records to the detected anomaly. |
---|