Systems and methods for predicting correct or missing data and data anomalies
The present disclosure is directed to systems and methods for predicting and correcting data anomalies. In one example aspect, data is received by the system. The system may analyze the data by profiling the data for certain profiling statistics (e.g., min, max, mean, cardinality, etc.). At least on...
Gespeichert in:
Hauptverfasser: | , |
---|---|
Format: | Patent |
Sprache: | eng ; heb |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The present disclosure is directed to systems and methods for predicting and correcting data anomalies. In one example aspect, data is received by the system. The system may analyze the data by profiling the data for certain profiling statistics (e.g., min, max, mean, cardinality, etc.). At least one machine-learning algorithm (e.g., a Random-Forest algorithm) may be applied to the profiled data to identify potential relationships among certain data columns in the data. Once certain relationships are identified, the data that is related may be extracted to form an itemset. A second machine-learning algorithm (e.g., Frequent Pattern Growth algorithm) may be applied to the itemset to identify certain frequencies of related values in the itemset. Low frequency values may indicate anomalies in the dataset. If an anomaly is detected, the system may be configured to provide an intelligent remedial action, such as substituting certain values and/or filling in a missing value. |
---|