Cognitive data pseudonymization
Computer systems, methods and program products for automating pseudonymization of personal identifying information (PII) using machine learning, metadata, and crowdsourcing patterns to identify and replace PII. Machine learning models are trained for classifying known column names or key names for p...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Patent |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Computer systems, methods and program products for automating pseudonymization of personal identifying information (PII) using machine learning, metadata, and crowdsourcing patterns to identify and replace PII. Machine learning models are trained for classifying known column names or key names for processing, using metadata. Column or key names are classified to be unprocessed, anonymized or pseudonymized by a pseudonymizer without revealing PII or scrubbing data into a useless format. A library of crowdsourced patterns are utilized for matching PII to data values within column or key names and PII is mapped to replacement methods. Feedback from user annotations retrains the algorithms to improve classification accuracy and Deep Learning algorithms automate the identification of PII using regular expression generation to concisely articulate how pseudonymizers search for PII patterns within a data set. PII replacement is mapped consistently across entire data packages and the crowdsourced pattern library is updated with generated regular expressions. |
---|