STRUCTURING DATA FOR PRIVACY RISKS ASSESSMENTS

A computer-implemented method for assessing a person re-identification risk in an application domain is provided. In the application domain, for each of a plurality of persons a corresponding personal record is stored in a database. Each record comprises a set of attributes. Each attribute comprises...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Suppan, Santiago Reinhard, Jain, Shivani, Cuellar Jaramillo, Jorge Ricardo, Rosenbaum, Ute
Format:	Patent
Sprache:	eng ; fre ; ger
Schlagworte:	CALCULATING COMPUTING COUNTING ELECTRIC DIGITAL DATA PROCESSING PHYSICS
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	A computer-implemented method for assessing a person re-identification risk in an application domain is provided. In the application domain, for each of a plurality of persons a corresponding personal record is stored in a database. Each record comprises a set of attributes. Each attribute comprises a corresponding attribute name and a corresponding attribute value. The method comprises: providing (202) a plurality of text documents relating to the application domain; automatically determining (204), based on the plurality of text documents, a plurality of text snippets; automatically assigning (206) to each of the plurality of text snippets a label of a plurality of labels, the label being a word representing the text snippet; providing (208) a plurality of main objects in the application domain, automatically clustering (210) the plurality of text snippets and the plurality of labels based on the plurality of main objects for obtaining a plurality of clusters; automatically clustering (212), within each cluster, the plurality of text snippets with related information in sub-clusters and assigning the labels of the clustered text snippets to the corresponding sub-cluster; for each attribute, automatically assigning (214) the attribute to one of the plurality of sub-clusters based on a similarity between (i) at least one of the attribute name and an attribute description of the attribute, and (ii) the plurality of text snippets and the plurality of labels of the sub-clusters; and assessing (216), for each sub-cluster of the plurality of sub-clusters, a corresponding re-identification risk based on value types assigned to the attributes and the attribute descriptions of the attributes assigned to the sub-cluster.