SYNTHETIC AND TRADITIONAL DATA STEWARDS FOR SELECTING, OPTIMIZING, VERIFYING AND RECOMMENDING ONE OR MORE DATASETS

Systems and methods for the verification of cohort sample sets is provided. In some embodiments, a sample dataset is received, and used to generate a sample vector set. The sample vector is computed by encoding the dataset according to a set of classes, generating a matrix of the encoded dataset (wh...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	ROGERS, Robert, CHALK, Mary, CZESZYNSKI, Alan
Format:	Patent
Sprache:	eng ; fre
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Systems and methods for the verification of cohort sample sets is provided. In some embodiments, a sample dataset is received, and used to generate a sample vector set. The sample vector is computed by encoding the dataset according to a set of classes, generating a matrix of the encoded dataset (where the rows of the matrix correspond to patients and the columns to a class or subclass), and converting the matrix into a series of vector spaces. An example vector set is received and the difference between the sample vector set and the example vector set. Calculating the difference is by framing the distance as a p-value in a hypothesis test, compared against a threshold. When the p- value is above the threshold the sample dataset is rejected. Systems and methods for the confirmation of a selection of data in a zero-trust environment is also provided. In some embodiments, the dataset(s) are received at a data steward. This may be a traditional data steward or a synthetic data steward. Additionally, a script is received from the algorithm developer. The dataset(s) and script(s) reside within a secure computing node and are therefore inaccessible by any party. The script(s) are executed, resulting in at least one confirmation about the data within the dataset(s). The script(s) complete any of confirming a format for data in the at least one dataset, the expected class values for data within the at least one dataset, an overall characterization and completeness of the at least one dataset, and/or an expected class membership for different data attributes within the at least one dataset. L'invention concerne des systèmes et des procédés de vérification d'ensembles d'échantillons de cohorte. Dans certains modes de réalisation, un ensemble de données d'échantillon est reçu et utilisé pour générer un ensemble de vecteurs d'échantillon. Le vecteur d'échantillon est calculé par codage de l'ensemble de données selon un ensemble de classes, par génération d'une matrice de l'ensemble de données codé (les rangées de la matrice correspondant à des patients et les colonnes à une classe ou sous-classe), et par conversion de la matrice en une série d'espaces vectoriels. Un ensemble de vecteurs d'exemple est reçu ainsi que la différence entre l'ensemble de vecteurs d'échantillon et l'ensemble de vecteurs d'exemple. Le calcul de la différence consiste à encadrer la distance en tant que valeur p dans un test d'hypothèse, par rapport à un seuil. Lorsque la valeur p est supérieur