COMPUTER-IMPLEMENTED METHOD, AN APPARATUS AND A COMPUTER PROGRAM PRODUCT FOR PROCESSING A DATA SET

According to an aspect, there is provided a computer-implemented method for processing a data set, the data set comprising respective data subsets for a plurality of subjects, each data subset comprising a plurality of data entries, each entry comprising respective parameter values for each of a plu...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Susaiyah, Allmin Pradhap Singh, Patil, Meru Adagouda
Format: Patent
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:According to an aspect, there is provided a computer-implemented method for processing a data set, the data set comprising respective data subsets for a plurality of subjects, each data subset comprising a plurality of data entries, each entry comprising respective parameter values for each of a plurality of parameters at a respective time point, wherein for a first data subset relating to a first subject in the plurality of subjects, one or more parameter values for at least a first parameter in the plurality of parameters is missing from the first data subset, the method comprising, for a first missing parameter value in a first data entry in the first data subset (a) determining completeness scores for the first parameter, wherein each completeness score indicates a level of completeness of the data entries in the first data subset for the first parameter and a respective one of the other parameters in the plurality of parameters; (b) determining correlation scores for the first parameter, wherein each correlation score indicates a level of correlation between the parameter values in the data set for the first parameter and the parameter values in the data set for a respective one of the other parameters in the plurality of parameters; (c) determining a subset of the plurality of parameters to use to form regression trees based on the determined completeness scores and the determined correlation scores; (d) forming a plurality of regression trees, wherein each regression tree relates to a respective parameter combination of the first parameter and one or more of the other parameters in the determined subset, and each regression tree is trained to predict a parameter value for the first parameter based on input parameter values for the one or more other parameters in the parameter combination, wherein each regression tree is trained using training data comprising parameter values for the parameters in the respective parameter combination, wherein the training data includes the parameter values in any data entry in the first data subset for which a parameter value is present for all of the parameters in the respective parameter combination; (e) using each regression tree to predict a parameter value for the first parameter based on parameter values in the first data entry for the one or more other parameters in the parameter combination; and (0 combining the predicted parameter values to estimate the first missing parameter value. A corresponding apparatu