Identifying cancer sub-types from genomic scale data sets using confidence based integration (CBI)

[Display omitted] •Disease subtyping involves extracting finer differences between samples.•The extraction of differences is a consensual process.•At every transit phase, decisions are taken based on the confidence of each participating feature.•This research accommodates outliers in data sets by a...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of biomedical informatics 2022-02, Vol.126, p.103997-103997, Article 103997
Hauptverfasser: Sreekumar, R., Khursheed, Farida
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:[Display omitted] •Disease subtyping involves extracting finer differences between samples.•The extraction of differences is a consensual process.•At every transit phase, decisions are taken based on the confidence of each participating feature.•This research accommodates outliers in data sets by a smooth transition process.•A measure of self confidence is devised based on an assumption that “close neighbors have common neighbors”. Precision medicine is a method involving refined diagnosis of patients and searching for causes that are unseen in their patient cohorts who otherwise have largely similar health conditions. As the technology evolved to extract features from a wide variety of sources including genetics, a large quantum of data is available to the researchers for conducting micro studies in the field of disease and cures. In cancer research, integrative methods using genomic data sets has become a major area of interest. The petabytes of data that is available at The Cancer Genome Atlas (TCGA), a program jointly under NCI and National Human Genome Research Institute, has made possible more nuanced research in cancer genomics. Our method, Confidence Based Integration (CBI) is an integration method to extract similar as well as complementing information from the genomic data sets. This information will provide insight into the status of patients and their prospects. We used the expression data sets of gene, miRNA and DNA methylation in our fusion experiments on five different cancer types. These data sets, after fusion, are clustered using 'Spectral Clustering' algorithm, which derives clusters that form the disease sub types. Survival properties of each sub type demonstrates the reasons to consider the samples inside them highly similar. The performance of CBI, we report, is better, in terms of P-value in log-rank test, than other methods like similarity network fusion or SNF in forming clusters of significance. Individual features clustered extremely poor compared to CBI in most of the experiments.
ISSN:1532-0464
1532-0480
DOI:10.1016/j.jbi.2022.103997