Biomarker identification by reversing the learning mechanism of an autoencoder and recursive feature elimination

RNA-Seq has made significant contributions to various fields, particularly in cancer research. Recent studies on differential gene expression analysis and the discovery of novel cancer biomarkers have extensively used RNA-Seq data. New biomarker identification is essential for moving cancer research...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Molecular omics 2022-08, Vol.18 (7), p.652-661
Hauptverfasser: Al Abir, Fuad, Shovan, S. M, Hasan, Md. Al Mehedi, Sayeed, Abu, Shin, Jungpil
Format: Artikel
Sprache:eng
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:RNA-Seq has made significant contributions to various fields, particularly in cancer research. Recent studies on differential gene expression analysis and the discovery of novel cancer biomarkers have extensively used RNA-Seq data. New biomarker identification is essential for moving cancer research forward, and early cancer diagnosis improves patients' chances of recovery and increases life expectancy. There is an urgency and scope of improvement in both sections. In this paper, we developed an autoencoder-based biomarker identification method by reversing the learning mechanism of the trained encoders. We devised an explainable post hoc methodology for identifying influential genes with a high likelihood of becoming biomarkers. We applied recursive feature elimination to shorten the list further and presented a list of 17 potential biomarkers that are 99.93% accurate in identifying cancer types using support vector machine for the UCI gene expression cancer RNA-Seq dataset consisting of five cancerous tumor types. Our methodology outperforms all of the state-of-the-art methods, confirming the potential of the newly identified biomarkers as well as the efficacy of the biomarker identification procedure. Moreover, we have evaluated the performance of our methodology using six independent RNA-Seq gene expression datasets for several tasks, i.e. , classification of tumors from non-tumors, detecting the origin of circulating tumor cells (CTCs), and predicting if metastasis occurs or not. Our methodology achieved stimulating results for these tasks as well. The source code of this project is available at https://github.com/fuad021/biomarker-identification . A rigorous filtration methodology to select the best cancer biomarker candidates using autoencoder by reversing the learning mechanism assisted by RFE and SVM.
ISSN:2515-4184
2515-4184
DOI:10.1039/d1mo00467k