Causal Gene Identification Using Non-Linear Regression-Based Independence Tests

With the development of biomedical techniques in the past decades, causal gene identification has become one of the most promising applications in human genome-based business, which can help doctors to evaluate the risk of certain genetic diseases and provide further treatment recommendations for po...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE/ACM transactions on computational biology and bioinformatics 2023-01, Vol.20 (1), p.185-195
Hauptverfasser: Zhang, Hao, Yan, Chuanxu, Xia, Yewei, Guan, Jihong, Zhou, Shuigeng
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:With the development of biomedical techniques in the past decades, causal gene identification has become one of the most promising applications in human genome-based business, which can help doctors to evaluate the risk of certain genetic diseases and provide further treatment recommendations for potential patients. When no controlled experiments can be applied, machine learning techniques like causal inference-based methods are generally used to identify causal genes. Unfortunately, most of the existing methods detect disease-related genes by ranking-based strategies or feature selection techniques, which generally return a superset of the corresponding real causal genes. There are also some causal inference-based methods that can identify a part of real causal genes from those supersets, but they are just able to return a few causal genes. This is contrary to our knowledge, as many results from controlled experiments have demonstrated that a certain disease, especially cancer, is usually related to dozens or hundreds of genes. In this work, we present an effective approach for identifying causal genes from gene expression data by using a new search strategy based on non-linear regression-based independence tests, which is able to greatly reduce the search space, and simultaneously establish the causal relationships from the candidate genes to the disease variable. Extensive experiments on real-world cancer datasets show that our method is superior to the existing causal inference-based methods in three aspects: 1) our method can identify dozens of causal genes, and 1/3 \sim 1/2 1/3∼1/2 of the discovered causal genes can be verified by existing works that they are really directly related to the corresponding disease; 2) The discovered causal genes are able to distinguish the status or disease subtype of the target patient; 3) Most of the discovered causal genes are closely relevant to the disease variable.
ISSN:1545-5963
1557-9964
DOI:10.1109/TCBB.2022.3149864