SARA: A memetic algorithm for high-dimensional biomedical data

Over the past two decades, large amounts of biomedical and clinical data have been generated. These high dimensional datasets contain thousands of genes. However, such datasets contain many irrelevant genes which influence the predictive accuracy of diagnosis. Therefore, to select the relevant genes...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Applied soft computing 2021-03, Vol.101, p.107009, Article 107009
Hauptverfasser: Baliarsingh, Santos Kumar, Muhammad, Khan, Bakshi, Sambit
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Over the past two decades, large amounts of biomedical and clinical data have been generated. These high dimensional datasets contain thousands of genes. However, such datasets contain many irrelevant genes which influence the predictive accuracy of diagnosis. Therefore, to select the relevant genes from the dataset and to accurately identify the patterns in the genes, it is necessary to employ some gene selection and classification algorithms. In this work, a hybrid algorithm is proposed using simulated annealing (SA) and Rao algorithm (RA) for selecting the optimal gene subset and classifying cancer. SA works as a local search strategy and RA works as a global optimization framework. The reason for combining SA in RA is to improve the exploitation capability of RA. The proposed method consists of two stages. In the first stage, minimum redundancy maximum relevance (mRMR) is employed to select the relevant gene subsets from the microarray dataset. Then, SA is hybridized with RA to improve the quality of solutions after every iteration of RA. Log sigmoidal function is introduced as an encoding scheme to transform the continuous version of Simulated annealing-Rao algorithm (SARA) to a discrete optimization algorithm. The performance of our approach is tested on three binary-class and four multi-class datasets. A comparative study is carried out with eighteen existing techniques. Results from the experiments have shown that our proposed approach selects discriminating genes with high classification accuracy. Particularly, it achieves high classification accuracy on the SRBCT dataset with 99.81% with only five informative genes.
ISSN:1568-4946
1872-9681
DOI:10.1016/j.asoc.2020.107009