A Microarray Data Pre-processing Method for Cancer Classification

The development of microarray technology has led to significant improvements and research in various fields. With the help of machine learning techniques and statistical methods, it is now possible to organize, analyze, and interpret large amounts of biological data to uncover significant patterns o...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	JOIV : international journal on informatics visualization Online 2022-12, Vol.6 (4), p.784-790
Hauptverfasser:	Hui, Tay Xin, Kasim, Shahreen, Md Fudzee, Mohd Farhan, Abdullah, Zubaile, Hassan, Rohayanti, Erianda, Aldo
Format:	Artikel
Sprache:	eng
Schlagworte:	data pre-processing gene expression data genepattern microarray data
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	The development of microarray technology has led to significant improvements and research in various fields. With the help of machine learning techniques and statistical methods, it is now possible to organize, analyze, and interpret large amounts of biological data to uncover significant patterns of interest. The exploitation of microarray data is of great challenge for many researchers. Raw gene expression data are usually vulnerable to missing values, noisy data, incomplete data, and inconsistent data. Hence, processing data before being applied for cancer classification is important. In order to extract the biological significance of microarray gene expression data, data pre-processing is a necessary step to obtain valuable information for further analysis and address important hypotheses. This study presents a detailed description of pre-processing data method for cancer classification. The proposed method consists of three phases: data cleaning, transformation, and filtering. The combination of GenePattern software tool and Rstudio was utilized to implement the proposed data pre-processing method. The proposed method was applied to six gene expression datasets: lung cancer dataset, stomach cancer dataset, liver cancer dataset, kidney cancer dataset, thyroid cancer dataset, and breast cancer dataset to demonstrate the feasibility of the proposed method for cancer classification. A comparison has been made to illustrate the differences between the dataset before and after data pre-processing.
ISSN:	2549-9610 2549-9904
DOI:	10.30630/joiv.6.4.1523