Comparison of supervised models in hepatocellular carcinoma tumor classification based on expression data using principal component analysis (PCA)

Hepatocellular Carcinoma is one of the cancer disease cases with a high dead population. To know that someone is affected by Hepatocellular Carcinoma Tumor by observing the expression of genes on DNA. Gene expression obtained from the microarray laboratory tool that produced genes probe. In this cas...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Siregar, Anggrainy Togi Marito, Siswantining, Titin, Bustamam, Alhadi, Sarwinda, Devvi
Format:	Tagungsbericht
Sprache:	eng
Schlagworte:	Cancer Classifiers Correlation analysis Deoxyribonucleic acid DNA Gene expression Genes Liver cancer Machine learning Multivariate analysis Principal components analysis Reduction Regression analysis Regression models Support vector machines Tumors Variance
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue	1
container_start_page
container_title
container_volume	2264
creator	Siregar, Anggrainy Togi Marito Siswantining, Titin Bustamam, Alhadi Sarwinda, Devvi
description	Hepatocellular Carcinoma is one of the cancer disease cases with a high dead population. To know that someone is affected by Hepatocellular Carcinoma Tumor by observing the expression of genes on DNA. Gene expression obtained from the microarray laboratory tool that produced genes probe. In this case, there are 54675 gene expressions with 40 samples (homo sapiens). Many expression genes will be difficult to classify someone affected or not affected by Hepatocellular Carcinoma Tumor. We must take action to minimize the features without losing the data information. One of the tools to reduction dimension in Machine learning is Principal Component Analysis (PCA). Principal Component Analysis is a multivariate analysis that transforms correlated origin features into new features that do not correlate with each other by reducing the number of these features so that they have smaller dimensions but can explain most of the diversity of the original features. The objective of this research is to find the best percentage of features that have generated from PCA then fitting some models. The models that we use are Logistic Regression Classifier, Support Vector Machine (SVM) Classifier, and Random Forest Classifier. A Logistic regression model is able to provide the best accuracy starting from 40% of its variance on PCA made, which is equal to 0.875. While the Random Forest Classifier and Support Vector Machine can provide an accuracy of 0.875 when the value of the variance is above 60% of the variance. The result can give information to select the best percent in using PCA as a reduction dimension especially, for gene expression on Microarray data.
doi_str_mv	10.1063/5.0023931
format	Conference Proceeding
fullrecord	<record><control><sourceid>proquest_scita</sourceid><recordid>TN_cdi_proquest_journals_2444999402</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2444999402</sourcerecordid><originalsourceid>FETCH-LOGICAL-p288t-a28db870bd5c00b2dcbe00f469b14e726d4c138fd2fc0ae778223c00a901d7da3</originalsourceid><addsrcrecordid>eNp9kMtKAzEUhoMoWKsL3yDgRoWpSSZzW5biDQq6UHA3nEkymjIziUmm2NfwiU1twZ2rcw7nO5f_R-ickhkleXqTzQhhaZXSAzShWUaTIqf5IZoQUvGE8fTtGJ14v4pQVRTlBH0vTG_BaW8GbFrsR6vcWnslcW-k6jzWA_5QFoIRquvGDhwW4IQeTA84jL2JdQfe61YLCDpuaWA7HRP1ZZ2KnZhKCIBHr4d3bJ0ehLbQYREvm0ENAcMA3cZrjy-fF_OrU3TUQufV2T5O0evd7cviIVk-3T8u5svEsrIMCbBSNmVBGpkJQhomRaMIaXleNZSrguWSC5qWrWStIKCiWsbSSEJFqCwkpFN0sdtrnfkclQ_1yowuvuJrxjmvqopHK6foekd5ocOvwjpK6MFtakrqred1Vu89_w9eG_cH1la26Q--04YI</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype><pqid>2444999402</pqid></control><display><type>conference_proceeding</type><title>Comparison of supervised models in hepatocellular carcinoma tumor classification based on expression data using principal component analysis (PCA)</title><source>AIP Journals Complete</source><creator>Siregar, Anggrainy Togi Marito ; Siswantining, Titin ; Bustamam, Alhadi ; Sarwinda, Devvi</creator><contributor>Akimenko, Vitalii ; Apri, Mochamad</contributor><creatorcontrib>Siregar, Anggrainy Togi Marito ; Siswantining, Titin ; Bustamam, Alhadi ; Sarwinda, Devvi ; Akimenko, Vitalii ; Apri, Mochamad</creatorcontrib><description>Hepatocellular Carcinoma is one of the cancer disease cases with a high dead population. To know that someone is affected by Hepatocellular Carcinoma Tumor by observing the expression of genes on DNA. Gene expression obtained from the microarray laboratory tool that produced genes probe. In this case, there are 54675 gene expressions with 40 samples (homo sapiens). Many expression genes will be difficult to classify someone affected or not affected by Hepatocellular Carcinoma Tumor. We must take action to minimize the features without losing the data information. One of the tools to reduction dimension in Machine learning is Principal Component Analysis (PCA). Principal Component Analysis is a multivariate analysis that transforms correlated origin features into new features that do not correlate with each other by reducing the number of these features so that they have smaller dimensions but can explain most of the diversity of the original features. The objective of this research is to find the best percentage of features that have generated from PCA then fitting some models. The models that we use are Logistic Regression Classifier, Support Vector Machine (SVM) Classifier, and Random Forest Classifier. A Logistic regression model is able to provide the best accuracy starting from 40% of its variance on PCA made, which is equal to 0.875. While the Random Forest Classifier and Support Vector Machine can provide an accuracy of 0.875 when the value of the variance is above 60% of the variance. The result can give information to select the best percent in using PCA as a reduction dimension especially, for gene expression on Microarray data.</description><identifier>ISSN: 0094-243X</identifier><identifier>EISSN: 1551-7616</identifier><identifier>DOI: 10.1063/5.0023931</identifier><identifier>CODEN: APCPCS</identifier><language>eng</language><publisher>Melville: American Institute of Physics</publisher><subject>Cancer ; Classifiers ; Correlation analysis ; Deoxyribonucleic acid ; DNA ; Gene expression ; Genes ; Liver cancer ; Machine learning ; Multivariate analysis ; Principal components analysis ; Reduction ; Regression analysis ; Regression models ; Support vector machines ; Tumors ; Variance</subject><ispartof>AIP conference proceedings, 2020, Vol.2264 (1)</ispartof><rights>Author(s)</rights><rights>2020 Author(s). Published by AIP Publishing.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://pubs.aip.org/acp/article-lookup/doi/10.1063/5.0023931$$EHTML$$P50$$Gscitation$$H</linktohtml><link.rule.ids>309,310,314,776,780,785,786,790,4498,23909,23910,25118,27901,27902,76126</link.rule.ids></links><search><contributor>Akimenko, Vitalii</contributor><contributor>Apri, Mochamad</contributor><creatorcontrib>Siregar, Anggrainy Togi Marito</creatorcontrib><creatorcontrib>Siswantining, Titin</creatorcontrib><creatorcontrib>Bustamam, Alhadi</creatorcontrib><creatorcontrib>Sarwinda, Devvi</creatorcontrib><title>Comparison of supervised models in hepatocellular carcinoma tumor classification based on expression data using principal component analysis (PCA)</title><title>AIP conference proceedings</title><description>Hepatocellular Carcinoma is one of the cancer disease cases with a high dead population. To know that someone is affected by Hepatocellular Carcinoma Tumor by observing the expression of genes on DNA. Gene expression obtained from the microarray laboratory tool that produced genes probe. In this case, there are 54675 gene expressions with 40 samples (homo sapiens). Many expression genes will be difficult to classify someone affected or not affected by Hepatocellular Carcinoma Tumor. We must take action to minimize the features without losing the data information. One of the tools to reduction dimension in Machine learning is Principal Component Analysis (PCA). Principal Component Analysis is a multivariate analysis that transforms correlated origin features into new features that do not correlate with each other by reducing the number of these features so that they have smaller dimensions but can explain most of the diversity of the original features. The objective of this research is to find the best percentage of features that have generated from PCA then fitting some models. The models that we use are Logistic Regression Classifier, Support Vector Machine (SVM) Classifier, and Random Forest Classifier. A Logistic regression model is able to provide the best accuracy starting from 40% of its variance on PCA made, which is equal to 0.875. While the Random Forest Classifier and Support Vector Machine can provide an accuracy of 0.875 when the value of the variance is above 60% of the variance. The result can give information to select the best percent in using PCA as a reduction dimension especially, for gene expression on Microarray data.</description><subject>Cancer</subject><subject>Classifiers</subject><subject>Correlation analysis</subject><subject>Deoxyribonucleic acid</subject><subject>DNA</subject><subject>Gene expression</subject><subject>Genes</subject><subject>Liver cancer</subject><subject>Machine learning</subject><subject>Multivariate analysis</subject><subject>Principal components analysis</subject><subject>Reduction</subject><subject>Regression analysis</subject><subject>Regression models</subject><subject>Support vector machines</subject><subject>Tumors</subject><subject>Variance</subject><issn>0094-243X</issn><issn>1551-7616</issn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2020</creationdate><recordtype>conference_proceeding</recordtype><recordid>eNp9kMtKAzEUhoMoWKsL3yDgRoWpSSZzW5biDQq6UHA3nEkymjIziUmm2NfwiU1twZ2rcw7nO5f_R-ickhkleXqTzQhhaZXSAzShWUaTIqf5IZoQUvGE8fTtGJ14v4pQVRTlBH0vTG_BaW8GbFrsR6vcWnslcW-k6jzWA_5QFoIRquvGDhwW4IQeTA84jL2JdQfe61YLCDpuaWA7HRP1ZZ2KnZhKCIBHr4d3bJ0ehLbQYREvm0ENAcMA3cZrjy-fF_OrU3TUQufV2T5O0evd7cviIVk-3T8u5svEsrIMCbBSNmVBGpkJQhomRaMIaXleNZSrguWSC5qWrWStIKCiWsbSSEJFqCwkpFN0sdtrnfkclQ_1yowuvuJrxjmvqopHK6foekd5ocOvwjpK6MFtakrqred1Vu89_w9eG_cH1la26Q--04YI</recordid><startdate>20200922</startdate><enddate>20200922</enddate><creator>Siregar, Anggrainy Togi Marito</creator><creator>Siswantining, Titin</creator><creator>Bustamam, Alhadi</creator><creator>Sarwinda, Devvi</creator><general>American Institute of Physics</general><scope>8FD</scope><scope>H8D</scope><scope>L7M</scope></search><sort><creationdate>20200922</creationdate><title>Comparison of supervised models in hepatocellular carcinoma tumor classification based on expression data using principal component analysis (PCA)</title><author>Siregar, Anggrainy Togi Marito ; Siswantining, Titin ; Bustamam, Alhadi ; Sarwinda, Devvi</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-p288t-a28db870bd5c00b2dcbe00f469b14e726d4c138fd2fc0ae778223c00a901d7da3</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Cancer</topic><topic>Classifiers</topic><topic>Correlation analysis</topic><topic>Deoxyribonucleic acid</topic><topic>DNA</topic><topic>Gene expression</topic><topic>Genes</topic><topic>Liver cancer</topic><topic>Machine learning</topic><topic>Multivariate analysis</topic><topic>Principal components analysis</topic><topic>Reduction</topic><topic>Regression analysis</topic><topic>Regression models</topic><topic>Support vector machines</topic><topic>Tumors</topic><topic>Variance</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Siregar, Anggrainy Togi Marito</creatorcontrib><creatorcontrib>Siswantining, Titin</creatorcontrib><creatorcontrib>Bustamam, Alhadi</creatorcontrib><creatorcontrib>Sarwinda, Devvi</creatorcontrib><collection>Technology Research Database</collection><collection>Aerospace Database</collection><collection>Advanced Technologies Database with Aerospace</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Siregar, Anggrainy Togi Marito</au><au>Siswantining, Titin</au><au>Bustamam, Alhadi</au><au>Sarwinda, Devvi</au><au>Akimenko, Vitalii</au><au>Apri, Mochamad</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Comparison of supervised models in hepatocellular carcinoma tumor classification based on expression data using principal component analysis (PCA)</atitle><btitle>AIP conference proceedings</btitle><date>2020-09-22</date><risdate>2020</risdate><volume>2264</volume><issue>1</issue><issn>0094-243X</issn><eissn>1551-7616</eissn><coden>APCPCS</coden><abstract>Hepatocellular Carcinoma is one of the cancer disease cases with a high dead population. To know that someone is affected by Hepatocellular Carcinoma Tumor by observing the expression of genes on DNA. Gene expression obtained from the microarray laboratory tool that produced genes probe. In this case, there are 54675 gene expressions with 40 samples (homo sapiens). Many expression genes will be difficult to classify someone affected or not affected by Hepatocellular Carcinoma Tumor. We must take action to minimize the features without losing the data information. One of the tools to reduction dimension in Machine learning is Principal Component Analysis (PCA). Principal Component Analysis is a multivariate analysis that transforms correlated origin features into new features that do not correlate with each other by reducing the number of these features so that they have smaller dimensions but can explain most of the diversity of the original features. The objective of this research is to find the best percentage of features that have generated from PCA then fitting some models. The models that we use are Logistic Regression Classifier, Support Vector Machine (SVM) Classifier, and Random Forest Classifier. A Logistic regression model is able to provide the best accuracy starting from 40% of its variance on PCA made, which is equal to 0.875. While the Random Forest Classifier and Support Vector Machine can provide an accuracy of 0.875 when the value of the variance is above 60% of the variance. The result can give information to select the best percent in using PCA as a reduction dimension especially, for gene expression on Microarray data.</abstract><cop>Melville</cop><pub>American Institute of Physics</pub><doi>10.1063/5.0023931</doi><tpages>8</tpages><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 0094-243X
ispartof	AIP conference proceedings, 2020, Vol.2264 (1)
issn	0094-243X 1551-7616
language	eng
recordid	cdi_proquest_journals_2444999402
source	AIP Journals Complete
subjects	Cancer Classifiers Correlation analysis Deoxyribonucleic acid DNA Gene expression Genes Liver cancer Machine learning Multivariate analysis Principal components analysis Reduction Regression analysis Regression models Support vector machines Tumors Variance
title	Comparison of supervised models in hepatocellular carcinoma tumor classification based on expression data using principal component analysis (PCA)
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-06T16%3A37%3A36IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_scita&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Comparison%20of%20supervised%20models%20in%20hepatocellular%20carcinoma%20tumor%20classification%20based%20on%20expression%20data%20using%20principal%20component%20analysis%20(PCA)&rft.btitle=AIP%20conference%20proceedings&rft.au=Siregar,%20Anggrainy%20Togi%20Marito&rft.date=2020-09-22&rft.volume=2264&rft.issue=1&rft.issn=0094-243X&rft.eissn=1551-7616&rft.coden=APCPCS&rft_id=info:doi/10.1063/5.0023931&rft_dat=%3Cproquest_scita%3E2444999402%3C/proquest_scita%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2444999402&rft_id=info:pmid/&rfr_iscdi=true