Identifying redundant features using unsupervised learning for high-dimensional data

In the digital era, classifiers play a vital role in various machine learning applications such as medical diagnosis, weather prediction and pattern recognition. The classifiers are built by classification algorithms using data. Nowadays, the data are high dimensional in nature since the data are ma...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	SN applied sciences 2020-08, Vol.2 (8), p.1367, Article 1367
Hauptverfasser:	Danasingh, Asir Antony Gnana Singh, Subramanian, Appavu alias Balamurugan, Epiphany, Jebamalar Leavline
Format:	Artikel
Sprache:	eng
Schlagworte:	Accuracy Algorithms Applied and Technical Physics Artificial intelligence Chemistry/Food Science Classification Classifiers Clustering Computer applications Connectivity Datasets Decision making Earth Sciences Engineering Engineering: Industrial Informatics: Data Analytics in Remote Sensing and Cyber-Physical Systems Environment Feature selection Information theory Machine learning Materials Science Pattern recognition Probability Probability distribution Redundancy Research Article Similarity measures Social networks Unsupervised learning Weather forecasting
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue	8
container_start_page	1367
container_title	SN applied sciences
container_volume	2
creator	Danasingh, Asir Antony Gnana Singh Subramanian, Appavu alias Balamurugan Epiphany, Jebamalar Leavline
description	In the digital era, classifiers play a vital role in various machine learning applications such as medical diagnosis, weather prediction and pattern recognition. The classifiers are built by classification algorithms using data. Nowadays, the data are high dimensional in nature since the data are massively generated due to advancements in information and communication technology. The high-dimensional space contains irrelevant and redundant features; both of them reduce the classification accuracy and increase space and building time of the classifiers. Redundancy and relevancy analysis mechanisms of the feature selection process remove the irrelevant and redundant features. Identifying the irrelevant features is a simple task since that only considers the relevancy between each feature and the target class of a dataset using any one of the statistical or information theoretic measures. Identifying the redundant features from a dataset is quite difficult, especially in high-dimensional space since it needs to consider the relevancy among the features. This leads to more computational complexity and an inappropriate relevancy measure that can degrade the classification accuracy. In order to overcome these problems, this paper presents an unsupervised learning-based redundancy analysis mechanism for feature selection by evaluating various clustering techniques in terms of average redundancy rate and runtime.
doi_str_mv	10.1007/s42452-020-3157-6
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2788416409</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2788416409</sourcerecordid><originalsourceid>FETCH-LOGICAL-c359t-6b94f62503e144a9f0cd345ffac718219f50fdbb146ef4b6054328d706e25c1e3</originalsourceid><addsrcrecordid>eNp1kEtrwzAQhEVpoSHND-jN0LPa1dP2sYQ-AoFe0rOQrVXikMipZBfy72vjkp562mV3Zhg-Qu4ZPDKA_ClJLhWnwIEKpnKqr8iMKy6oKHN2fdm1uCWLlPYAwPNSyELMyGblMHSNPzdhm0V0fXA2dJlH2_URU9an8dGH1J8wfjcJXXZAG8N49W3Mds12R11zxJCaNthD5mxn78iNt4eEi985J5-vL5vlO11_vK2Wz2taC1V2VFel9JorEMiktKWH2gmpvLd1zgrOSq_Au6piUqOXlQYlBS9cDhq5qhmKOXmYck-x_eoxdWbf9nFokQzPi0IyLaEcVGxS1bFNKaI3p9gcbTwbBmbkZyZ-ZuBnRn5GDx4-edKgDVuMf8n_m34AFlpzUQ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2788416409</pqid></control><display><type>article</type><title>Identifying redundant features using unsupervised learning for high-dimensional data</title><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><creator>Danasingh, Asir Antony Gnana Singh ; Subramanian, Appavu alias Balamurugan ; Epiphany, Jebamalar Leavline</creator><creatorcontrib>Danasingh, Asir Antony Gnana Singh ; Subramanian, Appavu alias Balamurugan ; Epiphany, Jebamalar Leavline</creatorcontrib><description>In the digital era, classifiers play a vital role in various machine learning applications such as medical diagnosis, weather prediction and pattern recognition. The classifiers are built by classification algorithms using data. Nowadays, the data are high dimensional in nature since the data are massively generated due to advancements in information and communication technology. The high-dimensional space contains irrelevant and redundant features; both of them reduce the classification accuracy and increase space and building time of the classifiers. Redundancy and relevancy analysis mechanisms of the feature selection process remove the irrelevant and redundant features. Identifying the irrelevant features is a simple task since that only considers the relevancy between each feature and the target class of a dataset using any one of the statistical or information theoretic measures. Identifying the redundant features from a dataset is quite difficult, especially in high-dimensional space since it needs to consider the relevancy among the features. This leads to more computational complexity and an inappropriate relevancy measure that can degrade the classification accuracy. In order to overcome these problems, this paper presents an unsupervised learning-based redundancy analysis mechanism for feature selection by evaluating various clustering techniques in terms of average redundancy rate and runtime.</description><identifier>ISSN: 2523-3963</identifier><identifier>EISSN: 2523-3971</identifier><identifier>DOI: 10.1007/s42452-020-3157-6</identifier><language>eng</language><publisher>Cham: Springer International Publishing</publisher><subject>Accuracy ; Algorithms ; Applied and Technical Physics ; Artificial intelligence ; Chemistry/Food Science ; Classification ; Classifiers ; Clustering ; Computer applications ; Connectivity ; Datasets ; Decision making ; Earth Sciences ; Engineering ; Engineering: Industrial Informatics: Data Analytics in Remote Sensing and Cyber-Physical Systems ; Environment ; Feature selection ; Information theory ; Machine learning ; Materials Science ; Pattern recognition ; Probability ; Probability distribution ; Redundancy ; Research Article ; Similarity measures ; Social networks ; Unsupervised learning ; Weather forecasting</subject><ispartof>SN applied sciences, 2020-08, Vol.2 (8), p.1367, Article 1367</ispartof><rights>Springer Nature Switzerland AG 2020</rights><rights>Springer Nature Switzerland AG 2020.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c359t-6b94f62503e144a9f0cd345ffac718219f50fdbb146ef4b6054328d706e25c1e3</citedby><cites>FETCH-LOGICAL-c359t-6b94f62503e144a9f0cd345ffac718219f50fdbb146ef4b6054328d706e25c1e3</cites><orcidid>0000-0002-4913-2886</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,776,780,27903,27904</link.rule.ids></links><search><creatorcontrib>Danasingh, Asir Antony Gnana Singh</creatorcontrib><creatorcontrib>Subramanian, Appavu alias Balamurugan</creatorcontrib><creatorcontrib>Epiphany, Jebamalar Leavline</creatorcontrib><title>Identifying redundant features using unsupervised learning for high-dimensional data</title><title>SN applied sciences</title><addtitle>SN Appl. Sci</addtitle><description>In the digital era, classifiers play a vital role in various machine learning applications such as medical diagnosis, weather prediction and pattern recognition. The classifiers are built by classification algorithms using data. Nowadays, the data are high dimensional in nature since the data are massively generated due to advancements in information and communication technology. The high-dimensional space contains irrelevant and redundant features; both of them reduce the classification accuracy and increase space and building time of the classifiers. Redundancy and relevancy analysis mechanisms of the feature selection process remove the irrelevant and redundant features. Identifying the irrelevant features is a simple task since that only considers the relevancy between each feature and the target class of a dataset using any one of the statistical or information theoretic measures. Identifying the redundant features from a dataset is quite difficult, especially in high-dimensional space since it needs to consider the relevancy among the features. This leads to more computational complexity and an inappropriate relevancy measure that can degrade the classification accuracy. In order to overcome these problems, this paper presents an unsupervised learning-based redundancy analysis mechanism for feature selection by evaluating various clustering techniques in terms of average redundancy rate and runtime.</description><subject>Accuracy</subject><subject>Algorithms</subject><subject>Applied and Technical Physics</subject><subject>Artificial intelligence</subject><subject>Chemistry/Food Science</subject><subject>Classification</subject><subject>Classifiers</subject><subject>Clustering</subject><subject>Computer applications</subject><subject>Connectivity</subject><subject>Datasets</subject><subject>Decision making</subject><subject>Earth Sciences</subject><subject>Engineering</subject><subject>Engineering: Industrial Informatics: Data Analytics in Remote Sensing and Cyber-Physical Systems</subject><subject>Environment</subject><subject>Feature selection</subject><subject>Information theory</subject><subject>Machine learning</subject><subject>Materials Science</subject><subject>Pattern recognition</subject><subject>Probability</subject><subject>Probability distribution</subject><subject>Redundancy</subject><subject>Research Article</subject><subject>Similarity measures</subject><subject>Social networks</subject><subject>Unsupervised learning</subject><subject>Weather forecasting</subject><issn>2523-3963</issn><issn>2523-3971</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><recordid>eNp1kEtrwzAQhEVpoSHND-jN0LPa1dP2sYQ-AoFe0rOQrVXikMipZBfy72vjkp562mV3Zhg-Qu4ZPDKA_ClJLhWnwIEKpnKqr8iMKy6oKHN2fdm1uCWLlPYAwPNSyELMyGblMHSNPzdhm0V0fXA2dJlH2_URU9an8dGH1J8wfjcJXXZAG8N49W3Mds12R11zxJCaNthD5mxn78iNt4eEi985J5-vL5vlO11_vK2Wz2taC1V2VFel9JorEMiktKWH2gmpvLd1zgrOSq_Au6piUqOXlQYlBS9cDhq5qhmKOXmYck-x_eoxdWbf9nFokQzPi0IyLaEcVGxS1bFNKaI3p9gcbTwbBmbkZyZ-ZuBnRn5GDx4-edKgDVuMf8n_m34AFlpzUQ</recordid><startdate>20200801</startdate><enddate>20200801</enddate><creator>Danasingh, Asir Antony Gnana Singh</creator><creator>Subramanian, Appavu alias Balamurugan</creator><creator>Epiphany, Jebamalar Leavline</creator><general>Springer International Publishing</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><orcidid>https://orcid.org/0000-0002-4913-2886</orcidid></search><sort><creationdate>20200801</creationdate><title>Identifying redundant features using unsupervised learning for high-dimensional data</title><author>Danasingh, Asir Antony Gnana Singh ; Subramanian, Appavu alias Balamurugan ; Epiphany, Jebamalar Leavline</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c359t-6b94f62503e144a9f0cd345ffac718219f50fdbb146ef4b6054328d706e25c1e3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Accuracy</topic><topic>Algorithms</topic><topic>Applied and Technical Physics</topic><topic>Artificial intelligence</topic><topic>Chemistry/Food Science</topic><topic>Classification</topic><topic>Classifiers</topic><topic>Clustering</topic><topic>Computer applications</topic><topic>Connectivity</topic><topic>Datasets</topic><topic>Decision making</topic><topic>Earth Sciences</topic><topic>Engineering</topic><topic>Engineering: Industrial Informatics: Data Analytics in Remote Sensing and Cyber-Physical Systems</topic><topic>Environment</topic><topic>Feature selection</topic><topic>Information theory</topic><topic>Machine learning</topic><topic>Materials Science</topic><topic>Pattern recognition</topic><topic>Probability</topic><topic>Probability distribution</topic><topic>Redundancy</topic><topic>Research Article</topic><topic>Similarity measures</topic><topic>Social networks</topic><topic>Unsupervised learning</topic><topic>Weather forecasting</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Danasingh, Asir Antony Gnana Singh</creatorcontrib><creatorcontrib>Subramanian, Appavu alias Balamurugan</creatorcontrib><creatorcontrib>Epiphany, Jebamalar Leavline</creatorcontrib><collection>CrossRef</collection><jtitle>SN applied sciences</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Danasingh, Asir Antony Gnana Singh</au><au>Subramanian, Appavu alias Balamurugan</au><au>Epiphany, Jebamalar Leavline</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Identifying redundant features using unsupervised learning for high-dimensional data</atitle><jtitle>SN applied sciences</jtitle><stitle>SN Appl. Sci</stitle><date>2020-08-01</date><risdate>2020</risdate><volume>2</volume><issue>8</issue><spage>1367</spage><pages>1367-</pages><artnum>1367</artnum><issn>2523-3963</issn><eissn>2523-3971</eissn><abstract>In the digital era, classifiers play a vital role in various machine learning applications such as medical diagnosis, weather prediction and pattern recognition. The classifiers are built by classification algorithms using data. Nowadays, the data are high dimensional in nature since the data are massively generated due to advancements in information and communication technology. The high-dimensional space contains irrelevant and redundant features; both of them reduce the classification accuracy and increase space and building time of the classifiers. Redundancy and relevancy analysis mechanisms of the feature selection process remove the irrelevant and redundant features. Identifying the irrelevant features is a simple task since that only considers the relevancy between each feature and the target class of a dataset using any one of the statistical or information theoretic measures. Identifying the redundant features from a dataset is quite difficult, especially in high-dimensional space since it needs to consider the relevancy among the features. This leads to more computational complexity and an inappropriate relevancy measure that can degrade the classification accuracy. In order to overcome these problems, this paper presents an unsupervised learning-based redundancy analysis mechanism for feature selection by evaluating various clustering techniques in terms of average redundancy rate and runtime.</abstract><cop>Cham</cop><pub>Springer International Publishing</pub><doi>10.1007/s42452-020-3157-6</doi><orcidid>https://orcid.org/0000-0002-4913-2886</orcidid><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 2523-3963
ispartof	SN applied sciences, 2020-08, Vol.2 (8), p.1367, Article 1367
issn	2523-3963 2523-3971
language	eng
recordid	cdi_proquest_journals_2788416409
source	Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals
subjects	Accuracy Algorithms Applied and Technical Physics Artificial intelligence Chemistry/Food Science Classification Classifiers Clustering Computer applications Connectivity Datasets Decision making Earth Sciences Engineering Engineering: Industrial Informatics: Data Analytics in Remote Sensing and Cyber-Physical Systems Environment Feature selection Information theory Machine learning Materials Science Pattern recognition Probability Probability distribution Redundancy Research Article Similarity measures Social networks Unsupervised learning Weather forecasting
title	Identifying redundant features using unsupervised learning for high-dimensional data
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-25T04%3A30%3A42IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Identifying%20redundant%20features%20using%20unsupervised%20learning%20for%20high-dimensional%20data&rft.jtitle=SN%20applied%20sciences&rft.au=Danasingh,%20Asir%20Antony%20Gnana%20Singh&rft.date=2020-08-01&rft.volume=2&rft.issue=8&rft.spage=1367&rft.pages=1367-&rft.artnum=1367&rft.issn=2523-3963&rft.eissn=2523-3971&rft_id=info:doi/10.1007/s42452-020-3157-6&rft_dat=%3Cproquest_cross%3E2788416409%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2788416409&rft_id=info:pmid/&rfr_iscdi=true