Filter-based relevance and instance selection

Feature selection is an important technique in building prediction systems. In various searches, the judgment of the relevancy of a feature is often calculated using all the instances of the considered sample. However, when the dataset size grows, some of the instances are not useful for weighting t...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Mourtji, Basma El, Ouaderhman, Tayeb, Chamlal, Hasna
Format:	Tagungsbericht
Sprache:	eng
Schlagworte:	Algorithms Clustering Datasets
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue	1
container_start_page
container_title
container_volume	3034
creator	Mourtji, Basma El Ouaderhman, Tayeb Chamlal, Hasna
description	Feature selection is an important technique in building prediction systems. In various searches, the judgment of the relevancy of a feature is often calculated using all the instances of the considered sample. However, when the dataset size grows, some of the instances are not useful for weighting the features. In this paper, we rank features according to a fitness function that relies on the relevancy using all instances and on the relevancy using a maximum number of significant instances (which really contribute to the feature positive relevancy with the target variable). The relevancy is based on preordonnances theory where the instances are expressed in pairs. The proposed algorithm can be mainly divided into three steps, namely, (a) eliminating all features that are in disagree with the target feature, (b) Finding the best subset of instances, to each feature, that maximize the relevancy and which the cardinal tends to the cardinal of instances in the original dataset, and (c) Ranking features. The second step is defined by dividing the dataset (instances of each feature) into several consistent regions by fuzzy clustering. Then, performing GA-based instances selection independently within each cluster. Finally, aggregating of the partial results by the ensemble voting. Experimental results verify the effectiveness of the proposed method.
doi_str_mv	10.1063/5.0194692
format	Conference Proceeding
fullrecord	<record><control><sourceid>proquest_scita</sourceid><recordid>TN_cdi_scitation_primary_10_1063_5_0194692</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2937429968</sourcerecordid><originalsourceid>FETCH-LOGICAL-p1682-79268148fac6daebff868afe9b2d459a53e344684f799ff6b8337f4859573afb3</originalsourceid><addsrcrecordid>eNotUM1Kw0AYXETBWD34BgFvwtb9_Xa_oxSrQqGXCt6WTbILKTGJu6ng25vanoYZhplhCLnnbMkZyCe9ZBwVoLggBdeaUwMcLknBGCoqlPy8Jjc57xkTaIwtCF233RQSrXwOTZlCF358X4fS903Z9nn6J3mW66kd-ltyFX2Xw90ZF-Rj_bJbvdHN9vV99byhIwcrqEEBlisbfQ2ND1WMFqyPASvRKI1eyyCVAquiQYwRKiulicpq1Eb6WMkFeTjljmn4PoQ8uf1wSP1c6QRKowQi2Nn1eHLlup38cZ8bU_vl06_jzB3vcNqd75B_Q6pQVQ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype><pqid>2937429968</pqid></control><display><type>conference_proceeding</type><title>Filter-based relevance and instance selection</title><source>AIP Journals Complete</source><creator>Mourtji, Basma El ; Ouaderhman, Tayeb ; Chamlal, Hasna</creator><contributor>Belhamadia, Youssef ; Seaid, Mohammed</contributor><creatorcontrib>Mourtji, Basma El ; Ouaderhman, Tayeb ; Chamlal, Hasna ; Belhamadia, Youssef ; Seaid, Mohammed</creatorcontrib><description>Feature selection is an important technique in building prediction systems. In various searches, the judgment of the relevancy of a feature is often calculated using all the instances of the considered sample. However, when the dataset size grows, some of the instances are not useful for weighting the features. In this paper, we rank features according to a fitness function that relies on the relevancy using all instances and on the relevancy using a maximum number of significant instances (which really contribute to the feature positive relevancy with the target variable). The relevancy is based on preordonnances theory where the instances are expressed in pairs. The proposed algorithm can be mainly divided into three steps, namely, (a) eliminating all features that are in disagree with the target feature, (b) Finding the best subset of instances, to each feature, that maximize the relevancy and which the cardinal tends to the cardinal of instances in the original dataset, and (c) Ranking features. The second step is defined by dividing the dataset (instances of each feature) into several consistent regions by fuzzy clustering. Then, performing GA-based instances selection independently within each cluster. Finally, aggregating of the partial results by the ensemble voting. Experimental results verify the effectiveness of the proposed method.</description><identifier>ISSN: 0094-243X</identifier><identifier>EISSN: 1551-7616</identifier><identifier>DOI: 10.1063/5.0194692</identifier><identifier>CODEN: APCPCS</identifier><language>eng</language><publisher>Melville: American Institute of Physics</publisher><subject>Algorithms ; Clustering ; Datasets</subject><ispartof>AIP conference proceedings, 2024, Vol.3034 (1)</ispartof><rights>Author(s)</rights><rights>2024 Author(s). Published by AIP Publishing.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://pubs.aip.org/acp/article-lookup/doi/10.1063/5.0194692$$EHTML$$P50$$Gscitation$$H</linktohtml><link.rule.ids>309,310,314,777,781,786,787,791,4499,23912,23913,25122,27906,27907,76134</link.rule.ids></links><search><contributor>Belhamadia, Youssef</contributor><contributor>Seaid, Mohammed</contributor><creatorcontrib>Mourtji, Basma El</creatorcontrib><creatorcontrib>Ouaderhman, Tayeb</creatorcontrib><creatorcontrib>Chamlal, Hasna</creatorcontrib><title>Filter-based relevance and instance selection</title><title>AIP conference proceedings</title><description>Feature selection is an important technique in building prediction systems. In various searches, the judgment of the relevancy of a feature is often calculated using all the instances of the considered sample. However, when the dataset size grows, some of the instances are not useful for weighting the features. In this paper, we rank features according to a fitness function that relies on the relevancy using all instances and on the relevancy using a maximum number of significant instances (which really contribute to the feature positive relevancy with the target variable). The relevancy is based on preordonnances theory where the instances are expressed in pairs. The proposed algorithm can be mainly divided into three steps, namely, (a) eliminating all features that are in disagree with the target feature, (b) Finding the best subset of instances, to each feature, that maximize the relevancy and which the cardinal tends to the cardinal of instances in the original dataset, and (c) Ranking features. The second step is defined by dividing the dataset (instances of each feature) into several consistent regions by fuzzy clustering. Then, performing GA-based instances selection independently within each cluster. Finally, aggregating of the partial results by the ensemble voting. Experimental results verify the effectiveness of the proposed method.</description><subject>Algorithms</subject><subject>Clustering</subject><subject>Datasets</subject><issn>0094-243X</issn><issn>1551-7616</issn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2024</creationdate><recordtype>conference_proceeding</recordtype><recordid>eNotUM1Kw0AYXETBWD34BgFvwtb9_Xa_oxSrQqGXCt6WTbILKTGJu6ng25vanoYZhplhCLnnbMkZyCe9ZBwVoLggBdeaUwMcLknBGCoqlPy8Jjc57xkTaIwtCF233RQSrXwOTZlCF358X4fS903Z9nn6J3mW66kd-ltyFX2Xw90ZF-Rj_bJbvdHN9vV99byhIwcrqEEBlisbfQ2ND1WMFqyPASvRKI1eyyCVAquiQYwRKiulicpq1Eb6WMkFeTjljmn4PoQ8uf1wSP1c6QRKowQi2Nn1eHLlup38cZ8bU_vl06_jzB3vcNqd75B_Q6pQVQ</recordid><startdate>20240305</startdate><enddate>20240305</enddate><creator>Mourtji, Basma El</creator><creator>Ouaderhman, Tayeb</creator><creator>Chamlal, Hasna</creator><general>American Institute of Physics</general><scope>8FD</scope><scope>H8D</scope><scope>L7M</scope></search><sort><creationdate>20240305</creationdate><title>Filter-based relevance and instance selection</title><author>Mourtji, Basma El ; Ouaderhman, Tayeb ; Chamlal, Hasna</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-p1682-79268148fac6daebff868afe9b2d459a53e344684f799ff6b8337f4859573afb3</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Algorithms</topic><topic>Clustering</topic><topic>Datasets</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Mourtji, Basma El</creatorcontrib><creatorcontrib>Ouaderhman, Tayeb</creatorcontrib><creatorcontrib>Chamlal, Hasna</creatorcontrib><collection>Technology Research Database</collection><collection>Aerospace Database</collection><collection>Advanced Technologies Database with Aerospace</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Mourtji, Basma El</au><au>Ouaderhman, Tayeb</au><au>Chamlal, Hasna</au><au>Belhamadia, Youssef</au><au>Seaid, Mohammed</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Filter-based relevance and instance selection</atitle><btitle>AIP conference proceedings</btitle><date>2024-03-05</date><risdate>2024</risdate><volume>3034</volume><issue>1</issue><issn>0094-243X</issn><eissn>1551-7616</eissn><coden>APCPCS</coden><abstract>Feature selection is an important technique in building prediction systems. In various searches, the judgment of the relevancy of a feature is often calculated using all the instances of the considered sample. However, when the dataset size grows, some of the instances are not useful for weighting the features. In this paper, we rank features according to a fitness function that relies on the relevancy using all instances and on the relevancy using a maximum number of significant instances (which really contribute to the feature positive relevancy with the target variable). The relevancy is based on preordonnances theory where the instances are expressed in pairs. The proposed algorithm can be mainly divided into three steps, namely, (a) eliminating all features that are in disagree with the target feature, (b) Finding the best subset of instances, to each feature, that maximize the relevancy and which the cardinal tends to the cardinal of instances in the original dataset, and (c) Ranking features. The second step is defined by dividing the dataset (instances of each feature) into several consistent regions by fuzzy clustering. Then, performing GA-based instances selection independently within each cluster. Finally, aggregating of the partial results by the ensemble voting. Experimental results verify the effectiveness of the proposed method.</abstract><cop>Melville</cop><pub>American Institute of Physics</pub><doi>10.1063/5.0194692</doi><tpages>8</tpages><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 0094-243X
ispartof	AIP conference proceedings, 2024, Vol.3034 (1)
issn	0094-243X 1551-7616
language	eng
recordid	cdi_scitation_primary_10_1063_5_0194692
source	AIP Journals Complete
subjects	Algorithms Clustering Datasets
title	Filter-based relevance and instance selection
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-17T10%3A29%3A23IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_scita&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Filter-based%20relevance%20and%20instance%20selection&rft.btitle=AIP%20conference%20proceedings&rft.au=Mourtji,%20Basma%20El&rft.date=2024-03-05&rft.volume=3034&rft.issue=1&rft.issn=0094-243X&rft.eissn=1551-7616&rft.coden=APCPCS&rft_id=info:doi/10.1063/5.0194692&rft_dat=%3Cproquest_scita%3E2937429968%3C/proquest_scita%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2937429968&rft_id=info:pmid/&rfr_iscdi=true