Filter-based relevance and instance selection

Feature selection is an important technique in building prediction systems. In various searches, the judgment of the relevancy of a feature is often calculated using all the instances of the considered sample. However, when the dataset size grows, some of the instances are not useful for weighting t...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Mourtji, Basma El, Ouaderhman, Tayeb, Chamlal, Hasna
Format:	Tagungsbericht
Sprache:	eng
Schlagworte:	Algorithms Clustering Datasets
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Feature selection is an important technique in building prediction systems. In various searches, the judgment of the relevancy of a feature is often calculated using all the instances of the considered sample. However, when the dataset size grows, some of the instances are not useful for weighting the features. In this paper, we rank features according to a fitness function that relies on the relevancy using all instances and on the relevancy using a maximum number of significant instances (which really contribute to the feature positive relevancy with the target variable). The relevancy is based on preordonnances theory where the instances are expressed in pairs. The proposed algorithm can be mainly divided into three steps, namely, (a) eliminating all features that are in disagree with the target feature, (b) Finding the best subset of instances, to each feature, that maximize the relevancy and which the cardinal tends to the cardinal of instances in the original dataset, and (c) Ranking features. The second step is defined by dividing the dataset (instances of each feature) into several consistent regions by fuzzy clustering. Then, performing GA-based instances selection independently within each cluster. Finally, aggregating of the partial results by the ensemble voting. Experimental results verify the effectiveness of the proposed method.
ISSN:	0094-243X 1551-7616
DOI:	10.1063/5.0194692