Comparison of multiclass classification techniques using dry bean dataset

•Multiclass classification approaches are used for the classification of this study.•This paper uses statistical approach IQR and heatmap for data preprocessing.•Adaptive Synthetic (ADASYN) approach is applied for constructing balanced classes from imbalanced classes.•XGB method has demonstrated pre...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:International journal of cognitive computing in engineering 2023-06, Vol.4, p.6-20
Hauptverfasser: Salauddin Khan, Md, Nath, Tushar Deb, Murad Hossain, Md, Mukherjee, Arnab, Bin Hasnath, Hafiz, Manhaz Meem, Tahera, Khan, Umama
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:•Multiclass classification approaches are used for the classification of this study.•This paper uses statistical approach IQR and heatmap for data preprocessing.•Adaptive Synthetic (ADASYN) approach is applied for constructing balanced classes from imbalanced classes.•XGB method has demonstrated preferable result in categorization of dry beans and determination of the factors because it combines flexibility, high performance, precision and accuracy. The application of classsification methods through multivariate and machine learning techniques has enormous significance in agricultural sector. It is vital to classify various types of seeds as well as identify the quality of seeds which has a great impact on the production of crops. There is a wide range of genetic variations in dry beans all over the world. Many studies have been conducted previously on various dataset to indentify the sorts of dry beans, however most of them focused on machine learning techniques with binary classification. The aim of this study is to investigate a reliable classifier which has the lowest noise implications and establish an algorithm for dry bean classification effectively. This paper focuses on outlier removals, oversampling with Adaptive Synthetic (ADASYN) algorithm and finding the best classifier to guarantee the highest possible accuracy. The raw dataset for this study was accessed from UCI Machine Learning Repository. The dataset contained grains having 16 features, 12 dimensions, and 4 distinct shapes. For the purpose of eliminating missing values from the dataset, interquartile range (IQR) with python programming was utilized. Eight most popular classifiers were used in this study which are Logistic Regression (LR), Naïve Bayes (NB), k-Nearest Neighbor (KNN), Decision Tree (DT), Random Forest (RF), Extreme Gradient Boosting (XGB), Support Vector Machine (SVM), and Multilayer Perception (MLP) with balanced and imbalanced classes. The authors utilized frequency tables, bar diagrams, boxplots, analysis of variance for descriptive analysis as well as data preprocessing. The XGB classifier preferably outperformed than other classifiers with balanced and imbalanced distribution of dry beans within each class. It has acquired accuracy (ACC) 93.0% and 95.4% in imbalanced and balanced classes respectively. In case of balanced dataset, after application of ADASYN algorithm both KNN and RF techniques also performed well regarding the Classification Accuracy (ACC), Sensitivity
ISSN:2666-3074
2666-3074
DOI:10.1016/j.ijcce.2023.01.002