Making data classification more effective: An automated deep forest model

•Propose an automated deep forest model (ATDF) to improve classification automation.•Develop a standardized process to determine the basic classifiers.•Design a NMI-based HC algorithm to identify the optimal forest learner type.•Employ a TPE-based BO to determine the optimal number of forest learner...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of industrial information integration 2024-11, Vol.42, p.100738, Article 100738
Hauptverfasser: Guo, Jingwei, Guo, Xiang, Tian, Yihui, Zhan, Hao, Chen, Zhen-Song, Deveci, Muhammet
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:•Propose an automated deep forest model (ATDF) to improve classification automation.•Develop a standardized process to determine the basic classifiers.•Design a NMI-based HC algorithm to identify the optimal forest learner type.•Employ a TPE-based BO to determine the optimal number of forest learners for each type.•Verify the proposed ATDF model through seven publicly available datasets. Despite a small overfitting risk, the deep forest model and its variants cannot automatically match data features; they rely on manual experience and comparative experiments for forest learner selection. This study proposes an automated deep forest model (ATDF) to enhance deep forest automation by automatically determining forest learners’ types and numbers based on training data. The model introduces a forest learner variability measure based on normalized mutual information, serving as a theoretical foundation for the automated process in deep forests. Then, a novel hierarchical clustering algorithm based on normalized mutual information is proposed to group forest learners at different granularities, determining the optimal forest learner type. This advanced technical method enables the determination of the model structure for stacking models, including deep forests. Finally, with the goal of maximizing cross-validation scores, the tree parson estimator-based Bayesian optimization algorithm determines the ideal number of forest learners for each type. Additionally, a standardized method for identifying forest learners is developed to guarantee the consistency of model outcomes. Most importantly, a series of comparative experiments on seven datasets from the UCI Machine Learning Repository confirmed the effectiveness and superiority of the proposed model. The results demonstrate that the proposed model has superior adaptability to new data and tasks, besides having a high level of automation, and performs excellently in the classification task.
ISSN:2452-414X
DOI:10.1016/j.jii.2024.100738