Empowering Machine Learning with Scalable Feature Engineering and Interpretable AutoML

Automated feature engineering has gained considerable attention in academia and industry. Nevertheless, existing systems often lack practical scalability and efficiency. This paper introduces BigFeat, a scalable and interpretable framework that streamlines critical phases of the machine learning pip...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on artificial intelligence 2024, p.1-16
Hauptverfasser: Eldeeb, Hassan, Elshawi, Radwa
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Automated feature engineering has gained considerable attention in academia and industry. Nevertheless, existing systems often lack practical scalability and efficiency. This paper introduces BigFeat, a scalable and interpretable framework that streamlines critical phases of the machine learning pipeline: feature engineering, model selection, and hyperparameter tuning. BigFeat presents two execution options: as a standalone feature engineering framework, denoted as BigFeat-FE, and as an AutoML framework, referred to as BigFeat-AutoML. BigFeat-FE optimizes input feature quality with the ultimate aim of maximizing predictive performance, based on a user-defined metric. BigFeat-FE employs a dynamic feature generation and selection mechanism that systematically creates a set of expressive features. These features not only enhance prediction performance but also prioritize interpretability. BigFeat-FE employs a metalearning technique to warm-start the optimization process, resulting in significant overall performance gains. BigFeat-AutoML, tailored for algorithm selection and hyperparameter tuning, harnesses a random search method over the space of interpretable models. We conducted extensive experiments, and the results demonstrate that BigFeat-FE consistently outperforms state-of-the-art automated feature engineering frameworks, such as AutoFeat and SAFE, across a wide range of datasets, achieving an average performance improvement of 8.65% compared to AutoFeat and 4.71% compared to SAFE, respectively. Additionally, BigFeat-AutoML demonstrates substantial performance improvement when compared to TPOT and Autosklearn, with average improvements of 0.74% over TPOT and 2.25% over Autosklearn, respectively. Furthermore, BigFeat's scalability is affirmed through its linear complexity, and execution times, averaging 20 times faster than AutoFeat and 14 times faster than SAFE.
ISSN:2691-4581
2691-4581
DOI:10.1109/TAI.2024.3400752