Feature Encodings for Gradient Boosting with Automunge
Automunge is a tabular preprocessing library that encodes dataframes for supervised learning. When selecting a default feature encoding strategy for gradient boosted learning, one may consider metrics of training duration and achieved predictive performance associated with the feature representation...
Gespeichert in:
1. Verfasser: | |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Automunge is a tabular preprocessing library that encodes dataframes for
supervised learning. When selecting a default feature encoding strategy for
gradient boosted learning, one may consider metrics of training duration and
achieved predictive performance associated with the feature representations.
Automunge offers a default of binarization for categoric features and z-score
normalization for numeric. The presented study sought to validate those
defaults by way of benchmarking on a series of diverse data sets by encoding
variations with tuned gradient boosted learning. We found that on average our
chosen defaults were top performers both from a tuning duration and a model
performance standpoint. Another key finding was that one hot encoding did not
perform in a manner consistent with suitability to serve as a categoric default
in comparison to categoric binarization. We present here these and further
benchmarks. |
---|---|
DOI: | 10.48550/arxiv.2209.12309 |