Parsed Categoric Encodings with Automunge
The Automunge open source python library platform for tabular data pre-processing automates feature engineering data transformations of numerical encoding and missing data infill to received tidy data on bases fit to properties of columns in a designated train set for consistent and efficient applic...
Gespeichert in:
1. Verfasser: | |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The Automunge open source python library platform for tabular data
pre-processing automates feature engineering data transformations of numerical
encoding and missing data infill to received tidy data on bases fit to
properties of columns in a designated train set for consistent and efficient
application to subsequent data pipelines such as for inference, where
transformations may be applied to distinct columns in "family tree" sets with
generations and branches of derivations. Included in the library of
transformations are methods to extract structure from bounded categorical
string sets by way of automated string parsing, in which comparisons between
entries in the set of unique values are parsed to identify character subset
overlaps which may be encoded by appended columns of boolean overlap detection
activations or by replacing string entries with identified overlap partitions.
Further string parsing options, which may also be applied to unbounded
categoric sets, include extraction of numeric substring partitions from entries
or search functions to identify presence of specified substring partitions. The
aggregation of these methods into "family tree" sets of transformations are
demonstrated for use to automatically extract structure from categoric string
compositions in relation to the set of entries in a column, such as may be
applied to prepare categoric string set encodings for machine learning without
human intervention. |
---|---|
DOI: | 10.48550/arxiv.2202.09498 |