Fast and approximate exhaustive variable selection for generalised linear models with APES

Summary We present APproximated Exhaustive Search (APES), which enables fast and approximated exhaustive variable selection in Generalised Linear Models (GLMs). While exhaustive variable selection remains as the gold standard in many model selection contexts, traditional exhaustive variable selectio...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Australian & New Zealand journal of statistics 2019-12, Vol.61 (4), p.445-465
Hauptverfasser: Wang, Kevin YX, Tarr, Garth, Yang, Jean YH, Mueller, Samuel
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Summary We present APproximated Exhaustive Search (APES), which enables fast and approximated exhaustive variable selection in Generalised Linear Models (GLMs). While exhaustive variable selection remains as the gold standard in many model selection contexts, traditional exhaustive variable selection suffers from computational feasibility issues. More precisely, there is often a high cost associated with computing maximum likelihood estimates (MLE) for all subsets of GLMs. Efficient algorithms for exhaustive searches exist for linear models, most notably the leaps‐and‐bound algorithm and, more recently, the mixed integer optimisation (MIO) algorithm. The APES method learns from observational weights in a generalised linear regression super‐model and reformulates the GLM problem as a linear regression problem. In this way, APES can approximate a true exhaustive search in the original GLM space. Where exhaustive variable selection is not computationally feasible, we propose a best‐subset search, which also closely approximates a true exhaustive search. APES is made available in both as a standalone R package as well as part of the already existing mplot package. We present APES, a fast and approximated exhaustive variable selection method for generalised linear models feasible for hundreds of variables.
ISSN:1369-1473
1467-842X
DOI:10.1111/anzs.12276