Modeling High-Dimensional Data with Unknown Cut Points: A Fusion Penalized Logistic Threshold Regression
In traditional logistic regression models, the link function is often assumed to be linear and continuous in predictors. Here, we consider a threshold model that all continuous features are discretized into ordinal levels, which further determine the binary responses. Both the threshold points and r...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | In traditional logistic regression models, the link function is often assumed
to be linear and continuous in predictors. Here, we consider a threshold model
that all continuous features are discretized into ordinal levels, which further
determine the binary responses. Both the threshold points and regression
coefficients are unknown and to be estimated. For high dimensional data, we
propose a fusion penalized logistic threshold regression (FILTER) model, where
a fused lasso penalty is employed to control the total variation and shrink the
coefficients to zero as a method of variable selection. Under mild conditions
on the estimate of unknown threshold points, we establish the non-asymptotic
error bound for coefficient estimation and the model selection consistency.
With a careful characterization of the error propagation, we have also shown
that the tree-based method, such as CART, fulfill the threshold estimation
conditions. We find the FILTER model is well suited in the problem of early
detection and prediction for chronic disease like diabetes, using physical
examination data. The finite sample behavior of our proposed method are also
explored and compared with extensive Monte Carlo studies, which supports our
theoretical discoveries. |
---|---|
DOI: | 10.48550/arxiv.2202.08441 |