Really Doing Great at Model Evaluation for CATE Estimation? A Critical Consideration of Current Model Evaluation Practices in Treatment Effect Estimation
This paper critically examines current methodologies for evaluating models in Conditional and Average Treatment Effect (CATE/ATE) estimation, identifying several key pitfalls in existing practices. The current approach of over-reliance on specific metrics and empirical means and lack of statistical...
Gespeichert in:
Hauptverfasser: | , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | This paper critically examines current methodologies for evaluating models in
Conditional and Average Treatment Effect (CATE/ATE) estimation, identifying
several key pitfalls in existing practices. The current approach of
over-reliance on specific metrics and empirical means and lack of statistical
tests necessitates a more rigorous evaluation approach. We propose an automated
algorithm for selecting appropriate statistical tests, addressing the
trade-offs and assumptions inherent in these tests. Additionally, we emphasize
the importance of reporting empirical standard deviations alongside performance
metrics and advocate for using Squared Error for Coverage (SEC) and Absolute
Error for Coverage (AEC) metrics and empirical histograms of the coverage
results as supplementary metrics. These enhancements provide a more
comprehensive understanding of model performance in heterogeneous
data-generating processes (DGPs). The practical implications are demonstrated
through two examples, showcasing the benefits of these methodological
improvements, which can significantly improve the robustness and accuracy of
future research in statistical models for CATE and ATE estimation. |
---|---|
DOI: | 10.48550/arxiv.2409.05161 |