Logs with Zeros? Some Problems and Solutions

Abstract When studying an outcome Y that is weakly positive but can equal zero (e.g., earnings), researchers frequently estimate an average treatment effect (ATE) for a “log-like” transformation that behaves like log (Y) for large Y but is defined at zero (e.g., log (1 + Y), $\operatorname{arcsinh}(...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:The Quarterly journal of economics 2024-05, Vol.139 (2), p.891-936
Hauptverfasser: Chen, Jiafeng, Roth, Jonathan
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Abstract When studying an outcome Y that is weakly positive but can equal zero (e.g., earnings), researchers frequently estimate an average treatment effect (ATE) for a “log-like” transformation that behaves like log (Y) for large Y but is defined at zero (e.g., log (1 + Y), $\operatorname{arcsinh}(Y)$). We argue that ATEs for log-like transformations should not be interpreted as approximating percentage effects, since unlike a percentage, they depend on the units of the outcome. In fact, we show that if the treatment affects the extensive margin, one can obtain a treatment effect of any magnitude simply by rescaling the units of Y before taking the log-like transformation. This arbitrary unit dependence arises because an individual-level percentage effect is not well-defined for individuals whose outcome changes from zero to nonzero when receiving treatment, and the units of the outcome implicitly determine how much weight the ATE for a log-like transformation places on the extensive margin. We further establish a trilemma: when the outcome can equal zero, there is no treatment effect parameter that is an average of individual-level treatment effects, unit invariant, and point identified. We discuss several alternative approaches that may be sensible in settings with an intensive and extensive margin, including (i) expressing the ATE in levels as a percentage (e.g., using Poisson regression), (ii) explicitly calibrating the value placed on the intensive and extensive margins, and (iii) estimating separate effects for the two margins (e.g., using Lee bounds). We illustrate these approaches in three empirical applications.
ISSN:0033-5533
1531-4650
DOI:10.1093/qje/qjad054