Close Enough? A Large-Scale Exploration of Non-Experimental Approaches to Advertising Measurement

A large-scale comparison of experimental advertising effects and those obtained using two state-of-the-art methods. Despite their popularity, randomized controlled trials (RCTs) are not always available for the purposes of advertising measurement. Non-experimental data are thus required. However, Fa...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Marketing science (Providence, R.I.) R.I.), 2023-07, Vol.42 (4), p.768-793
1. Verfasser:	Gordon, Brett R
Format:	Artikel
Sprache:	eng
Schlagworte:	Access Advertising campaigns advertising measurement causal inference Clinical trials digital advertising double ML field experiments Heat transfer Machine learning Measurement Measures observational methods Popularity Propensity Social networks
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	A large-scale comparison of experimental advertising effects and those obtained using two state-of-the-art methods. Despite their popularity, randomized controlled trials (RCTs) are not always available for the purposes of advertising measurement. Non-experimental data are thus required. However, Facebook and other ad platforms use complex and evolving processes to select ads for users. Therefore, successful non-experimental approaches need to “undo” this selection. We analyze 663 large-scale experiments at Facebook to investigate whether this is possible with the data typically logged at large ad platforms. With access to over 5,000 user-level features, these data are richer than what most advertisers or their measurement partners can access. We investigate how accurately two non-experimental methods—double/debiased machine learning (DML) and stratified propensity score matching (SPSM)—can recover the experimental effects. Although DML performs better than SPSM, neither method performs well, even using flexible deep learning models to implement the propensity and outcome models. The median RCT lifts are 29%, 18%, and 5% for the upper, middle, and lower funnel outcomes, respectively. Using DML (SPSM), the median lift by funnel is 83% (173%), 58% (176%), and 24% (64%), respectively, indicating significant relative measurement errors. We further characterize the circumstances under which each method performs comparatively better. Overall, despite having access to large-scale experiments and rich user-level data, we are unable to reliably estimate an ad campaign’s causal effect. History: Olivier Toubia served as the senior editor for this article. Funding: To be allowed to access the data required for this paper, B. R. Gordon and F. Zettelmeyer were part-time employees of Facebook with the title of Academic Researchers, employed for three hours per week. R. Moakler is an employee of Meta Platforms, Inc. and owns stock in the company.
ISSN:	0732-2399 1526-548X
DOI:	10.1287/mksc.2022.1413