Fusion of hard and soft information in nonparametric density estimation

•We address how to fuse hard and soft information for probability density estimation.•We formulate the problem as a stochastic optimization model.•We examine convexity, consistency, and asymptotics.•We illustrate the approach in a variety of settings. This paper discusses univariate density estimati...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:European journal of operational research 2015-12, Vol.247 (2), p.532-547
Hauptverfasser: Royset, Johannes O., Wets, Roger J-B
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:•We address how to fuse hard and soft information for probability density estimation.•We formulate the problem as a stochastic optimization model.•We examine convexity, consistency, and asymptotics.•We illustrate the approach in a variety of settings. This paper discusses univariate density estimation in situations when the sample (hard information) is supplemented by “soft” information about the random phenomenon. These situations arise broadly in operations research and management science where practical and computational reasons severely limit the sample size, but problem structure and past experiences could be brought in. In particular, density estimation is needed for generation of input densities to simulation and stochastic optimization models, in analysis of simulation output, and when instantiating probability models. We adopt a constrained maximum likelihood estimator that incorporates any, possibly random, soft information through an arbitrary collection of constraints. We illustrate the breadth of possibilities by discussing soft information about shape, support, continuity, smoothness, slope, location of modes, symmetry, density values, neighborhood of known density, moments, and distribution functions. The maximization takes place over spaces of extended real-valued semicontinuous functions and therefore allows us to consider essentially any conceivable density as well as convenient exponential transformations. The infinite dimensionality of the optimization problem is overcome by approximating splines tailored to these spaces. To facilitate the treatment of small samples, the construction of these splines is decoupled from the sample. We discuss existence and uniqueness of the estimator, examine consistency under increasing hard and soft information, and give rates of convergence. Numerical examples illustrate the value of soft information, the ability to generate a family of diverse densities, and the effect of misspecification of soft information.
ISSN:0377-2217
1872-6860
DOI:10.1016/j.ejor.2015.06.034