Sequential Community Mode Estimation
We consider a population, partitioned into a set of communities, and study the problem of identifying the largest community within the population via sequential, random sampling of individuals. There are multiple sampling domains, referred to as \emph{boxes}, which also partition the population. Eac...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | We consider a population, partitioned into a set of communities, and study
the problem of identifying the largest community within the population via
sequential, random sampling of individuals. There are multiple sampling
domains, referred to as \emph{boxes}, which also partition the population. Each
box may consist of individuals of different communities, and each community may
in turn be spread across multiple boxes. The learning agent can, at any time,
sample (with replacement) a random individual from any chosen box; when this is
done, the agent learns the community the sampled individual belongs to, and
also whether or not this individual has been sampled before. The goal of the
agent is to minimize the probability of mis-identifying the largest community
in a \emph{fixed budget} setting, by optimizing both the sampling strategy as
well as the decision rule. We propose and analyse novel algorithms for this
problem, and also establish information theoretic lower bounds on the
probability of error under any algorithm. In several cases of interest, the
exponential decay rates of the probability of error under our algorithms are
shown to be optimal up to constant factors. The proposed algorithms are further
validated via simulations on real-world datasets. |
---|---|
DOI: | 10.48550/arxiv.2111.08535 |