Top-[Formula Omitted] Rank Aggregation From [Formula Omitted]-Wise Comparisons

Suppose one aims to identify only the top-[Formula Omitted] among a large collection of [Formula Omitted] items provided [Formula Omitted]-wise comparison data, where a set of [Formula Omitted] items in each data sample are ranked in order of individual preference. Natural questions that arise are a...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE journal of selected topics in signal processing 2018-01, Vol.12 (5), p.989
Hauptverfasser:	Jang, Minje, Kim, Sunghyun, Suh, Changho
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Complexity Datasets Lower bounds Ranking Robustness (mathematics) Spectral methods Statistical analysis Statistical methods Statistical models
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Suppose one aims to identify only the top-[Formula Omitted] among a large collection of [Formula Omitted] items provided [Formula Omitted]-wise comparison data, where a set of [Formula Omitted] items in each data sample are ranked in order of individual preference. Natural questions that arise are as follows: 1) how one can reliably achieve the top- [Formula Omitted] rank aggregation task; and 2) how many [Formula Omitted]-wise samples one needs to achieve it. In this paper, we answer these two questions. First, we devise an algorithm that effectively converts [Formula Omitted]-wise samples into pairwise ones and employs a spectral method using the refined data. Second, we consider the Plackett–Luce (PL) model, a well-established statistical model, and characterize the minimal number of [Formula Omitted]-wise samples (i.e., sample complexity) required for reliable top-[Formula Omitted] ranking. It turns out to be inversely proportional to [Formula Omitted]. To characterize it, we derive a lower bound on the sample complexity and prove that our algorithm achieves the bound. Moreover, we conduct extensive numerical experiments to demonstrate that our algorithm not only attains the fundamental limit under the PL model but also provides robust ranking performance for a variety of applications that may not precisely fit the model. We corroborate our theoretical result using synthetic datasets, confirming that the sample complexity decreases at the rate of [Formula Omitted]. Also, we verify the applicability of our algorithm in practice, showing that it performs well on various real-world datasets.
ISSN:	1932-4553 1941-0484
DOI:	10.1109/JSTSP.2018.2834864