Statistical comparisons of non-deterministic IR systems using two dimensional variance

•We propose methods to compare non-deterministic IR systems.•We show pitfalls in using standard significance tests to compare such systems.•We verify the applicability of proposed methods using simulations and a case study.•We show how to compare a non-deterministic IR system for equivalent effectiv...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Information processing & management 2015-09, Vol.51 (5), p.677-694
Hauptverfasser: Jayasinghe, Gaya K., Webber, William, Sanderson, Mark, Dharmasena, Lasitha S., Culpepper, J. Shane
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:•We propose methods to compare non-deterministic IR systems.•We show pitfalls in using standard significance tests to compare such systems.•We verify the applicability of proposed methods using simulations and a case study.•We show how to compare a non-deterministic IR system for equivalent effectiveness. Retrieval systems with non-deterministic output are widely used in information retrieval. Common examples include sampling, approximation algorithms, or interactive user input. The effectiveness of such systems differs not just for different topics, but also for different instances of the system. The inherent variance presents a dilemma – What is the best way to measure the effectiveness of a non-deterministic IR system? Existing approaches to IR evaluation do not consider this problem, or the potential impact on statistical significance. In this paper, we explore how such variance can affect system comparisons, and propose an evaluation framework and methodologies capable of doing this comparison. Using the context of distributed information retrieval as a case study for our investigation, we show that the approaches provide a consistent and reliable methodology to compare the effectiveness of a non-deterministic system with a deterministic or another non-deterministic system. In addition, we present a statistical best-practice that can be used to safely show how a non-deterministic IR system has equivalent effectiveness to another IR system, and how to avoid the common pitfall of misusing a lack of significance as a proof that two systems have equivalent effectiveness.
ISSN:0306-4573
1873-5371
DOI:10.1016/j.ipm.2015.06.005