A new distance measure for non-identical data with application to image classification

Distance measures are part and parcel of many computer vision algorithms. The underlying assumption in all existing distance measures is that feature elements are independent and identically distributed. However, in real-world settings, data generally originate from heterogeneous sources even if the...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Pattern recognition 2017-03, Vol.63, p.384-396
Hauptverfasser: Swaminathan, Muthukaruppan, Yadav, Pankaj Kumar, Piloto, Obdulio, Sjöblom, Tobias, Cheong, Ian
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Distance measures are part and parcel of many computer vision algorithms. The underlying assumption in all existing distance measures is that feature elements are independent and identically distributed. However, in real-world settings, data generally originate from heterogeneous sources even if they do possess a common data-generating mechanism. Since these sources are not identically distributed by necessity, the assumption of identical distribution is inappropriate. Here, we use statistical analysis to show that feature elements of local image descriptors are indeed non-identically distributed. To test the effect of omitting the unified distribution assumption, we created a new distance measure called the Poisson-Binomial radius (PBR). PBR is a bin-to-bin distance which accounts for the dispersion of bin-to-bin information. PBR's performance was evaluated on twelve benchmark data sets covering six different classification and recognition applications: texture, material, leaf, scene, ear biometrics and category-level image classification. Results from these experiments demonstrate that PBR outperforms state-of-the-art distance measures for most of the data sets and achieves comparable performance on the rest, suggesting that accounting for different distributions in distance measures can improve performance in classification and recognition tasks. •Empirical evidence is provided that real-world data is non-identically distributed.•PBR, the first distance measure to account for non-identical data is proposed.•PBR was tested in 6 test applications using 12 benchmark data sets.•PBR outperforms state-of-the-art measures for most data sets.•Avoiding the identical distribution assumption can improve classification.
ISSN:0031-3203
1873-5142
1873-5142
DOI:10.1016/j.patcog.2016.10.018