Metric Embedding Learning on Multi-Directional Projections

Image based instance recognition is a difficult problem, in some cases even for the human eye. While latest developments in computer vision—mostly driven by deep learning—have shown that high performance models for classification or categorization can be engineered, the problem of discriminating sim...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Algorithms 2020-06, Vol.13 (6), p.133
1. Verfasser:	Kertész, Gábor
Format:	Artikel
Sprache:	eng
Schlagworte:	Classification Computer vision deep metric learning Feature extraction Machine learning Mathematical models multi-directional image projections Multilayers Neural networks object matching object re-identification Object recognition one-shot learning Parameters Principal components analysis
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Image based instance recognition is a difficult problem, in some cases even for the human eye. While latest developments in computer vision—mostly driven by deep learning—have shown that high performance models for classification or categorization can be engineered, the problem of discriminating similar objects with a low number of samples remain challenging. Advances from multi-class classification are applied for object matching problems, as the feature extraction techniques are the same; nature-inspired multi-layered convolutional nets learn the representations, and the output of such a model maps them to a multidimensional encoding space. A metric based loss brings same instance embeddings close to each other. While these solutions achieve high classification performance, low efficiency is caused by memory cost of high parameter number, which is in a relationship with input image size. Upon shrinking the input, the model requires less trainable parameters, while performance decreases. This drawback is tackled by using compressed feature extraction, e.g., projections. In this paper, a multi-directional image projection transformation with fixed vector lengths (MDIPFL) is applied for one-shot recognition tasks, trained on Siamese and Triplet architectures. Results show, that MDIPFL based approach achieves decent performance, despite of the significantly lower number of parameters.
ISSN:	1999-4893 1999-4893
DOI:	10.3390/a13060133