Proxy Anchor Loss for Deep Metric Learning

Proxy-based losses compute class representatives (stars) for each class. Data samples (circles) are pulled towards and push away from these class representatives (data-to-proxy). Pair-based losses pull similar data samples and push different data samples (data-to-data). The solid-green lines indicate a pull force, while the red-dashed lines indicate a push force.
The pros and cons of both proxy-based and pair-based losses.
Differences between Proxy-NCA and Proxy-Anchor in handling proxies and embedding vectors during training. Each proxy is colored in black and three different colors indicate distinct classes. The associations defined by the losses are expressed by edges, and thicker edges get larger gradients.
Accuracy in Recall@1 versus training time on the Cars- 196 dataset. Note that all methods were trained with a batch size of 150 on a single Titan Xp GPU. Proxy-anchor loss achieves the highest accuracy and converges faster than the baselines in terms of both the number of epochs and the actual training time.
Comparison of training complexities
Recall@K (%) on the CUB-200–2011 and Cars-196 datasets. Superscripts denote embedding sizes and † indicates models using larger input images. Backbone networks of the models are denoted by abbreviations: G–GoogleNet, BN–Inception with batch normalization, R50–ResNet50.
  • The official repository of the paper is available on Github. The code implementation is well-organized and uses the PyTorch metric learning library.
  • The paper is well-written and easy to read.
  • The paper claims that the proxy-anchor loss is robust against noisy labels and outliers. Yet, this claim is neither supported nor rejected by any experiments in the paper.
  • The paper assumes a single proxy (class representative) per class. It is a valid assumption on datasets with small intra-class variations. Yet, large intra-class variations break this assumption. This problem is clear when working with imbalanced datasets. Should a minority class have the same number of class representatives as a majority class?




I write reviews on computer vision papers. Writing tips are welcomed.

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Create an Image classifier without neural nets from scratch — Part 1

Ask-me-anything: How I used Neural Search to ask German politicians the tough questions

Predict figure skating world championship ranking from season performances

Types of Neural Networks -CNN

Three key fails in Machine Learning

6 Jars perspective of Machine learning

Data Science Made Easy: Image Analytics using Orange

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Ahmed Taha

Ahmed Taha

I write reviews on computer vision papers. Writing tips are welcomed.

More from Medium

Light DeepLearning: Binary Neural Network

Breakdown and Utilization of a Convolutional Neural Network

Brief showcase of images in training set

Gradient Editing On The Fly in Deep Neural Networks