No Fuss Distance Metric Learning using Proxies

N=9 objects, in circles, from three distinct classes. P=3 stars represent learned proxies. Sample anchor (pink circle outlined in cyan) is assigned to positive proxy (pink star) and negative proxies (blue/green stars)
Proxy Loss
The margin-based triplet loss and neighborhood component analysis Surrogates for H function.
Normalized Mutual Information equation
Recall@1 as a function of training step on the Cars196 dataset. Proxy-NCA converges about three times as fast compared with the baseline methods, and results in higher Recall@1 values
Retrieval and Clustering Performance on the Cars196 dataset. Bold indicates best results
Retrieval and Clustering Performance on the CUB200 dataset.
Recall@1 results on the Stanford Product Dataset. Proxy-NCA has a 6% point gap with previous SOTA.
  • Learning proxies to relax hard mining complexity is the main contribution. It is a valuable and intuitive idea. It speeds DML convergence and achieves state-of-the-art results.
  • Imbalance datasets are disregarded in the paper, in most DML papers to be fair. Learned proxies, for imbalance data, can be biased towards majority classes, i.e. A majority class gets multiple proxies while multiple minority classes get merged into a single proxy.
  • The proxy loss is similar to the center loss. Magnet loss, a generalization of center loss, is mentioned in the related work but not center loss. I wish the proxy loss was compared to the center loss. Maybe it is omitted because center loss is used in conjunction with softmax loss, i.e. for classification purpose.
  • I am not sure how to extend this approach to person reidentification or face identification/verification problems. There are no *classes* in these problems, there are a lot of identities, the market person-reidentification dataset has 1501 identifies. Do we need to learn P=1501 person identities? how that extrapolates to test/unseen person identifies?
  • From my previous experience, using center loss for person reidentification learns an inferior embedding. So I am pessimistic about the proxy loss in similar problems.
  • Minor issue: I found the mathematical formulation in the paper a bit confusing and sometimes dispensable. I think appendix and supplementary material are better places to emphasize details like proxy loss bounds.
  • Minor technical issue: None official datasets splits (train/test) are utilized in the experiment sections. This makes comparative evaluation against other published results, like classification accuracies, challenging.




I write reviews on computer vision papers. Writing tips are welcomed.

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Concept Drift Can Ruin Your Model Performance and How to Address it

Car Brand Classification

Creating a Computer Vision API in 60 minutes

Don’t Use K-fold Validation for Time Series Forecasting

How generative adversarial networks (GANs) work ?

Confusion Matrix and Class Statistics

Preprocessing in Machine Learning

Estimating Street Safeness after an Earthquake with Deep Learning

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Ahmed Taha

Ahmed Taha

I write reviews on computer vision papers. Writing tips are welcomed.

More from Medium

How To Use Mask to Select Specific Tensor in PyTorch?

Triplet Loss on ImageNet Dataset

All you need to know about Transfer Learning

Pytorch nifti image data loader