No Fuss Distance Metric Learning using Proxies

N=9 objects, in circles, from three distinct classes. P=3 stars represent learned proxies. Sample anchor (pink circle outlined in cyan) is assigned to positive proxy (pink star) and negative proxies (blue/green stars)
Proxy Loss
The margin-based triplet loss and neighborhood component analysis Surrogates for H function.
Normalized Mutual Information equation
Recall@1 as a function of training step on the Cars196 dataset. Proxy-NCA converges about three times as fast compared with the baseline methods, and results in higher Recall@1 values
Retrieval and Clustering Performance on the Cars196 dataset. Bold indicates best results
Retrieval and Clustering Performance on the CUB200 dataset.
Recall@1 results on the Stanford Product Dataset. Proxy-NCA has a 6% point gap with previous SOTA.
  • Learning proxies to relax hard mining complexity is the main contribution. It is a valuable and intuitive idea. It speeds DML convergence and achieves state-of-the-art results.
  • Imbalance datasets are disregarded in the paper, in most DML papers to be fair. Learned proxies, for imbalance data, can be biased towards majority classes, i.e. A majority class gets multiple proxies while multiple minority classes get merged into a single proxy.
  • The proxy loss is similar to the center loss. Magnet loss, a generalization of center loss, is mentioned in the related work but not center loss. I wish the proxy loss was compared to the center loss. Maybe it is omitted because center loss is used in conjunction with softmax loss, i.e. for classification purpose.
  • I am not sure how to extend this approach to person reidentification or face identification/verification problems. There are no *classes* in these problems, there are a lot of identities, the market person-reidentification dataset has 1501 identifies. Do we need to learn P=1501 person identities? how that extrapolates to test/unseen person identifies?
  • From my previous experience, using center loss for person reidentification learns an inferior embedding. So I am pessimistic about the proxy loss in similar problems.
  • Minor issue: I found the mathematical formulation in the paper a bit confusing and sometimes dispensable. I think appendix and supplementary material are better places to emphasize details like proxy loss bounds.
  • Minor technical issue: None official datasets splits (train/test) are utilized in the experiment sections. This makes comparative evaluation against other published results, like classification accuracies, challenging.



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Ahmed Taha

Ahmed Taha

I write reviews on computer vision papers. Writing tips are welcomed.