Mining on Manifolds: Metric Learning without Labels

Figure 1: A pretrained network embeds images into a manifold (feature space)
Figure 2: Identify hard positive and negative images using the Euclidean and manifold nearest neighbors.
Table 1: Recall@k and NMI on CUB-200–2011. All methods except for Ours and cyclic match [30] use ground-truth labels during training.
Figure 3: Sample CUB-200-2011 anchor images (x^r), positive images from the proposed method (P^+(x^r)) and baseline (NN^e_3(x^r)), and negative images from the proposed method (P^−(x^r)) and baseline (X \ NN^e_3(x^r)). The baseline is Euclidean nearest neighbors and non-neighbors. Positive (negative) ground-truth framed in green (red). Labels are only used for qualitative evaluation and not during training. NN^e_3(x^r) indicates the three euclidean nearest neighbors to the anchor image x^r.
  1. [Strength] The paper is well written and tackles an interesting problem — training a network on unlabeled image collection. The authors released their implementation on Github.
  2. [Strength] The paper proposed a specific approach to identify useful anchor images, i.e., the stationary probability distribution (A). The authors provide an ablation study to evaluate this approach against randomly sampled anchors. The stationary probability distribution (A) approach significantly outperforms random sampling as shown in the next Table.
Impact of choices of anchors and pools of positive and negative examples on Recall@1 on CUB-200–2011 and mAP on Oxford5k. Higher Recall/mAP is better. On CUB, all images are used as anchors, while on Oxford5K anchors are selected either at random or by the stationary probability distribution (A) approach. The positive and negative pools are formed by either the baseline with Euclidean nearest neighbors (NN^e) or the proposed hard-mining selection framework (P+ and P−).

--

--

--

I write reviews on computer vision papers. Writing tips are welcomed.

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Performance Metrics: Confusion matrix, Precision, Recall, and F1 Score

Understanding Logistic Regression!!!

Colab & CocaCola = Heaven

An unforgettable internship on sign language classification

keypoints detected by Openpose

Convolutions in Machine Learning

Transformers…Attention is all you need!

Leveraging Unlabeled Data for Crowd Counting by Learning to Rank

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Ahmed Taha

Ahmed Taha

I write reviews on computer vision papers. Writing tips are welcomed.

More from Medium

Review — Learning classification with Unlabeled Data

All you need to know about Transfer Learning

Applications of Deep Learning: Convolutional Neural Network Models In the Healthcare Industry: Part…

Understanding Transfer Learning: Unlocking the power of pre-trained models