Retrieval with Deep Learning: A Ranking-Losses Survey Part 2

N-pairs formulation using cosine similarity for a single triplet (a,p,n). The cosine similarity equals 1 for overlapping (identical) embeddings and 0 for opposite (different) embeddings.
N-pairs formulation pairs every anchor f_a with a single positive f_p and every negative f_n in the mini-batch.
(Left) Triplet loss pairs every anchor (f) with a single positive (f+) and negative (f-) sample. (Right) N-pairs pairs the anchor with a single positive (f+) and all negatives in the mini-batch.
Example of feature embedding computed by t-SNE for the Stanford car dataset, where the images of Ford Ranger SuperCab (right) have a more diverse distribution than Volvo C30 Hatchback (left). Conventional triplet loss has difficulty in dealing with such an unbalanced intra-class variation. The proposed angular loss addresses this issue by minimizing the scale-invariant angle at the negative point.
The negative point x_n gradient may not be optimal without the guarantee of moving away from the class which both the anchor x_a and positive sample x_p belong to; it is not perpendicular to line [x_a,x_p].
Illustration of the angular constraint using a synthetic triplet. (a) Directly minimizing ∠n is unstable as it would drag x_n closer to x_a. (b) The more stable is minimizing ∠n′ defined by re-constructing the triangle [x_m,x_n,x_c] where x_c is the center between x_a,x_p.
  • The sampling strategy used to train triplet loss can lead to significant performance differences. Hard Mining is efficient and converges faster if a model collapse is avoided.
  • The nature of the training dataset is another important factor. When working with person re-identification or face clustering, we can assume that each class is represented by a single cluster, i.e., a single-mode with small intra-class variations. Yet, some retrieval datasets like CUB-200–2011 and Stanford Online Products have a lot of intra-class variations. Empirically, hard triplet loss works better for person/face re-identification tasks while N-pairs and Angular losses work better on CUB-200 and Stanford Online Product datasets.
  • When approaching a new retrieval task and tuning the hyper-parameters (learning rate and batch_size) for a new training dataset, I found semi-hard triplet loss to be the most stable. It does not achieve the best performance but it is the least likely to degenerate.

--

--

--

I write reviews on computer vision papers. Writing tips are welcomed.

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

A Gentle Introduction to Linear Regression

A Practitioner’s Guide to Similarity Scoring, Part 2: The n² problem

Fixed Feature Extractor as the Transfer Learning Method for Image Classification Using MobileNet

The hidden values of MLOps — An article series to support your Machine Learning Operations

Neural Networks Part - 3

GPT-J-6B: An Introduction to the Largest Open Source GPT Model | Forefront

Deep Learning in 5 minutes Part 2: Recurrent Neural Networks

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Ahmed Taha

Ahmed Taha

I write reviews on computer vision papers. Writing tips are welcomed.

More from Medium

Applications of Deep Learning: Convolutional Neural Network Models In the Healthcare Industry: Part…

GANs in Medical Image Analysis: Part 2

2 Deep reinforcement learning: policy gradients