Retrieval with Deep Learning: A Ranking-Losses Survey Part 2

This is the second part of a ranking-losses survey. The first part covers the contrastive and triplet losses. In this part, N-pairs and angular losses are presented.

[3] N-Pairs Loss

Both contrastive and triplet losses leverage the Euclidean distance to quantify the similarity between points. In addition, every anchor, in the training mini-batch, is paired with a single negative example. The N-Pairs loss changes these two assumptions. First, It uses the cosine similarity to quantify the similarity between points. Thus, The N-pairs loss compares embeddings using the angle between two vectors, and not the norm. This is not dramatic, so its formulation remains similar to the triplet loss for a single triplet (a,p,n) as follows

N-pairs formulation using cosine similarity for a single triplet (a,p,n). The cosine similarity equals 1 for overlapping (identical) embeddings and 0 for opposite (different) embeddings.

However, the core idea of N-pairs loss is pairing every anchor with a single positive and every negative in the batch as follows

N-pairs formulation pairs every anchor f_a with a single positive f_p and every negative f_n in the mini-batch.

For N-pairs, a training batch contains a single positive pair from each class. Thus, a mini-batch, of size B, will have B//2 positive pairs and every anchor is paired with (B-2) negatives as shown in the next figure.

(Left) Triplet loss pairs every anchor (f) with a single positive (f+) and negative (f-) sample. (Right) N-pairs pairs the anchor with a single positive (f+) and all negatives in the mini-batch.

The N-pairs’ intuition is to leverage all negatives within a batch to guide the gradient update which speeds convergence.

The N-pairs’ loss is generally superior to triplet loss but with few caveats. The training mini-batch size is upper bounded by the number of training classes because only a single positive pair is allowed per class. In contrast, the triplet loss and contrastive loss mini-batches’ size are only limited by the GPU’s memory. In addition, the N-pairs loss learns an un-normalized embedding. This has two consequences: (1) margin between different classes is defined using an angle theta, (2) to avoid a degenerate embedding that grows to infinity, a regularizer, to constrain the embedding space, is required.

[4] Angular Loss

Angular loss tackles two limitations in triplet loss. First, triplet loss assumes a fixed margin m between different classes. A fixed margin is undesirable because different classes have different intra-class variations as shown in the next figure

Example of feature embedding computed by t-SNE for the Stanford car dataset, where the images of Ford Ranger SuperCab (right) have a more diverse distribution than Volvo C30 Hatchback (left). Conventional triplet loss has difficulty in dealing with such an unbalanced intra-class variation. The proposed angular loss addresses this issue by minimizing the scale-invariant angle at the negative point.

The second limitation is how triplet loss formulates the gradient of the negative point. The next figure shows why the direction for the negative gradient may not be optimal, i.e., no guarantees for moving away from the positive class’s center.

The negative point x_n gradient may not be optimal without the guarantee of moving away from the class which both the anchor x_a and positive sample x_p belong to; it is not perpendicular to line [x_a,x_p].

To tackle both limitations, the authors propose to use the angle at n instead of the margin m and correct the gradient at the negative point x_n. Instead of pushing points based on distance, the goal is to minimize the angle at n, i.e., make the triangle a-n-b pointy at n. The next figure illustrates how the angular loss formulation pushes the negative point x_n away from xc, the center of the local cluster defined by x_a and x_p. In addition, the anchor x_a and the positive x_p are dragged towards each other.

Illustration of the angular constraint using a synthetic triplet. (a) Directly minimizing ∠n is unstable as it would drag x_n closer to x_a. (b) The more stable is minimizing ∠n′ defined by re-constructing the triangle [x_m,x_n,x_c] where x_c is the center between x_a,x_p.

Compared to the original triplet loss whose gradients only depend on two points (e.g., grad = x_a - x_n), the angular loss gradients are much more robust as they consider all three points simultaneously. Also, please note that compared to distance-based metrics, manipulating the angle n’ is not only rotation-invariant but also scale-invariant by nature.

My Comments:

N-pairs and Angular loss are generally superior to vanilla Triplet loss. However, there are important parameters to consider when comparing these approaches.

  • The sampling strategy used to train triplet loss can lead to significant performance differences. Hard Mining is efficient and converges faster if a model collapse is avoided.
  • The nature of the training dataset is another important factor. When working with person re-identification or face clustering, we can assume that each class is represented by a single cluster, i.e., a single-mode with small intra-class variations. Yet, some retrieval datasets like CUB-200–2011 and Stanford Online Products have a lot of intra-class variations. Empirically, hard triplet loss works better for person/face re-identification tasks while N-pairs and Angular losses work better on CUB-200 and Stanford Online Product datasets.
  • When approaching a new retrieval task and tuning the hyper-parameters (learning rate and batch_size) for a new training dataset, I found semi-hard triplet loss to be the most stable. It does not achieve the best performance but it is the least likely to degenerate.

If there are other ranking losses that are worth mentioning, please leave it in the comments. If the list gets big, I will write a followup.

Resources

[1] Dimensionality Reduction by Learning an Invariant Mapping

[2] Deep Metric Learning using triplet network

[3] Improved Deep Metric Learning with Multi-class N-pair Loss Objective

[4] Deep Metric Learning with Angular Loss

[5] Tensorflow retrieval baseline

I write reviews on computer vision papers. Writing tips are welcomed.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store