Feature Embedding Regularizers: SVMax & VICReg

Figure 1: During training, a network N generates a feature embedding matrix E ∈ R^{b × d} for a mini-batch of size b.
Figure 2: Feature embeddings scattered over the 2D unit circle. In (a), the features are polarized across a single axis; the singular value of the principal (horizontal) axis is large, while the singular value of the secondary (vertical) axis is small. In (b), the features spread uniformly across both dimensions; both singular values are comparably large.
Figure 3: Vanilla SVMax formulation. L_r is the original loss function before using the SVMax regularizer, while s_μ is the mean singular value to be maximized.
Figure 4: A lower bound on the mean singular value holds when all singular values equal zero except the first — largest — singular value. s^\ast(E) is the value of the largest singular value when all other singular values equal zero.
Figure 5: An upper bound on the mean singular value established using the nuclear norm ||E||_* and the Frobenius Norm ||E||_F.
Figure 6: The mean singular values of four different feature embedding (metric learning) networks. The X and Y-axes denote the mini-batch size b and the s_μ of the feature embedding of CUB-200’s test split. The feature embedding is learned using a contrastive loss with and without SVMax. The horizontal red line denotes the upper bound on s_μ.
Figure 7: Given a feature embedding matrix E ∈ R^{b × d}, VICReg computes a standard deviation vector S with d dimensions. The standard deviation serves as a metric for evaluating the dimension’s activity. A dimension with zero standard deviation is a collapsed dimension.
Figure 8: The variance term in VICReg computes the standard deviation (std) of each d-dimension in the feature embedding Matrix E. Then, VICReg encourages the std to be γ. ϵ is a small scalar preventing numerical instabilities.
Table 1: Quantitative SVMax evaluation using self-supervised learning with an AlexNet backbone. We evaluate the pre-trained network N through ImageNet classification with a linear classifier on top of frozen convolutional layers. For every layer, the convolutional features are spatially resized until there are fewer than 10K dimensions left. A fully connected layer followed by softmax is trained on a 1000-way object classification task.
Table 2: Evaluation of the representations obtained with a ResNet-50 backbone pretrained with VICReg using: (1) linear classification on top of the frozen representations from ImageNet; (2) semi-supervised classification on top of the fine-tuned representations from 1% and 10% of ImageNet samples. We report Top-1 and Top-5 accuracies (in %). Top-3 best self-supervised methods are underlined.
Figure 9: . Quantitative evaluation on Stanford CARS196. X and Y-axis denote the learning rate lr and recall@1 performance, respectively.
  • Both SVMax and VICReg are well-written and well-motivated papers. Both are unsupervised and support various network architectures and tasks. Each delivers a ton of experiments that are impossible to be cover in this article. I highly recommend these papers for those interesting in feature embedding literature. PyTorch implementations are available for both SVMax and VICReg.
  • Compared to VICReg, the SVMax paper is easier to read as it focuses on a single idea. In contrast, VICReg presents multiple terms and one of these terms is borrowed from another paper, the Barlow twins paper [4]
  • Compared to SVMax, VICReg delivers a ton of quantitative evaluation on recent benchmarks. FAIR has the GPUs :)
  • Regarding weight-decay vs. feature embedding regularizers, both SVMax and VICReg regularize the output of a single layer. In contrast, weight-decay is always applied to all network weights (layers). Accordingly, I wish a paper evaluates the impact of these feature embedding regularizers when applied on all layers. As mentioned previously, weight-decay had a significant impact in [3] and I wonder if feature regularizers have a similar impact.



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Ahmed Taha

Ahmed Taha

I write reviews on computer vision papers. Writing tips are welcomed.