Feature Embedding Regularizers: SVMax & VICReg

Figure 1: During training, a network N generates a feature embedding matrix E ∈ R^{b × d} for a mini-batch of size b.
Figure 2: Feature embeddings scattered over the 2D unit circle. In (a), the features are polarized across a single axis; the singular value of the principal (horizontal) axis is large, while the singular value of the secondary (vertical) axis is small. In (b), the features spread uniformly across both dimensions; both singular values are comparably large.
Figure 3: Vanilla SVMax formulation. L_r is the original loss function before using the SVMax regularizer, while s_μ is the mean singular value to be maximized.
Figure 4: A lower bound on the mean singular value holds when all singular values equal zero except the first — largest — singular value. s^\ast(E) is the value of the largest singular value when all other singular values equal zero.
Figure 5: An upper bound on the mean singular value established using the nuclear norm ||E||_* and the Frobenius Norm ||E||_F.
Figure 6: The mean singular values of four different feature embedding (metric learning) networks. The X and Y-axes denote the mini-batch size b and the s_μ of the feature embedding of CUB-200’s test split. The feature embedding is learned using a contrastive loss with and without SVMax. The horizontal red line denotes the upper bound on s_μ.
Figure 7: Given a feature embedding matrix E ∈ R^{b × d}, VICReg computes a standard deviation vector S with d dimensions. The standard deviation serves as a metric for evaluating the dimension’s activity. A dimension with zero standard deviation is a collapsed dimension.
Figure 8: The variance term in VICReg computes the standard deviation (std) of each d-dimension in the feature embedding Matrix E. Then, VICReg encourages the std to be γ. ϵ is a small scalar preventing numerical instabilities.
Table 1: Quantitative SVMax evaluation using self-supervised learning with an AlexNet backbone. We evaluate the pre-trained network N through ImageNet classification with a linear classifier on top of frozen convolutional layers. For every layer, the convolutional features are spatially resized until there are fewer than 10K dimensions left. A fully connected layer followed by softmax is trained on a 1000-way object classification task.
Table 2: Evaluation of the representations obtained with a ResNet-50 backbone pretrained with VICReg using: (1) linear classification on top of the frozen representations from ImageNet; (2) semi-supervised classification on top of the fine-tuned representations from 1% and 10% of ImageNet samples. We report Top-1 and Top-5 accuracies (in %). Top-3 best self-supervised methods are underlined.
Figure 9: . Quantitative evaluation on Stanford CARS196. X and Y-axis denote the learning rate lr and recall@1 performance, respectively.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store