A Generic Visualization Approach for Convolutional Neural Networks

An overview of the L2-CAF visualization approach.
L2-CAF mathematical formula. Using gradient descent, optimize the filter f that minimizes the constrained objective L.
The class-specific L2-CAF formula. The constraint objective L maximizes the logit for a particular class while minimizing the logits for all other classes.
Qualitative evaluation for alternative attention constraints. A softmax constraint offers a sparse attention map, while a Gaussian constraint assumes a single-mode (object). The L2-Norm constraint supports multi-mode (objects).
L2-CAF converging in slow motion. L2-CAF supports both classification and feature embedding networks. It can visualize multiple distinct objects with a single input image.
  • [S1] The paper is well-written and the code is released.
  • [S2] L2-CAF enables attention visualization for more complex inputs like 3D images and videos; these areas are rarely explored in visualization literature that focuses mostly on ImageNet and CUB-200.
  • [W1] The paper focuses on quantitative evaluation and provides limited qualitative evaluation. This is one of the weaknesses of the paper because it is a visualization paper and qualitative evaluation can highlight the corner cases of L2-CAF.
  • [W2] Both L2-CAF and CAM reports qualitative, but no quantitative, evaluation for videos. I am not sure, but there is probably a video dataset with object localization that can be used for quantitative evaluation.
  • [W3] L2-CAF is an iterative approach because it uses gradient descent. Thus, Grad-CAM is 7 times faster than L2-CAF on GoogLeNet. Yet, L2-CAF takes 0.3 seconds on GoogLeNet.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Ahmed Taha

Ahmed Taha

I write reviews on computer vision papers. Writing tips are welcomed.