Bilinear CNN Models for Fine-grained Visual Recognition

Fine-grained Visual Recognition (FVGR)

FVGR is a classification task where intra category visual differences are small and can be overwhelmed by factors such as pose, viewpoint, or location of the object in the image. For instance, the following image shows a California gull (left) and a Ringed-beak gull (Right). The beak pattern difference is the key for a correct classification. Such a difference is tiny when compared to the intra-category variations like pose and illumination.

  1. FV-SIFT: Image SIFT features pooled using Fisher Vector descriptor
  2. FC-CNN[M]: M-Net CNN feature extractor followed by Fully Connected layer descriptor
  3. FC-CNN[D]: VGG-Net (Deep) CNN feature extractor followed by Fully Connected layer descriptor
  4. FV-CNN[D]: VGG-Net (Deep) CNN feature extractor followed by Fisher Vector descriptor
  5. B-CNN [D-M]: Bilinear Model with VGG(Deep) and M-Net CNN feature extractors followed by summation pooling
  • The paper is well-organized, mathematically formulation is simple to understand given the required background
  • Multiple interesting concepts are introduced
  • I particularly like the detailed experiments results analysis and the in-depth comparison with previous work even when the benchmark is different — compare methods that use/don’t use image part annotations.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Ahmed Taha

Ahmed Taha

I write reviews on computer vision papers. Writing tips are welcomed.