Hi Rajesh

I prefer “bilinear is nothing but a fancier name to outer product right” over “bilinear pooling is nothing but a fancier name to outer product right”.

Pooling is the source of orderless, not the bilinear. Please excuse any mathematical slips in the following

Let's say you have two CNNs (A, B). Their outputs have dimensions (WxHxM, WxHxN). The bilinear operation output has a dimension (WxHxMxN). This output is “order-ful”, it preserves spatial information inside the WxH dimension. After pooling across all image’s locations (WxH), it becomes MxN which is finally flattened into MNx1. Pooling is where it becomes orderless.

This pooling idea is actually explored by standard classification architectures, i.e., it is used for general classification problems. Check table 1 in the “Densely Connected Convolutional Networks” paper; notice the last 7x7 global average pooling operation.

I write reviews on computer vision papers. Writing tips are welcomed.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store