Leveraging Unlabeled Data for Crowd Counting by Learning to Rank

Example of crowd image
Training ranking and counting simultaneously. A single mini-batch applies counting loss on images with count ground-truth. Ranking loss is applied to images without count ground-truth.
Crowd density maps generated by counting network pipeline
Estimate average person per pixel using average pooling over the crowd density maps
Loss function combines both counting and ranking terms
Table 2 shows different training paradigms evaluations
Quantitative evaluation on two different crowd counting labeled datasets — UCF CC 50 dataset and ShanghaiTech dataset
Qualitative results in terms of crowd density maps
  1. The paper is well organized and easy to understand. Minor style issue in table no.4, inconsistent text size with the rest of the tables.
  2. The author argues that image ranking is a “poorly-defined nature self-supervised task”. The network could decide to count anything, e.g. ‘hats’, ‘trees’, or ‘people’, all of which would agree with the ranking constraints. I agree; I think tiny faces detection would be a better complementary task. The recent interest in tiny face detection supplies labeled datasets, e.g. WIDER FACE dataset, that can be leveraged.
  3. The author states that using detection approaches for counting would fail in extremely dense crowded scenes due to occlusion and low resolution of persons. I wish a quantitative evidence was provided to support such a claim.




I write reviews on computer vision papers. Writing tips are welcomed.

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

What is Shadow Article?

What is Image Compression?

BTCUSD 28/2/2022 QML spotted. Watch out. (Updated 2/3/2022 ✅)

An overview of Data Augmentation Without Formulas

What, When and Why Feature Scaling for Machine Learning

Feature Scaling with Machine Learning. Data Normalization and Standardization of Data. Working with Machine Learning

Using NLP and LSTM to combat cyberbullying

Research Papers On Object Recognition

Speech Emotion Recognition Using RAVDESS Audio Dataset

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Ahmed Taha

Ahmed Taha

I write reviews on computer vision papers. Writing tips are welcomed.

More from Medium

Self-supervised learning in depth (part 1)

Advances in Yan LeCun’s cake génoise

Pooling layers in Neural nets and their variants

Crowdsourcing — A Step Towards Advanced Deep Learning

Computer Vision: How to tackle the problem of class imbalance in image datasets?