A Real time Action Prediction Framework by Encoding Temporal Evolution for Assembly Tasks

  • Dataset partitioning is usually 60%, 10% and 30% for training, validation and testing. In this paper, the partitioning is weird, like 40 (90%) videos for training, 2 validation (5%),2 testing (5%)
  • Background pixels cause a huge bias while training. This was addressed by adding white noise which improved accuracy. I am not sure how white noise parameters are tuned if any. The ratio of background to foreground pixels can improve accuracy by assigning different error weights for background or foreground pixels. Randomly sampling background pixel and assigning high error weights is another approach.
  • Baseline felt weak especially for action prediction. IKEA videos contain five actions only. A baseline predictor, predicting same action as previous dynamic image, achieves around 75%. This means switching between actions is relatively small in these videos. So future prediction, across the whole video, is weak metric. I wish accuracy was normalized by the amount of action switching in the video. Not sure about this point but normalization would have been helpful in evaluation.

--

--

--

I write reviews on computer vision papers. Writing tips are welcomed.

Love podcasts or audiobooks? Learn on the go with our new app.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Ahmed Taha

Ahmed Taha

I write reviews on computer vision papers. Writing tips are welcomed.

More from Medium

Model Interpretation using GradCAM

Introduction to PyTorch with Tutorial

Gradient Descent Optimization Algorithms

Deep Learning for Sign Language Production and Recognition