Open in app

Sign In

Write

Sign In

Ahmed Taha
Ahmed Taha

384 Followers

Home

About

Pinned

L2-CAF: A Neural Network Debugger

Every software engineer has used a debugger to debug his code. Yet, a neural network debugger… That’s news! This paper [1] proposes a debugger to debug and visualize attention in convolutional neural networks (CNNs). Before describing the CNN debugger, I want to highlight few attributes of program debuggers (e.g., gdb)…

Machine Learning

5 min read

L2-CAF: A Neural Network Debugger
L2-CAF: A Neural Network Debugger
Machine Learning

5 min read


May 9

High Resolution Images and Efficient Transformers

ResNet and ViT models achieve competitive performance, but they are not the best. For instance, DenseNets achieve superior performance to ResNets. Yet, DenseNets are less popular than ResNet. So why are ResNets and ViT models so popular in literature? One should not attribute their success to a single factor. Yet…

High Resolution Images

5 min read

High Resolution Images and Efficient Transformers
High Resolution Images and Efficient Transformers
High Resolution Images

5 min read


Mar 27

Masked Autoencoders Are Scalable Vision Learners

Annotated data is a vital pillar of deep learning. Yet, annotated data is rare in certain applications (e.g., medical and robotics). To reduce the number of annotations, self-supervised learning aims to pre-train deep networks on unannotated data to learn useful representations. Different self-supervised learning approaches propose different objectives to train…

Deep Learning

7 min read

Masked Autoencoders Are Scalable Vision Learners
Masked Autoencoders Are Scalable Vision Learners
Deep Learning

7 min read


Feb 14

Rethinking Attention with Performers — Part II & Final

This article’s objective is to summarize the Performers [1] paper. The article highlights key details and documents some personal comments at the end. A previous article presents a hand-wavy understanding of Performers using a hashing analogy. Vanilla Transformers leverage self-attention layers defined as follows This formula has quadratic space complexity…

Transformers

7 min read

Rethinking Attention with Performers — Part II & Final
Rethinking Attention with Performers — Part II & Final
Transformers

7 min read


Oct 11, 2022

Rethinking Attention with Performers — Part I

This article’s objective is to present a hand-wavy understanding of how Performers [1] work. Transformers dominate the deep-learning literature in 2022. Unfortunately, Transformers suffer quadratic complexity in the self-attention layer. This has hindered transformers for long-input signals, i.e., large sequence L. Large sequences are not critical in NLP applications since…

Machine Learning

7 min read

Rethinking Attention with Performers — Part I
Rethinking Attention with Performers — Part I
Machine Learning

7 min read


May 16, 2022

Understanding the Effective Receptive Field in Deep Convolutional Neural Networks

In deep networks, a receptive field — or field of view — is the region in the input space that affects the features of a particular layer as shown in Fig.1. The receptive field is important for understanding and diagnosing a network’s performance. …

Machine Learning

5 min read

Understanding the Effective Receptive Field in Deep Convolutional Neural Networks
Understanding the Effective Receptive Field in Deep Convolutional Neural Networks
Machine Learning

5 min read


Apr 4, 2022

Understanding Transfer Learning for Medical Imaging

Transfer learning (a.k.a. ImageNet pre-training) is a common practice in deep learning where a pre-trained network is fine-tuned on a new dataset/task. This practice is implicitly justified by feature-reuse where features learned from ImageNet are beneficial to other datasets/tasks. This paper [1] evaluates this justification on medical images datasets. The…

Deep Learning

6 min read

Understanding Transfer Learning for Medical Imaging
Understanding Transfer Learning for Medical Imaging
Deep Learning

6 min read


Jan 24, 2022

Sharpness-Aware Minimization for Efficiently Improving Generalization

For training a deep network, picking the right optimizer has become an important design choice. Standard optimizers (e.g., SGD, Adam, etc.) seek a minimum on the loss curve. This minimum is sought without regard for the curvature, i.e., the 2nd degree derivative of the loss curve. A curvature denotes the…

Optimization

4 min read

Sharpness-Aware Minimization for Efficiently Improving Generalization
Sharpness-Aware Minimization for Efficiently Improving Generalization
Optimization

4 min read


Dec 27, 2021

Feature Embedding Regularizers: SVMax & VICReg

What is more important, a deep network weights or its activations? Obviously, we can derive the network’s activation from its weights. Yet, deep networks are non-linear embedding functions; we want this non-linear embedding only. On top of this embedding, we either slap a linear classifier in a classification network or…

Deep Learning

7 min read

Feature Embedding Regularizers: SVMax & VICReg
Feature Embedding Regularizers: SVMax & VICReg
Deep Learning

7 min read


Nov 3, 2021

IIRC: Incremental Implicitly-Refined Classification

While training a deep network on multiple tasks jointly is easy, training on multiple tasks sequentially is challenging. This challenge is addressed in various literature: Lifelong Learning, Incremental Learning, Continual Learning, and Never-Ending Learning. Within these forms, the common problem is catastrophic forgetting, i.e., the network forgets older tasks. To…

Computer Vision

6 min read

IIRC: Incremental Implicitly-Refined Classification
IIRC: Incremental Implicitly-Refined Classification
Computer Vision

6 min read

Ahmed Taha

Ahmed Taha

384 Followers

I write reviews on computer vision papers.

Help

Status

Writers

Blog

Careers

Privacy

Terms

About

Text to speech