Knowledge Evolution in Neural Networks

The optimization manifold for a deep network has many local minima. With a small dataset, the deep network is likely to fall into an inferior local minimum. Within a local minimum, the gradient is zero. Accordingly, gradient descent cannot get the network out. Knowledge evolution (KE) splits the network into two parts and re-initializes one part randomly. Hopefully, this gets the network out of the inferior local minimum. Then, KE resumes training — using gradient descent — to evolve the knowledge inside the network.
KE splits a deep network into two parts: fit-hypothesis (blue) and reset-hypothesis (gray). KE re-initializes the reset-hypothesis when the network overfits, i.e., enters an inferior local minimum. After re-initialization, the network resumes training using gradient descent. Gradient descent evolves the knowledge inside the fit-hypothesis.
After training a network for e epochs, KE reinitializes the reset-hypothesis. Then, KE trains the next generations iteratively.
Classification performance on Flower-102 (FLW) and CUB-200 (CUB) datasets trained on a randomly initialized ResNet18. The horizontal dashed lines denote a SOTA cross-entropy (CE) baseline [2]. The marked curves show KE’s performance across generations.
Quantitative classification evaluation using CUB-200 on VGG11-bn. The x-axis denotes the number of generations. The fit-hypothesis (blue dotted curve) achieves an inferior performance at g = 1, but its performance increases as the number of generations increases.
Quantitative evaluation for KE using the number of both operations (G-Ops) and parameters (millions). Acc_g denotes the classification accuracy at the g-th generation. \triangle_ops denotes the relative reduction in the number of operations. \triangle_acc denotes the absolute accuracy improvement on top of the dense baseline N1.

--

--

--

I write reviews on computer vision papers. Writing tips are welcomed.

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Converting an Anime Image to a Sketch

ResNet for Image Classification.

A Primer on the Most Important Machine Learning Methods

Arcgis Interpolation Barriers To Critical Thinking

Conceptualizing the Knowledge Graph Construction Pipeline

How to scrape the ImageNet

Implement Machine Learning Algorithm From Scratch

Understanding Transfer Learning for Medical Imaging

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Ahmed Taha

Ahmed Taha

I write reviews on computer vision papers. Writing tips are welcomed.

More from Medium

Ch 8. Adversarial Discriminative Domain Adaptation (ADDA): Quest for Semantic Alignment

Sharpness-Aware Minimization

AnimeGAN: a GAN for style transfer

A Simple and Intuitive Explanation of StackGAN.