You probably know someone who’s left a relationship over political differences in the past few years. A conversation got heated over the dinner table. Each side’s viewpoints were intolerable to the…
Residual Neural network on CIFAR10
In my previous posts we have gone through
Residual Network (ResNet) is a Convolutional Neural Network (CNN) architecture which can support hundreds or more convolutional layers. ResNet can add many layers with strong performance, while previous architectures had a drop off in the effectiveness with each additional layer.
ResNet proposed a solution to the “vanishing gradient” problem.
Neural networks train via backpropagation, which relies on gradient descent to find the optimal weights that minimize the loss function. When more layers are added, repeated multiplication of their derivatives eventually makes the gradient infinitesimally small, meaning additional layers won’t improve the performance or can even reduce it.
ResNet solves this using “identity shortcut connections” — layers that initially don’t do anything. In the training process, these identical layers are skipped, reusing the activation functions from the previous layers.
This reduces the network into only a few layers, which speeds learning. When the network trains again, the identical layers expand and help the network explore more of the feature space.
Pre-training lets you leverage transfer learning — once the model has learned many objects, features, and textures on the huge ImageNet dataset, you can apply this learning to your own images and recognition problems.
torchvision.models include the following ResNet implementations: ResNet-18, 34, 50, 101 and 152 (the numbers indicate the numbers of layers in the model), and Densenet-121, 161, 169, and 201.
There are two main types of blocks used in ResNet, depending mainly on whether the input and output dimensions are the same or different.
For example, to reduce the activation dimensions (HxW) by a factor of 2, you can use a 1x1 convolution with a stride of 2.
The figure below shows how residual block look and what is inside these blocks.
Step 1: Prepare data set
Download the dataset and create PyTorch datasets to load the data.
There are a few important changes we’ll make while creating the PyTorch datasets:
Next create data loaders for retrieving images in batches. We’ll use a relatively large batch size of 400 to utilize a larger portion of the GPU RAM. You can try reducing the batch size & restarting the kernel if you face an “out of memory” error.
Step 2 : Using GPU
Step 3: Residual block
Here is a very simple residual block
Here is our resnet architecture , resnet 9
Step 4 :Training the model
Before we train the model, we’re going to make a bunch of small but important improvements to our fit
function:
To train our model instead of SGD (stochastic gradient descent), we’ll use the Adam optimizer which uses techniques like momentum and adaptive learning rates for faster training.
Our model trained to over 90% accuracy in just 4 minutes!
Step 5 : Accuracy plot
Plotting accuracy vs no of epochs
Plotting Loss vs no of epochs.
It’s clear from the trend that our model isn’t over fitting to the training data just yet. Finally, let’s visualize how the learning rate changed over time, batch-by-batch over all the epochs.
Credits & references :
A lump from a blocked milk duct is a common problem while breastfeeding. You may develop a blocked duct for no apparent reason. Or. it may … Most lumps are benign. meaning they are not cancer. But…
To be in full regulatory and legal compliance, the SolarCoin Foundation continuously maintains and updates our operating list of suspended regions and countries. The SolarCoin Foundation always seeks…
During this year I learnt a lot of useful topics and frameworks for the rol of Engineering Manager. I would like to reflect a high overview of the most important ones to me in this article.