AlexNet, VGG, GoogLeNet, ResNet

Assignment1. Understanding the entire system

< Introduction >

AlexNet, VGG, GoogLeNet, ResNet were developed in order. As deeper and complicated, they have evoloved for better performance. In this era, engineers were wondering how to stack it well.

1. AlexNet (ILSVRC’12)

(1) Activation function : ReLU

(2) Normalization technique : LRN(Local Response Normalization)

  • Data Augmentation : flip, crop, color(jiltering)
  • Dropout : output x 0.5

2. VGG (ILSVRC’14)

VGG Network doesn’t need additional description, because it is very simple structure as just using 3x3 conv, 1 stride. Normally, VGG16 and VGG19 of total six models are frequently used.

3. GoogLeNet (ILSVRC’14)

(1) The number of layers : 22

(2) 1x1 convolution : useful technique for decreasing the number of parameters

(3) Inception module

(4) GAP(global average pooling)

(5) Auxiliary classifier

4. ResNet (ILSVRC’15)

(1) The number of layers : 152

(2) Residual block for preventing from vanishing gradient

(3) Good initialization technique

(4) Batch Normalization

(5) Early stop callback

< Code Anaylsis >

1. import module

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms
# tensorboard
from tensorboardX import SummaryWriter
# matrix calculation
import numpy as np
import random
# learning visualization
from tqdm import tqdm
# access to file system
import os
# network composition
from network import VGG
from network import ResNet

2. Dataloader

#Apply data augmentation and data preprocessing for training set
transform = transforms.Compose([
        transforms.RandomCrop(32, padding=4), # Random Crop: Randomly crop the part of the large image and utilize it as an augmented data 
        transforms.RandomHorizontalFlip(), # Random Horizontal Flip: Randomly flip the image and utilize it as an augmented data
        transforms.ToTensor(),
        transforms.Normalize(mean=[0.4914, 0.4822, 0.4465], std=[0.2023,0.1994,0.2010]), # Normalize the data using the given mean and standard deviation
        ])

#Apply data preprocessing for test set
transform_test = transforms.Compose([
        transforms.ToTensor(), 
        transforms.Normalize(mean=[0.4914, 0.4822, 0.4465], std=[0.2023,0.1994,0.2010]),
        ]) 

# torchvision dataset : MNIST, Fashion-MNIST, KMNIST, EMNIST, FakeData, COCO, LSUN, ImageFolder, DatasetFolder, Imagenet-12, CIFAR, STL10, SVHN, SBU, Flickr, VOC, Cityscapes
train_dataset = torchvision.datasets.CIFAR10(root='./data', train=True, download=True, transform=transform)
test_dataset = torchvision.datasets.CIFAR10(root='./data', train=False, download=True, transform=transform_test)

train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=200, shuffle=True)
test_loader = torch.utils.data.DataLoader(test_dataset, batch_size=200, shuffle=False)

3. Optimizer, Loss function

def train(model, n_epoch, loader, optimizer, criterion, device="cpu"):
  model.train() # train mode
  for epoch in tqdm(range(n_epoch)):
    running_loss = 0.0
    # Usage for Dataloader
    for i, data in enumerate(loader, 0): 
      images, labels = data
      # device = "cpu" or "cuda"
      images = images.to(device) 
      labels = labels.to(device)
      optimizer.zero_grad() # initialize gradient
      outputs = model(images)
      loss = criterion(input=outputs, target=labels) # define loss function
      loss.backward() # backpropagation
      optimizer.step() # update weight, bias
      running_loss += loss.item()
    print('Epoch {}, loss = {:.3f}'.format(epoch, running_loss/len(loader)))
  print('Training Finished')

def evaluate(model, loader, device="cpu"):
  model.eval() # eval mode
  total=0
  correct=0
  with torch.no_grad(): # No update weight, bias
    for data in loader:
      images, labels = data
      images = images.to(device)
      labels = labels.to(device)
      outputs = model(images)
      _, predicted = torch.max(outputs.data, 1)
      total += labels.size(0)
      correct += (predicted==labels).sum().item()
  acc = 100*correct/total
  return acc

4. Network

(1) VGG

(2) ResNet

5. main function

def reset_seed(seed):
  torch.manual_seed(seed)
  np.random.seed(seed)
  random.seed(seed)

# VGG main function
reset_seed(0)
vgg_model = VGG().to("cuda")
criterion = nn.CrossEntropyLoss()
tb_log = SummaryWriter(log_dir=os.path.join('./', 'tensorboard'))
optimizer = optim.SGD(params=vgg_model.parameters(), lr=0.1, momentum=0.9)
train(model=vgg_model, n_epoch=10, loader=train_loader, optimizer=optimizer, criterion=criterion, device="cuda")
vgg_acc = evaluate(vgg_model, test_loader, device="cuda")
print('VGG Test accuracy: {:.2f}%'.format(vgg_acc))

# ResNet main function
reset_seed(0)
resnet_model = ResNet().to("cuda")
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(params=resnet_model.parameters(), lr=0.1, momentum=0.9)
train(model=resnet_model, n_epoch=10, loader=train_loader, optimizer=optimizer, criterion=criterion, device="cuda")
resnet_acc = evaluate(resnet_model, test_loader, device="cuda")
print('ResNet Test accuracy: {:.2f}%'.format(resnet_acc))

6. tensorboard, argparse, mgpus, logging, lr_scheduler

Reference site

[1] (https://junklee.tistory.com/19)

Updated:

Leave a comment