Deep residual learning for image recognition; ResNet (2015)

ResNet

1) 개요

- very deep networks using residual connections

- 152 layers

- ILSVRC 15 winner

- Between 20 and 56 layers 'plain' convolutional networks, the deeper network has higher training error and test error.>

- Try to resolve degradation problem with deeper network which isn't caused by overfitting.

Residual Block

- Use network layers to fit a residual mapping instead of directly trying to fit a desired underlying mapping

- Former method was getting the optimal H(x) by passing through just 2 weight layers. But, residual block's goal was getting H(x) - x (difference between output and input) -> weights will be trained to get H(x) - x

if F(x) = H(x) - x -> H(x) = F(x) + x

- If identity mappings are optimal (F(x) = 0), simply drive the weights of the multiple nonliear layers toward 0.

- No increase of computing power (no extra parameter)

- possible to optimize deeper net

- dimensions of x and F(x) must be equal -> If not, perform as linear projection Ws each F and x.

(This will lead time complexity and model size are doubled)

Architecture

- stack residual blocks

- every residual block has two 3x3 conv layers

- periodically, double # of filters and downsample spatially using stride 2 -> reduce activation volume by half

- additional conv layer at the beginning (stem)

- No FC layers at the end (only FC 1000 to output classes)

- For deeper networks, use 'bottleneck' layer to improve efficiency (put 1x1 conv layer)

Training ResNet

1) hyper parameters

- BN after every CONV layer

- 3x3 conv layer

- Xavier initialization from He et al.

- SGD + Momentum (0.9)

- LR : 0.1, divide by 10 when validation error stop decreasing

- mini-batch size = 256

- wegith decay 0.00001

- no dropout used

- only one max-pooling (stride 2 to reduce feature-map size instead of pooling)

2) Experiment

- VGG-19

- residual net : with 34 layers shortcut connection per 2 conv layer

- plain net with 34 layers

2) More than 1000 layers

- 1202-layer is worse than 110-layer although both have similar training error.

- They think this result is because of overfitting. Deeper layers network needs a large dataset.

Ref.

cs231n.stanford.edu/slides/2020/lecture_9.pdf

blog.naver.com/PostView.nhn?blogId=laonple&logNo=220761052425&parentCategoryNo=&categoryNo=22&viewDate=&isShowPopularPosts=false&from=postView

arxiv.org/abs/1512.03385

'AI > Paper' 카테고리의 다른 글

Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset (2017) (0)	2021.04.18
Skeleton Aware Multi-modal Sign Language Recognition (2021) (0)	2021.04.13
Going deeper with convolutions; GoogLeNet (2014) (0)	2021.03.29
Very deep convolutional networks for large-scale image recognition; VGGNet (2015) (0)	2021.03.29
ImageNet Classification with Deep Convolutional Neural Networks; AlexNet (2012) (0)	2021.03.28

날아가는 개발자