Going deeper with convolutions; GoogLeNet (2014)

GoogLeNet

1) 개요

- deeper networks, with computational efficiency

- 14 ILSVRC winner

- 22 layers

- Only 5 million parameters (12x fewer parameters than the AlexNet)

- Efficient "inception" module

- NO FC layers

Inception module

1) Naive version

- design a local network topology (network within a network) and then stack these modues on top of each other.

- apply parallel filter operations on the input from previous layer

(multiple receptive field sizes for convolution (1x1, 3x3, 5x5), pooling operation (3x3)

- concatenate all filter outputs together channel-wise

- This naive version needs very expensive computation. (pooling layer also preserves feature depth)

NIN; Network in network
- Instead of using filter, NIN uses multi-layer perceptron and extracts features.
- Filter is good enough to extract feature. But, it is linear so use MLP to extract non-linearity feature.
- GoogLeNet applied this concept.

2) Dimension reduction version

- By applying 1x1 convolutions before the expensive 3x3 and 5x5 convolutions, preserves spatial dimensions and reduces depth. (Also include the use of ReLU which makes them dual-purpose)

- Interpret it as applying the same fully-connected layer on each input pixel.

- It allows for increasing the number of units at each stage without a computational complexity.

- It can abstract features from different scales simultaneously. (use of 3x3, 5x5 convolutions)

Global average pooling

- Instead of using fully-connected layers at the end of the network, GoogLeNet used 'global average pooling'.

- Its output is 1 dimensional vector from averaging whole feature maps. (e.g. 7x7 1024 maps -> 1024 vector)

(1 dimensional vector needs for softmax)

- pooling doesn't need any weights so this method can reduce computing resources.

(fc -> 7 x 7 x 1024 x 1024 = 51.3m)

- This enables adapting and fine-tuning GoogLeNet for other label sets easily.

auxiliary classifier

- To prevent vanishing gradient, put 2 auxiliary classifiers in the middle of the network.

- Auxiliary classifier injects additional gradient at lower layers. (AvgPool-1x1Conv-FC-FC-Softmax)

- only for training.

Architecture

- All the convolutions (including inside of the Inception) use ReLU activation.

- The network was designed with computational efficiency.

- SGM-momentum (0.9)

- Learning rate decreasing by 4% every 8 epochs.

Ref.

blog.naver.com/PostView.nhn?blogId=laonple&logNo=220716782369&parentCategoryNo=&categoryNo=22&viewDate=&isShowPopularPosts=false&from=postView

bskyvision.com/539

arxiv.org/abs/1409.4842

'AI > Paper' 카테고리의 다른 글

Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset (2017) (0)	2021.04.18
Skeleton Aware Multi-modal Sign Language Recognition (2021) (0)	2021.04.13
Deep residual learning for image recognition; ResNet (2015) (0)	2021.03.29
Very deep convolutional networks for large-scale image recognition; VGGNet (2015) (0)	2021.03.29
ImageNet Classification with Deep Convolutional Neural Networks; AlexNet (2012) (0)	2021.03.28

날아가는 개발자