본문으로 바로가기

Going deeper with convolutions; GoogLeNet (2014)

category AI/Paper 2021. 3. 29. 14:57

GoogLeNet

1) 개요

    - deeper networks, with computational efficiency

    - 14 ILSVRC winner

    - 22 layers

    - Only 5 million parameters (12x fewer parameters than the AlexNet)

    - Efficient "inception" module

    - NO FC layers

 

Inception module

1) Naive version

    - design a local network topology (network within a network) and then stack these modues on top of each other.

    - apply parallel filter operations on the input from previous layer

      (multiple receptive field sizes for convolution (1x1, 3x3, 5x5), pooling operation (3x3)

    - concatenate all filter outputs together channel-wise

    - This naive version needs very expensive computation. (pooling layer also preserves feature depth)

NIN; Network in network
- Instead of using filter, NIN uses multi-layer perceptron and extracts features.
- Filter is good enough to extract feature. But, it is linear so use MLP to extract non-linearity feature.
- GoogLeNet applied this concept.

 

2) Dimension reduction version

    - By applying 1x1 convolutions before the expensive 3x3 and 5x5 convolutions, preserves spatial dimensions and reduces depth. (Also include the use of ReLU  which makes them dual-purpose)

    - Interpret it as applying the same fully-connected layer on each input pixel.

    - It allows for increasing the number of units at each stage without a computational complexity.

    - It can abstract features from different scales simultaneously. (use of 3x3, 5x5 convolutions)

 

Global average pooling

    - Instead of using fully-connected layers at the end of the network, GoogLeNet used 'global average pooling'.

    - Its output is 1 dimensional vector from averaging whole feature maps. (e.g. 7x7 1024 maps -> 1024 vector)

      (1 dimensional vector needs for softmax) 

    - pooling doesn't need any weights so this method can reduce computing resources.

      (fc -> 7 x 7 x 1024 x 1024 = 51.3m)

    - This enables adapting and fine-tuning GoogLeNet for other label sets easily.

 

auxiliary classifier

- To prevent vanishing gradient, put 2 auxiliary classifiers in the middle of the network.

- Auxiliary classifier injects additional gradient at lower layers. (AvgPool-1x1Conv-FC-FC-Softmax)

- only for training.

 

Architecture

    - All the convolutions (including inside of the Inception) use ReLU activation.

    - The network was designed with computational efficiency.

    - SGM-momentum (0.9)

    - Learning rate decreasing by 4% every 8 epochs.


Ref.

blog.naver.com/PostView.nhn?blogId=laonple&logNo=220716782369&parentCategoryNo=&categoryNo=22&viewDate=&isShowPopularPosts=false&from=postView

bskyvision.com/539

arxiv.org/abs/1409.4842