- Only 5 million parameters (12x fewer parameters than the AlexNet)
- Efficient "inception" module
- NO FC layers
Inception module
1) Naive version
- design a local network topology (network within a network) and then stack these modues on top of each other.
- apply parallel filter operations on the input from previous layer
(multiple receptive field sizes for convolution (1x1, 3x3, 5x5), pooling operation (3x3)
- concatenate all filter outputs together channel-wise
- This naive version needs very expensive computation. (pooling layer also preserves feature depth)
NIN; Network in network
- Instead of using filter, NIN uses multi-layer perceptron and extracts features. - Filter is good enough to extract feature. But, it is linear so use MLP to extract non-linearity feature. - GoogLeNet applied this concept.
2) Dimension reduction version
- By applying 1x1 convolutions before the expensive 3x3 and 5x5 convolutions, preserves spatial dimensions and reduces depth. (Also include the use of ReLU which makes them dual-purpose)
- Interpret it as applying the same fully-connected layer on each input pixel.
- It allows for increasing the number of units at each stage without a computational complexity.
- It can abstract features from different scales simultaneously. (use of 3x3, 5x5 convolutions)
Global average pooling
- Instead of using fully-connected layers at the end of the network, GoogLeNet used 'global average pooling'.
- Its output is 1 dimensional vector from averaging whole feature maps. (e.g. 7x7 1024 maps -> 1024 vector)
(1 dimensional vector needs for softmax)
- pooling doesn't need any weights so this method can reduce computing resources.
(fc -> 7 x 7 x 1024 x 1024 = 51.3m)
- This enables adapting and fine-tuning GoogLeNet for other label sets easily.
auxiliary classifier
- To prevent vanishing gradient, put 2 auxiliary classifiers in the middle of the network.
- Auxiliary classifier injects additional gradient at lower layers. (AvgPool-1x1Conv-FC-FC-Softmax)
- only for training.
Architecture
- All the convolutions (including inside of the Inception) use ReLU activation.
- The network was designed with computational efficiency.