BN is proposed by Sergey Ioffe and Christian Szegedy. Which one of the following papers is also published by Christian Szegedy?
A. (Deepid2)Deep Learning Face Representation by Joint Identification-Verification
B. (Joint Bayesian)Bayesian Face Revisited: A Joint Formulation
C. Robust Multi-Resolution Pedestrian Detection in Traffic Scenes
D. RASL: Robust alignment by sparse and low-rank decomposition for linearly correlated images
E. (Googlenet)Going Deeper with Convolutions
For each output channel k
G(k) is a corresponding subset of input channels
$$ G(k) \subset \{1,2, ..., D\} $$ $$ y_{ijk} = \dfrac{x_{ijk}} {(k + \alpha \sum_{m\in G(k)} x^2_{ijm})^{\beta}} $$For each output channel k
$$ n^2_{ijk} = \dfrac{1}{H'W'} \sum_{1\leq i' \leq H', 1\leq j' \leq W'} x_{i+i'-\lfloor(H'-1)/2\rfloor, j+j'-1-\lfloor(W'-1)/2\rfloor}^2$$ $$ y_{ijk} = \dfrac{x_{ijk}} {(1 + \alpha n^2_{ijk})^{\beta}} $$Two modes:
Two modes:
Problem: internal covariate shift
Change in the distribution of network activations due to the change in network parameters during training
Idea: ensure the distribution of nonlinearity inputs remains more stable
For one feature map:
$$ \small{N = H\times W\times M}$$ $$ \dfrac{\partial l}{\partial \gamma} = \sum_{i=1}^N \dfrac{\partial l}{\partial y_i} \cdot \hat{x}$$ $$ \dfrac{\partial l}{\partial \beta} = \sum_{i=1}^N \dfrac{\partial l}{\partial y_i} $$Parameters of BN layer can be updated by above equations.
For one feature map k:
$$ \dfrac{\partial l}{\partial x_{ijkm}} = \sum_{i'j'km'} \dfrac{\partial l}{\partial y_{i'j'km'}}\dfrac{\partial y_{i'j'km'}}{\partial x_{ijkm}}$$ $$ \dfrac{\partial y_{i'j'km'}}{\partial x_{ijkm}} = \gamma_k ((1-\dfrac{\partial \mu_k}{\partial x_{ijkm}})\dfrac{1}{\sqrt{\sigma^2+\epsilon}} - \dfrac{1}{2} (x_{i'j'km'} - \mu_k)(\sigma_k^2+\epsilon)^{-3/2}\dfrac{\partial \sigma_k^2}{\partial x_{ijkm}})$$ $$ \dfrac{\partial \mu_k}{\partial x_{ijkm}} = \dfrac{1}{HWM} = \dfrac{1}{N}$$ $$ \dfrac{\partial \sigma^2_k}{\partial x_{ijkm}} = \dfrac{2}{N} (x_{ijkm} - \mu_k) $$Diff can be passed down through BN layer by above equations.
Network: xd_net_12m
Network: xd_net_12m
The picture below describes what kind of Normalization?
A. Cross-Channel Normalization
B. Spatial Normalization
C. Batch Normalization
D. Local Response Normalization
E. Mean-Variance Normalization
The number of parameter \(\gamma\) in a BN layer equals to?
A. Batch size
B. The number of feature maps
C. The number of activations
Network | lr | iter | pcadim | accuracy |
---|---|---|---|---|
(tile conv)sn01 bn | 0.01 | 150000 | 300 | 0.984000 |
(tile conv)sn02 bn | 0.05 | 150000 | 400/500/700 | 0.991167 |
(full conv)tn03 7x6x1024(3)->7x6x256(1)->512 bn | 0.03 | 150000 | 300 | 0.989000 |
(full conv)tn01 7x6x1024(3)->7x6x256(1)->512 bn | 0.05 | 150000 | 400 | 0.992667 |
(full conv)tn04 7x6x1024(3)->7x6x256(1)->512 bn | 0.1 | 150000 | 800/900 | 0.990167 |
(full conv)np04 tn01 -> no bn | 0.05 | 150000 | 400 | 0.990500 |
(full conv)np05 tn01 -> no bn | 0.01 | 150000 | 500/800 | 0.990000 |