Convolutions in neural networks are still essential on various vision tasks. To develop neural convolutions, this study focuses on Structured Receptive Field (SRF), representing a convolution filter as a linear… Click to show full abstract
Convolutions in neural networks are still essential on various vision tasks. To develop neural convolutions, this study focuses on Structured Receptive Field (SRF), representing a convolution filter as a linear combination of widely acting designed components. Although SRF can represent convolution filters with fewer components than the number of filter bins, N-Jet, the sole component system implementation, requires ten trainable parameters per filter to improve accuracy even for $3 \times 3$ convolutions. Hence, we aim to formulate a new component system for SRF that can represent valid filters with fewer components. Our component system named “OtX” is based on the Principal Component Analysis of well-trained filter weights because the extracted components will also be principal for neural convolution filters. In addition to proposing the component system, we develop a component scaling method to defuse massive scale differences among the coefficients in a linear combination of OtX components. In the experimental section, we train image classification models on CIFAR-100 dataset under the hyperparameters tuned for the original models with the standard convolutions. For NFNet-F0 classifier, OtX with six components performs 0.5% better than the standard convolution, 3.1% better than N-Jet with six components, and only 0.1% worse than N-Jet with ten components. Besides, OtX with nine components provides stabler training than N-Jet, performing 0.5% better than the standard for NFNet-F0. OtX suits when replacing standard convolutions because OtX performs at least comparably against N-Jet with further parameter efficiency and training stability.
               
Click one of the above tabs to view related content.