LAUSR.org creates dashboard-style pages of related content for over 1.5 million academic articles. Sign Up to like articles & get recommendations!

Randomly translational activation inspired by the input distributions of ReLU

Photo by voznenko_artur from unsplash

Abstract Deep convolutional neural networks have achieved great success on many visual tasks (e.g., image classification). Non-linear activation plays a very important role in deep convolutional neural networks (CNN). It… Click to show full abstract

Abstract Deep convolutional neural networks have achieved great success on many visual tasks (e.g., image classification). Non-linear activation plays a very important role in deep convolutional neural networks (CNN). It is found that the input distribution of non-linear activation is like Gaussian distribution and the most of the inputs are concentrated near zero. It makes the learned CNN likely sensitive to the small jitter of the non-linear activation input. Meanwhile, CNN is easily prone to overfitting with deep architecture. To solve the above problems, we make full use of the input distributions of non-linear activation and propose the randomly translational non-linear activation for deep CNN. In the training stage, non-linear activation function is randomly translated by an offset sampled from Gaussian distribution. In the test stage, the non-linear activation with zero offset is used. Based on our proposed method, the input distribution of non-linear activation is relatively scattered. As the result, the learned CNN is robust to the small jitter of the non-linear activation input. Our proposed method can be also seen as the regularization of non-linear activation to reduce overfitting. Compared to the original non-linear activation, our proposed method can improve classification accuracy without increasing computation cost. Experimental results on CIFAR-10/CIFAR-100, SVHN, and ImageNet demonstrate the effectiveness of the proposed method. For example, the reductions of error rates with VGG architecture on CIFAR-10/CIFAR-100 are 0.55% and 1.61%, respectively. Even when the noise is added to the input image, our proposed method still has much better classification accuracy on CIFAR-10/CIFAR-100.

Keywords: input distributions; non linear; proposed method; activation; linear activation

Journal Title: Neurocomputing
Year Published: 2018

Link to full text (if available)


Share on Social Media:                               Sign Up to like & get
recommendations!

Related content

More Information              News              Social Media              Video              Recommended



                Click one of the above tabs to view related content.