Everyone thought it was great to use diﬀerentiable, symmetric, non-linear activation functionsin feed-forward neural networks, until Alex Krizhevsky  found that Rectiﬁer Linear Units, despitebeing not entirely diﬀerentiable, nor symmetric, and most of all, piece-wise linear, were compu-tationally cheaper and worth the trade-oﬀ with their more sophisticated counterparts. Here are just a few thoughts on the properties of these activation functions, a potential explanation for whyusing ReLUs speeds up training, and possible ways of applying these insights for better learningstrategies
Tag Archives: Deep Neural Networks
Artificial neural networks typically have a fixed, non-linear activation function at each neuron. We have designed a novel form of piecewise linear activation function that is learned independently for each neuron using gradient descent. With this adaptive activation function, we are able to improve upon deep neural network architectures composed of static rectified linear units, achieving state-of-the-art performance on CIFAR-10 (7.51%), CIFAR-100 (30.83%), and a benchmark from high-energy physics involving Higgs boson decay modes.