I wrote an article titled “Neural Network L2 Regularization using Python” in the September 2017 issue of Visual Studio Magazine. See https://visualstudiomagazine.com/articles/2017/09/01/neural-network-l2.aspx.

You can think of a neural network as a complicated mathematical prediction equation. To compute the constants (called weights and biases) that determine the behavior of the equation, you use a set of training data. The training data has known, correct input and output values. You use an algorithm (most often back-propagation) to find values for the NN constants so that computed output values closely match the correct output values in the training data.

A challenge when training a NN is called over-fitting. If you train a network too well, you will get very low error or equivalently, high accuracy) on the training data. But when you apply your NN model to new, previously unseen data, your accuracy is very low.

There are several ways to try and limit NN over-fitting. One technique is called regularization. As it turns out, an over-fitted NN model often has constants that are very large in magnitude. Regularization keeps the values of the NN constants small.

There are two main forms of regularization, L1 and L2. L1 regularization penalizes the sum of the magnitudes of all the NN weights. L2 regularization penalizes the sum of the squared weights. My article explains exactly how L2 regularization works, and compares L1 and L2 regularization.

From a developer’s point, L2 regularization generally (but not always) works a bit better than L1 regularization. And L2 is a tiny bit easier to implement than L1. But L1 sometimes (but not always) automatically prunes away irrelevant predictor variables by setting their associated weight constants to zero.

As often with machine learning, working with regularization is part art and part science and part intuition and part experience.

*“A Perfect Fit” (1863) – Luis Ruiperez*