Pierre Baldi




Wednesday, February 7, 2024 - 2:00pm



510R Rowland Hall

AI today can pass the Turing test and is in the process of transforming science, technology, society, humans, and beyond.
Surprisingly, modern AI is built out of two very simple and old ideas, rebranded as deep learning: neural networks and
gradient descent learning. When a typical feed-forward neural network is trained by gradient descent, with an L2 regularizer
to avoid overly large synaptic weights, a strange phenomenon occurs: at the optimum, each neuron becomes "balanced"
in the sense that the L2 norm of its incoming synaptic weights becomes equal to the L2 norm of its outgoing synaptic weights. We develop a theory that explains this phenomenon and exposes its generality. Balance emerges with a variety of activation functions, a variety of regularizers including all Lp regularizers, and a variety of networks including recurrent networks. A simple local balancing algorithm can be applied to any neuron and at any time, instead of just at the optimum. Most remarkably, stochastic iterated application of the local balancing algorithm always converges to a unique, globally balanced, state.