Venue: Webex
Abstract
Among the most surprising aspects of deep learning models are their highly overparametrized and non-convex nature. Both of these aspects are a common trait of all the deep learning models and have led to unexpected results for classical learning theory and non-convex optimization. Current deep neural networks (DNN) are composed of millions (or even billions) of connection weights and the learning process seeks to minimize a non-convex loss function that measures the number of classification errors made by the DNN. The empirical evidence shows that these highly expressive neural network models can fit the training data via simple variants of algorithms originally designed for convex optimization. Moreover, even if the learning processes are run with little control over their statistical complexity (e.g. regularisation, number of parameters, …), these models achieve unparalleled levels of prediction accuracy, contrary to what would be expected from the uniform convergence framework of classical statistical inference.
In this talk, we will discuss the geometrical structure of the space of solutions (zero error configurations) in overparametrized non-convex neural networks when trained to classify patterns taken from some natural distribution. Building on statistical physics techniques for the study of disordered systems, we analyze the geometric structure of the different minima and critical points of the error loss function as the number of parameters increases and we relate this to learning performance. Of particular interest is the role of rare flat minima which are both accessible to algorithms and have good generalisation properties, on the contrary to dominating minima which are almost impossible to sample. We will show that the appearance of rare flat minima defines a phase boundary at which algorithms start to find solutions efficiently.
Short Bio
Riccardo Zecchina is professor in theoretical physics at Bocconi University in Milan, with a chair in Machine Learning.
He did his PhD in Theoretical Physics at the University of Turin with Tullio Regge. Next he was appointed research scientist and head of the Statistical Physics Group at the International Centre for Theoretical Physics in Trieste and
full professor in Theoretical Physics at the Polytechnic University of Turin. In 2017 he moved to Bocconi University in Milan, with a chair in Machine Learning. He has been multiple times long term visiting scientist at Microsoft Research (in Redmond and Cambridge MA) and at the Laboratory of Theoretical Physics and Statistical Models (LPTMS) of the University of Paris-Sud.
His current research interests lie at the interface between statistical physics, computer science and machine learning. His current research activity is primarily focused on the study of the out-of-equilibrium theory of learning algorithms, in artificial and biological neural networks.
He is an advanced grantee of the European Research Council. In 2016, he was awarded (with M. Mezard and G. Parisi) the Lars Onsager Prize in Theoretical Statistical Physics by the American Physical Society, "For groundbreaking work applying spin glass ideas to ensembles of computational problems, yielding both new classes of efficient algorithms and new perspectives on phase transitions in their structure and complexity."
Powered by iCagenda