Master Thesis MSTR-2019-68

BibliographyHolzmüller, David: Convergence Analysis of Neural Networks.
University of Stuttgart, Faculty of Computer Science, Electrical Engineering, and Information Technology, Master Thesis No. 68 (2019).
77 pages, english.

We prove that two-layer (Leaky)ReLU networks with one-dimensional input and output trained using gradient descent on a least-squares loss and He et al. initialization are not universally consistent. Specifically, we define a submanifold of all data distributions on which gradient descent fails to spread the nonlinearities across the data with high probability, i.e. it only finds a bad local minimum or valley of the optimization landscape. In these cases, the network found by gradient descent essentially only performs linear regression. We provide numerical evidence that this happens in practical situations and that stochastic gradient descent exhibits similar behavior. We relate the speed of convergence to such a local optimum to a stable linear system whose eigenvalues have different asymptotics. We also provide an upper bound on the learning rate based on this observation. While we mainly operate in the underparameterized regime like most consistency results for classical algorithms, our proof also applies to certain overparameterized cases that are not covered by recent results showing convergence of overparameterized neural nets to a global optimum.

Full text and
other links
Department(s)University of Stuttgart, Institute of Parallel and Distributed Systems, Simulation Software Engineering
Superviser(s)Pflüger, Prof. Dirk; Steinwart, Prof. Ingo
Entry dateFebruary 19, 2020
   Publ. Computer Science