|Holzmüller, David: Convergence Analysis of Neural Networks. |
Universität Stuttgart, Fakultät Informatik, Elektrotechnik und Informationstechnik, Masterarbeit Nr. 68 (2019).
77 Seiten, englisch.
We prove that two-layer (Leaky)ReLU networks with one-dimensional input and output trained using gradient descent on a least-squares loss and He et al. initialization are not universally consistent. Specifically, we define a submanifold of all data distributions on which gradient descent fails to spread the nonlinearities across the data with high probability, i.e. it only finds a bad local minimum or valley of the optimization landscape. In these cases, the network found by gradient descent essentially only performs linear regression. We provide numerical evidence that this happens in practical situations and that stochastic gradient descent exhibits similar behavior. We relate the speed of convergence to such a local optimum to a stable linear system whose eigenvalues have different asymptotics. We also provide an upper bound on the learning rate based on this observation. While we mainly operate in the underparameterized regime like most consistency results for classical algorithms, our proof also applies to certain overparameterized cases that are not covered by recent results showing convergence of overparameterized neural nets to a global optimum.
|Abteilung(en)||Universität Stuttgart, Institut für Parallele und Verteilte Systeme, Simulationssoftwarebau|
|Betreuer||Pflüger, Prof. Dirk; Steinwart, Prof. Ingo|
|Eingabedatum||19. Februar 2020|