TITLE: How to escape saddle points efficiently?
Praneeth Netrapalli, Microsoft
ABSTRACT:
Non-convex optimization is ubiquitous in modern machine learning applications. Gradient descent based algorithms (such as gradient descent or Nesterov's accelerated gradient descent) are widely used in practice since they have been observed to perform well on these problems. From a theoretical standpoint, however, this is quite surprising since these nonconvex problems are teeming with highly suboptimal saddle points, and it is well known that gradient descent (and variants) can get stuck in such saddle points. A key question that arises in resolving this apparent paradox is whether gradient descent based algorithms escape saddle points where the Hessian has negative eigenvalues and converge to points where Hessian is positive semidefinite (such points are called second-order stationary points) efficiently. We answer this question in the affirmative by showing the following: (1) Perturbed gradient descent converges to second-order stationary points in a number of iterations which depends only poly-logarithmically on the dimension (i.e., it is almost “dimension-free”). The convergence rate of this procedure matches the well-known convergence rate of gradient descent to first-order stationary points, up to log factors, and (2) A variant of Nesterov’s accelerated gradient descent converges to second-order stationary points at a faster rate than perturbed gradient descent.
The key technical components behind these results are (1) understanding geometry around saddle points and (2) designing a potential function for Nesterov’s accelerated gradient descent for non-convex problems.
Based on joint works with Chi Jin, Rong Ge, Sham M. Kakade and Mike Jordan.
BIO: Praneeth Netrapalli is a researcher at Microsoft Research India, Bengaluru since August 2016. Prior to this, he was a postdoctoral researcher at Microsoft Research New England in Cambridge, MA. He obtained MS and PhD in ECE from UT Austin advised by Sujay Sanghavi. Before coming to UT, he worked in the Quantitative Analysis group at Goldman Sachs, Bangalore. He obtained B-Tech in Electrical Engineering from IIT Bombay. His research focuses on designing efficient algorithms for machine learning problems primarily via stochastic and nonconvex optimization.