TITLE: Slow and Stale Gradients Can Win the Race: Error-Runtime Trade-offs in Distributed SGD

Gauri Joshi, CMU

 

ABSTRACT:

Distributed Stochastic Gradient Descent (SGD) when running in a synchronous manner, suffers from delays in waiting for the slowest learners (stragglers). Asynchronous methods can alleviate stragglers, but cause gradient staleness that can adversely affect convergence. In this work, we present the first theoretical characterization of the speed-up offered by asynchronous methods by analyzing the trade-off between the error in the trained model and the actual training runtime (wallclock time). The novelty in our work is that our runtime analysis considers random straggler delays, which helps us design and compare distributed SGD algorithms that strike a balance between stragglers and staleness. We also present a new convergence analysis of asynchronous SGD variants without bounded or exponential delay assumptions, and a novel learning rate schedule to compensate for gradient staleness.

 

BIO: Gauri Joshi is an assistant professor in the ECE department at Carnegie Mellon University since September 2017. Prior to that, she worked as a Research Staff Member at IBM T. J. Watson Research Center. Gauri completed her Ph.D. from MIT EECS in June 2016. She also received her B.Tech and M. Tech in Electrical Engineering from the Indian Institute of Technology (IIT) Bombay in 2010. Her awards and honors include the Best Thesis Prize in Computer science at MIT (2012), Institute Gold Medal of IIT Bombay (2010), Claude Shannon Research Assistantship (2015-16), and the Schlumberger Faculty for the Future fellowship (2011-2015).