Gradient Descent, Demystified

Abstract: Gradient descent (GD) is a fundamental optimization algorithm that sounds much scarier than it is. Many users of scikit-learn et al can apply GD through these tools, but do not `grok` what GD is really doing. Other more engineering-oriented practitioners are put off entirely by the seeming complexity. I walk through a live coding practicum (in a RISE Jupyter Notebook slideshow) in which I implement an initial gradient descent algorithm for logistic and linear regression, demonstrating the flexibility of the optimization technique and the decidedly un-scary code required to get our prototype up-and-running. I compare the results of our hand coded algorithm to those generated by Scikit-learn (and the closed-form normal equation, for linear regression) and show equality.

The focus of this talk is on the practicum of implementation one’s own GD algorithm, though I review the most important mathematical and theoretical components of GD to ground the practicum for attendees. Mathematical review touches on the nature of gradients, what they are, how they relate to derivates, and how they enable iterative optimization over a parameter space. This talk does not include a formal derivation of, eg, the loss functions used in linear and logistic regression, nor does it require mastery of calculus. Attendees will leave the talk with a better understanding of iterative optimization and a template of their own for implementing GD in Python, should they feel this would enrich their understanding.

Bio: Stu (Michael Stewart) is a machine learning engineer at Opendoor in San Francisco. Previously, he worked as a data scientist and engineer at Uber, and as an economic researcher at the Federal Reserve Bank of New York.

Open Data Science Conference