The Magic of Dimensionality Reduction

Abstract: Dimensionality reduction is one of the most crucial tools in a data scientists’ toolbox, and modern tools can yield truly magical results. This talks looks at how we can take complex, messy, real-world datasets and simplify them, in order to create beautiful visualizations, to better understand the relationships between our data points, and to discover some surprising insights. The talk aims to be accessible and practical; the audience should walk away with brand new techniques for visualizing and understanding their data.

Dimensionality reduction can seem like an intimidatingly complex subject, full of acronyms (MDR, LDA, GDA, LTSA, HLLE…) and lengthy equations. But in fact, anyone who’s ever drawn a picture (in other words, anyone at all) can understand dimensionality reduction. The first part of this talk focuses on providing an intuitive understanding of what dimensionality reduction is, and why it’s so crucial.

Next, we’ll tour through four of the most useful and widely used dimensionality reduction techniques: PCA, random projection, Isomap and t-SNE, accompanied by example implementations in Python. We’ll look at how each algorithm works, and what their particular strengths and weaknesses are. We’ll also see how these different approaches fare when applied to well-known datasets like MNIST.

The second half of the talk will focus on dimensionality reduction’s magical powers, how it can help us answer questions as diverse as: which is the strangest US state? Which two Simpsons characters are most similar? What movie should I watch tonight? Which of my customers is most likely to cancel next month? We answer these questions by combining modern dimensionality reduction and visualization techniques, again providing simple code implementations in Python.

Finally, we’ll look at how dimensionality reduction can be applied to linguistic and image data. We’ll explore how word embeddings can help us organize emojis or build equations from language (e.g. king - man + woman = queen), and how the tedium of Tinder can be automated with eigenfaces.

Bio: Alex Peattie is the co-founder and CTO of Peg, a technology platform helping multinational brands and agencies to find and work with top YouTubers. Peg is used by over 1500 organisations worldwide including Coca-Cola, L'Oreal and Google.

An experienced digital entrepreneur, Alex spent six years as a developer and consultant for the likes of Grubwithus, Huckberry, UNICEF and Nike, before joining coding bootcamp Makers Academy as senior coach, where he trained hundreds of junior developers. Alex was also a technical judge at this year's TechCrunch Disrupt conference.

Open Data Science Conference