A Hands-On Tutorial for Training Interpretable Variational Autoencoders Using siVAE

Abstract: 

Variational autoencoders (VAEs) are one of the most widely used deep generative models with applications to computer vision, language processing, and genomics, among other fields. VAEs are typically used to perform non-linear dimensionality reduction, by mapping high dimensional samples such as images into a low-dimensional latent space for visualization and other downstream analysis. One of the key limitations of VAEs is their lack of interpretability: until now, it has been challenging to identify the relationship, or attributions, between individual latent dimensions and the original input features of the samples. Increasing the interpretability of the latent dimensions learned by the VAE will improve our understanding of what the latent space of VAEs is capturing, and help interpret their visualizations.

In this hands-on tutorial, we will introduce attendees to the siVAE (scalable, interpretable VAE) model that infers a set of factor loadings that explicitly map latent dimensions to the input features that define them, during training of the VAE model. Using standard datasets from computer vision (MNIST, Fashion-MNIST and CIFAR-10), we will walk attendees through the process of training the siVAE model, visualizing the sample embeddings inferred by classic VAEs, and extracting and visualizing the features that contribute to individual latent dimensions. We will also teach attendees how to estimate and visualize feature awareness, a new metric for measuring the overall importance of individual features for embedding a sample in the latent space. At the end of the tutorial, attendees will be able to train an siVAE model on their own datasets and interpret and visualize the latent dimensions inferred.

Session Outline
Tentative schedule:
- Introduction to VAE, siVAE (10 minutes)
- Hands-on introduction to basic TensorFlow commands (30 minutes)
- Hands-on training on how to train siVAE using the Google Colaboratory (20 minutes)
- Hands-on visualization of sample embeddings, factor loadings (interpretation), feature awareness (25 minutes)
- Wrap-up (5 minutes)

Bio: 

Yongin is a PhD candidate at UC Davis advised by Gerald Quon. His research focuses on computational biology involving applications of machine learning to answer questions in field of biology. More specifically, his current research focuses on the interpretation of deep neural network architectures trained on genomics data to understand the underlying gene to gene relationships.