Data Harmonization for Generalizable Deep Learning Models: from Theory to Hands-on Tutorial

Abstract: Integration of data from multiple sources, with and without labels, is a fundamental problem in transfer learning when models must be trained on a source data distribution that differs from one or more target data distributions. For example, in healthcare, models must flexibly inter-operate on large scale medical data gathered across multiple hospitals, each with confounding biases. Domain adaptation is a method for enabling this form of transfer learning by
simultaneously identifying deep feature representations that are invariant across domains (data sources), thereby enabling transfer learning to unseen data distributions.

In this workshop, we will teach attendees how to use domain adaptation for machine learning applications in computer vision for healthcare. More specifically, we will introduce scAlign, our recently developed domain adaptation approach that can integrate data from multiple sources in either a fully unsupervised, semi supervised and fully supervised fashion. Healthcare datasets will be used by participants to explore how to tune model accuracy and generalizability. Finally, we will explore how domain adaptation can be extended to perform a novel form of differential feature extraction to identify differences between features of data (e.g. images) from different
sources.

Learning objectives:
- Introduction to Domain Adaptation.
- 15-20 mins
- Theory/motivation (Healthcare)
- Go over applications in computer vision
- Introduction to scAlign

- Overview of Tensorflow/Colaboratory. (Interactive)
- 25 mins
- Structure of workshop
- Structure of Tensorflow model: basics
- Running an ML model in colaboratory (example)

- Domain adaptation on melanoma data. (Hands-on)
- 10 mins
- Explain problem: Align data from different hospital/patient/disease
- Go over what the model should produce, show visualization.
- 20 mins
- Work through alignment of data
- Explore changes to loss function weights.
- Hyper parameters for DA/Classifier

- Expand model structure

- Differential feature analysis with Domain adaptation. (Hands-on)
- 15 mins
- Explain model structure and how to perform differential feature analysis.
- Explore how features change with respect to disease.
- Decode into both health and disease state
- Compute difference

Bio: Coming soon!