Abstract: Data science has many powerful tools for using observational data to create models that solve prediction and classification problems. Many of our real world questions are causal, however, and it is well known that predictive models do not answer causal questions. A common attitude is that only controlled experiments can be used to produce causal information, but there are rich fields of statistics and machine learning covering correct methods for doing causal inference and causal discovery from observational data alone. Such methods are increasingly being used in practice, as experimentation is often costly, slow, unethical, or impossible. I will survey current methods for using observational data to answer causal questions, briefly review the theory and assumptions behind those methods, and demonstrate their use with open source software packages.
Bio: Dr. Kummerfeld is a Research Assistant Professor at the Institute for Health Informatics in the University of Minnesota. His primary research interest is in statistical and machine learning methods for discovering causal relationships, with a special focus on discovering causal latent variable models. His work includes (1) developing novel algorithms for discovering causal relationships and latent variables, (2) proving theorems about the properties of causal discovery and latent variable discovery algorithms, (3) performing benchmark simulation studies to evaluate features of the algorithms that are difficult or impossible to evaluate by other means, and (4) applying these novel algorithms to health data in order to inform the development of new treatments.