Abstract: Cython is an elegant middle group between the ease-of-use of Python and the numeric efficiency of C. In this tutorial, we will cover the various elements of cython from a practical perspective. We will start off by converting common mathematical functions from python to cython and timing them at each step to identify what elements of cython provide the best speed gains. After cythonizing some examples, we will discuss what the difference between multithreaded and multiprocessing is, what the global interpreter lock (GIL) is, and how we can use cython to release the GIL and enable more efficient parallel processing. We will end by coding the commonly used k-means algorithm in cython in a manner that utilizes multithreaded for parallel speed gains. Participants should be familiar with C to the extent of understanding what static typing is, and familiar with Python enough to have previously used the numpy package. A working C++ compiler is required for this tutorial, which comes natively with both Linux and Mac machines, but must be downloaded for a Windows machine. Please have the compiler working ahead of time, as it can take time to get installed.
Bio: Jacob is a fourth year graduate student and IGERT big data fellow currently pursuing a Ph.D. in computer science at the University of Washington. His research involves connecting the three dimensional structure of the genome with a variety of cellular phenomena. Recently, Jacob’s work has involved predicting the three dimensional structure of the genome at high resolution given simple, cheap, data sources, and investigating a deep tensor factorization approach to impute missing epigenetics experiments for the human genome. As a separate track of research, he has also recently worked on techniques for merging expert knowledge with data to efficiently learn the structure of Bayesian networks, sometimes even making the problem tractable for large problems. When not doing research, Jacob actively contributes to the open source community. He is a core developer of the popular scikit-learn machine learning package for python, where he focuses on maintaining the tree based models/ensembles and probabilistic models. Jacob also written pomegranate, a python package that extends scikit-learn to flexible probabilistic modeling and some probabilistic graphical models with a focus on scalability through parallelized and out-of-core learning.