Pomegranate: Fast and Flexible Probabilistic Modeling in Python
Pomegranate: Fast and Flexible Probabilistic Modeling in Python


Pomegranate is a Python package for probabilistic modeling that emphasizes both ease of use and speed. In keeping with the first emphasis, pomegranate has a convenient modular API for building complex models out of simple components and a simple sklearn-like API for using them. In keeping with the second emphasis, the computationally intensive parts of pomegranate are written in Cython and all models support multithreaded parallelism and out-of-core computations. In this talk I will give an overview of the features in pomegranate, including missing value support and the new data generators, and demonstrate how they can yield more accurate models. I will also demonstrate how one can use existing components to build neural probabilistic models, such as neural HMMs, using your favorite neural network package.

Session Outline
Part 1: What features does pomegranate have?
In this part you will get a high level overview of the pomegranate package, including the features that are included and how to use them. You will also learn why these features are important in realistic scenarios and see what tricks are involved with implementing them efficiently.

Part 2: A real-world example
In this part we will apply probabilistic modeling to a real world example in order to ground what we've learned.

Part 3: A dive into Bayesian networks
In this part we will do a dive into Bayesian networks in pomegranate, including the theory behind how they work and how they are implemented in pomegranate. We will also cover constraint graphs, which are a way of setting constraints on the process of learning the structure of a Bayesian network and can dramatically speed it up.

Part 4: Neural probabilistic models
Finally, you will learn how neural networks can be incorporated into probabilistic models using pomegranate. We will focus on the example of a neural hidden Markov model, which is similar to a normal hidden Markov model except that all of the hidden states are replaced by a single neural network.

Background Knowledge
The attendee should have a good understanding of Python and beginning-level knowledge of numpy, scikit-learn, and probability theory. If a user is familiar with probabilistic models such as HMMs or Bayesian networks, or has trained a machine learning model using scikit-learn, or is conceptually familiar with the idea of building machine learning models in Python, they will be well suited for this talk.


Jacob Schreiber is a post-doc and IGERT big data fellow in the Genome Science department at the University of Washington. His research focus is on the application of machine learning methods, primarily deep learning ones, to the massive amount of data being generated in the field of genome science. His research projects have involved predicting the three dimensional structure of the genome using convolutional neural networks and learning a latent representation of the human epigenome using deep tensor factorization. He also routinely contributes to the Python open source community as the core developer of pomegranate, a package for flexible probabilistic modeling, apricot, a package for data summarization for machine learning, and in the past as a core developer for the scikit-learn project.