Abstract: pomegranate is a python package for probabilistic modeling with a primary emphasis on ease of use and a secondary emphasis on speed. In keeping with the first emphasis it has a consistent sklearn-like API for training and making predictions using a model, and a convenient "lego API" that allows complex models to be built out of simple components without needing to think about how the math might work. In keeping with the second emphasis the computationally intensive parts are written in efficient cython code and all models support both multithreaded parallelism and out-of-core computations for training on massive datasets. Currently, pomegranate allows you to use basic probability distributions to build general mixture models, naive Bayes classifiers, Markov chains, hidden Markov models, factor graphs, and Bayesian networks. In this talk I will show how to build models of increasing complexity with code examples and describing the type of phenomena they model well, drawing examples from "popular culture" and inadvertently proving how out of touch I am with today's youth. I will showcase both its speed and flexibility at each step with comparisons to other well-known packages such as numpy, scipy, and scikit-learn. Finally, I will show the simplicity of training a mixture of hidden Markov models in parallel using pomegranate.
Bio: Jacob Schreiber is a fifth year Ph.D. student and NSF IGERT big data fellow in the Computer Science and Engineering department at the University of Washington. His primary research focus is on the application of machine larning methods, primarily deep learning ones, to the massive amount of data being generated in the field of genome science. His research projects have involved using convolutional neural networks to predict the three dimensional structure of the genome and using deep tensor factorization to learn a latent representation of the human epigenome. He routinely contributes to the Python open source community, currently as the core developer of the pomegranate package for flexible probabilistic modeling, and in the past as a developer for the scikit-learn project. Future projects include graduating.