
Abstract: Have you ever wondered about how those data scientists at Facebook and LinkedIn make friend recommendations? Or how epidemiologists track down patient zero in an outbreak? If so, then this tutorial is for you. In this tutorial, we will use a variety of datasets to help you understand the fundamentals of network thinking, with a particular focus on constructing, summarizing, and visualizing complex networks.
Session Outline
Part 1: Introduction (40 min)
- Networks of all kinds: biological, transportation.
- Representation of networks, NetworkX data structures
- Grammar of network graphics with `nxviz`.
Part 2: Hubs and Paths (40 min)
- Finding important nodes; applications
- Pathfinding algorithms and their applications
- Hands-on: implementing path-finding algorithms
- Visualize degree and betweenness centrality distributions.
Part 3: Cliques, Triangles & Structures (60 min)
- Definition of cliques
- Triangles as the simplest complex clique, applications
- Using path-finding algorithms to find structures in a graph.
- Open triangles as recommender systems.
Part 4: Advanced Topics (60 min)
This will be lecture-style, chosen by the class. They may choose two from:
- Bipartite graphs.
- Statistical inference on graphs.
- Linear algebra and graph operations.
Background Knowledge
Participants should be comfortable operating in a Jupyter notebook environment. They should also be familiar with the Python programming language, in particular, how to write loops, and define and call functions. Knowledge of NumPy, Pandas, and matplotlib will be useful to have, though by no means required, as the necessary knowledge will be introduced just-in-time.
Bio: Eric is an Investigator at the Novartis Institutes for Biomedical Research, where he solves biological problems using machine learning. He obtained his Doctor of Science (ScD) from the Department of Biological Engineering, MIT, and was an Insight Health Data Fellow in the summer of 2017. He has taught Network Analysis at a variety of data science venues, including PyCon USA, SciPy, PyData, and ODSC, and has also co-developed the Python Network Analysis curriculum on DataCamp. As an open-source contributor, he has made contributions to PyMC3, matplotlib, and bokeh. He has also led the development of the graph visualization package nxviz, and a data cleaning package pyjanitor (a Python port of the R package).