How Should We (Correctly) Compare Graphs?

Abstract: Graph representations of real-world phenomena are ubiquitous – from social and information networks, to technological, biological, chemical, and brain networks. Many graph mining tasks require a distance (or, conversely, a similarity) measure between graphs. Examples include clustering of graphs and anomaly detection, nearest neighbor and similarity search, pattern recognition, and transfer learning. Such tasks find applications in diverse areas including image processing, chemistry, and social network analysis, to name a few.

Intuitively, given two graphs, their distance is a score quantifying their structural differences. A highly desirable property for such a score is that it is a metric, i.e., it is non-negative, symmetric, positive-definite, and, crucially, satisfies the triangle inequality. Metrics exhibit significant computational advantages over non-metrics. For example, operations such as nearest-neighbor search, clustering, outlier detection, and diameter computation have known fast algorithms precisely when performed over objects embedded in a metric space.

Unfortunately, algorithms to compute several classic distances between graphs do not scale to large graphs; other distances do not satisfy all of the metric properties: non-negativity, positive definiteness, symmetry, and triangle inequality.

The purpose of this tutorial is to go over the recent and expanding literature of graph metric spaces, focusing specifically on tractable metrics. Furthermore, we also explain how to compute the distance between n graphs in a way that the resulting distance satisfy a generalization of the triangle inequality to n elements, and is still tractable.

Bio: José Bento completed his Ph.D. in Electrical Engineering at Stanford University where he worked with Professor Andrea Montanari on statistical inference and structural learning of graphical models. After his Ph.D., he moved to Disney Research, Boston lab, where he worked with Dr. Jonathan Yedidia on algorithms for distributed optimization, robotics, and computer vision. He is now with the Computer Science department at Boston College. His current research lies at the intersection of distributed algorithms and machine learning. In 2014 he received a Disney Inventor Award for his work on distributed optimization, which recently lead to an approved patent. In 2016 he was awarded a $10M NIH joint grant to study the emergence of antibiotic resistance and in 2017 a $2M NSF joint grant to study measures of distance between large graphs.