How Should We (Correctly) Compare Graphs?

Abstract: Graph representations of real-world phenomena are ubiquitous – from social and information networks, to technological, biological, chemical, and brain networks. Many graph mining tasks require a distance (or, conversely, a similarity) measure between graphs. Examples include clustering of graphs and anomaly detection, nearest neighbor and similarity search, pattern recognition, and transfer learning. Such tasks find applications in diverse areas including image processing, chemistry, and social network analysis, to name a few.

Intuitively, given two graphs, their distance is a score quantifying their structural differences. A highly desirable property for such a score is that it is a metric, i.e., it is non-negative, symmetric, positive-definite, and, crucially, satisfies the triangle inequality. Metrics exhibit significant computational advantages over non-metrics. For example, operations such as nearest-neighbor search, clustering, outlier detection, and diameter computation have known fast algorithms precisely when performed over objects embedded in a metric space.

Unfortunately, algorithms to compute several classic distances between graphs do not scale to large graphs; other distances do not satisfy all of the metric properties: non-negativity, positive definiteness, symmetry, and triangle inequality.

The purpose of this tutorial is to go over the recent and expanding literature of graph metric spaces, focusing specifically on tractable metrics. Furthermore, we also explain how to compute the distance between n graphs in a way that the resulting distance satisfy a generalization of the triangle inequality to n elements, and is still tractable.

Bio: Sam Safavi completed his Ph.D. in Electrical and Computer Engineering at Tufts University. He got his M.Sc. degree in Automated Systems and Control from University of Sheffield, and his B.Sc. degree in Electrical Engineering from University of Tehran. After his Ph.D., he worked with Prof. Bento as a post-doctoral researcher at the Computer Science department at Boston College, where he studied measures of distance between multiple graphs. Sam has recently joined Wise Systems as an applied research scientist. His research interests include distributed algorithms, machine learning and optimization.