Towards a Zero-One Law for Column Subset Selection
Towards a Zero-One Law for Column Subset Selection

Abstract: 

There are a number of approximation algorithms for NP-hard versions of low rank approximation, such as finding a rank-k matrix minimizing the sum of absolute values of differences to a given n x n matrix A, or more generally finding a rank-k matrix which minimizes the sum of p-th powers of absolute values of differences to A. Many of these algorithms are linear time columns subset selection algorithms, returning a subset of columns whose cost is no more than a poly(k) factor larger than the cost of the best rank-k matrix. The above error measures are special cases of the following general entrywise low rank approximation problem: given an arbitrary function f, find a rank-k matrix B which minimizes |A-B|_F. A natural question is which functions admit efficient approximation algorithms? Indeed, this is a central question of recent work studying generalized low rank models. In this work we give approximation algorithms for every function which is approximately monotone and satisfies an approximate triangle inequality, and we show both of these conditions are necessary. Further, our algorithm is efficient if the function admits an efficient approximate regression algorithm. Our approximation algorithms handle functions which are not even scale-invariant, such as the Huber loss function, which we show have very different structural properties than norms, e.g., one can show the lack of scale-invariance causes any column subset selection algorithm to provably require a factor sqrt{log n} larger number of columns than p-norms; nevertheless we design the first efficient column subset selection algorithms for such error measures.
Joint work with Zhao Song and Peilin Zhong.

Bio: 

David Woodruff is an associate professor at Carnegie Mellon University since 2017. Prior to that he was at the IBM Almaden Research Center for ten years, which he joined after completing his Ph.D. at MIT in theoretical computer science. His research interests include data stream algorithms, machine learning, numerical linear algebra, sketching, and sparse recovery. He is the recipient of the 2014 Presburger Award and Best Paper Awards at STOC 2013 and PODS 2010. At IBM he was a member of the Academy of Technology and a Master Inventor.