Latent Variable Matrix Estimation via Collaborative Filtering

Abstract: Similarity based collaborative filtering for matrix estimation is a popular heuristic that has been used widely across industry in the previous decades to build recommendation systems, due to its simplicity and scalability. However, despite its popularity, there has been little theoretical foundation explaining its widespread success. In this talk, I will present theoretical guarantees for collaborative filtering under a nonparametric latent variable model, which arises from the natural property of "exchangeability", i.e. invariance under relabeling of the dataset. The analysis suggests that similarity based collaborative filtering can be viewed as kernel regression for latent variable models, where the features are not directly observed, and the kernel must be estimated from the data. In addition, while classical collaborative filtering typically requires a dense dataset, I will introduce an iterative collaborative filtering algorithm which compares larger radius neighborhoods of data to compute similarities. Our results show that this estimate converges even for very sparse datasets, which has implications towards sparse graphon estimation. The algorithms can be applied in a variety of settings, such as recommendations for online markets, analysis of social networks, or denoising crowdsourced labels.

Bio: Christina Lee Yu is a postdoc at Microsoft Research New England. She received her PhD and MS in Electrical Engineering and Computer Science from MIT in the Laboratory for Information and Decision Systems. She received her BS in Computer Science from California Institute of Technology. In fall of 2018, she will be starting as an assistant professor at Cornell University in the Operations Research and Information Engineering Department. She is a recipient of the MIT Jacobs Presidential Fellowship, the NSF Graduate Research Fellowship, and the Claude E. Shannon Research Assistantship. Her research focuses on designing and analyzing scalable algorithms for processing social data based on principles from statistical inference.

Open Data Science Conference