Latent Variable Matrix Estimation via Collaborative Filtering
Latent Variable Matrix Estimation via Collaborative Filtering


Similarity based collaborative filtering for matrix estimation is a popular heuristic that has been used widely across industry in the previous decades to build recommendation systems, due to its simplicity and scalability. However, despite its popularity, there has been little theoretical foundation explaining its widespread success. In this talk, I will present theoretical guarantees for collaborative filtering under a nonparametric latent variable model, which arises from the natural property of "exchangeability", i.e. invariance under relabeling of the dataset. The analysis suggests that similarity based collaborative filtering can be viewed as kernel regression for latent variable models, where the features are not directly observed, and the kernel must be estimated from the data. In addition, while classical collaborative filtering typically requires a dense dataset, I will introduce an iterative collaborative filtering algorithm which compares larger radius neighborhoods of data to compute similarities. Our results show that this estimate converges even for very sparse datasets, which has implications towards sparse graphon estimation. The algorithms can be applied in a variety of settings, such as recommendations for online markets, analysis of social networks, or denoising crowdsourced labels.


Christina Lee Yu is a postdoc at Microsoft Research New England. She received her PhD and MS in Electrical Engineering and Computer Science from MIT in the Laboratory for Information and Decision Systems. She received her BS in Computer Science from California Institute of Technology. In fall of 2018, she will be starting as an assistant professor at Cornell University in the Operations Research and Information Engineering Department. She is a recipient of the MIT Jacobs Presidential Fellowship, the NSF Graduate Research Fellowship, and the Claude E. Shannon Research Assistantship. Her research focuses on designing and analyzing scalable algorithms for processing social data based on principles from statistical inference.

Open Data Science




Open Data Science
One Broadway
Cambridge, MA 02142

Privacy Settings
We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
Consent to display content from - Youtube
Consent to display content from - Vimeo
Google Maps
Consent to display content from - Google