Data Science Without Data Collection Using FedScale


Cloud computing has successfully accommodated the three ""V""s of Big Data into data science, but collecting everything into the cloud is becoming increasingly infeasible. Today, we face a new set of challenges. A growing awareness of privacy among individual users and governing bodies is forcing platform providers to restrict the variety of data we can collect. Often, we cannot transfer data to the cloud at the velocity of its generation. Many cloud users suffer from sticker shock, buyer's remorse, or both as they try to keep up with the volume of data they must process. Making sense of data closer to its home is more appealing than ever.

Federated learning is a growing field that attempts to address this challenge by distributing learning and analytics tasks to end-user devices. Although theoretical federated learning research is growing exponentially to meet these challenges, we are far from putting those theories into practice. In this talk, I will introduce FedScale, a scalable and extensible open-source federated learning and analytics platform. It provides high-level APIs to implement algorithms, a modular design to customize implementations for diverse hardware and software backends, and the ease of deploying the same code at many scales. FedScale also includes a comprehensive benchmark that allows data scientists to evaluate their ideas in realistic, large-scale settings. I will highlight a select few systems successfully built using FedScale and share insights from benchmarking recent algorithms using FedScale.


Mosharaf Chowdhury is a Morris Wellman associate professor of CSE at the University of Michigan, Ann Arbor, where he leads the SymbioticLab. His work improves application performance and system efficiency of machine learning and big data workloads. He is also building software solutions to monitor and optimize the impact of machine learning systems on energy consumption and data privacy. His group developed Infiniswap, the first scalable software solution for memory disaggregation; Salus, the first software-only GPU sharing system for deep learning; FedScale, the largest federated learning benchmark and a scalable and extensible federated learning engine; and Zeus, the first GPU energy-vs-training performance tradeoff optimizer for DNN training. In the past, Mosharaf did seminal works on coflows and virtual network embedding, and he was a co-creator of Apache Spark. He has received many individual awards and fellowships, thanks to his stellar students and collaborators. His works have received seven paper awards from top venues, including NSDI, OSDI, and ATC, and over 22,000 citations. Mosharaf received his Ph.D. from UC Berkeley in 2015.

Open Data Science




Open Data Science
One Broadway
Cambridge, MA 02142

Privacy Settings
We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
Consent to display content from Youtube
Consent to display content from Vimeo
Google Maps
Consent to display content from Google