R Packages as Collaboration Tools

Abstract: Data science teams often struggle with maintaining clean code libraries and sharing code between members to avoid duplicated efforts and rework. In addition, when team members leave an organization, institutional knowledge and coding skill is frequently lost unless documentation is diligent and substantial effort is committed to maintaining code repositories. This session will advocate for building internal R packages to combat these challenges.

R packages are excellent tools to manage important code and its documentation, creating sustainable, collaborative bodies of work. As an example of this in practice, the open source package uptasticsearch (developed at Uptake and available on CRAN) will be discussed. This package began as a set of scripts owned by one team member and was developed into a package that allows the entire Uptake data science organization (and now the general public) to easily access ElasticSearch stored data.

Some of the elements covered in this session will include:
-Why it’s worth spending the time to build R packages for internal use instead of maintaining assorted discrete scripts
-Where to start with R package building, and how to design a package to be sustainable and collaborative
-Techniques for converting complicated scripts into modular functions for packaging
-The importance of quality documentation and how to create and maintain it
-Methods for making contributing and collaborating on packages welcoming and easy for all members of the team, not just the package creator

Attendees can expect to learn the essentials of building an R package out of existing scripts, including planning your R package and pulling scripts together thematically, a few ways for sharing R packages with internal audiences, and guidance for smooth collaboration on packages within teams.

Bringing software engineering best principles to data science workflows lets you spend less time managing code and more time doing science.

Bio: Stephanie is a data scientist at Uptake in Chicago, where she focuses on building supervised learning models to predict and diagnose mechanical failures. Prior to joining Uptake, she worked in data analytics in the public sector, including at the University of Chicago Urban Labs. Her academic background is in quantitative sociology and education research. In addition to her work at Uptake, Stephanie teaches undergraduate courses in the College of Science and Health at DePaul University.