
Abstract: Our customers, store locations, constituents, research subjects, patients, crime locations, traffic accidents, and events don’t exist in a vacuum – their geographic locations provide important information that can best be displayed visually. Often, we have datasets that give location data in various ways: zip code, census tract or block, street address, latitude / longitude, congressional district, etc. Combining these data and getting critical insight into the relationships between them requires a bit of data munging skills that go beyond the basic data analysis we use for traditional tabular data.
There are many free, publicly available datasets about environmental exposures, socioeconomic status, climate, public safety events, and more that are linked to a geographic point or area, and this wealth of information gives us the opportunity to enrich our proprietary data with greater insight.
Whether it's comparing the number of asthma-related visits to the hospital with air quality data or looking at socioeconomic data and correlating it to commuting patterns, the presentation of geographic data in maps helps accelerate the transformation of raw data to actionable information. Maps are a well-known data idiom that are ideal for presenting complex data in an approachable way for non-technical stakeholders like policymakers, executives, and the press.
In this hands-on workshop, we will use R to take public data from various sources and combine them to find statistically interesting patterns and display them in static and dynamic, web-ready maps. This session will cover topics including geojson and shapefiles, how to munge Census Bureau data, geocoding street addresses, transforming latitude and longitude to the containing polygon, and data visualization principles.
Participants will leave this workshop with a publication-quality data product and the skills to apply what they've learned to data in their field or area of interest. Participants should have R and RStudio installed and have a basic understanding of how to use R and R Markdown for basic data ingestion and analysis. Ideally, participants will install the following packages prior to the workshop: tidyverse, leaflet, jsonlite, ggplot2, maptools, sp, rgdal, rgeos, scales, tmap.
Bio: Joy Payton is a data scientist and data educator at the Children’s Hospital of Philadelphia (CHOP), where she helps biomedical researchers learn the reproducible computational methods that will speed time to science and improve the quality and quantity of research conducted at CHOP. A longtime open source evangelist, Joy develops and delivers data science instruction on topics related to R, Python, and git to an audience that includes physicians, nurses, researchers, analysts, developers, and other staff. Her personal research interests include using natural language processing to identify linguistic differences in a neurodiverse population as well as the use of government open data portals to conduct citizen science that draws attention to issues affecting vulnerable groups. Joy holds a degree in philosophy and math from Agnes Scott College, a divinity degree from the Universidad Pontificia de Comillas (Madrid), and a data science Masters from the City University of New York (CUNY).

Joy Payton
Title
Supervisor, Data Education | Children's Hospital of Philadelphia
