Visualization throughout the data science workflow: Why it’s useful, and how not to lie

Abstract: Scientists, data scientists included, want to believe that ""the data speaks for itself,"" focusing on numerical methods and analyses over visualization and communication. However, data visualization is an essential tool in a data scientist's toolbox. Data visualization allows you to see patterns that would be invisible - or inefficiently visible - from looking at numbers alone. These patterns help make sense of data, facilitating good decision-making throughout the data science workflow.

Beyond exploratory data analysis, data visualization can help you build better statistical and machine learning models. Later on in your workflow, it can speed up your understanding of the resulting patterns in your data. And ultimately, there is no point in doing good science if you're not able to communicate the results, whether you are contributing to the foundation of scientific understanding or making insight actionable.

In this tutorial, we will step beyond extolment of the importance of data visualization to walk through some accessible examples of where visualization serves its purpose in the data science workflow (thus visually demonstrating the importance of visualization!).

In Part I, we will explore several main, overlapping applications to which data visualization can bring efficiency or effectiveness: (1) data science process, (2) data science insight, and (3) data science communication.

In Part II, we will build on Part I by discussing some considerations that are often overlooked when choosing and designing visualizations, including (1) when data visualization does or does not aid interpretation, and (2) the importance of integrity in communication and design decision-making.

At the end of this talk, the goal is for participants to leave with new ideas of how to integrate visualization into their data science workflows, and a better understanding of how to make decisions on visualization design that will facilitate clear and effective communication.

Bio: Lindsay is a data scientist at T4G Limited, where she works with clients to provide the analytics and visualizations required for data-driven business decisions. Her data journey began with a MA and PhD in Biogeochemistry, from Boston University and Brown University respectively. With more than a decade of experience in research methods and the scientific process, she excels at asking incisive questions and using data to tell compelling stories, and is skilled at translating insight into impact.
Lindsay is passionate about teaching data skills to a variety of audiences, from academic and professional to amateur. Through this work, she has developed and taught workshops and online courses at the University of New Brunswick, and is a Data Carpentry instructor and Canada Learning Code chapter co-lead. When she’s not joyfully up to her knees in data, she can be found trail running, backpacking, doing yoga, cooking, or playing drums in a rock ban.