Statistics for Data Science
Statistics for Data Science

Abstract: 

The emergence of data science as a discipline has impacted businesses in a range of different ways. One primary impact has been to elevate the use of data in decision-making by using statistical methods to assess the ever-growing datasets companies are collecting. This workshop will review and introduce statistical techniques and touch on more advanced methods for dealing with noisy data and applying real-world constraints to analyses. This workshop assumes a working knowledge of standard statistical methods and will aim to connect theory to practice using real-world examples.

Lesson 1: Descriptive statistics and exploring data statistically

- (Re)familiarize yourself with basic descriptive statistics
- Use simple data exploration techniques to identify problems and limitations of a new dataset

Lesson 2: Statistical analyses

- Review of statistical tests to compare datasets and groups within those data
- Assessments of correlations and other qualities of the data with an eye towards modeling

Lesson 3: More advanced analyses and methods

- Linear modeling and the statistical outputs thereof
- Stats -> ML: connections and methodologies


● Python;
● Jupyter notebooks;

https://github.com/zirmite/odsc-2020-statistics,
a setup doc: https://docs.google.com/document/d/1LjaQXflpNIKNOcbvXoB9AT-gq4mBgEryNE9rEFQdZd4/edit?usp=sharing, and a doc for the (eventual syllabus and links to other resources) https://docs.google.com/document/d/19PWp_GzrAa11bQ3E4Ge0kKyfQQNG4jPCRgzzSdJeVAY/edit?usp=sharing
https://drive.google.com/open?id=1KAP1nmMrSj6EE78XHVGSi4WLToR617TBNH8Mra6j5us

Bio: 

Andrew is a Ph.D. Astrophysicist who made the switch from academia to data science (via the Insight Data Science program) in 2014. He was the first data scientist hired at Greenhouse Software where he has worked on many internal data science projects and a few customer-facing data-powered product features. Andrew lives in New Jersey with his wife and son.