Engineering For Data Science Part 1

Abstract: Practicing data scientists typically spend the bulk of their time working developing models for a particular inference or prediction application, likely giving substantially less time to the equally complex problems stemming from system infrastructure. We might trivially think of these two often orthogonal concerns as the modeling problem and the engineering problem. The typical data scientist is trained to solve the former, often in an extremely rigorous manner, but can often wind up developing a series of ad hoc solutions to the latter.

This talk will discuss Docker as a tool for the data scientist, in particular in conjunction with the popular interactive programming platform, Jupyter, and the cloud computing platform, Amazon Web Services (AWS). Using Docker, Jupyter and AWS, the data scientist can take control of their environment configuration, prototype scalable data architectures, and trivially clone their work toward replicability and communication. This talk will toward developing a set of best practices for Engineering for Data Science.

Bio: Joshua Cook is a mathematician. He writes code in Bash, C, and Python and has done pure and applied for computational work in geospatial predictive modeling, quantum mechanics, semantic search, and artificial intelligence. He also has ten years experience teaching mathematics at the secondary and post-secondary level. His research interests lie in high-performance computing, interactive computing, feature extraction, and reinforcement learning. He is always willing to discuss orthogonality or to explain why Fortran is the language of the future over a warm or cold beverage.