Deploying Large Spark Models and scoring in Real time at Scale

Abstract: I will answer these five simple questions;

1. How does one build a pyspark model and deploy it in a scala pipeline with no code rewrite - Solving the greatest fights between datascientist who want to code in python and data engineers who like the tried and tested type safety of the JVM.

2. How does one beat the spark context latency to serve spark models in milliseconds to handle near realtime business needs

3. How does one build a ML model, zip it up and deploy it across platforms in a completely vendor neutral way i.e. build your model on AWS and deploy it on GCP or vice-versa.

4. How does one leverage the years of efforts spent in software engineering and use it directly in building datascience pipelines without reinventing the wheel and pain.

5. How does on build a completely GDPR compliant machine learning model with 0.88 on the ROC curve.

Bio: A dataengineer/datascientist and chief builder of AbundanceAI. Having worked in core distributed systems, high frequency finance and ecommerce, he loves solving problems with data and scale. He measures his success by the number of successful ML models that he pushes into production and the number of times he did not order pizza after trying out his latest culinary experiment. An avid long distance runner and an aspiring chef.

https://medium.com/@subhojit20_27731/apache-spark-and-amazon-s3-gotchas-and-best-practices-a767242f3d98
https://hackernoon.com/no-you-dont-need-personal-data-for-personalization-de9222cff8e4

Open Data Science Conference