Immersive AI Session Schedule
We are delighted to announce our schedule which lists ~50% of our sessions. Session times will be added in the coming week in addition to more Workshops and Career Mentoring talks.
Training | Deep Learning | Open Source Data Science | Intermediate
Lesson 1: The Unreasonable Effectiveness of Deep Learning
Introduction to Neural Networks and Deep Learning
The Deep Learning Families and Libraries
Lesson 2: Essential Deep Learning Theory
The Cart Before the Horse: A Shallow Neural Network in TensorFlow 2.0
Learning with Artificial Neurons
TensorFlow Playground—Visualizing a Deep Net in Action
Lesson 3: Deep Learning with TensorFlow 2.0
Revisiting our Shallow Neural Network
Deep Nets in TensorFlow
What to Study Next, Depending on Your Interests
Abstract: Relatively obscure a few short years ago, Deep Learning is ubiquitous today across data-driven applications as diverse as machine vision, natural language processing, and super-human game-playing.
This Deep Learning primer brings the revolutionary machine-learning approach behind contemporary artificial intelligence to life with interactive demos featuring TensorFlow 2.0, the major, cutting-edge revision of the world’s most popular Deep Learning library.
To facilitate an intuitive understanding of Deep Learning’s artificial-neural-network foundations, the essential theory will be introduced visually and pragmatically. Paired with tips for overcoming common pitfalls and hands-on code run-throughs provided in straightforward Jupyter notebooks, this foundational knowledge empowers you to build powerful state-of-the-art Deep Learning models…more details
Training | Machine Learning | Open Source Data Science
Machine learning has become an indispensable tool across many areas of research and commercial applications. From text-to-speech for your phone to detect the Higgs boson, machine learning excels at extracting knowledge from large amounts of data. This talk will give a general introduction to machine learning, as well as introduce practical tools for you to apply machine learning in your research. We will focus on one particularly important subfield of machine learning, supervised learning. The goal of supervised learning is to “learn” a function that maps inputs x to an output y, by using a collection of training data consisting of input-output pairs. We will walk through formalizing a problem as a supervised machine learning problem, creating the necessary training data and applying and evaluating a machine learning algorithm. The talk should give you all the necessary background to start using machine learning yourself…more details
Training | Kickstarter | Beginner
Curious about Data Science? Self-taught on some aspects, but missing the big picture? Well you’ve got to start somewhere and this session is the place to do it. This session will cover, at a layman’s level, some of the basic concepts of data science. In a conversational format, we will discuss: What are the differences between Big Data and Data Science – and why aren’t they the same thing? What distinguishes descriptive, predictive, and prescriptive analytics? What purpose do predictive models serve in a practical context? What kinds of models are there and what do they tell us? What is the difference between supervised and unsupervised learning? What are some common pitfalls that turn good ideas into bad science? During this session, attendees will learn the difference between k-nearest neighbor and k-means clustering, understand the reasons why we do normalize and don’t overfit, and grasp the meaning of No Free Lunch…more details
Training | Quant Finance | Open Source Data Science | Beginner – Intermediate
The rapid progress in machine learning (ML) and the massive increase in the availability and diversity of data has enabled novel approaches to quantitative investment. It has also increased the demand for the application of data science to develop both discretionary and algorithmic trading strategies.
In this workshop, we will cover popular use cases for ML in the investment industry, and how data science and ML fit into the workflow of developing a trading and investment strategy from the identification and combination of alpha factors to strategy backtesting and asset allocation.
We will see how a broad range of ML techniques can be used to extract tradeable signals. In particular, the rise of alternative data, i.e. sources beyond market and fundamental data, has created the need to apply deep learning for natural language processing and image classification. We will also take a look at how reinforcement learning can be used to train an agent interactively on market data.
The workshop uses Python and various standard data science and machine learning libraries like pandas, scikit-learn, gensim, spaCy as well as TensorFlow and Keras. The code examples will be presented using jupyter notebooks and are based on my book ‘Machine Learning for Algorithmic Trading’…more details
Workshop | Open Source Data Science | Beginner – Intermediate
As we use the internet for more and more things, for a website, knowing what a person needs, wants and would want is not just a nice-to-have feature but almost a must-have. Today recommendation systems suggest products, films/shows, music, friends/dates, etc. The recommender works by finding similar items to a given item or user. For that, each item first needs to be quantified and represented in way that allows comparison against each-other. This workshop will begin with the origins of recommendation systems, discuss how they are built and where they are today. We will review the tools and techniques used to build a recommender. Attendees will understand what type of data is necessary, and they will get a sense of what would make an effective recommender. Having a good recommendation system might be the lowest-hanging-fruit for most websites and companies. It’s relatively easy to achieve and it provides immediate value addition to the consumer. We will discuss how to measure such value gain and the challenges in quantifying that. This workshop will review the two major approaches to creating recommendations: “content-based filtering” and “collaborative filtering”….more details
Workshop | Data Science Management | Machine Learning | Intermediate
Bridging the gap from research and prototypes to production software continues to be a major challenge for most maturing data science teams. In this workshop we will discuss and demonstrate how various Data Science workflows and tasks are deployed into production.
We will center the discussion around three different use cases and workflows – machine learning models and pipelines, ETL/batch processing jobs, and real-time Analytics. We will start by demonstrating how to quickly deploy a simple machine learning model using Flask and Docker and then extend this into a more holistic end-to-end machine learning pipeline with AWS Sagemaker. Next, we will use Apache Airflow and Kubernetes to deploy and monitor an automated batch processing pipeline. Finally, we will demonstrate how to capture, transform and analyze streaming data in real time using AWS Kinesis and AWS Lambda.
For each of these workflows, we will introduce a specific use case and follow with an exploration of the technologies used to deploy these at production scale. We will discuss the reasons for these choices – ultimately leading to some higher-level takeaways on matching system requirements to product requirements…more details
Workshop | Machine Learning | Beginner
This 1.5-hour session will be a survey through the data science workflow, including different types of algorithms, and to explore common language and misconceptions in the data science industry. The intention of this workshop is to establish a framework for individuals with no data science expertise to come away with a theoretical understanding on the data science landscape and terminology.
• Artificial Intelligence vs Machine Learning (ML) vs Deep Learning (DL)
• Data Science Workflow
• Machine Learning
• Neural Networks
• Common Applications of ML and Algorithms Used
• Popular Data Sets
• Popular Platforms and Demos
• Popular Machine learning frameworks for Software Engineers..more details
Training | Data Visualization | Open Source | Intermediate
Shiny is an R package that can be used to build interactive web pages with R. This might sound strange or scary, but you don’t need to have any web knowledge – it’s just R! If you’ve ever written an analysis in R and you want to make it interactive, you can use Shiny. If you’ve ever written a function or model that you want to share with others who don’t know how to use R, you can use Shiny. Shiny has many use cases, and this course will help you see how you can leverage it in your own work. In this workshop, you’ll learn how to take a Shiny app from start to finish – we’ll start by building a simple Shiny app to interactively visualize a dataset, and deploy it online to make it accessible to the world…more details
This interactive session will cover topics and provide hands-on examples and resources critical to landing your dream job in Data Science including:
Crafting your narrative and supporting it with a data science specific resumé, LinkedIn profile, project portfolio and git repo
Setting structure for the job search; key behaviors and simple tips to keep focus and make progress
Getting what you deserve; advice for networking, meetings, interviews and negotiations (as well as a review of data scientist salary data!)..more details
Workshop | Open Source Data Science | Machine Learning | Intermediate
Time series data differs from cross-sectional data in that time series data has temporal dependence, which can be leveraged to forecast future values of the series. Some of the most important and commonly used data science techniques to analyze time series data and make forecast based on them are those in developed in the field of statistics and machine learning. For this reason, time series statistical and machine learning models should be included in any data scientists’ toolkit.
This workshop teaches the application of two important classes of time series statistical models (Autoregressive Integrated Moving Average Model and Vector Autoregressive Model) and an important set of neural network-based algorithms (Recurrent neural network) in time series forecasting. The attendees will learn the mathematical formulation, python implementation, the advantages, and disadvantages of when using these techniques in time series analysis. Jupyter notebooks with examples and sample codes will be provided for attendees to follow along and experiment with these techniques…more details
Workshop | Data Visualization | Open Source Data Science | Beginner – Intermediate
The field of data visualization is undergoing rapid growth and is repeatedly recognized as a vital component of effective data science communication. Good data visuals help practitioners make sense of complex patterns, allow for enhanced collaboration among colleagues, and perhaps most importantly, facilitate conversations between specialists and stakeholders in making data-driven decisions. Leveraging off-the-shelf visualization tools can yield massive time savings, but some data stories require special formats and further customization. This workshop focuses on building bespoke visuals with lines of Python code. After a brief introduction to the topic, participants will learn how to generate and style custom visualizations using a variety of Python libraries. Essential plotting libraries like matplotlib and seaborn will be highlighted through a series of increasingly sophisticated examples. Attention will then shift toward building interactive visuals, including a demonstration of Plotly. Participants will gain technical experience with popular Python plotting libraries as well as practical insight about the data visualization thought process...more details
Training | Open Source Data Science | Intermediate
Whether in R, MATLAB, Stata, or python, modern data analysis, for many researchers, requires some kind of programming. The preponderance of tools and specialized languages for data analysis suggests that general purpose programming languages like C and Java do not readily address the needs of data scientists; something more is needed.
In this workshop, you will learn how to accelerate your data analyses using the Python language and Pandas, a library specifically designed for interactive data analysis. Pandas is a massive library, so we will focus on its core functionality, specifically, loading, filtering, grouping, and transforming data. Having completed this workshop, you will understand the fundamentals of Pandas, be aware of common pitfalls, and be ready to perform your own analyses…more details
Training | Machine Learning | Data Science for Good | Beginner – Intermediate
Machine learning models are increasingly used to inform high-stakes decisions about people. Discrimination in machine learning becomes objectionable when it places certain privileged groups at systematic advantage and certain unprivileged groups at systematic disadvantage. We have developed the AI Fairness 360 (AIF360), a comprehensive Python package (https://github.com/ibm/aif360) that contains nine different algorithms, developed by the broader algorithmic fairness research community, to mitigate that unwanted bias. AIF360 also provides an interactive experience (http://aif360.mybluemix.net/data) as a gentle introduction to the capabilities of the toolkit for people unfamiliar with Python programming. Compared to existing open source efforts on AI fairness, AIF360 takes a step forward in that it focuses on bias mitigation (as well as bias checking), industrial usability, and software engineering. In our proposed hands-on tutorial, we will teach participants to use and contribute to AIF360 enabling them to become some of the first members of the community. Toward this goal, all participants in this tutorial will get to experience first-hand: 1) how to use the metrics provided in the toolkit to check fairness of an AI application, and 2) how to mitigate bias they discover. Our goal in creating a vibrant community, centered around the toolkit and its application, is to contribute to efforts to engender trust in AI and make the world more equitable for all…more details
Training | Data Visualization | Machine Learning | Beginner-Intermediate
If you’ve never heard of the “good, fast, cheap” dilemma, it goes something like this: You can have something good and fast, but it won’t be cheap. You can have something good and cheap, but it won’t be fast. You can have something fast and cheap, but it won’t be good. In short, you can pick two of the three but you can’t have all three.
If you’ve done a data science problem before, I can all but guarantee that you’ve run into missing data. How do we handle it? Well, we can avoid, ignore, or try to account for missing data. The problem is, none of these strategies are good, fast, *and* cheap.
We’ll start by visualizing missing data and identify the three different types of missing data, which will allow us to see how they affect whether we should avoid, ignore, or account for the missing data. We will walk through the advantages and disadvantages of each approach as well as how to visualize and implement each approach. We’ll wrap up with practical tips for working with missing data and recommendations for integrating it with your workflow! ... more details
Training | Machine Learning | Beginner – Intermediate
Coming soon…more details
Getting data and tech solutions to work effectively, represent all voices, and do more good than harm is a human-intensive design process. This workshop will be focused on the best practices DataKind has developed in the past seven years of working on projects using data science in the service of humanity. Attendees will hear stories about a few of our failures, how we course corrected, and how a return to the core of our practice has always ensured we create analyses and algorithms that drive impact. Additionally, participants will be exposed to the ways doing data science in the social sector can differ from academia and industry. As conversations about tech and AI begin to focus on their threats to society, we’d also like to host a conversation that gives constructive steps forward, so that folks can balance between the two extremes of unfettered techno-positivism and grim dystopian hand-wringing and instead get to building wonderful new tools that help where it matters most. In this session, Jake Porway, Founder and Executive Director of DataKind, will hew closer to the implementation of human-centered design and the “human” story at the center of project implementation successes and failures…more details
Workshop | Open Source Data Science | Beginner – Intermediate
If you can write a basic model in Python’s scikit-learn library, you can make the leap to Bayesian inference with PyMC3, a user-friendly intro to probabilistic programming in Python! The only requisite background for this workshop is minimal familiarity with Python, preferably with some exposure to building a model in sklearn.
Probabilistic programming (PP) means building models where the building blocks are probability distributions. We can use PP to do Bayesian inference easily. Bayesian inference is historically a fairly established method but it’s gaining prominence in data science because it’s now easier than ever to use Python to do the math. Bayesian inference is a different paradigm of statistics than maybe we’re used to, but it also allows us to solve problems that aren’t otherwise tractable with classical methods. In this workshop, we’ll work through actual examples of models using PyMC3, including hierarchical models.
By the end of this presentation, you’ll know the following:
* What probabilistic programming is and why it’s necessary for Bayesian inference
* What Bayesian inference is, how it’s different from classical frequentist inference, and why it’s becoming so relevant for applied data science in the real world
* How to write your own Bayesian models in the Python library PyMC3, including metrics for judging how well the model is performing
* How to go about learning more about the topic of Bayesian inference and how to bring it to your current data science job…more details
Workshop | Open Source Data Science | Beginner-Intermediate
The rise of online social platforms has resulted in an explosion of written text in the form of blogs, posts, tweet, wiki pages, etc. This new wealth of data provides a unique opportunity to explore natural language in its many forms, both as a way of automatically extracting information from written text and as a way of artificially producing text that looks natural.
In this session we will introduce viewers to natural language processing from scratch. Each concept is introduced and explained through coding examples using nothing more than just plain Python and numpy. In this way, viewers will learn in depth about the underlying concepts and techniques instead of just learning how to use a specific NLP library.
In particular, we will cover:
– One-Hot Encoding
– Bag of Words
– Document clustering
– Sentiment Analysis
– Word embeddings…more details
In order to do data science, one needs skills in math, statistics, and computer science. In this talk, I will take you through how while these skills create the technical foundation for a data scientist’s work, other highly important transferable skills include critical thinking and problem solving, data visualization and the ability to translate and present quantitative data into actionable business insights. From having worked in healthcare, finance and life insurance, I discuss how the idea is more about the fundamental technical problem, scientific investigative skills, creativity and the story than about the field...more details
Coming soon!..more details
Mentor Talk | Beginner
In 2019, every company is a technology company whether they like it or not. Regardless of industry or vertical, future successful organizations will be required to embrace the power of data science. From Robotic Process Automation to AI, prescriptive analytics to machine learning algorithms – data science is being incorporated into every aspect of business regardless of discipline or geography.
While this represents tremendous opportunities for data scientists, it also comes with challenges. Because the field is evolving so rapidly, it means that data scientists will have to stay ahead of the curve in order to be successful. They will need to constantly be aware of trending and understand how their skills are going to map to what is in demand, what employers are paying for.
Today’s economic climate also requires data scientists to understand the business value of data science as companies develop unique, innovative and monetizable portfolios. These include but are not limited to areas like:
• new mobility models (autonomous vehicles, ride-sharing, Hyperloop)
• genetic engineering (CRISPR, immunotherapies)
• extraterrestrial travel (taking cargo and humans to the ISS, the Moon, Mars)
• robotics (supply chain to surgery)
• financial services (cryptoassets, blockchain)
• media and entertainment (AR/VR, mixed reality)
In my session, I will help attendees better understand career and learning paths for data science. I will start with socio-historical context, and then describe trending and future opportunities across a range of sectors and verticals. I will also share concrete guidance on how to stay on top of these trends using my three Future Career Tools – Voice, Antenna and Mesh…more details
Most empirical work involves data on units such as individuals, households and/or organizations of various types. In circumstances such as these, analysts must ensure that such data are used responsibly and ethically. In practical terms, this requires that the private interests of individual privacy and data confidentiality be balanced against the social benefits of access and use.
It is critical to address privacy and confidentiality issues if the full public value of data is to be realized. This presentation will highlight why the challenges need to be met; review the past, point out challenges with this approach in the new data world; briefly describe the current state of play from a legal, technical, and statistical perspective; and point to open questions that need to be addressed in the future…more details
Given the proliferation of options for education in data analytics and data science, it is not easy to choose the right program to help one achieve his/her goals. Credit vs. non-credit; degree vs. non-degree; online vs. face-to-face vs. hybrid; quick vs. protracted are all questions facing those that want to further their education. In this session, I will help you learn what questions to ask of different programs in order to determine the best fit for YOU.
Quantitative finance is a rich field in finance where advanced mathematical and statistical techniques are employed by both sell-side and buy-side institutions.
Machine learning techniques are now being increasingly used by financial firms for generating profitable trading strategies and for automating various processes. Quants typically have background in hard sciences and mathematical finance. In addition to classical techniques like derivatives modeling, asset allocation etc. quants now need a good understanding of machine learning models and statistical methods. Experience in deep learning techniques for text and image processing is required for handling unstructured datasets (also called alternative data).
Data scientists transitioning into quant finance should develop a solid foundation in financial concepts and business knowledge. Data visualization and tools to explain how the models work under the hood are also crucial. Python is becoming the language of choice for scientific computing and machine learning. Data scientists should develop strong computing skills with focus on data analysis, storage and handling of unstructured datasets…more details
Data Science as a professional discipline is still quite young. As such, much of the collective effort thus far has been dedicated to codifying the technical approaches to building data science tools and products. The long-term success of the discipline, however, will be highly dependent on our ability to manage teams and build career paths. In this talk, Drew will discuss three core components of data science management: recruiting and hiring processes; project definition and execution; and people management and performance reviews. ..more details
Ever since Harvard declared “Data Scientist” the sexiest job of the 21st century, there’s been a mad rush to build the coding and statistics skills to get this title. Yet the focus on being a ‘scientist’ ignores the myriad of other roles and skills that are important in every data science team. In this talk, I’ll take you through how I built a career that allows me to leverage all my strengths – not just the most technical ones. Hopefully, by the end of the talk you’ll gather that being in data science isn’t just for math and computer wizards- but also for subject matter experts, great communicators, and even product people, and will walk away with the steps to forge your own path moving forward…more details
Coming soon!…more details