Superior Training for Career Enhancing Skills
Workshop & Training List
Introduction to Machine Learning with Andreas Mueller, core developer of scikit-learn
The resurging interest in machine learning is due to multiple factors including growing volumes and varieties of data,and cheaper computational processing. Thus making it possible to quickly and automatically produce models that can analyze bigger, more complex data and deliver faster, more accurate results on a very large scale. scikit-learn (http://scikit-learn.org/) has emerged as one of the most popular open source machine learning toolkits, now widely used in academia and industry.
scikit-learn provides easy-to-use interfaces in Python to perform advanced analysis and build powerful predictive models.
Andreas Mueller received his MS degree in Mathematics (Dipl.-Math.) in 2008 from the Department of Mathematics at the University of Bonn. In 2013, he finalized his PhD thesis at the Institute for Computer Science at the University of Bonn. After working as a machine learning scientist at the Amazon Development Center Germany in Berlin for a year, he joined the Center for Data Science at the New York University in the end of 2014. In his current position as assistant research engineer at the Center for Data Science, he works on open source tools for machine learning and data science. He is one of the core contributors of scikit-learn, a machine learning toolkit widely used in industry and academia, for several years, and has authored and contributed to a number of open source projects related to machine learning.
This workshop will cover basic concepts of machine learning, such as supervised and unsupervised learning, cross-validation and model selection, and how they map to programming concepts in scikit-learn. Andreas will demonstrate how to prepare data for machine learning, and go from applying a single algorithm to building a machine learning pipeline. We will cover the trade-offs of learning on large datasets, and describe some techinques to handle larger-than-RAM and streaming data on a single machine.
Introduction to Deep Learning with Tensorflow Contributor and Kaggle Winner Dan Becker
Machine learning applications depend on a researcher handcrafting features. Deep learning has become hugely important since it allows a deep network to learn features by itself. Deep learning modeling is mostly unsupervised and by utilizing large-scale neural nets allow computers to learn and “think” by itself without the need for direct human intervention. Deep learning promises to revolutionize tasks such as image recognition, speech recognition, and other AI challenges.
Dan is the Technical Product Director at DataRobot. He has broad data science expertise, with consulting experience for 6 companies from the Fortune 100, a 2nd place finish in Kaggle’s $3million Heritage Health Prize, and contributions to the Keras and Tensorflow libraries for deep learning. Dan has a PhD in Econometrics from the University of Virginia.
– Basic model set-up in Keras
– Multi-class classification
– Convolutional neural networks
– Debugging deep learning models
Deep Learning with H2O Open Platform with Jo-fai (Joe) Chow
H2O is fast scalable open-source machine learning and deep learning platform. Using in-memory compression techniques, H2O can handle billions of data rows in-memory — even on small compute clusters. The platform includes interfaces for R, Python, Scala, Java, JS and JSON, along with its interactive graphical Flow interface that make it easier for non-engineers to stitch together complete analytic workflows. H2O was built alongside (and on top of) both Hadoop and Spark clusters and is deployed within minutes. It is a math and machine learning engine that brings distribution and parallelism to powerful algorithms that enable you to make better predictions and more accurate models faster.
Jo-fai (Joe) is a data scientist at H2O.ai. Before joining H2O, he was in the business intelligence team at Virgin Media where he developed data products to enable quick and smart business decisions. He also worked remotely for Domino Data Lab as a data science evangelist promoting products via blogging and giving talks at meetups.
- Pipe Search
- Deep Water
Step by Step Bot Construction - Micheleen Harris
This tutorial will ramp up the attendee very quickly on the Microsoft Bot Framework, providing sample code upon which to base a bot experience. In this tutorial the attendee will build their own intelligent bot. This tutorial will help attendees decide if they want to make a bot to solve a repetitive task they have encountered or they know might be useful to others. User experience will be heavily emphasized to create the best bot experiences. Components will be laid out for the attendee.
Micheleen is a Data Scientist and trainer at Microsoft where she shares her Python, R and advanced analytics experience internally and externally. She has led or co-led workshops around data science and analytics concepts in Python and R, often utilizing Jupyter notebooks for interactive coding. Micheleen has developed a “Python for the Data Scientist” course delivered on Jupyter notebooks and have delivered this at Microsoft several times and look forward to its external release. She has also delivered courses utilizing Microsoft Azure and covering DocumentDB, Cognitive Services, the Bot Framework, as well as other components of the Cortana Intelligence Suite. She enjoys teaching/training and finding the most effective ways to teach data science and advanced analytics on any size dataset.
- Cognitive services overview
- What are Cognitive APIs
- Introduction for Bot Framework Part
- Learning objectives
- Bot Framework Overview
- What a bot is and is not
- The major components of the Bot Framework
- Deploying and working with channels
- Your arsenal or toolbox
- Developer’s Introduction and Building an intelligent bot with Bot Builder Node.js SDK
- Toolbox – Go over prereqs
- Setup project in VSCode (and set up debugger)
- Get code from course website with Git
- Update with Vision API key from Cognitive Services “My Account”
- Test with emulator
- Create more bots! Follow along or create your own
There are a few things you will need in order to take full advantage of the course:
Please bring a laptop with internet connectivity.
- Node.js with npm installed locally – get the latest at:
- Visual Studio Code [recommended] or equivalent code editing and debugging environment with IntelliSense.
- Bot Framework Emulator (Windows and Unix-compatible) installed locally – information and links at
- GitHub Account – a code repository and collaboration tool we’ll use
- Git Bash – included in git download
- [Recommended]Azure account – use the one you have, sign up for a free trial at https://azure.microsoft.com/en-us/free/, or, if you have an MSDN account and Azure as a benefit, link your Microsoft Account or Work/School Account to MSDN and activate the Azure benefit by following this guide
We will assume you have already have the following background:
- Basic knowledge around using and navigating in a unix-style command line or terminal (for using Git Bash) (good basic guide at http://linuxcommand.org/lc3_learning_the_shell.php)
- Familiarity with Git and GitHub as a tools for software development, versioning and collaboration. (great book on Git at https://git-scm.com/book/en/v2)
- Have learned about debugging bots with VSCode in https://docs.botframework.com/en-us/node/builder/guides/debug-locally-with-vscode/ docs.
- If you are new to Node, here’s a good video tutorial series at https://www.youtube.com/playlist?list=PL6gx4Cwl9DGBMdkKFn3HasZnnAqVjzHn_
Modeling in R with Jared Lander, Chief Data Scientist of Lander Analytics and Author of R for Everyone,
At one point the open source language, R, was considered the lingua franca for data science in terms of programming. As more languages competing for that title, R still has a very passionate following. In fact, one of the main strengths of R is its huge community that provides open source user-contributed packages (CRAN), documentation and very active user support group. R packages are a collection of R functions and data that make it easy to immediately get access to the latest techniques and functionalities without needing to develop everything from scratch yourself.
Jared Lander is theChief Data scientist at Lander Analytics, Columbia Professor, Author of R for Everyone and Organizer of the World’s Largest R Meetup
The linear model, and its extensions, forms the backbone of statistical analysis. In this course we cover Linear Regression using `lm`, Generalized Linear Models using `glm` and model assessment using `AIC`, `BIC` and other measures. The focus will be mainly on applied programming, though theoretical properties and derivations will be taught where appropriate. Attendees should already have a basic knowledge of linear models and have R and RStudio installed, along with the `UsingR`, `ggplot2` and `coefplot` packages. Linear Models: Learn about the best fit line, Understand the formula interface in R, Understand the design matrix, Fit Models with `lm`, Visualize the coefficients with `coefplot`, Make predictions on new data. Generalized Linear Models: Learn about Logistic Regression for classification, Learn about Poisson Regression for count data, Fit models with `glm`, Visualize the coefficients with `coefplot`, Model Assessment, Compare models, `AIC`,’BIC`
Deploying and Scaling Spark ML and Tensorflow AI Models with Chris Fregly, Research Scientist, Contributor, Author and Trainer
Apache Spark is becoming increasing popular for big data science projects. Spark can handle large volumes of data significantly faster and easier than other platforms, and it includes tools for real-time processing, machine learning, and interactive SQL. It is quickly being adopted by industry to achieve business objectives that need data and data science at scale.
Chris Fregly is a Reaserach Scientist at Pipeline.IO. Chris is also the founder of the global Advanced Apache Spark Meetup and author of the upcoming book, Advanced Spark @ advancedspark.com. Previously, Chris was a Data Solutions Engineer at Databricks and a Streaming Data Engineer at Netflix. When Chris isn’t contributing to Spark and other open source projects, he’s creating book chapters, slides, and demos to share knowledge with his peers at meetups and conferences throughout the world.
In this workshop, we will train, deploy, and scale Spark ML and Tensorflow AI Models in a distributed, hybrid-cloud and on-premise production environment. We will use 100% open source tools including Tensorflow, Spark ML, Jupyter Notebook, Docker, Kubernetes, and NetflixOSS Microservices. This talk will discuss the trade-offs of mutable vs. immutable model deployments, on-the-fly JVM byte-code generation, global request batching, miroservice circuit breakers, and dynamic cluster scaling – all from within a Jupyter notebook. All code and docker images are 100% open source and available from Github and DockerHub at http://pipeline.io.
San Francisco Airport Marriott Waterfront
Your Server is Unable to connect to the Google Geocoding API, kindly visit THIS LINK , find out the latitude and longitude of your address and enter it manually in the Google Maps Module of the Page Builder