Our deep learning artificial neural networks have won numerous contests in pattern recognition and machine learning. They are now used billions of times per day by the world’s most valuable public companies. I will discuss latest state-of-the-art results in numerous applications, and outline how AIs will transform every aspect of our civilisation, and eventually colonise the universe, and make it intelligent.
Since age 15, the main goal of Jürgen Schmidhuber has been to build a self-improving Artificial Intelligence (AI) smarter than himself, then retire. He is often called the father of AI. His lab’s deep learning neural networks such as LSTM have revolutionized machine learning, are now on 3 billion smartphones, and used billions of times per day, for Facebook’s automatic translation (2017), Google’s speech recognition (since 2015), Apple’s Siri & QuickType, Amazon’s Alexa, etc. He is recipient of numerous awards, and president of the company NNAISENSE, which aims at building the first practical general purpose AI.
Open Source Data Science | Data Science at Scale | Intermediate | Talks
I will answer these five simple questions;
1. How does one build a pyspark model and deploy it in a scala pipeline with no code rewrite – Solving the greatest fights between datascientist who want to code in python and data engineers who like the tried and tested type safety of the JVM.
2. How does one beat the spark context latency to serve spark models in milliseconds to handle near realtime business needs
3. How does one build a ML model, zip it up and deploy it across platforms in a completely vendor neutral way i.e. build your model on AWS and deploy it on GCP or vice-versa.
4. How does one leverage the years of efforts spent in software engineering and use it directly in building datascience pipelines without reinventing the wheel and pain.
5. How does on build a completely GDPR compliant machine learning model with 0.88 on the ROC curve.
A dataengineer/datascientist and chief builder of AbundanceAI. Having worked in core distributed systems, high frequency finance and ecommerce, he loves solving problems with data and scale. He measures his success by the number of successful ML models that he pushes into production and the number of times he did not order pizza after trying out his latest culinary experiment. An avid long distance runner and an aspiring chef.
DataOps | Open Source Data Science | All | Talks
Data science professionals are often seen as “data artisans” and they like to use their own brushes (R, Python, Python Notebooks) to create insights and actions. Most companies started their data science journey relying on a couple of data scientists. There is a new demand to manage collaboration, teams, model lifecycle, deployment, model accuracy, and governance. Learn how best to embrace open source while intersecting with the maturity arch of analytics and keeping your data artisans engaged and innovative. Full Details.
Shawn Rogers is Senior Director of Analytic Strategy at TIBCO. Shawn is an internationally recognized thought leader, speaker, author and instructor on the topics of IoT, big data, analytics, cloud, data warehousing and social analytics. His latest book Analytics: How to Win with Intelligence was published in July 2017.
Shawn has more than 20+ years of hands-on IT experience. Prior to joining TIBCO he was Director of Global Marketing and Channels at Quest Software and Chief Research Officer at Dell’s Information Management Group. Prior to joining Dell Shawn was Vice President Research for Business Intelligence and Analytics at Enterprise Management Associates a leading analyst firm.
Machine Learning | Intermediate | Talks
A racing competition held in France asked players to design an autonomous toy car, piloted with deep learning models. Following a line is a well known problem in computer science, and there is no need for a neural network to perform such a task. However, I decided to assemble hardware components (rapsberry, arduino) and design some code in python to make an autonomous radio control car, using only a camera. The deep learning approach implies to have training examples to quantify the weighs of the architecture. You can either have real case examples that you generate while manually piloting the car. This method can be defined as shaping the model, which is not satisfying because it may produce unexpected biaises.I decided to adopt a simulation technique to generate my training examples. As a result, the car is autonomous without having ever driven on the race. I would like to share with you the general workflow, from scratch, ranging from hardware assembling to racing in production, via training models.
Kevin is a data scientist at Cambridge Spark, a company providing data science trainings and consulting. Prior to that, he was leading development of data products for the energy sector and worked for the telecommunications industry at Qualcomm. Kevin has delivered data science and machine learning training courses to various clients from industries that include finance, engineering and research helping individuals leverage the latest techniques.
Deep Learning | Open Source Data Science | Intermediate-Advanced | Talks
A racing competition held in France asked players to design an autonomous toy car, piloted with deep learning models. The deep learning approach implies to have training examples to quantify the weighs of the architecture. I decided to adopt a simulation technique to generate my training examples. As a result, the car is autonomous without having ever driven on the race. I would like to share with you the general workflow, from scratch, ranging from hardware assembling to racing in production, via training models. Full Details.
I am strongly interested in the creation of value out of data, and we are currently experiencing one of the most satisfying mind shifts of the past 10 years. Companies start to realize that data is not a useless expense to build on, but a real opportunity to assess their results, find insights in their process failures, reclaim their expertise and probably evolve to a more sustainable business. I want to help those who believe in such a potential by accelerating their transition toward a data driven company. In order to address these new problematics, I focus on mastering every skill of a complete Data Geek : architecture expertise (data, applications, network), data science mastering (statistical learning, data visualisation, algorithmic theory), customer and business understanding (model prediction consumption, business metrics, customer needs).
I have been working for about two years for OCTO Technology in the best Big Data team in France. I am an expert in the industry sector and I work on several types of mission, ranging from predictive maintenance of production site, to prediction of critical KPIs in video games, via real time monitoring of manufacturing devices. Prior to joining OCTO, I was working as a researcher in data61 (formerly known as NICTA), the best research institute in ICT in Australia on applying Machine Learning to profile GUI users and provide the best amount of information to help them make a decision based on a machine learning prediction.
Machine Learning | Intermediate | Workshop
Abstract coming soon!
Jo-fai (Joe) is a data scientist at H2O.ai. Before joining H2O, he was in the business intelligence team at Virgin Media where he developed data products to enable quick and smart business decisions. He also worked remotely for Domino Data Lab as a data science evangelist promoting products via blogging and giving talks at external events.
Joe has a background in water engineering. Before his data science journey, he was an EngD researcher at STREAM Industrial Doctorate Centre working on machine learning techniques for drainage design optimization. Prior to that, he was an asset management consultant specialized in data mining and constrained optimization for the utilities sector in UK and abroad. He also holds a MSc in Environmental Management and a BEng in Civil Engineering.
Open Source Data Science | Business Intelligence | Beginner-Intermediate | Talk,
Mobile and smartphone penetration rates have been steadily increasing globally. Smartphone penetration at 33.3% of global populations currently and projected to increase to 37% by 2020. Better technological coverage and rapid advances in positioning technology has important implications in modeling dynamic human mobility patterns. With data from global positioning systems, we are not only able to work towards a remote-sensing method of creating a population count, but these counts be used to create a spatiotemporally dynamic, real-time representation of mobility.
In this talk, we walk through our process for using cell phone GPS data to model 24 hour population foot traffic data, including origin-destination flows and transit modes, at scale. We first present some of the methodological challenges of mode detection using passive GPS data, model validation using unlabelled data, and scaling our models to create urban data extracts that are flexible across time and space. These have implications on our ability to better understand and represent a range of scenarios such as transportation planning, site selection, and policy impact analysis.
We will demonstrate a use case of this data through creating a real-time mobility survey of New York City and analyzing the impact on infrastructure congestion. We use this survey to understand the potential impact of road closures.
Wenfei is Spatial Data Scientist at CARTO and a PhD student in Urban Planning at Columbia University. She has a background in urban planning, economics, and design and combines those skills in her work and research. Previously, Wenfei worked the Civic Data Design Lab and Senseable Cities Lab at MIT, where she researched vacancies or underuse in residential developments in China through social media and crowd-sourced data, the impact of street infrastructure on ethnic minority communities in Los Angeles, citizen-driven pollution documentation in China, and informal transit in Nairobi.
Deep Learning | Open Source Data Science | Intermediate-Advanced | Talks
In the last year there have been a number of attempts to train deep CNNs on the ImageNet dataset in the shortest time possible, with the most recent attempt managing to do it in 15 minutes. All of these attempts happen on custom clusters which are out of the reach of most data scientists. In this talk we will present two platforms for running distributed deep learning in the cloud. Batch AI which uses the Azure Batch infrastructure to easily run Deep Learning jobs at scale across GPUs. The second is an open source toolkit that allows data scientists to spin up clusters in turn-key fashion. It utilises Kubernetes and Grafana for easy job scheduling and monitoring. Both utilise Docker containers making it possible to run any deep learning framework on them. We will use the aforementioned training platforms to train a ResNet network on ImageNet dataset using each of the following frameworks: CNTK, Tensorflow (Horovod), PyTorch, MxNet and Chainer. Full Details.
Miguel González-Fierro is a Data Scientist at Microsoft UK, where his job consists of helping customers leverage their processes using Big Data and Machine Learning. Previously, he was CEO and founder of Samsamia Technologies, a company that created a visual search engine for fashion items allowing users to find products using images instead of words, and founder of the Robotics Society of Universidad Carlos III, which developed different projects related to UAVs, mobile robots, small humanoids competitions, and 3D printers. Miguel also worked as a robotics scientist at Universidad Carlos III of Madrid and King’s College London, where his research focused on learning from demonstration, reinforcement learning, computer vision, and dynamic control of humanoid robots. He holds a BSc and MSc in Electrical Engineering and an MSc and PhD in robotics.
Deep Learning | Machine Learning | Intermediate | Workshops
Do you want to know what your customers, users, contacts, or relatives really think? Find out by building your own sentiment analysis application.
In this workshop you will build a sentiment analysis application, step by step, using KNIME Analytics Platform. After an introduction to the most common techniques used for sentiment analysis and text mining we will work in three groups, each one focusing on a different technique. Full Details
Rosaria Silipo has been mining data, big and small, since her master degree in 1992. She kept mining data throughout all her doctoral program, her postdoctoral program, and most of her following job positions. So many years of experience and passion for data analytics, data visualization, data manipulation, reporting, business intelligence, and KNIME tools, naturally led her to become a principal data scientist and an evangelist for data science at KNIME.
Kathrin Melcher is a Data Scientist at KNIME. She holds a Master Degree in Mathematics obtained at the University of Konstanz, Germany. She joined the Evangelism team at KNIME in 2017. She has a strong interest in data science, machine learning and algorithms, and enjoys teaching and sharing her knowledge about it.
Machine Learning | Big Data | Intermediate-Advanced | Workshops
Price optimization becomes increasingly complex in practice, especially when dealing with complementary goods. When you or your competitors raise or lower prices how does that affect demand for complementary goods? A good way of estimating demand in such a complex and inter-dependent environment is machine learning. In this talk, we are going to discuss what should be taken into account when training ML models for price recommendations, First, you need to collect the information about the items your competitors sell and their prices. Machine learning helps you identify those, which are similar to the items you sell. Next, you train a model, which predicts how a change in prices for one good affects another. After that, you build another machine learning model to predict the demand for every good. Finally, you build an elasticity model, which takes into account additional business rules and generates price recommendations. We will discuss in detail, pitfalls to avoid and tips to ensure a robust model.Full Details.
Deep Learning | Data Science at Scale | Intermediate | Talks
Deep learning has had a profound impact on conversational AI research. In recent years, models such as CNN intent extractors, DQN policy networks and LSTM language generators have been at the centre of spoken dialogue systems research. Despite the transformational potential of these methods, not all of these approaches are ready for production. For various reasons, these methods struggle to scale to complex, real-world conversational scenarios. In this talk, I will share the insights we gained from studying conversational AI in academia, and how these unique experiences each us to build real-world dialogue agents that can scale cross multiple application domains and languages. Full Details.
Pei-Hao (Eddy) Su is a co-founder and Chief Scientist of PolyAI, a London-based startup looking to use the latest developments in NLP to create a general machine learning platform for deploying spoken dialogue systems. He holds a PhD from the Dialogue Systems group, University of Cambridge, where he worked under the supervision of Professor Steve Young. His research interests centre on applying deep learning, reinforcement learning and Bayesian approaches to dialogue management and reward estimation, with the aim of building systems that can learn directly from human interaction. He has given several invited talks at academia and industry such as Apple, Microsoft, General Motor and DeepHack.Turing. He also gave a tutorial on deep learning for conversational AI at NAACL 2018. He received the best student paper award at ACL 2016.
Deep Learning | Computer-Vision | Beginner-Intermediate | Talks
While virtual assistants have never sounded more human and as cars become driverless, companies still have to deal with a massive amount of mail. From unsolicited mail and bills to registered mail, mail processing solutions are a necessity. In an effort to bring AI to mail processing, we will present a prototype we’ve developed for a client in the insurance industry. Using Computer Vision and Deep Learning techniques, it automatically processes typed and hand-written letters to send them to the correct department within the organization. Full Details.
He’s one of Dataiku’s top data scientists but Alexandre Hubert began his career in a very different domain. After four years as a trader in the city, he realised that, with the huge amount of data out there, it was possible – and fun! – to resolve problems using real life data. Since becoming a data scientist, Alexandre has worked on a range of use cases, from creating models that predict fraud to building specific recommendation systems. He especially loves using deep learning with text or sports data.
Data Science at Scale | Beginner-Intermediate | Talk
CPL Online was formed in 2010 and specialises in bespoke digital services and products as well as e-learning training for the hospitality sector in the UK. In 2015, CPL Online introduced a series of added-value reporting and data-analysis functions for their clients thanks to the open source Big Data platform, HPCC Systems®. These developments have allowed their clients to gain a better understanding of their workforce and benefit from significant cost savings. Since then as well as expanding those features they have now fully integrated their CRM / Visual Studio Team Services / Accounts into the same BIG Data platform to create analytics that run our business.
In this presentation David Dasher, Chief Technical Officer, CPL Online, will explore how they use data from multiple external sources to build a Realtime P&L that can not only show sales per product to date but profitability and direct costs apportioned per product and as a business. The presentation will focus on the challenges CPL faced and how they have gone from ‘struggling with SQL’ to ‘flourishing with HPCC Systems. He will also explore the way in which CPL Online have built algorithms around the data allowing them to process and analyse vast amounts of data in real time. David will show real life examples of how this unique data is used to track user trends.
David Dasher is the Chief Technology Officer and Founder of CPL Online, the leading provider of e-Learning and digital services to the UK’s hospitality sector. That since 2018 has been part of CGA Group.
Quant Finance | Open Source Data Science | Intermediate | Talks
Machine learning and Statistics; we can see a healthy cross-pollination where each field starts to adop the ideas of the other. In this talk, we will look at the ideas the two disciplines have developed, identify those that have already crossed the chasm, and those that are still grounded in one of the two fields. Some examples of such concepts are informative priors, neural networks, uncertainty, regularization, and hierarchical models. As I will show, probabilistic programming using PyMC3 allows us to perform both. Full Details
Thomas Wiecki is the VP of Data Science at Quantopian, where he uses probabilistic programming and machine learning to help build the world’s first crowdsourced hedge fund. Among other open source projects, he is involved in the development of PyMC3—a probabilistic programming framework written in Python. A recognized international speaker, Thomas has given talks at various conferences and meetups across the US, Europe, and Asia. He holds a PhD from Brown University.
Deep Learning | Data Science Research | Beginner | Workshops
Certainly, some of the most exciting research going on right now is in the area Deep-Learning. But how do we get started with hands-on practice and how do we gain a basic understanding of what is going on within all of those deep learning layers? This lesson will help the beginner-level deep-learner navigate this new landscape. I will explain both the design theory and the Keras implementation of some of today’s most widely used deep-learning algorithms including convolutional neural nets and recurrent neural nets. I will also discuss some of my own recent explorations via Keras including a spin-off of Style Transfer. Full Details
Julia Lintern is a senior data scientist at Metis, where she co-teaches the data science bootcamp, develops curricula, and focuses on other special projects. Previously, Julia worked as a data scientist at JetBlue, where she used quantitative analysis and machine learning methods to provide continuous assessment of the aircraft fleet. Julia began her career as a structures engineer designing repairs for damaged aircraft. Julia holds an MA in applied math from Hunter College, where she focused on visualizations of various numerical methods including collocation and finite element methods and discovered a deep appreciation for the combination of mathematics and visualizations, leading her to data science as a natural extension of these ideas. She continues to collaborate on various projects; including her current work with the NYTimes data science team. During certain seasons of her career, she has also worked on creative side projects such as Lia Lintern, her own fashion label.
Data Visualization | Big Data | Beginner-Intermediate | Workshops
The vast majority of our data (Merrill Lynch puts the figure at roughly 90%) is *unstructured*. The first part of the talk focuses on getting a thorough understanding of what a language model is, and how it works In the second half of the talk, we’ll look at how we can practically use language models to understand unstructured data. We’ll explore classification – the canonical application of language models, they can help us identify spam, analyze sentiment or perform unsupervised clustering. Also Predictive modeling – on tweets etc., and information retrieval. Finally, we’ll see how language models have been used extensively (for example in the legal sector), to extract targeted insights from enormous data sets. Full Details.
Deep Learning | Research | Intermediate | Talks
Prof. John D. Kelleher is the Academic Leader of the Information, Communication and Entertainment Research Institute at the Dublin Institute of Technology. His areas of expertise include machine learning, artificial intelligence, natural language processing, and spatial cognition. John has worked in a number of different academic and research focused institutes, including Dublin City University, Media Lab Europe, and DFKI (the German Centre for Artificial Intelligence Research). Currently, his research is supported by the Science Foundation Ireland ADAPT Research Centre (Grant Number 13/RC/2016). He is the co-author of Fundamentals of Machine Learning for Predictive Data Analytics, MIT Press, 2015.
Kickstarter | Open Source Data Science | Beginner | Workshop
Modern statistics has become almost synonymous with machine learning, a collection of techniques that utilize today’s incredible computing power. This two-part course focuses on the available methods for implementing machine learning algorithms in R, and will examine some of the underlying theory behind the curtain. We start with the foundation of it all, the linear model and its generalization, the glm. We look how to assess model quality with traditional measures and cross-validation and visualize models with coefficient plots. Full Details.
Data Visualization | Data Science Management | Intermediate | Workshops
In this talk, we give a brief overview of the options for writing dashboards in Python. We then show how to build a dashboard that allows users to upload a set of time-stamped tweets or sentences, performs sentiment analysis on the tweets, displays graphs of the evolution of sentiment over time and gives back a CSV of the sentiment attached to each tweet. We will write the dashboard live during the talk, using Plotly Dash. By writing the dashboard live during the talk, we hope to give the audience a realistic flavour of what can be achieved and some insight into how to start. More advanced users will learn best-practices around writing dashboards.” Full Details.
Machine Learning | Open Source Data Science | Beginner-Intermediate | Talks
To enable the machine to learn on a particular dataset, a lot of manual effort and expert knowledge is required. Fortunately, in the last years, more and more automatic approaches were proposed to make the live of practitioners a lot easier: In the first part of my talk, I will provide a brief overview about traditional AutoML methods. In the second part of my talk, I will focus on recent state-of-the-art, open-source AutoML systems that won the last two AutoML competitions. First auto-sklearn, which is an automated machine learning toolkit and a drop-in replacement for a scikit-learn estimator. It combines Bayesian Optimization with meta-learning and ensembles. Second, POSH-auto-sklearn, which makes auto-sklearn even more efficient by incorporating cheaper fidelities via a multi-armed-bandit approach. Full Details.
Deep Learning | Machine Learning | Intermediate | Talks
Recommender systems are widely used by e-commerce and services companies worldwide to provide the most relevant items to their users. Over the past few years, deep learning has demonstrated breakthrough advances in image recognition and natural language processing. Meanwhile, new approaches have been published which apply deep learning techniques to recommender systems, further expanding the use cases of neural networks. Some of these novel systems already display state-of-the-art performance and deliver high-quality recommendations. Compared to traditional models, deep learning solutions can provide a better understanding of user’s demands, item’s characteristics and the historical interactions between them. In this talk, Oliver will discuss how some of these novel models can be implemented in the machine learning framework TensorFlow, starting from a collaborative filtering approach and extending that to more complex deep recommender systems. Full Details.
Oliver Gindele is the head of Machine Learning at Datatonic. He studied Materials Science at ETH Zurich and moved to London to obtain his PhD in computational physics from UCL. Oliver is passionate about using computers models to solve real-world problems for which he joined Datatonic to create bespoke machine learning solutions. Working with clients in retail, finance and telecommunications Oliver applies deep learning techniques to tackle some of the most challenging use cases in these industries.
Quant Finance | Python | Intermediate | Talks
In this talk, we introduce the topic of using Big Data and alternative data to trade financial markets. We discuss the general approach which can be used when faced with an unusual dataset in financial markets. We talk about some of the various sources of Big Data and alternative data, which could be relevant for understanding financial markets and we also give examples. Full Details.
Saeed Amen is the founder of Cuemacro. Over the past decade, Saeed Amen has developed systematic trading strategies at major investment banks including Lehman Brothers and Nomura. Independently, he is also a systematic FX trader, running a proprietary trading book trading liquid G10 FX, since 2013. He is also the author of Trading Thalesians: What the ancient world can teach us about trading today (Palgrave Macmillan). Through Cuemacro, he now consults and publishes research for clients in the area of systematic trading. His clients have included major quant funds and data companies such as RavenPack and TIM Group. He is also a co-founder of the Thalesians.
Open Source Data Science | Big Data | Intermediate-Advanced | Talks
Streaming Analytics (or Fast Data) is becoming an increasingly popular subject in enterprise organizations. The reason for this is that customers want to have real-time experiences, In this talk, I’ll present a streaming analytics engine (‘Styx’) that is powered by Apache Flink. Kafka is used for the message bus and Cassandra for the state management. The machine learning models are made with Knime and Spark, exported to PMML format, and evaluated using the Openscoring.io library. Full Details.
Bas is a programmer, scientist, and IT manager. At ING, he works as Technology Lead in the global innovation center. His academic background is in Artificial Intelligence and Informatics. His research on reference architectures for big data solutions was published at the IEEE conference ICITST 2013. Bas has a background in software development, design and architecture with a broad technical view from C++ to Prolog to Scala. He occasionally teaches programming courses and is a regular speaker on conferences and informal meetings, where he brings a mixture of market context, his own vision, business cases, architecture and source code in an enthusiastic way towards his audience.
Machine Learning | Open Source Data Science | Intermediate | Talks
Algorithmic design and the data used for it may have profound societal implications across dimensions such as fair housing, economic opportunity, and discrimination. The intricate institutional and regulatory aspects of healthcare make it a particularly complex environment, that requires an open framework and tools to codify clinical practice guidelines, prediction and decision support algorithms and more in order to avert the potential harms from data and algorithmic bias. The main goal of this session is to outline strategies for the design and prototyping of a set of digital tools and frameworks that can aid the understanding, assessment, measurement, and solutions for the types of biases that can be most commonly encountered in the data and algorithms used in the context of the healthcare industry. Full Details.
Deep Learning | Beginner-Intermediate-Advanced | Keynote
Deep neural networks (DNNs) are reaching or even exceeding the human level on an increasing number of complex tasks. However, due to their complex non-linear structure, these models are usually applied in a black box manner, i.e., no information is provided about what exactly makes them arrive at their predictions. This lack of transparency can be a major drawback in practice. In my talk I will present a general technique, Layer-wise Relevance Propagation (LRP), for interpreting DNNs by explaining their predictions. I will demonstrate the effectivity of LRP when applied to various datatypes (images, text, audio, video, EEG/fMRI signals) and neural architectures (ConvNets, LSTMs), and will summarize what we have learned so far by peering inside these black boxes.
Wojciech Samek is head of the Machine Learning Group at Fraunhofer Heinrich Hertz Institute, Berlin, Germany. He studied Computer Science at Humboldt University of Berlin, Germany, Heriot-Watt University, UK, and University of Edinburgh, UK, from 2004 to 2010 and received the Dr. rer. nat. degree (summa cum laude) from the Technical University of Berlin, Germany, in 2014.
In 2009, he was visiting researcher at NASA Ames Research Center, Mountain View, CA, and, in 2012 and 2013, he had several short-term research stays at ATR International, Kyoto, Japan. He was awarded scholarships from the European Union’s Erasmus Mundus programme, the German National Academic Foundation and the DFG Research Training Group GRK 1589/1. He is associated with the Berlin Big Data Center, is a member of the editorial board of Digital Signal Processing and PLOS ONE, and was organizer of several deep learning workshops.
He has authored more than 80 journal and conference papers, predominantly in the areas deep learning, interpretable artificial intelligence, robust signal processing and computer vision.
Open Source Data Science | Data Science Research | Beginner-Intermediate | Talks
Advertisers have long known that by capturing our attention through pictures and words they can influence our decision-making. What is new, however, are the various ways in which this can now be done online, for example, by manipulating our search results, through suggestive search engines, purchasing recommendations, and social media etc. Governments, corporations, and other institutions now have the capacity to target nudges for each individual. By using algorithms that operate on big data, nudges can be customized for individuals and their effectiveness can be tracked and adjusted as the algorithm learns from feedback data tracking a user’s behavior. These technologies raise a bunch of new ethical questions and in this talk, I will examine the ethics of nudging effects of AI systems on human behavior (e.g. influence of recommendations) Full Details.
NLP | Machine Learning | Deep Learning | Beginner-Intermediate | Talks
In this talk, we will discover how linguistic intuition can be formalized and encoded to solve the problem of complex word identification and correction. We will investigate word formation, dive into language structures and learn about language modelling. Full Details.
Deep Learning | Intermediate-Advanced | Talk
In deep learning, input pipelines are responsible for a complex chain of actions that ultimately feed data into GPU memory: defining how files are read from storage, deserializing them into data structures, pre-processing on a CPU, and copying to the GPU. These pipelines bring together complex hardware systems—including cluster networks, peripheral interconnects, modern CPUs, and storage devices. In this talk, we present a new benchmark suite for evaluating and tuning input pipelines. Full Details.
Emily Watkins is a Solutions Engineer at Pure Storage and works with teams to achieve faster time to insight with highly-parallelized data pipelines. Prior to Pure Storage, she helped build monitoring tools that bring “real-time analytics” closer to real-time. The more data the better.
Data Science at Scale | Intermediate-Advanced | Talk
In this talk, we will first explain the business case of a peer detection model across a massive payment system and how we abstracted it to a network-based recommendation problem. Then, we will discuss multiple ways of extracting node representation from a graph network, from a graph neighborhood perspective to a network embedding perspective. Next, we will demonstrate how we computed node similarity efficiently. We developed a python package to accelerate top n cosine similarity computation for a sparse matrix, and we used spark local sensitive hashing for a dense matrix. lastly, we will describe an active learning framework we built to improve peer detection by user feedback, and how we utilized airflow to productize the whole model pipeline. Full Details.
Zhe Sun is currently a senior data scientist in ING Wholesale banking Advanced Analytics team, where he has applied machine learning techniques to problems ranging from entity matching to large scale payment transaction network analysis. Together with the team, he aims to change the way the bank operates via data driven analytics and machine learning. He has 9 years of industry experience within data science and software engineering across a range of international companies, within the Banking and Telecommunications sectors.
Machine Learning | Open Source Data Science | Beginner-Intermediate | Workshops
Anna Veronika Dorogush graduated from the Faculty of Computational Mathematics and Cybernetics of Lomonosov Moscow State University and from Yandex School of Data Analysis. She used to work at ABBYY, Microsoft, Bing and Google, and has been working at Yandex since 2015, where she currently holds the position of the head of Machine Learning Systems group and is leading the efforts in development of the CatBoost library.
Open Source Data Science | Python | Beginner-Intermediate-Advanced | Workshops
Missing data is a widespread challenge for data analysts. Even in the most robust settings, malfunctioning instruments, malfunctioning researchers, and study subjects that drop out are all attributes of data collection in the real world. Knowing how to handle missing data, therefore, is such a crucial skill for data scientists that Wainer (2010) considers it one of the “six necessary tools” that researchers need to master in order to successfully tackle problems in their fields in this century. Full Details.
Alexandru Agachi is a co-founder of Empiric Capital, an algorithmic, data driven asset management firm headquartered in London. He is also a guest lecturer in big data and machine learning at Pierre et Marie Curie University in Paris, and is involved in neuro oncogenetic research, in particular applications of machine learning. After initial studies at LSE, he completed 4 graduate and postgraduate degrees and diplomas in technology and science, focusing on the thorium nuclear fuel cycle, surgical robotics, neuroanatomy and imagery, and biomedical innovation. He previously worked at UBP in hedge funds research, Deutsche Bank, the Kyoto University Research Reactor Institute, and conducted an investment consulting project for the CIO office at Investec. He was nominated as one of Forbes’ 30 Under 30 in Finance in 2018.
Open Source Data Science | Data Science at Scale | Intermediate | Talks
Hivemall significantly simplifies machine learning workflow such as feature engineering, algorithm implementation, and evaluation, because Hive enables us to access to distributed storage using handy SQL-like queries (HiveQL). In this session, the speaker talks about Which part of modern realistic machine learning and data science is painful and how Hivemall is notably preferable to the other implementation of machine learning algorithms. Full Details.
Takuya Kitazawa is a data science engineer at Treasure Data, Inc., a company developing a large-scale enterprise-grade customer data platform, and committer of Apache Hivemall, a scalable machine learning library for Apache Hive and Spark. He is interested in theory and practice of real-world data science and engineering, especially for recommender systems and scalable machine learning.
Machine Learning | Open Source Data Science | Beginner-Intermediate | Workshops
Creating an end to end machine learning flow and predicting financial purchases from imbalance financial data using a weighted XGBoost code pattern is for anyone who is also interested in using XGBoost and creating Scikit-Learn based machine learning pipeline from the real dataset where class imbalances are varied. We will start with a dataset description and classification of the problem, discuss Xgboost and scikit-learn We will also explore output and will note the class imbalance issues and discuss inference. We will also give pointers to other advanced techniques like oversampling, undersampling and SMOTE algorithms. Full Details.
Machine Learning | Research| Beginner-Intermediate | Talks
Recommendation system is now prevalently used as a success key to elevate customer experience and growth the e-commerce businesses and industries. An effective recommendation system specially business should not only recommend items which were frequently purchased in the past by a customer but also should be able to identify the items which were never purchased in past but are likely to be in interest of the customers. In large companies which are operating on tens of thousands of products and thousands of customers the category managers or BI tools can only provide recommendations for the top items, top customers.
In this talk we introduce an innovative approach to discover substitute product by deriving the similarities of products based on corresponding association rules. The proposed method is computationally efficient and effective. We will present a case study to demonstrate the application of proposed method in B2B e-commerce business.
Amir Meimand is Zilliant Director of R&D, pricing scientist, where he designs and develops pricing solutions for customers and performs research in which he applies new methods to improve the current solutions as well as develop new tools. Prior to joining Zilliant, Amir helped design and develop a promotion planning and pricing platform for B2C retailers.
Amir holds a dual Ph.D. degree in Industrial Engineering and Operations Research from Pennsylvania State University. In his doctoral work, he applied operations research concepts to dynamic pricing and revenue management.
Machine Learning | Big Data | Intermediate | Talks
In this talk we will be looking at various architectural options for operationalizing machine learning models. Including Serverless architectures, ONNX, Containerization and others. This presentation is intended for Software/Data Architects, Technical Leads, and Data Engineers/Scientists. The presentation aims to build an understanding of various operationalization options, and how it can be applied in the context of machine learning based software development projects. Some general knowledge of data science and machine learning is desirable. Full Details.
Dr. Mufajjul Ali is Cloud Solution Architect at Microsoft, specializing in Advanced Analytics and AI. He has a Doctorate Degree from Southampton University, and Master’s from Birkbeck, University of London. Dr.Ali has over 15 years of Industry & Academic experience and his expertise spans across big data, machine learning and architectures.
Data Visualization | Open Source Data Science | Beginner | Workshops
This workshop, delivered by journalist and data visualization specialist Alan Rutter, will cover an audience-centered approach to visualizing data. It will introduce tried-and-tested techniques for communicating data-driven stories effectively to people from a broad range of backgrounds and deal with some of the common problems that practitioners encounter. It is suited to anyone who wants to create impact with the data they work with by turning it into compelling stories for other audiences – whether through printed materials, presentations, social media, or websites and apps. Full Details.
Alan Rutter is the co-founder of consultancy Clever Boxer. He first worked with infographics as a magazine journalist (Time Out, WIRED), before moving into technology roles (Condé Nast, Net-A-Porter) and then training and development (The Guardian, General Assembly). He has taught data visualisation techniques to thousands of students, and for organisations including the Home Office, Department of Health, Biotechnology and Biosciences Research Council, Capita, Novartis and Kings College London.
Deep Learning | Open Source Data Science | Beginner-Intermediate | Talks
Deep Learning’s arcane jargon and its intimidating equations often discourage software developers, who wrongly think that they’re “not smart enough.” In this session, we’ll explain the basic concepts of Neural Networks and Deep Learning. Then, through code-level demos based on Apache MXNet, we’ll demonstrate how to build, train and use models based on different types of networks: multi-layer perceptrons, convolutional neural networks and long short-term memory networks. Finally, we’ll share some optimization tips. Full Details.
Before joining Amazon Web Services, Julien served for 10 years as CTO/VP Engineering in top-tier web startups. Thus, he’s particularly interested in all things architecture, deployment, performance, scalability and data. As a Principal Technical Evangelist, Julien speaks very frequently at conferences and technical workshops, where he meets developers and enterprises to help them bring their ideas to life thanks to the Amazon Web Services infrastructure.
Machine Learning | Open Source Data Science | Intermediate | Workshop
Modern statistics has become almost synonymous with machine learning, a collection of techniques that utilize today’s incredible computing power. This two-part course focuses on the available methods for implementing machine learning algorithms in R, and will examine some of the underlying theory behind the curtain. Next we turn to penalized regression with the Elastic Net. After that we turn to Boosted Decision Trees utilizing xgboost. Attendees should have a good understanding of linear models and classification and should have R and RStudio installed, along with the `glmnet`, `xgboost`, `boot`, `ggplot2`, `UsingR` and `coefplot` packages. Full Details.
Data Science Research | Intermediate | Talk
Reinforcement Learning (RL) is the study of how agents (ought to) behave under regimes of rewards and punishments. It forms the core of Prescriptive Analytics, i.e. methods that try to identify optimal sequences of actions. It is also closely aligned to Game Theory, Operations Research, Planning and Machine Learning, with most advances coming from cross-pollination across different aligned fields (e.g. Deep Reinforcement Learning, Neuro-evolution). This talk will provide a unifying framework one could use to identify which algorithms are better suited for what environments, what are some research “”black spots”” that have not been explored enough and what new algorithms could be tried by mixing and matching. More specifically, we will explore algorithmic bias, method suitability and practical results along the following axes:
a) The agent’s sensory abilities (e.g. partial or full observability) b) The type of action (e.g. discrete, continuous, textual) c) The number of agents that populate an environment d) The availability (and exploitation) of forward and inverse models e) The use of global and local function approximators f) Ways to explore and exploit available information. Some of the themes explored in this talk have been discussed in Vodopivec, Tom, Spyridon Samothrakis, and Branko Ster. “On Monte Carlo tree search and reinforcement learning.” Journal of Artificial Intelligence Research 60 (2017): 881-936.
Dr Spyros Samothrakis is Assistant Director for the Institute for Analytics and Data Science at the University of Essex. Spyros’ research interests include reinforcement learning, neural networks and causality.
He obtained his PhD from the University of Essex (2014) and has published numerous papers on the topics above. He is currently working closely with industrial partners, helping bridge the gap between data science concepts and business applications.
Machine Learning | Data Science Research | Intermediate-Advanced | Talk
Under the natural assumption that the signal properties are related to the topology of the graph where they are supported, the emerging field of graph signal processing (GSP) aims at developing data science algorithms that fruitfully leverage this relational structure, and can make inferences about these relationships when they are only partially observed. After presenting the fundamentals of GSP and motivating the study of graph signals, we leverage these ideas to offer a fresh look at the problem of network topology inference from graph signal observations, also know as graph learning. Focus in this talk is placed on building a judicious network model of the data facilitating efficient signal representation, visualization, prediction, (nonlinear) dimensionality reduction, and (spectral) clustering. Throughout, we illustrate the developed methods and results on various application domains including computational biology, the economy, network neuroscience, and online social media.
Gonzalo Mateos earned the B.Sc. degree from Universidad de la Republica, Uruguay, in 2005, and the M.Sc. and Ph.D. degrees from the University of Minnesota, Twin Cities, in 2009 and 2011, all in electrical engineering. He joined the University of Rochester, Rochester, NY, in 2014, where he is currently an Assistant Professor with the Department of Electrical and Computer Engineering, as well as a member of the Goergen Institute for Data Science. During the 2013 academic year, he was a visiting scholar with the Computer Science Department at Carnegie Mellon University. From 2004 to 2006, he worked as a Systems Engineer at Asea Brown Boveri (ABB), Uruguay. His research interests lie in the areas of statistical learning from Big Data, network science, decentralized optimization, and graph signal processing, with applications in dynamic network health monitoring, social, power grid, and Big Data analytics. He currently serves as Associate Editor for the IEEE Transactions on Signal Processing, the EURASIP Journal on Advances in Signal Processing, and is a member of the IEEE SigPort Editorial Board. Dr. Mateos received the NSF CAREER Award in 2018, the 2017 IEEE Signal Processing Society Young Author Best Paper Award (as senior co-author), and the Best Paper Awards at ICASSP 2018, SSP Workshop 2016, and SPAWC 2012. His doctoral work has been recognized with the 2013 University of Minnesota’s Best Dissertation Award (Honorable Mention) across all Physical Sciences and Engineering areas.
Business Intelligence | Beginner | Talks
Measurements about economic activity are primarily represented in the form of time-series data. However, there are few tools to effectively analyze this sort of data using next-generation technology like machine learning and AI. Until now, economic researchers have used structural models and econometric models developed in the previous century. The performance of these models has been underwhelming and plagued with lack of robustness and small sample sizes. Modern algorithms, such as machine learning and new sources of data stand to completely change the face of economic modeling and forecasting. We have developed the world’s first AI-driven virtual data scientist, capable of understanding large-scale time series data with minimal human effort.” Full Details.
Dr. Darko Matovski is the CEO of causaLens. The company provides automated machine learning solution for time-series predictions and serves prominent organisations including hedge funds and asset managers. Darko has also worked for cutting edge hedge funds and research institutions. For example, the National Physical Laboratory in London (where Alan Turing worked) and Man Group in London. Darko has a PhD in Machine Learning and an MBA.
Open Source Data Science | Intermediate | Talks
At ThriveHive we deal with small and medium size businesses with different levels of marketing sophistication and budgets. One of the major challenges for us and for the businesses we serve is determining what is the right budget for a particular industry targeting a particular region. This is a difficult problem for predictive modelling as even though we have years of spend data, it is unlikely that this “realized spend” is the same as the marketing budget. We need a way to incorporate uncertainty in the estimate of this spend and in the difference between the spend and a business’ actual budget. It would also be important to allow this uncertainty to vary by industry and region. For this purpose, we used a hierarchical probabilistic modelling approach implemented in PyMC3. Full Details.
Benjamin is a Data Scientist at ThriveHive where he works with business stakeholders to create tools and models providing guidance to internal teams and business customers. Before ThriveHive he worked with the City of Boston’s analytics team on various modelling projects. He received his PhD in Policy Analysis from the RAND Corporation, with a focus on projects related to health, infrastructure and defense. He has experience with a wide array of methods and their applications across a diverse set of verticals.
Data Visualization | Kickstarter | Beginner | Talks
“Forbes Magazine calls data storytelling ‘the essential data science skill everyone needs’. And with good reason – well told data stories are change drivers within the modern organization. This 1 hour session will cover the essential elements of a good data story, chart design, common design errors, Gestalt principals and more. Full Details.
Isaac is an Australian data scientist, company founder and TEDx speaker who lives, breathes and dreams data. Isaac travels the world teaching data visualization skills and in 2017, his “Art of Data Storytelling” speaking tour saw him visit Asia, North America, Europe and Australia. A passionate data science educator, Isaac previously lectured in analytics and statistical theory at the Australian National University. He has delivered his Data Storytelling course on site to forward thinking companies including Cisco, AIG and JPMorgan Chase. Isaac was a featured data visualization keynote presenter at the 2017 Strata Data Conference and makes regular appearances at data conferences.
Data Science for Good | Data Visualization | Beginner | Talk
The past few years have seen a proliferation of both machine learning and spatial data. Data science techniques, ranging from (spatial) data visualization to statistical modeling to both machine learning and deep learning can help us identify insights from spatial data which were previously not possible. Data science and spatial data make a formidable combination in our data-driven world and can help build spatially relevant models. The session will introduce real-life examples wherein data science techniques were used along with spatial data for addressing real-life problems and demonstrating how appropriate use of spatial data can inform decision making. Examples range from building robust data visualizations (after obtaining and cleaning free earth observation data) to building predictive models of habitat suitability. Full Details.
“Minerva is a PhD graduate from Cambridge University where she specialized in Tropical Ecology. In my PhD I focused on using machine learning models in conjunction with satellite data for predicting the impact of degradation on forest structure and biodiversity in SE Asia.
Thanks to my PhD training, she is also a Data Scientist on the side. As a part of my research, I have to carry out extensive data analysis, including spatial data analysis. For this purpose, I prefer to use a combination of freeware tools- R, QGIS and Python. She also holds an MPhil degree in Geography and Environment from Oxford University where she honed my remote sensing and spatial data analysis skills.
In addition to spatial data analysis, Minerva is also proficient in statistical analysis, machine learning and data mining. On the basis of her MPhil and PhD, Minerva published several peer-reviewed papers, including machine learning papers in PLOS One. The details of her research can be found at: https://www.researchgate.net/profile/Minerva_Singh”
Open Source Data Science | Machine Learning | Intermediate | Workshops
Daenerys or Jon Snow? JFK, ORD or ATL, do these codes look familiar? In this tutorial we build up on the fundamentals of network science and look at various applications of network analysis to real world datasets like the US Airport Dataset, Game of Thrones character co-occurrence network, and foray into algorithms and application of network science. We will work through two case studies; datasets of applied network analysis. Participants should be comfortable with python and the basic pydata stack [pandas, matplotlib]. By the end of the tutorial everyone should be comfortable with hacking on the NetworkX API, modelling data as networks and basic analysis on networks using python.
Deep Learning | Machine Learning | Intermediate | Workshop
State of the art approaches to Named Entity Recognition (NER) is purely data-driven, leveraging deep neural networks to identify named entity mentions—such as people, organizations, and locations—in lakes of text data. In this talk, I will present our latest research on NER and provide real-life examples of how we are applying these cutting-edge techniques to several languages starting with a detailed description of our neural architecture for NER, which is based on a generic Long Short-Term Memory (LSTM). We will then look into the internal network activation values, on different input conditions. Full Details.
Jos is the VP of Applied AI at Faktion, an AI engineering company that works for some of the biggest and most innovative companies in the world. Originally a mathematician, he has lead Machine Learning and AI implementation all over the world. Over the last years, Jos lead the team that was responsible for developing the NLP models that power Chatlayer.ai, the most popular enterprise-ready chatbot platform in Europe.
Data Science at Scale | Machine Learning | Computer Vision | Intermediate | Workshops
This workshop presents the challenges that the industry faces in adopting AI. Specifically, focusing on how to scale AI. We deep dive into training large-scale models, including looking at modeling and infrastructure aspects. While AI research progress has accelerated over the past years, its wide adoption has been hampered due to scaling challenges. In this talk, I will present the challenges that I found during years of academic and industrial experience with machine learning and computer vision. Specifically, I will dive deep into the scaling challenges that industry faces, including scaling expertise, data, computation and algorithms. Full Details.
R | Open Source Data Science | Beginner-Intermediate | Workshops
R is a standard tool for predictive modeling. It allows to use hundreds of predictive models and build really complex workflows. The workshop is a guided tour through the most important R packages. It is illustrated with working R examples. You will learn how to use R for predictive modeling including feature selection, model building, validation, and deployment. Full Details.
Artur has over twenty years of experience in deep business analytics, Data Science, and Machine Learning projects. He worked for various companies: from start-ups to international corporations, and in various roles: as an employee, a consultant, and a business owner. He spent over ten years working as a statistician in a commercial bank. At the same time he received Ph.D. in Mathematics and wrote several scientific papers. He currently runs his company QuantUp (http://quantup.pl), focusing on giving value to companies using Data Science, Machine Learning, software development and commercial trainings. He has led nearly one hundred of real-world Data Science projects and several thousand hours of commercial trainings in this field. A co-owner, Vice CEO and CSO of a Swedish bioinformatics company MedicWave. Artur has a long-time experience working with open source software and promotes its use in business applications during numerous conferences. He is a fan of the R language and a co-author of a book on forecasting in R.
Data Science at Scale | Data Science Research | Intermediate-Advanced | Talk
Recommendation systems have wide-spread applications in both industry and academia. With ever-increasing popularity of the Internet, the amount of information increases exponentially and users spend significant time and energy to select their relevant items. When a user wants to purchase a particular item, they go through various reviews on that item and make a number of comparisons. Finding relevant information from large-scale datasets is often a difficult and time-consuming process, and users like the system automatically takes into account their interests and shows only relevant information. Recommendation systems are rapidly becoming a powerful technology in e-commerce and business analytics applications. These systems help the users cope with the information overload to find their desired content in a reasonable time by providing personalized recommendations for them. Moreover, recommendation systems help the e-commerce businesses to generate further revenue through increased sales and improved customer satisfaction. RSs are now a major topic in computer science and considerable effort has been made in the last two decades to advance them. In this talk, I will provide a review of recent advances in the field of recommendation systems. I will also highlight various applications and evaluations metrics often used to assess performance of recommender algorithms.
Open Source Data Science | Data Visualization | Intermediate | Talk
Impending open banking regulations and increasing competition has forced large banks to go through perhaps the largest transformation program in decades. Data and innovative applications of Machine Learning lies at the heart of this transformation. There are many success stories and many more challenges. Careful analysis of these dimensions holds the key to successful delivery of these HUGE transformation programs. Kanishka plans to discuss the broader premise, some challenges and some clever applications of ML to drive such changes.
Kanishka has extensive experience in using data science driven approaches to solve complex business problems. He has led data driven digital transformation projects for clients in Financial Services, Retail, Manufacturing, Telco and Government. Prior to joining Publicis.Sapient, Kanishka has worked in Management & Technology consulting, hedge funds, a technology start-up and spent ~10 years teaching Statistics at Oxford University. His academic research focussed on computationally efficient algorithms to analyse large scale genomic datasets.
Deep Learning | Data Visualization | Intermediate | Talks
In this talk we will walk you through the evolution of neural network architectures we have been using for information extraction. We detail our experiment with simple recurrent neural networks that shows that even a simple recurrent classifier can outperform traditional approaches by a 10%’s margin (accuracy increased by approx. 10%). We have discovered that neural networks mainly boost recall: on 70% of the documents where a CRF classifier failed to find any information, not only can neural network find information, it also identifies it correctly. Full Details.
Pavel Shkadzko is a semantics engineer at Gini GmbH working on a neural based solution to improve the quality of information extraction from financial documents. He has conducted research in natural language generation and automatic semantic role labelling domain using neural and statistical approaches. Pavel acquired a master degree in Language Science and Technology from Saarland University (Saarbrucken) in 2018.
Machine Learning | Deep Learning | Intermediate | Talks
The use of machine learning and data science to accelerate drug discovery has become an area of increased interest over recent years. There are many subproblems to solve and many approaches to solving these problems involve collecting just the data necessary to answer a very focused question. By combining some of the latest methods in domain adaptation, multitask learning and conditioning methods for deep neural networks we can create such a latent space containing the information necessary to solve problems ranging from standard classification of various conditions, prediction of compounds effective in rescuing a morphological change, identification of mechanism of action of compounds, prediction of various toxicity endpoints, and more. Full Details.
Kickstarter | Beginner | Training
Reinforcement Learning recently progressed greatly in industry as one of the best techniques for sequential decision making and control policies.
Deep Mind used RL to greatly reduce energy consumption in Google’s data centre. It has being used to do text summarisation, autonomous driving, dialog systems, media advertisement and in finance by JPMorgan Chase. We are at the very beginning of the adoption of these algorithms as systems are required to operate more and more autonomously.
In this workshop we will explore Reinforcement Learning, starting from its fundamentals and ending creating our own algorithms.
We will use OpenAI gym to try our RL algorithms. OpenAI is a non profit organisation that want committed to open source all their research on Artificial Intelligence. To foster innovation OpenAI crated a virtual environment, OpenAi gym, where it’s easy to test Reinforcement Learning algorithms.
In particular we will start with some popular techniques like Multi Armed Bandit, going thought Markov Decision Processes and Dynamic Programming.
Leonardo De Marchi holds a Master in Artificial intelligence and has worked as a Data Scientist in the sport world, with clients such as New York Knicks and Manchester United, and with large social networks, like Justgiving. He now works as Lead Data Scientist in Badoo, the largest dating site with over 340 million users.
Machine Learning | Open Source Data Science | Beginner | Training
The tidyverse is essential for any statistician or data scientist who deals with data on a day-to-day basis. By focusing on small key tasks, the tidyverse suite of packages removes the pain of data manipulation. In this tutorial, we’ll cover some of the core features of the tidyverse, such as dplyr (the workhorse of the tidyverse), string manipulation, linking directly to databases and the concept of tidy data. Full Details.
Dr. Colin Gillespie is Senior lecturer (Associate Professor) at Newcastle University, UK. His research interests are high performance statistical computing and Bayesian statistics. He is regularly employed as a consultant by Jumping Rivers and has been teaching R since 2005 at a variety of levels, ranging from beginners to advanced programming.
Kickstarter | Deep Learning | Intermediate | Training
Agenda • Why do we need to create our own models? • Introduction to Deep Learning • Lab: The “Hello World” of TensorFlow + Keras: Logistic Regression • Convolutional Neural Networks: At last! Real Deep Learning • Lab: Computer Vision with CNNs • Beyond Computer Vision
Juliet is a Technical Evangelist at Microsoft, helping ISVs get the most of the cloud. During the last 8 years at Microsoft she has focused on business apps while at the same time delivering talks about projects which combine things such as IoT with Cognitive Services and other Azure services.
Pablo Doval is Principal Data Architect and the General Manager of Plain Concepts in the UK. With a background of relational databases, data warehousing and traditional BI projects, he has spent the last years architecting and building Big Data and Machine Learning projects for customers in different sectors, such as Healthcare, Digital Media, Retail and Industry.
Data Visualization | Open Source Data Science | Beginner-Intermediate | Training
When creating complex visualisations, interactivity can help communicate your core concepts. It allows the audience to familiarise themselves with the data, and makes understanding the data a step along the journey of understanding your visualisation.
However, creating interactive visualisations adds a layer of complexity to the data science workflow: during modelling and data exploration, interactivity is effectively achieved by re-running chunks of code with different parameters. Giving the reader the ability to achieve the same interactivity without having to change and re-run code therefore requires extra development from the data scientist.
In this workshop, we will go through python libraries that make this extra development as frictionless as possible, and produce interactive visualisations with as little code as possible. We will also go through options for producing interactivity for the wider public, and what steps need to be taken to achieve resilient interactive graphs.
Libraries to be used include ipywidgets, plotly, and plotly dash. Full Details
Dr Jan Freyberg is a data scientist at ASI. He has worked on data science projects in the private and public sector, and his experience ranges from geospatial to unstructured language data. He is an expert in building interactive tools for communicating complex models, and is active in developing open-source data science software.
Jan completed a PhD and a fellowship studying brain activity, vision and consciousness in autism at the University of Cambridge and King’s College London, where he taught statistics and programming at undergraduate and postgraduate level.
Deep Learning | Intermediate-Advanced | Workshops
This will be a hands-on workshop on Probabilistic Graphical Models using PGMPY library. Attendees shall learn about basics of PGMs with the open source library, pgmpy for which we are contributors. We shall also talk about Hidden Markov Models and showcase how thermostat control can be modeled. Generative models are also useful to measure causality and are great alternatives to deep neural networks, latter which, cannot solve such problems. This workshop shall have students learn basics needed to learn about Bayesian Networks, Markov Models, HMMs including advanced probability and other math basics needed to understand the topic. Full Details.
Harish Kashyap has a Masters from Northeastern University, Boston in Electrical Engineering. He received the Graduate Student Award, a research scholarship as a part of which he worked at the BBN Technologies in the area of Bayesian Machine Learning algorithms as applied to Speech Recognition. He has more than 15 years of experience in the areas of Artificial Intelligence (AI), Digital Signal Processing and Software development. He has several publications and patents filed in the areas of ML. He has led various Data Analytics projects across US and Europe that led to organizational savings of $8M+. He has built out the curriculum for Machine Learning training platform, refactored.ai which is now part of SUNY Buffalo’s graduate ML course. He is currently the founder of Mysuru Consulting Group (MCG.ai) and Diagram.AI where he works on ML algorithms.
Kickstarter | Machine Learning | Beginner | Training
Machine learning has become an indispensable tool across many areas of research and commercial applications. From text-to-speech for your phone to detecting the Higgs boson, machine learning excels at extracting knowledge from large amounts of data. This talk will give a general introduction to machine learning, as well as introduce practical tools for you to apply machine learning in your research. Full Details.
Kickstarter | Data Visualization | Beginner | Training
One of the most popular frameworks today for web-based data visualizations is D3. In this training you will learn how to leverage to most common parts of the D3 framework to create data visualizations. This will in include making selections, working with data, creating shapes, and more. Although the emphasis is on learning D3, some general visualization design principles will be touched upon briefly as well, to help you make better informed design decisions. Full Details.
Jan Willem Tulp is a independent Data Experience Designer from The Netherlands. With his company TULP interactive he creates custom data visualizations for a variety of clients. He has helped clients such as Google, European Space Agency, Scientific American, Nature and World Economic Forum by creating visualizations, both interactive and in print. His work has appeared in several books and magazines and he speaks regularly at international conferences.
Open Source Data Science | Deep Learning | Beginner-Intermediate-Advanced | Workshops
Gain insight into how to drive success in data science. Identify key points in the machine learning life cycle where executive oversight really matters. Learn effective methods to help your team deliver better predictive models, faster. You’ll leave this seminar able to identify business challenges well suited for machine learning, with fully defined predictive analytics projects your team can implement now to improve operational results. Full Details.
Machine Learning | Open Source Data Science | Beginner | Workshops
Mike is a senior machine learning engineer at Evolution AI, working on Evolution AI’s NLP platform. He is probably most widely known in the machine learning community for a popular blog post about his escapades teaching a neural network to freestyle rap (https://bit.ly/2fsePbZ). He has has been working in data science and machine learning for the last 5 years, for the likes of Ocado and Qubit Technology. His primary areas of expertise are in NLP, probabilistic graphical models and recommender systems.
Quant Finance | Open Source Data Science | Intermediate | Workshops
This workshop illustrates the use of machine and deep learning algorithms for classification in the context of predicting stock market movements. The workshop shows that there are parallels between building self-driving cars and deploying automated algorithmic trading strategies. Full Details.
Yves has a Ph.D. in Mathematical Finance and is the founder and managing partner of The Python Quants GmbH. He is also the author of the books Python for Finance, Derivatives Analytics with Python and Listed Volatility & Variance Derivatives. He lectures for Data Science at htw saar University of Applied Sciences and for Computational Finance at the CQF Program and is the organizer of the Python for Quant Finance Meetup in London.
Open Source Data Science | Data Science at Scale | Intermediate | Workshops
StackNet is a computational, scalable and analytical framework mainly implemented in Java that resembles a feedforward neural network and uses Wolpert’s stacked generalization in multiple levels to improve the accuracy of predictions. StackNet will be demonstrated through practical examples. Full Details.
Marios Michailidis is a Research data scientist at H2O.ai . He holds a Bsc in accounting Finance from the University of Macedonia in Greece and an Msc in Risk Management from the University of Southampton. He has also nearly finished his PhD in machine learning at University College London (UCL) with a focus on ensemble modelling. He has worked in both marketing and credit sectors in the UK Market and has led many analytics’ projects with various themes including: Acquisition, Retention, Recommenders, Uplift, fraud detection, portfolio optimization and more.
He is the creator of KazAnova(http://www.kazanovaforanalytics.com/), a freeware GUI for credit scoring and data mining 100% made in Java as well as is the creator of StackNet Meta-Modelling Framework (https://github.com/kaz-Anova/StackNet). In his spare time he loves competing on data science challenges and was ranked 1st out of 500,000 members in the popular Kaggle.com data competition platform. Here (http://blog.kaggle.com/2016/02/10/profiling-top-kagglers-kazanova-new-1-in-the-world/) is a blog about Marios being ranked at the top in Kaggle and sharing his knowledge with tricks and ideas.
Quant Finance | Machine Learning | Intermediate | Training
In this training session I will give an introduction to quantitative finance applications for data scientists [with no prior knowledge in the field]. I will start by discussing a few fundamental ideas such as the efficient market hypothesis and the capital asset pricing model. I will also introduce some concepts on investment strategies including portfolio theory and smart beta investing. Many such strategies are based on fundamental factors revealed by academic research, such as value or profitability, which can yield excess returns when suitably applied. There is currently a lot of interest to use alternative data sets and machine learning tools to uncover further factors, making the field exciting for data scientists. Full Details.
Johannes is Data Analytics Associate Director at IHS Markit, a global information and intelligence provider. He technically manages multiple data science projects across various business lines including finance, the automotive industry and the energy sector. He has a keen interest in the full data science spectrum including mathematical statistics, machine learning, databases, distributed computing, and dynamic visualizations.
He holds a PhD in theoretical condensed matter physics from Imperial College and has been active in quantitative research for more than 10 years including research positions at Harvard University and the Max-Planck Institute in Germany. He has disseminated his research in more than 30 peer reviewed publications and over 50 talks.
Machine Learning | Open Source Data Science | Intermediate-Advanced | Training
With greater machine learning and big data software from the open source community, it is now easier than ever to build powerful predictive models to help people making decisions. On the other hand, even if one can predict the outcome accurately, it is not always trivial to make real life decisions, especially when there are a large number of choices and trade-off between multiple objectives. Bio-inspired optimization algorithms can be a great combination with the predictive modelling techniques under such scenario, allowing efficient exploration of the multi-dimensional choice space to find out the optimal frontier of the key objectives. Decision makers can then focus on these objectives rather than worrying about the choices at the execution level.
In this session we are going to walk through an example of multi-objective optimization problem in the context of a promotion campaign, using the open source package PyGMO (the Python Parallel Global Multiobjective Optimizer) from ESA. We will first briefly touch upon how to build a propensity model for such marketing activities. Then we will see how to optimize our promotion strategy with PyGMO, based on the prediction of the propensity model. We will also go a bit into the details of various algorithms available in PyGMO, as well as how to handle constraints.
Dr Jiahang Zhong is the leader of the data science team at Zopa, one of UK’s earliest fintech company. He has broad experience of data science projects in credit risk, operational optimization and marketing, with keen interests in machine learning, optimization algorithms and big data technologies. Prior to Zopa, he worked as a PhD and Postdoctoral researcher on the Large Hadron Collider Project at CERN, with focus on data analysis, statistics and distributed computing.
Big Data | Intermediate | Training
These lectures include pattern recognition and knowledge discovery, computer learning and statistics. Addressed is how geometry and topology can uncover and empower the semantics of data. Key themes include: text mining; computational linear time hierarchical clustering, search and retrieval; the Correspondence Analysis platform that performs latent semantic factor space mapping, and accompanying hierarchical clustering. Full Details.
Fionn Murtagh is Professor of Data Science and was Professor of Computer Science, including Department Head, in many universities. Following his primary degres in Mathematics and Engineering Science, before his MSc in Computer Science, that was in Information Retrieval, in Trinity College Dublin, his first position as Statistician/Programmer was in national level (first and second level) education research. PhD in Université P&M Curie, Paris 6, with Prof. Jean-Paul Benzécri, was in conjunction with the national geological research centre, BRGM. After an initial 4 years as lecturer in computer science, there was a period in atomic reactor safety in the European Joint Research Centre, in Ispra (VA), Italy. On the Hubble Space Telescope, as a European Space Agency Senior Scientist, Fionn was based at the European Southern Observatory, in Garching, Munich for 12 years. For 5 years, Fionn was a Director in Science Foundation Ireland, managing mathematics and computing, nanotechnology, and introducing and growing all that is related to environmental science and renewable energy.
Fionn was Editor-in-Chief of the Computer Journal (British Computer Society) for more than 10 years, and is an Editorial Board member of many journals. With over 300 refereed articles and 30 books authored or edited, his fellowships and scholarly academies include: Fellow of: British Computer Society (FBCS), Institute of Mathematics and Its Applications (FIMA), International Association for Pattern Recognition (FIAPR), Royal Statistical Society (FRSS), Royal Society of Arts (FRSA). Elected Member: Royal Irish Academy (MRIA), Academia Europaea (MAE). Senior Member IEEE.
Machine Learning | Beginner-Intemediate | Workshops
This workshop will explore the application of Machine Learning and A.I techniques as a Cyber Defence solution. Current Cyber defence system are clearly struggling with the volume and sophistication of cyber-attacks. This is fundamentally a big data and weak signal problem. It is thus well suited to the application of machine learning technology to help identify, classify and manage cyber threats. Full Details
Deep Learning | Beginner-Intermediate | Training
Deep Learning (DL) has become ubiquitous in every day software applications and services. A solid understanding of DL foundational principles is necessary for researchers and modern-day engineers alike to successfully adapt the state of the art research in DL to business applications.
Researchers require a DL framework to quickly prototype and transform their ideas into models and Engineers need a framework that allows them to efficiently deploy these models to production without losing performance. We will show how to use Gluon APIs in Apache MXNet to quickly prototype models and also deploy them without losing performance in production using MXNet Model Server (MMS).
In this workshop, you will learn applying Convolutional Neural Network (CNN), a class of DL techniques, in Computer Vision (CV) and applying Recurrent Neural Network (RNN) DL techniques for solving Natural Language Processing (NLP) tasks using Apache MXNet – the two fields in which Deep Learning has achieved state of the Art results.
To learn applying DL in CV problems, we will get hands-on by building a Facial Emotion Recognition (FER) model using advances of deep learning in CV. We will also build a sentiment analysis model to understand the application of DL in Natural Language Processing (NLP). As we build the model, we will learn common practical limitations, pitfalls, best practices and tips and tricks used by practitioners. Finally, we will conclude the workshop by showing how to deploy using MMS for online/real-time inference and using Apache Spark + MXNet for offline batch inference on large datasets.
Naveen is a Senior Software Engineer and a member of Amazon AI at AWS and works on Apache MXNet. He began his career building large scale distributed systems and has spent the last 10+ years designing and developing it. He has delivered various Tech Talks at AMLC, Spark Summit, ApacheCon and loves to share knowledge. His current focus is to make Deep Learning easily accessible to Software Developers without the need for a steep learning curve. In his spare time, he loves to read books, spend time with his family and watch his little girl grow.
Machine Learning | Open Source Data Science | Intermediate | Training
Scikit-learn is a machine learning library in Python, that has become a valuable tool for many data science practitioners. This training will cover some of the more advanced aspects of scikit-learn, such as building complex machine learning pipelines, model evaluation, parameter search, and out-of-core learning. Apart from metrics for model evaluation, we will cover how to evaluate model complexity, and how to tune parameters with grid search, randomized parameter search, and what their trade-offs are. We will also cover out of core text feature processing via feature hashing. Full Details.
Machine Learning | Open Source Data Science | Intermediate | Workshops
This lecture discusses the formulation Vector Autoregressive (VAR) Models, one of the most important class of multivariate time series statistical models, and neural network-based techniques, which has received a lot of attention in the data science community in the past few years, demonstrates how they are implemented in practice, and compares their advantages and disadvantages used in practice. Real-world applications, demonstrated using python, are used throughout the lecture to illustrate these techniques. Full Details.
Kickstarter | Open Source | Beginner-Intemediate | Workshops
Survival/duration models are common ways to model the probability to fail/survive at each period in your data set. Though they are common in certain fields in economics, econometrics and biology, they are less commonly applied in data science, despite them often being the most appropriate approach to a problem. This workshop will start with a theoretical introduction on basic non-parametric, semi-parametric and parametric models such as Kaplan-Meier, Cox Proportional Hazard (with and without time-varying covariates), and Aalen additive model, and random survival forests. In the second part of the workshop, we will look at how we can apply these models in Python and R. Full Details.
Violeta has been working as a data scientist in the Data Innovation and Analytics department in ABN AMRO bank located in Amsterdam, the Netherlands.In her daily job, she works on projects with different business lines applying latest machine learning and advanced analytics technologies and algorithms. Before that, she worked for about 1.5 years as a data science consultant in Accenture, the Netherlands. Violeta enjoyed helping clients solve their problems with the use of data and data science but wanted to be able to develop more sophisticated tools, therefore the switch.
Before her position at Accenture, she worked on her PhD, which she obtained from Erasmus University, Rotterdam in the area of Applied Microeconometrics.In her research she used data to investigate the causal effect of negative experiences on human capital, education, problematic behavior and crime commitment.
Machine Learning | Beginner-Intemediate | Workshops
Target leakage is one of the most difficult problems in developing real-world machine learning models. Leakage occurs when the training data gets contaminated with information that will not be known at prediction time. In this talk, we will look through real-life examples of data leakage at different stages of the data science project lifecycle, and discuss various countermeasures and best practices for model validation. Full Details.
Yuriy Guts is a Machine Learning Engineer at DataRobot with over 10 years of industry experience in data science and software architecture. His primary interests are productionalizing data science, automated machine learning, time series and multiseries forecasting, processing spoken and written language. He teaches AI and ML at UCU, and has led multiple data science and engineering teams in Ukraine.
Accelerate AI Keynote Bernard Marr
Accelerate AI keynote Luciano Floridi
Data Science and Artificial Intelligence technologies are impacting industries transversally, dynamically transforming personalized marketing and sales, supply chain management forecasting, risk management, fraud detection and predictive maintenance.
The Pharmaceutical Industry is lagging behind the transformation, primarily because of its data structure and of regulatory, legal and privacy limitations.
Paradoxically, while Pharmaceutical Research and Development is the area were transformation is most acutely needed because of the high R&D failure rate and ever-growing development cycle times, R&D is also the most refractory to change do to its highly siloed data.
The pervasive “AI Hype” that promises effortless transformation is faced here with the hard requirements of results interpretability and scientific reproducibility.
We present a Data Science strategy to transform Pharma R&D, taking the specific case study of Vaccines, the most impactful Public Health intervention after clean water.
The proposed Data Strategy is based on four pillars: i) Next Generation Data Management and Governance, ii) Completely redefined, patient-centric Information Management Systems, iii) Science-driven advance analytics and machine learning, iv) organizational evolution through a Data Science competency framework.
The overall framework will be discussed, along with concrete examples of successful application.
Dr. Duccio Medini received his Ph.D. in Biophysics from the University of Perugia, Italy, and the Northeastern University in Boston, MA. He serves as Head of Data Science and Clinical Systems for GSK Vaccines, honorary member of the Cuban Immunology Society, Fellow of the ISI Foundation and adjunct professor of Bioinformatics and Biostatistics.
Dr. Medini discovered the pangenome concept and contributed to the development of the first universal vaccine against serogroup B meningitis. He published more than 30 papers, book chapters and patents on the population genomics of bacteria and mathematical modeling of vaccine effects.
Dr Sybil Wong Vivian Chang discuses how the maturation of AI technologies is the prelude to an unforeseen step change in humankind’s ability to understand and innovate. Leveraging human expertise to train AI can increase the pace of democratization of science by disrupting the way current research and development is performed and communicated. Ful Details.
Almost every company in the financial technology sector has already started using AI to improve customer experience, gain better insights, reduce costs, prevent fraud and launch new business models. The opportunities are endless – but how can banks capture these in the best possible way and deliver on the AI promise? Jesper will talk about Nordea’s journey to accelerate AI across the organization and how the company is achieving 10x improvements through AI technologies. Full Details.
Jesper Nordström is Strategy and Innovation Manager at Nordea’s AI & Machine Learning unit. His mission is to accelerate AI-powered business innovation across Nordea to create the leading intelligent bank. Prior to joining Nordea, Jesper worked as Strategy Consultant within emerging technologies and Digital Innovation Lead. His academic background includes a Master’s Degree in Human Computer Interaction from Royal Institute of Technology in Sweden as well Business Management studies at Stockholm University and Hong Kong University of Science & Technology.
90% of the information that exists in the world today has been created in the last two years. Smart companies will certainly exploit the power of the new generation of AI and machine learning tools to generate attributable revenue. But advanced deep learning AI instances will also give humans the power to address what have historically been intractable social and cultural problems. Chris Bishop will speak about how we can do both by smartly partnering with algorithms, bots, and machines. Full Details.
Christopher Bishop is passionate about the power of emerging technologies to deliver positive transformation at the intersection of business and culture.
Chris worked in the entertainment business for over 20 years performing in various touring rock bands as well as composing music for radio and TV commercials. He then spent 15 years at IBM in a variety of roles, working initially as a Web producer and business strategy consultant, then shifting into an executive communications role at Corporate Headquarters driving social media adoption and the use of virtual worlds.
Chris has written and spoken on the topic of AI, blockchain, augmented and virtual reality and robotics at various events and conferences. Last fall, he spoke at the ODSC event Accelerate AI, Europe 2018 in London. His talk was titled “Your brain is too small to manage your business” describing the increasing commoditization of AI.
He also recently co-authored a white paper with MIT Media Lab professor Sandy Pentland titled Blockchain+AI+Human”, describing the business possibilities as well as the socio-cultural implications of connecting AI and blockchain.
Artificial Intelligence is impacting all areas of society, from healthcare and transportation to smart cities and energy. AI won’t be an industry, it will be part of every industry. Alison’s talk will introduce the hardware and software platform at the heart of this Intelligent Industrial Revolution. She’ll provide insights into how academia, enterprise, and startups are applying AI, as well as offer a glimpse into state-of-the-art research from worldwide labs. Full Details
After spending her first year with NVIDIA as a Deep Learning Solutions Architect, Alison is now responsible for NVIDIA’s Artificial Intelligence Developer Relations in the EMEA region. She is a mature graduate in Artificial Intelligence combining technical and theoretical computer science with a physics background & over 20 years of experience in international project management, entrepreneurial activities and the internet. She consults on a wide range of AI applications, including planetary defence with NASA, ESA & the SETI Institute and continues to manage the community of AI & Machine Learning researchers around the world, remaining knowledgeable in state of the art across all areas of research. She also travels, advises on & teaches NVIDIA’s GPU Computing platform, around the globe.
Dr. Edgar Meij is a team lead and senior data scientist at Bloomberg. He holds a PhD in computer science from the University of Amsterdam and has an extensive track record in artificial intelligence, information retrieval, natural language processing, and machine learning. Before joining Bloomberg he worked at Yahoo Labs on all aspects related to entities in the context of web search. At Bloomberg he leads the team that is responsible for leveraging knowledge graph technology to drive advanced financial insights.
Artificial Intelligence’s unfulfilled expectation is it’s own worse enemy, and between citizens not understanding what it can do for them, and developers not including them in the process, the technologies are lacking transparency and fairness. All parties need to come together to solve this problem, citizens, regulators, vendors, organizations, and entrepreneurs – for AI’s sake and ours. Full Details.
Hugo mixes business strategy, technology and design thinking to drive true transformation.
Hugo has 18+ years experience working across agencies, brands and consultancies, and is deeply involved with the startup and VC world. He is currently part of the Accenture Digital leadership team where he brings together Interactive, Analytics and Digital platforms to help the Resources Industries evolve to the next stage of the connected economy. Hugo is especially keen in bringing to life Artificial Intelligence, Blockchain and IoT, as they will digitise the physical world and accelerate innovation, human experience and disruptive business models exponentially.
As more and more companies start to have a data science function we also learn the best way to organize and drive the development of data driven products. This talk will teach you some of the best practices from this based on experience from the field. You will hear about what works and what most definitely will lead to failure. Some of the key points touched upon in this talk will be agile development, automation of data science, the value of consumption in production.
We are entering a new industrial revolution, the data revolution. With it comes changes that will make life easier for most but harder for some. We will automate many task by using big data analytics and this will create new opportunities for all of mankind. I am a part of this revolution and my role is to make sure the people I work for will get the best possible experience they can have.