Schedule Guide for Pass Holders
Accelerate AI has passes for 2 days and 4 days available
The Accelerate AI session schedule below is for Tuesday April 30th and Wednesday May 1st September 19th. It is accessible to Accelerate AI 2 Day & 4 Day and all VIP pass holders
The ODSC Talks/Workshop schedule includes Thursday May 2nd andFriday May 3rd. It is available is to Accelerate AI 4 Day Pass holders plus ODSC Silver, Gold, Platinum, Platinum Business, and all VIP Pass holders
Speaker and speaker schedule times are subject to change. More sessions added weekly
East 2019 Preliminary Schedule
We are delighted to announce our East 2019 Preliminary Schedule which lists ~45% of our sessions. Session times will be added in the coming week in addition to more talks, workshop, and training sessions.
Accelerate AI | Keynote
Are you confused about what it takes to be a data scientist? Curious about how companies recruit, train and manage analytics resources? You are not alone. Many employers, educators, and managers are struggling with these issues. In fact, tremendous resources are being wasted by employers on interviewing candidates who claim knowledge of Data Science that are not even qualified for such positions. This presentation covers insight from the most comprehensive research effort to-date on the data analytics profession, proposes a framework for standardization of roles in the industry and methods for assessing skills.
We have been running an industry initiative named: Initiative for Analytics and Data Science Standards (IADSS) to support the development of standards regarding analytics role definitions, required skills and career advancement paths. The initiative kicked off a research study including a detailed survey for analytics executives and professionals, in-depth interviews with industry leaders and academicians as well as an extensive literature review. We will present our initial findings from the research and provide case studies of how bad this confusion and why it is important for the field, for practitioners and for employers and educators to have clarity on this front…more details
Accelerate AI | Keynote | Cross Industry | Beginner-Intermediate-Advanced
Enterprise leaders across every industry are seeking better ways to understand their business. They need new ways to model the data that flows through their systems – the life blood of their information architecture. Currently, the models are taken and used by human decision makers, or rigid algorithms are developed to make use of the models in an attempt to optimize one or more outcomes. But, what how can we get AI to drive the decision making? How do we AI-enable a fast paced business operating in an ever changing environment? How do we create smart, responsive businesses able to predict the organization’s need in real-time? We try to answer these questions, and provide real world examples of how to apply AI to businesses to create end-to-end intelligence across a variety of industries…more details
Babak Hodjat (https://en.wikipedia.org/wiki/Babak_Hodjat) is VP of Evolutionary AI at Cognizant, and former co-founder and CEO of Sentient, responsible for the core technology behind the world’s largest distributed artificial intelligence system. Babak was also the founder of the world’s first AI-driven hedge-fund, Sentient Investment Management. Babak is a serial entrepreneur, having started a number of Silicon Valley companies as main inventor and technologist. Prior to co-founding Sentient, Babak was senior director of engineering at Sybase iAnywhere, where he led mobile solutions engineering. Prior to Sybase, Babak was co-founder, CTO and board member of Dejima Inc. Babak is the primary inventor of Dejima’s patented, agent-oriented technology applied to intelligent interfaces for mobile and enterprise computing – the technology behind Apple’s Siri. Babak is a published scholar in the fields of Artificial Life, Agent-Oriented Software Engineering, and Distributed Artificial Intelligence, and has 31 granted or pending patents to his name. He is an expert in numerous fields of AI, including natural language processing, machine learning, genetic algorithms, distributed AI, and has founded multiple companies in these areas. Babak holds a PhD in Machine Intelligence from Kyushu University, in Fukuoka, Japan.
Accelerate AI | Keynote
Coming soon!…more details
Business Talk | Healthcare | Intermediate-Advanced
One of the most difficult industries for data science to take hold and gain effectiveness is the world of commercial pharma/biotech. Due to regulation of FDA, lack of identifiable patient data, and one of the last industries that uses a “traveling salesperson” approach, data science is still taking hold in this industry. This talk will talk in depth about steps that companies in this space can take to make the most out of their data science teams and out of their data in general. These steps will include standardizing internal data, utilizing 3rd party data in unique methodologies, bearing the course during marketing and sales initiatives, and creating validation methods.
We will dive into these issues through the context of how to bring the industry from one of “old school” sales and marketing techniques into one where machine learning can make tangible top and bottom line impacts. Through this lens we will identify areas of opportunity that should first be tackled by any organization and those areas which are often pitfalls (even though they may seem lucrative). Additionally, an ideal team make-up and time line will be outlined so that these companies can level-set where they are and where they can improve their data science processes…more details
Adam Jenkins is a Data Science Lead at Biogen, where he works on optimizing commercial outcomes through marketing, patient outreach and field force infrastructure utilizing data science and predictive analytics. Biogen is a leader in the treatment and research of neurological diseases for 40 years. Prior to being commercial lead, Adam was part of their Digital Health team where he worked on next generation application of wearable and neurological tests. Holding a PhD in genomics, he also teaches management skills for data science and big data initiatives at Boston College.
Business Talk | Cross Industry | Beginner-Intermediate
Keyence is a name that isn’t exactly household. For more than 45 years it have operated as one of the most important companies, that no one has ever heard of. As a company, Keyence manufactures sensors, which are the devices used by everything to collect data . Be it a barcode reader to a pressure sensor, a ultrasound to a beam of light. These are the devices that generate the massive amounts of data that are driving our economy. But for years Keyence has confound the investment world on how it has seen such staggering growth, has release hit product after product, and manage to reach a market cap per employee to rival the hottest startups.
The secret to this companies success? Every single decision, from the largest to the small is driven by data analytics. Information and data guided business practices mold and shape every detail from who is hired, to what products are launched, all the way down to the smallest detail.
But using that much data can be time consuming, or at minimum requires an army of data scientists and analysts to churn through the data and help make decisions. Or so people thought.
Join Keyence as they unlock the secrets that have led them to being the 6th largest firm on the Tokyo stock exchange, be ranked on Forbes “Most innovative Companies” list every year since inception, and maintain a market cap per employee that would rival the hottest startups. It’s a journey of data, and business decisions. Meeting in the middle to make real change happen…more details
Rob is Keyence’s National Director of Artificial Intelligence. Having joined Keyence in 2004 he has had the opportunity to work hand in hand with the largest and most recognized companies in the data age.
Business Talk | Healthcare | Intermediate-Advanced
Politics aside, value-based care is the model that is transforming the practice and compensation of healthcare in the United States. Once laggards, payers and providers are increasingly becoming sophisticated enterprises when it comes to data and the implications for healthcare are staggering. What lies within that data has the power to cure disease, reduce readmissions, enable precision medicine, improve population health, detect fraud and reduce waste.
Take Flagler Hospital, a 335-bed hospital in St. Augustine, Florida. They don’t have a single data scientist on staff. Nonetheless, they have orchestrated one of the most successful deployments of artificial intelligence in healthcare — delivering cost savings of more than 30%, reducing the length of stay by days and reducing readmissions by a factor of more than 7X.
In this talk, Dr. Jennifer Kloke, VP of Product Innovation at Ayasdi, will walk through how healthcare institutions small and large will be able to apply artificial intelligence in the pursuit of value based care. She can discuss the strategy, implementation, and results seen to date and go over how these advances are transforming the healthcare industry…more details
Jennifer Kloke is the VP of Product Innovation at Ayasdi. For the last three years, she has been responsible for the automation and algorithm development at Ayasdi and led many efforts to develop cutting edge analysis techniques, analyzing a wide variety of data from diverse industries including finance, public sector, and bio-tech. Jennifer and her team’s efforts landed Ayasdi spots on Fast Company’s Most Innovative Companies in Big Data, AIconics, and consecutively on Forbes Fintech50. Jennifer received her Ph.D. in Mathematics from Stanford University.
Business Talk | Cross Industry | Beginner-Intermediate-Advanced
The advent of the “citizen data scientist” has been both concerning and irritating to many so-called “real” data scientists at work in the business world today. Some argue that the prospect of new, less well-trained AI-builders has been made established data scientists threatened, insecure, and paranoid. Others argue that using untrained modelers to build AI is a recipe for disaster. In this talk, Dr. Greg Michaelson will discuss the truth and the hype of the citizen data scientist and will propose a framework for utilizing both skilled data scientists and citizen data scientists in the same organization…more details
Greg Michaelson is the Chief Success Coordinator for DataRobot. Prior to that role, he led the data science practice at DataRobot, working clients across the world to ensure their success using the DataRobot platform to solve their business problems. Prior to joining DataRobot, Greg led modeling teams at Travelers and Regions Financial, focusing on pricing and risk modeling. He earned his Ph.D. in applied statistics from the Culverhouse College of Business Administration at the University of Alabama. Greg lives in Charlotte, NC with his wife and four children and their pet tarantula.
Business Talk | Cross Industry | Beginner
As Machine Learning becomes a core component of any forward-looking company, how can we engage the entire workforce to help with ML and automation initiatives? This talk will cover how at Square we have adopted a “machine learning mindset” by 1. providing training to all employees (both technical and non-technical folks) on what ML is and how it works, including current applications and ethical considerations, 2. conducting structured brainstorming sessions to elicit automation opportunities, where everyone can contribute what ML could mean for their team or their customers, and 3. implementing a subset of those ideas by partnering with infrastructure, operations, and product teams, resulting in improved risk management, more efficient internal operations, and novel customer-facing product features…more details
Marsal Gavalda is a senior R&D executive with deep expertise in speech, language, and machine learning technologies. Marsal currently leads the Commerce Platform Machine Learning team at Square, where he applies machine learning for economic empowerment and financial inclusion. Previously, Marsal headed the Machine Intelligence team at Yik Yak, where he developed natural language processing and machine learning services to analyze the content of messages, discover trends, and make recommendations at scale and across languages. Prior to that, Marsal served as the Director of Research at MindMeld (acquired by Cisco), where he applied the latest advances in speech recognition, language understanding, information retrieval, and machine learning to the MindMeld conversational and anticipatory computing platform. Marsal has also extensive experience in the customer interaction and speech analytics space, as he has served as VP and Chief of Research at Verint Systems and as VP of Research and Incubation at Nexidia (acquired by NICE), where he developed disruptive speech analytics solutions for the call center, intelligence, and media markets. Marsal holds a PhD in Language Technologies and a MS in Computational Linguistics, both from Carnegie Mellon University, and a BS in Computer Science from BarcelonaTech. Marsal is the author of over thirty technical and literary publications, thirteen issued patents, and is fluent in six languages. He is also a frequent speaker at academic and industry conferences and organizes, every summer, a science and humanities summit in Barcelona on topics as diverse as machine translation, music, or the neuroscience of free will.
Business Talk | Cross Industry | Intermediate
Data Enthography is a look at data as a means to combat bias and produce data to intervene in larger civic and private networks and engagements. This presentation uses insights from machine learning analytics and design thinking to challenge those developing algorithms and data sets that are to be representative of diverse populations but rarely are. The talk is largely based on the concept that to illuminate bias within machine learning, the ‘removal of bias’ itself has to be manifested into a ‘thing’ to teach or sway the algorithms. The idea aims to initiate a standard for equity and equality, by centering collaboration in the creation of this data set. The application has affects within areas of biometrics of accurate facial recognition, predictive analytics and finance and credit issuances…more details
Nicole Alexander is an Adjunct Professor at New York University. She is also a Lecturer at Columbia University and an Advisory Council Subject Matter Expert at Gerson Lehrman Group. Over the past 16 years Nicole has held leadership roles across marketing and technology which included Vice President, Innovation Practice at Nielsen China, Vice President at Marketing Evolution, and Vice President, at Pointlogic.
Nicole holds a global executive M.B.A. jointly awarded by New York University Stern School of Business, HEC Paris and London School of Economics and Political Science. She earned a Bachelor’s degree in International Business from New York University.
Business Talk | Finance | Intermediate
In order to understand customer behaviors and provide better services, wealth management firms have started to invest vastly in data analytics. During this session, Meina Zhou will discuss how she has successfully helped different financial institutions implement in-house predictive analytics solutions to improve wealth management services. She will also address multiple use cases that built at different stages of the customer journey, including customer acquisition, customer personalization, and customer retention. She will discuss both the analytics components and the challenges involved throughout the implementation process….more details
Meina is the Lead Data Scientist at Indellient. Her core expertise lies in the application of proven data science tools and techniques to conduct business analytics and predictive modeling. Meina has used her business acumen and data science skills to solve business problems. Meina is a thought leader in the data science world and is an active conference speaker. She enjoys public speaking and sharing innovative data science ideas with other people. Meina received her Master of Science in Data Science from New York University and her Bachelor of Arts in Mathematics and Economics from Agnes Scott College.
Business Talk | Healthcare | Intermediate-Advanced
Artificial intelligence is revolutionizing many industries from manufacturing to automated driving, and the healthcare industry, though relatively recent to penetrate, is no exception. But diagnostics and therapeutics of child behavioral conditions like autism, ADHD, and language disorders remain relatively behind. This is unfortunate because early detection of such disorders is proven to improve the prospects of affected children dramatically. It also happens to be a healthcare context where there is a significant unmet need for streamlining and scalability of services.
We discuss the potential of artificial intelligence to disrupt that domain and talk about specific artificial intelligence techniques, challenges, and outcomes of experiments that have been applied successfully in practice.
We present the challenges, solutions, insights, and validation results of our user-facing solution that screens for autism and ADHD by combining multiple predictors based on various media inputs while allowing for inconclusive determination on hard-to-screen subjects. We will talk about incorporating signal from parental questionnaires, expert analysis of short video uploads, as well as short doctor questionnaires. We will get into leveraging automatable, complementary signals like audio-video streams from interactive storytelling and gaming sessions on tablets and mobile phones. Finally, we will explore the potential of deep learning to help with this problem-setting.
This talk is intended to raise awareness of the potential benefit of applying AI to the field of healthcare in general and child behavioral diagnostics in particular, and share with the audience a concrete example of a context where proper application of artificial intelligence has yielded demonstrably superior results to the traditional standard of practice…more details
Halim is a high tech innovator who spearheaded world-class data science projects at game changing techs like eBay and Teradata. Formally educated in Machine Learning, his professional expertise span Information Retrieval, Natural Language Processing, and Big Data. Halim has a proven track record of applying state of the art data science techniques across industry verticals such as eCommerce, web & mobile services, airline, BioPharma, and the medical technology industry. He currently leads the AI department at Cognoa, a data driven behavioral healthcare startup in Palo Alto.
Business Talk | Cross Industry | Intermediate
Many of the 1.5M nonprofits in the US face increasing pressure to achieve their missions due to fundraising inefficiencies, lack of access to talent pools, and budget constraints. This limits the social good that can be cultivated to improve our world.
Through this lens, we’ll explore how artificial intelligence has played a crucial role in expanding the capabilities of nonprofit organizations, allowing them to build more and deeper relationships with supporters without having to hire more people (or fire anyone). As practitioners of vertical AI products, we’ll show how we’ve utilized open source technologies including TensorFlow, NLTK, CRFsuite, SKlearn, and others to determine who is most likely to give a donation to an organization and learn from communications to mimic the cognitive functions of fundraisers to expand their reach.
The discussion will center around case studies involving the College of Charleston, where AI was used to increase their workforce throughput by 160%, and University of Delaware, where AI uncovered a $50M donor. Similar examples will be pulled from cancer research, human rights, and Alzheimer’s funding.
The talk is intended for beginner-to-intermediate attendees. The main takeaways will be an exploration of AI in SaaS products, how behavioral psychology affects AI adoption, and insight into the nonprofit industry….more details
Rich is the co-founder and CTO of Gravyty, the first company focused on applying artificial intelligence to the social good industry. Rich believes that technology and companies are at their best when they augment people and allow them to do things that were previously impossible in simple, cost-effective and elegant ways.
As a brain aneurysm survivor, Rich has a deeply rooted belief that technology should be used for the positive benefit of people. He strives to scale Gravyty to help nonprofits of all shapes and sizes gain the resources they need to achieve their missions. Prior to Gravyty, Rich was senior product manager at RelSci, a $120M business connection intelligence platform. He was also the head of portfolio analytics for CapitalIQ and led product development for their quantitative solutions team. In between, Rich has founded four technology-based companies and was the winner of the 2015-2016 Entrepreneurship Award from Babson College.
He has a BS in Economics and Information Technology from Rensselaer Polytechnic Institute and an MBA from Babson College.
David Woodruff is Associate Vice President and Chief Operating Officer for Resource Development at Massachusetts Institute of Technology (MIT) and he has served in this capacity since June 2012. There he oversees front-line and support operations of a development team that raises more than $500 million per year. MIT recently launched the public phase of its $6 billion comprehensive campaign, The MIT Campaign for a Better World.
David first worked at MIT between 1984 and 2002. His assignments included corporate fundraising and individual giving and he led the major gifts team in MIT’s successful $2 billion campaign in 1997-2004, Calculated Risks/Creative Revolutions. Between 2002 and 2008, David was Chief Development Officer at Harvard School of Public Health where he headed up initial planning for the School’s portion of a Harvard University campaign. From 2008 to 2012, David held the post of Executive Director and Chief Operating Officer for Development at Massachusetts General Hospital (MGH) where he oversaw overall development operations and guided the execution of the hospital’s successful $1.5 billion campaign, The Campaign for the Third Century of Medicine.
David received his bachelor’s degree from MIT and master’s degree from Stanford University. David also earned his MBA from Babson College. David has been a frequent presenter at conferences held by CASE, AFP and AHP and serves on a number of nonprofit boards. In 2017 he was the recipient of CASE’s Quarter Century Circle Award for fundraising service in higher education. David is also a Certified Fundraising Executive. David is currently president of the Massachusetts Chapter of the Association of Fundraising Professionals and began his two-year term in 2019. David serves on the newly created AI in Advancement Advisory Council.
Business Talk | Cross Industry | Beginner-Intermediate-Advanced
While humanoids and computer wizardry attract attention, how are normal businesses currently using artificial intelligence? Leading organizations are deepening their commitments to AI and are eager to scale AI. But many companies have discovered, often to their surprise, that it is easy to apply AI and get quick results. What is not so easy is building a system of AI applications along with associated data pipelines that interact and are reliable. I will share the results of research that combines a global survey with 3,076 respondents and in-depth interviews with 36 business executives. The research tells a story of measurable benefits from current AI initiatives, increased investments, and determined efforts to expand AI across the enterprise…more details
Sam Ransbotham is an associate professor of information systems and McKiernan Distinguished Fellow at Boston College. While everyone seems to be talking about artificial intelligence, Sam is curious about what businesses are really doing—How are normal businesses using AI now and how will they use AI later? To learn more, he serves as editor for the MIT Sloan Management Review initiative on Artificial Intelligence in Business. Recently, he co-authored a research report “Artificial Intelligence in Business Gets Real” based on a global survey of 3,076 executives and dozens of interviews. Sam earned a Bachelor’s degree in Chemical Engineering, an MBA, and a PhD all from the Georgia Institute of Technology.
Business Talk | Cross Industry | Begginer-Intermediate
Data scientists spend 30% of their time building shoddy infrastructure. Our data shows that many AI teams can accelerate their progress by 10x at least. Deep Learning brings with it enormous amounts of data, complicated experiment results and intense compute requirements. Decades of experience in moving code to production yielded best practices in engineering that have not yet found their place in deep learning teams. Breaking silos to foster trust, a transparent culture, and shared responsibility – we introduce DeepOps – deep learning ops. A set of methodologies, tools and culture where data engineers and scientists collaborate to build a faster and more reliable deep learning pipeline.
Surveying hundreds of AI companies, we’ve learned that adopting DeepOps practices helped them ship faster, with more confidence and improved customer experiences.
In this talk, Yuval Greenfield, Deep Learning Developer Relations at MissingLink.ai, will discuss:
DeepOps checklist – insights from leading AI teams and how to bring them to your team.
Increase productivity within data science teams
Reduce time to market
Increase deployment speed
Increase visibility and transparency across teams…more details
Yuval Greenfield has been an engineer and data enthusiast for the past 13 years in the fields of military cybersecurity, computer vision medical diagnostics, gaming, 360 cameras, and deep-learning tools. He holds a B.Sc. in Physics and Mathematics from the Hebrew University of Jerusalem as part of the IDF Talpiot program. At MissingLink, Yuval is in charge of developer relations, using the MissingLink platform for deep learning research, building tutorials, marketing content, and technical presentations.
Business Talk | Cross Industry | Intermediate-Advanced
Machine learning and “AI” systems often fail in production in unexpected ways. This talk shares real-world case studies showing why this happens and explains what you can do about it, covering best practices and lessons learned from a decade of experience building and operating such systems at Fortune 500 companies across several industries.
The covered topics include concept drift (identifying and correcting for model decay due to changes in the distribution of data in production), common pitfalls in A/B testing (like the primacy and novelty effects), offline versus online measurements, and systems that learn in production (such as adversarial learning use cases). This talk is intended for executives, technical leaders and product managers who want to learn from others’ mistakes how to best set up their teams & products for success…more details
David Talby has been building real-world big data analytics systems in healthcare, finance and e-commerce for over a decade. David has extensive experience in building and operating web-scale data science and business platforms, as well as building world-class, Agile, distributed teams. Prior to joining the startup world, he was with Microsoft’s Bing group, where he led business operations for Bing Shopping in the US and Europe. Earlier, he worked at Amazon both in Seattle and the UK, where he built and ran distributed teams that helped scale Amazon’s financial systems. David holds a PhD in computer science and master’s degrees in both computer science and business administration.
Business Talk | Cross industry | Beginner-Intermediate-Advanced
Many events and conferences have gender and general diversity issues. These can be a product of a long history of predominately male speakers – the Known Quantity Factor– or, of an industry where there is little diversity at the top (where speakers often come from) or in the ranks.
An #allmalepanel can get you mocked, ridiculed and harassed on social media. There’s a “Congrats you have an all-male panel” Tumblr. Gender Avenger (@GenderAvenger) looks for events light on female speakers and creates social media-ready images with your event hashtag. The Gender Avenger community can bury your event’s social media stream in a flash.
Speaking at conferences is an important path to career and business success. Speaking at conferences gets you access to prospective customers, partners, employees and investors. Meanwhile, more diverse conferences are more interesting and better attended.
So, how to do you create a pipeline for future success and a diverse conference? You likely already have a number of diversity speakers inside your own community. We help educate your existing community of women and diverse candidates on what it takes to be a speaker – how to source speaking opportunities, write abstracts, pitch and then rock the gig itself. We also help them understand the benefits of public speaking through this hands-on workshop. (We can also create a special “event inside an event” that will increase your number of female speakers AND create a pipeline for the future.)
We talk to professional women about:
• The Speaker’s Paradise (why there are so many speaking opportunities today)
• The 5 “C’s” – the benefits of public speaking
• Rebranding their fear – getting over the nervousness of public speaking
• How to handle the “no” – getting turned down to speak…more details
Bobbie Carlton is a parallel founder and an award-winning speaker, marketing, PR and social media professional. Her marketing and PR firm, Carlton PR & Marketing is a boutique firm that supports tech startups and small to medium-sized companies. Her second company, Mass Innovation Nights (MIN), is a social media powered new product showcase which has launched more than 1200 new products which have received more than $3 billion in collective funding. Company number three is Innovation Women, an online speaker bureau for entrepreneurial and technical women. It helps connect event managers and awesome speakers who just happen to be women. Follow Bobbie on Twitter as @BobbieC @MassInno @WomenInno or @CarltonPRM.
Business Talk | Cross Industry | Intermediate
Artificial intelligence is the buzzword that is everywhere in the media in 2018. Although this subject is the focus of attention, it still raises many questions about how this impacts businesses and how can they better leverage AI. More specifically, Olivier Blais, experienced data scientist will demonstrate how to develop an agile data science community as he has developed practice communities in various companies from different industries. He will also prove why Artificial Intelligence is not as complex as everybody thinks and why you should start deploying artificial intelligence capabilities now.
During this presentation, Olivier will:
* Introduce key concepts such as artificial intelligence, machine learning and deep learning
* Present great use cases, in which machine learning can give companies an edge in enhanced productivity, cost reduction or revenue generation
* Demonstrate strategies to enable your company to become more data driven
* Detail useful techniques on how you can make it easy to deploy artificial intelligence in your company…more details
Olivier Blais is the Head of Data Science for Moov AI and an experienced Data Scientist with many years of business transformation experience using innovative approaches in different industries, such as financial services, technologies, aerospace, and consumer products. He is a strong believer that a good dose of data, mixed with a pinch of human instinct, can shed some light on what makes a business successful.
Business Talk | Cross industry | Beginner-Intermediate-Advanced
Things (IoT) will mean that the amount of devices that connect to the Internet will rise massively. This is already giving rise to the creation of massive amounts of data. Spatial and temporal mobility patterns of things and societies as a whole can be characterized based on the interactions that we are able to capture from the IoT sensors.
In this presentation, we will review what we can learn from human mobility patterns, how they can be used to optimize traffic, city planning and tourist attractions. We will review the challenges associated with privacy and security regulation when analyzing mobility patterns. As an application we will present an study on AIS data that describes the locations of vessels traveling in Norwegian seas. We will close the presentation with an overview of the kind of AI techniques we can apply to analyze mobility patterns…more details
Arturo Opsetmoen Amador is a senior data scientist working as a consultant for Capgemini. He specializes in the application of AI technologies to solve practical problems that have positive effects on our society. He has experience as a lead data scientist in Smart Digital, a division part of Telenor Norway – Business. He was in charge of bringing the Big Data service «Mobility Analytics» to provide insights into human mobility patterns to the Norwegian market. He has a scientific background and holds a Ph.D. in physics from the Norwegian University of Science and Technology. His interests include Big Data, AI and ML technologies, and how their ethic implementation can improve our society.
Business Talk | Cross Industry | Beginner-Intermediate-Advanced
Data science managers (and senior leaders managing data science teams) need to think through many questions relating to how to best execute their data science efforts. For example, what is the most effective way to lead a data science project? How to make sure my data science team does not expose my organization to issues relating to the misuse of data and/or algorithms? How do I validate the results provided by the data science team?
This workshop will provide a framework managers can use to help ensure a successful data science project. The focus of this framework is not on which specific algorithm a team should use, but rather, how to ensure that the data science effort is progressing effectively and efficiently.
Jeff Saltz is an Associate Professor at Syracuse University, where his research and consulting focus on helping organizations leverage data science and big data for competitive advantage. Specifically, his work identifies the key challenges, and potential solutions, relating to how to manage, coordinate and run data science / big data projects within and across teams. In order to stay connected to the real world, Jeff consults with clients ranging from professional football teams to Fortune 500 organizations. In his last full-time corporate role, at JPMorgan Chase, he reported to the firm’s Chief Information Officer and drove technology innovation across the organization. Saltz received his B.S. in computer science from Cornell University, an M.B.A. from The Wharton School at the University of Pennsylvania and a Ph.D. in Information Systems from the NJIT.
Business Workshop | Finance | Advanced
In today’s financial services industry, competitors are rushing to enable the AI-driven enterprise by making strategic investments in AI and machine learning technology, financial institutions not investing in AI and machine learning technology risk losing their competitive edge. However, due to an increased reliance on AI and machine learning models with everyday business processes and for strategic decisions, model risk must not be ignored and must be effectively managed. If left unchecked, the consequences of model risk can be severe; where model risk is defined as the risk of financial or reputation loss due to errors in the development, implementation or use of models.
Therefore, AI and machine learning models require constant monitoring and effective validation. This is not only a regulatory requirement, but it is also sound business practice. In this session, Seph will present the cornerstones of effective modern model risk management in the age of AI and machine learning by first providing an overview of AI and machine learning in the financial serves industry, summarizing the regulatory background and the machine learning model lifecycle, and then finally presenting the challenges and emerging best practice for the validation of models, in an ever-changing world of AI and machine learning…more details
Business Talk | Cross Industry | Intermediate-Advanced
So you’re doing some machine learning. But have you really thought about what needs to happen once you put it into production? This is the challenge lurking behind every promising machine learning initiative: making it work in the real world.
Robbie Allen, author of the book Machine Learning in Practice and the CEO of two data-centered companies, breaks the challenge down into key production issues like:
On-Prem Deployment. If you’re not in the cloud, deployment on your own servers or hardware can be difficult.
Workflow Integration. When companies work with legacy and/or closed systems, connecting prediction API’s into products/workflows is hard.
Who Owns Deployment and Maintenance? Is DevOps responsible for maintenance and deployment of ML models into production? Is this a job for data scientists? Engineers? When data scientists build models, will they see the light of day?
Monitoring. Today’s tools lack the ability to measure model drift and identify when a model’s production data is no longer representative of training data.
Still Learning? Machine learning is not “learning” unless there is a continuous feedback loop of data from production; many ML solutions do not have an established method for doing this.
Changing It Up. Measuring and adjusting machine learning models is challenging enough; these tasks are even harder when dealing with changing models, parameters, data labels, and goals…more details
Robbie Allen is the CEO of Infinia ML, a team of advanced machine learning experts focused on making business impact. The company helps Fortune 500 companies and cutting-edge startups reduce costs, increase efficiency, and achieve breakthroughs with data science. Infinia ML serves industries from manufacturing and healthcare to marketing and human resources. The company’s capabilities include natural language processing, recommendation engines, object detection, 3D image modeling, and anomaly detection.
Previously, Robbie founded and led Automated Insights, whose natural language generation software helps automate content production for The Associated Press, Yahoo!, and many others. Automated Insights was successfully acquired by Vista Equity Partners in 2015, and Robbie currently serves as the company’s Executive Chairman. Before starting Automated Insights, Robbie was a Distinguished Engineer at Cisco. Robbie has authored or coauthored eight software books, owns six patents, and has spoken at a variety of conferences including the O’Reilly AI Conference, Strata, SXSW, and the MIT Sloan CIO Symposium. He holds two Master’s degrees from MIT and is completing his Ph.D. in computer science at UNC-Chapel Hill.
Business Talk | Healthcare | Beginner-Intermediate
The latest AI advances have the potential to massively improve our health and well being. However, most of the work is yet to be done. In this talk, we will explore the most important opportunities for AI in healthcare. For example, we will explore how AI can diagnose major life-threatening conditions even before those conditions emerge. We will talk about AI ability to recommend dramatically more effective and less harmful treatment plans based on AI understanding of patient’s medical history and current conditions. Finally, we will talk about AI role in making our healthcare system effective and affordable for everyone…more details
Alex Ermolaev, Director of AI at Change Healthcare, has developed and led a variety of AI projects over the last 20 years, including enterprise AI, NLP, AI platforms/tools, imaging and self-driving cars. Alex is one of the most frequent “AI in Healthcare” speakers in the Silicon Valley. Change Healthcare is one of the largest healthcare technology companies in the world.
Business Talk | Cross Industry | Intermediate
Each day, AI and data science become less of a competitive advantage for businesses and more of a requirement for survival. Yet, many companies struggle to effectively leverage their data science and AI investments. New Vantage Partners reports that although 92% of businesses are increasing investment in AI, 77% continue to face challenges with adoption. What makes some companies reap the benefits of AI while others struggle to see return on investment? The difference is strategy.
Whether your teams are distributed or centralized, many or few, sophisticated or just starting, your company needs a unified data strategy. But, data science and AI bring new challenges to strategic planning. The nature of their output demands that data science and AI projects undergo a more risk-aware scoping process. To scale smoothly and evolve quickly, a data strategy needs to provide broad and specific standards. Finally, proposed projects need to be contextualized within the larger data ecosystem in order to amplify their impact.
This talk is designed for business leaders, data science managers, and decision makers that want to ensure the effectiveness of the AI and data science capabilities they are building. Attendees will leave equipped with the tools to:
Critically evaluate pitched projects and select the most strategic ones;
Build an effective, impactful, and high yield data science project portfolio;
Evolve your data science roadmap to quickly adapt to new opportunities…more details
Kerstin is passionate about bringing data science from the edge of business to the center of it. She has data science experience in all three sectors: for-profit, non-profit, and government. Currently, she is a Senior Data Scientist at Metis where develops and delivers curriculum to accelerate data science learning for teams. As Director of Data Science, she founded the Guidestar data science team and brought machine learning to the largest nonprofit data warehouse. At Postmates she used her broad data science toolkit to support marketing, growth, finance, and fleet team needs. As a University of Chicago Data Science for Social Good Fellow she helped uncover early signals for delays in education. She holds graduate degrees in statistics, mathematical statistics, and mathematical computer science from Cornell University and University of Illinois at Chicago. As an undergraduate she studied psychology and anthropology at Yale University.
Business Talk | Cross Industry | Advanced
The opportunities are endless in the global economy. However, monetizing data analytics in the global space is like a free fall, hoping to have a parachute when landing. To succeed, you need a solid data science strategy that can be deployed across multiple geographies, each with unique business risk factors.
LexisNexis® Risk Solutions is well-established in helping companies mitigate financial risk and has been quite successful across global markets. Prabhu Sadasivam, leader of Analytic Technology, discusses how LexisNexis established its analytic solutions with a global presence, how it has overcome unique geographical challenges, and lessons learned that continue to inform and improve their data science efforts.
The discussion topics and examples will include standardized analytic framework, data dogmatism, scaling solutions, managing market size, developing solutions with no defined target or data, attribute and model monitoring, cloud variability, agility, and data governance across geographies…more details
Business Talk | Cross Industry | Beginner-Intermediate-Advanced
Imagine time as currency. Finite. Fiat. Hard to earn. Harder to yield. In a world of infinite content, why would anyone use your service or app? The answer is to provide unobtrusive, transparent return on investment. The initial milestone for A.I. was connecting people to relevant content. The new harder requirement is recommending the right content, at the right time, and in the right medium to the right person. Given that natural ceiling on digital screen time, A.I has to transcend into real life. How do you extend your digital service to have real life touchpoints? If you work in the social recommendations space e.g. news, social networking, how do you prepare for this new age? We will walk through how to leverage A.I to make the content glut tractable for a picky audience with finite time…more details
Saisi is a Product Manager in Facebook’s Artificial Intelligence org. Specifically, leveraging Artificial Intelligence / Machine Learning at scale to personalize experiences for 2 Billion Facebook users. He has first-hand experience in driving business and consumer outcomes across the Facebook family that includes Facebook, Instagram and WhatsApp. At Facebook, he has also worked providing Internet access to underconnected and Facebook’s eCommerce vertical.
Business Talk | Cross Industry | Intermediate
Based on her experience of building analytics teams from the ground up, Hillary will walk through the process of creating an analytics team.
We’ll begin by examining why analytics teams exist and how they are different from Data Science teams. Next, we’ll discuss possible structures for analytics team, including embedded, independent, and hybrid structures.
We’ll talk about best practices in hiring a diverse and talented analytics team, including good interview questions, and interview tools, such as CoderPad to ensure that applicants have the necessary skill set.
Once the team is up and running, it needs to integrate with Product teams. Creating best practices around data creation and experimental design, can make sure that your team is involved early, before problems can surface.
Success can bring challenges, such as too many under-defined requests. Creating a ticketing system unique to your team can ensure that ad hoc requests can be handled in a systematic and efficient manner. This is key to scaling an analytics team.
There are many approaches to becoming the voice of data at a company. Building a data reporting ecosystem ensure that all internal clients have access to what they need when they need it. The talk will cover dashboarding, alert systems, and data newsletters. Finally, we’ll discuss promoting responsible data conception through continuous training in statistics and tooling for all members of an organization…more details
Hillary is a Senior Curriculum Lead at DataCamp. She is an expert in creating a data-driven product and curriculum development culture, having built the Product Intelligence team at Knewton and the Data Science team at Codecademy. She enjoys explaining data science in a way that is understandable to people with both PhDs in Math and BAs in English.
Business Talk | Finance | Healthcare | Intermediate
Deep Learning is an incredibly powerful technique, which has found uses in wide range of applications such as image object detection, speech translation, natural language processing and time series modeling. However, training deep neural network models requires a tremendous amount of time, training data and compute resources. A technique called transfer learning allows data scientists to increase their productivity dramatically by sharing neural network architectures and model weights. Reuse of a pre-trained model on a different but related task enables training of deep neural networks with comparatively less data. In this talk, you will learn the details of how transfer learning works and will see demonstrations in both financial and healthcare domains. We will talk about specific use cases and lessons learned that are applicable to many other industry sectors…more details
Anjali is a Senior Data Scientist at IBM aligned to insurance and financial services industry. She has worked across healthcare, financial services and telecommunications industries. Her expertise in applying cutting-edge technology to analyze structured and unstructured data has helped her clients convert data into actionable business insights. Her early career in software engineering focused on managing complex projects with strict deadlines (having delivered multiple technology solutions). Prior to joining IBM, she has delivered 80+ lectures as Assistant Professor in Health Information Management. She has a Ph.D. in Biomedical Informatics and Applied Statistics, Master’s and Bachelor’s degrees in Computer Science.
Steve is a Data Science Solutions Architect on the IBM Analytics team covering Healthcare, State/Local Government. Steve works with clients to understand their big data and analytics goals and helps design data driven solutions to fit their needs. When not engaged with clients, Steve can be found keeping up with all the latest breakthroughs in data science. Steve achieved the “Kaggle Competitions Expert” designation by his high performance in Kaggle Machine Learning Competitions and received his Bachelor of Science in Computer Science and Applied Mathematics from the University at Albany.
Business Talk | Finance | Intermediate
Given the growing interest in NLP among investors, this session will demystify common NLP terms and provide an overview of general steps in NLP. NLP can be used to quantify the sentiment of earnings calls. We will discuss how sector-level sentiment trends are generated, providing insights around inflection points and accelerations, stock-level sentiment changes and forward returns as well as language complexity in earnings calls…more details
Frank is a Senior Director and a key member of S&P Global Market Intelligence’s Quantamental Research group. His primary focus is to conduct systematic alpha research on global equities with publications on natural language processing, newly discovered stock selection anomalies, event-driven strategies and industry-specific signals. Frank has a master’s degrees in Financial Engineering from UCLA Anderson and in Finance from Boston College Carroll, and has undergraduate degrees in Computer Science and Economics from University of California, Davis.
Business Talk | Cross Industry | Begginer-Intermediate-Advanced
As data scientists, we invest much of our time on the business problem, the data, the statistics, the algorithm and the model. But we can’t afford to overlook one very important component: the customer! A great AI/ML model with a poorly designed user experience is ultimately is going to fail. The world’s best data products are born from a perfect blend of data science and an amazing user experience.
Design thinking is a methodology for creative problem solving developed at the Stanford University d.school. The methodology is used by world class design firms like IDEO and many of the world’s leading brands like Apple, Google, Samsung and GE.
Michael Radwin, VP of Data Science at Intuit, will offer a recipe for how to apply design thinking to the development of AI/ML products. Your team will learn how to get deep customer empathy & fall in love with the customer’s problem (not the team’s solution). Next, you will learn to go go broad to go narrow, focusing on what matters most to customers. Finally, learn how to get customers involved in the development process by running rapid experiments and quick prototypes.
These lessons of blending data science & design thinking can be applied to products that leverage supervised and unsupervised machine learning models, as well as “old-school” AI expert systems.
Case study examples will include:
Mint users lose $250 million in overdraft fees every year. Using the data from Mint’s 10 million users, we applied a machine learning algorithm that predicts if you are likely, within three days, to have an overdraft. Mint alerts you in time, with helpful suggestions on how to avoid the exorbitant Non-Sufficient-Funds fee.
Business or personal? Mobile mileage tracking for QuickBooks Self-Employed: ML model + UX = automatic categorization of individual trips easy to accurately rack up potential tax deductions.
Americans spend 7 billion hours every year filing taxes. TurboTax’s Tax Knowledge Engine, which uses advanced AI to translate the 80,000+ pages of US tax requirements and instructions into a software oracle that can explain computations to DIY tax filers so that they have greater confidence in the calculations in their returns…more details
Michael Radwin is a Vice President of Data Science at Intuit, responsible for leading a team dedicated to using artificial intelligence and machine learning models for security, anti-fraud and risk. Prior to Intuit, Radwin was VP Engineering of Anchor Intelligence, which used machine learning ensemble methods to fight online advertising fraud. He also served as Director of Engineering at Yahoo!, where he built ad-targeting and personalization algorithms with neural networks and naïve Bayesian classifiers, and scaled web platform technologies Apache and PHP. Radwin holds an ScB in Computer Science from Brown University.
Business Talk | Cross Industry | Intermediate-Advanced
The proliferation of sensor technologies has resulted in more connected machines than ever before. This change is resulting in huge quantities of sensor data becoming available for analysis. Machine learning algorithms have resulted in a mixed track record of success with these data sources. This talk will give an overview of the state of machine learning as applied to IoT and industrial equipment. It will discuss some of the challenges with current approaches, exciting theoretical advancements and some “lessons learned” from the field.
What do we mean by IoT?
What is failure prediction and prognostics?
What is the value of IoT?
Differences between physics based approaches to IoT and data-driven approaches to IoT
What are the challenges from applying data-driven approaches to IoT?
How can recent advances in machine learning help with the unique challenges of IoT?
Real-case study that illustrates the application of deep-learning, gradient boosting, transfer learning and other machine learning techniques for IoT applications
What are the opportunities for future enhancements and exciting research in this area?
Adam McElhinney is currently the Head of Data Science at Uptake Technologies, where he leads a team of 75 Data Scientists building cutting-edge industrial data analytics tools. Additionally, Adam is an Adjunct Professor in the Computer Science and Mathematics departments at Illinois Institute of Technology. Additionally, Adam has filed 18 patents for his research in machine learning, internet of things (IOT), software engineering and big data technology. Adam was recognized by the Illinois Technology Association as the 2018 Technologist of the Year.
Business Talk | Cross Industry | Intermediate
If we look back at the major developments in data over the last decade, we often think about advances in data storage, in AI, and in machine learning. While these are significant, over the last several years, a series of quiet revolutions have been driving equally big changes. A new class of data tools are closing the technology gap between the world’s leading companies and everyone else, making data science far more accessible to a much wider range of companies. This is opening up new opportunities for companies to serve and sell to an emerging “middle class” of data consumers.
As these technologies spread, we’ve collectively become remarkably skilled at processing, modeling, analyzing, and learning from data. And yet, despite being we’re massively more informed than ever before, we haven’t seen comparable improvements in outcomes.
Why? Because we’re rarely in the room where these decisions happen. We often hand off our work, and tell executives “you take it from here.” Our tools and analyses, no matter how powerful or clever, will always be restricted by how they’re wielded – and who wields them.
In this talk, I’ll outline the three technological changes that are driving this quiet transformation of data science. I’ll show how any company, regardless of stage or resources, can and should take advantage of these changes. I’ll discuss the problem technology hasn’t yet solved – how to put data science in the rooms where decisions are made. Finally, I’ll survey the opportunities we have to address this challenge…more details
Benn Stancil is a cofounder and Chief Analyst at Mode, a company building collaborative tools for analysts. Benn is responsible for overseeing Mode’s internal analytics efforts. Benn is also an active contributor to the data science community, frequently helping data science teams build their technology stacks and establish data-driven cultures within their companies. In addition, Benn provides strategic oversight and guidance to Mode’s product direction as a member of the product leadership team.
Prior to Mode, Benn was a senior analyst at Microsoft and Yammer, where he helped lead product analytics. Benn also worked as an economic analyst at the Carnegie Endowment for International Peace, a think tank in Washington, DC.
Business Talk | Cross Industry | Beginner-Intermediate
Globally, demand for AI skills continues to outpace supply resulting in an arms race to hire and retain scarce talent. In the last 18 months alone, the number of open positions on LinkedIn increased by 2x for AI roles and 10x for ML Engineer roles. In response, companies are adapting by looking internally to build proprietary talent pipelines through skills development and retraining programs. This presentation will share insight, best practices, and lessons learned from company leaders who have built consensus for and ultimately launched AI skills development programs at growing startups as well as large Fortune 500 organizations…more details
Gautam is CEO & co-founder of Springboard, a rapidly-growing workforce development company focused on digital economy skills like AI & Machine Learning, Data Science, CyberSecurity, and UX Design. Springboard has trained over 200K professionals, helping digital economy aspirants get job-ready with 1:1 mentorship from industry experts. The company also collaborates with Fortune 500 companies and startups as a training and talent partner.
Gautam spent the first decade of his career working on technology, data, and strategy at InMobi, Bain & Company, and Capital One. He holds an MBA from the Wharton School, and studied engineering at IIT Delhi.
Business Talk | Cross Industry | Beginner-Intermediate-Advanced
The amounts of data in digital investigations are ever increasing and new approaches are needed for finding the relevant items amongst the noise. For too long, the focus on digital investigation software has been on parsing and extracting any possible piece of data and displaying it to the user. But, with the increasing amount of data, the focus needs to be on showing only the most relevant items.
Machine learning techniques can help identify which items the user should see first and therefore save them time. This talk will outline how these techniques can be used to rank documents, executables, and other files found during a digital investigation…more details
Mario Vuksan is the Co-Founder and Chief Executive Officer at ReversingLabs Corporation. Mr. Vuksan served as a Director of Research and Knowledgebase Services at Bit9 Inc. He also served as Program Manager and Consulting Engineer at Groove Networks (acquired by Microsoft), working on Web based solutions, P2P management, and integration servers. Before Groove Networks, Mr. Vuksan developed one of the first Web 2.0 applications at 1414c, a spin-off from PictureTel. He is a regular presenter at RSA, Black Hat, Defcon, Caro Workshop, Virus Bulletin, CEIC, FSISAC, and AVAR Conferences, and has also authored numerous texts on security. He supports AMTSO, IEEE Malware Working Group and CTA, and holds a BA from Swarthmore College and an MA from Boston University.
Carl founded Basis Technology in 1995 to help American companies enter Asian markets. In 1999, the company shipped its first products for website internationalization, enabling Lycos and Google to become the first search engines capable of cataloging the web in both Asian and European languages. In 2003, the company shipped its first Arabic analyzer and began development of a comprehensive text analytics platform.
Today, Basis Technology is recognized as the leading provider of components for information retrieval, entity extraction, and entity resolution in many languages. Carl has been directly involved with the company’s activities in support of national security missions, and works closely with analysts in the U.S. intelligence community. Prior to starting Basis Technology, Carl worked as an independent consultant in Boston, New York and Tokyo to international clients in finance and knowledge management. Carl spent eight years on the research staff of the MIT Laboratory for Computer Science. He is an active contributor to several non-profit organizations, including the Free Software Foundation, the MIT Alumni Fund, and the Unicode Consortium.
Business Talk | Cross Industry | Beginner-Intermediate-Advanced
Have you ever been asked to create a model or derive insights out of data that is inaccurate, missing, or unreliable? As data experts, we know that no matter how good our model is, the results will be unreliable if the source data is unreliable. As the saying goes– Junk in, junk out.
This talk will explore the approaches that Wayfair’s Business Intelligence Team has used to improve the quality of source data, and in turn increase the accuracy of reporting, models, and insights.
Yard Arrival Date – Wayfair has increased data capture from <10% to > 99% for Yard Arrival Date. Yard Arrival date is the date that an SPO (Stock Purchase Order) arrives at our warehouse yard. Knowing the exact date that an order arrived in the yard is essential to creating predictive models that will allow us to predict when future orders will arrive. We found that the capturing Yard Arrival required staff to perform extra work that they did not see the value in. There also was no accountability – no one was checking if the staff was entering the data, and it didn’t impact their work, so there was no incentive to do this seemingly needless work. By explaining the usefulness of this data, working with the warehouse staff to improve the data capture process, and creating a public dashboard that drove accountability, we drove Yard Arrival Date entry compliance from less than 10% to over 99%.
Kaitlin Andryauskas is a Business Intelligence Manager at Wayfair supporting Wayfair’s global supply chain. She approaches her work by identifying the problem that will unlock the largest potential, and then find the data that will provide the required insights. She is passionate about solving complex problems requiring both analytical skill and business acumen, as well as using data to solve pressing social issues. Kaitlin has an undergraduate degree in Sociology from The University of Texas at Austin, a Master’s in Business Analytics from Bentley University and 24 credits towards a Master’s in Education at Johns Hopkins University. Kaitlin is former high school history teacher and Teach for America Baltimore Alum.
Business Talk | Cross Industry | Beginner-Intermediate-Advanced
How does customer experience/digital marketing know what customers are saying to our human chat, bot chat, survey, or social? Why are they not satisfied or not moving to the next action? The first step is to deeply analyze customer conversations. A new generation of AI technology makes this possible, extracting the ideas contained in text to summarize, organize, and display for analysis…more details
Ben is the CEO and Founder of Gamalon. He was previously the co-founder and CEO of Lyric Semiconductor, the first microprocessor architectures for statistical machine learning, growing out of Ben’s PhD at MIT. Lyric was acquired by Analog Devices, and Lyric’s technology is deployed in leading smartphones and consumer electronics, medical devices, wireless base stations, and automobiles. He has authored over 120 patents and academic publications, and his work has been featured in the Wall Street Journal, New York Times, EE Times, Scientific American, Wired, TechCrunch, and other media.
Ben has been an Intel Student Fellow, Kavli Foundation/National Academy of Sciences Fellow, served on the DARPA Information Science and Technology (ISAT) steering committee, and has held research appointments at MIT, Hewlett Packard, Mitsubishi, and the Santa Fe Institute. He also co-founded Design That Matters, a not-for-profit that for the past decade has helped solve engineering and design problems in underserved communities and has saved thousands of infant lives by developing low-cost, easy-to-use medical technology such as infant incubators, UV therapy, pulse oximeters, and IV drip systems that have been fielded in 20 countries.
Regina Barzilay is a professor in the Department of Electrical Engineering and Computer Science and a member of the Computer Science and Artificial Intelligence Laboratory at the Massachusetts Institute of Technology. Her research interests are in natural language processing. Currently, Prof. Barzilay is focused on bringing the power of machine learning to oncology. In collaboration with physicians and her students, she is devising deep learning models that utilize imaging, free text, and structured data to identify trends that affect early diagnosis, treatment, and disease prevention. Prof. Barzilay is poised to play a leading role in creating new models that advance the capacity of computers to harness the power of human language data.
Regina Barzilay is a recipient of various awards including an NSF Career Award, the MIT Technology Review TR-35 Award, Microsoft Faculty Fellowship and several Best Paper Awards in top NLP conferences. In 2017, she received a MacArthur fellowship, an ACL fellowship and an AAAI fellowship.
Prof. Barzilay received her MS and BS from Ben-Gurion University of the Negev. Regina Barzilay received her PhD in Computer Science from Columbia University, and spent a year as a postdoc at Cornell University.
Workshop | AI for Engineers | Big Data | Beginner-Intermediate
Throughout the history of computing, humans had to interact with machines in an abstract and complex ways starting with punch cards to machine code to command line to graphic user interfaces. Machines still force us to communicate with them on their own terms. However with commercialization of Voice Enabled Devices like Amazon Echo, finally time has come where we are able to communicate with machines in a more natural way using our voice.Voice as an interface for communication with devices is going to very prevalent in the coming years. This workshop focuses on jumpstarting developers who are curious about building skills on Amazon Alexa. Topics like Conversational UX, Interaction Schema, Entity Resolution, Dialog Management, Command Line Utilities for Skill Management and Alexa Skills Kit SDK. By the end of the workshop, attendees will walk out with a fully functional Alexa skill that they can test on the simulator of deploy to their Amazon Echo devices….more details
Workshops | Open Source Data Science | Intermediate
Probabilistic programming is sometimes referred to as “modeling for hackers”, and has recently been picking up steam with a flurry of releases including Stan, PyMC3, Edward, Pyro, and Tensorflow Probability
As these and similar systems have improved in performance and usability, they have unfortunately also become more complex and difficult to contribute to. This is related to a more general phenomenon of the “two language problem”, in which performance-critical domain like scientific computing involve both a high-level language for users and a high-performance language for developers to implement algorithms. This establishes a kind of wall between the two groups, and has a harmful effect on performance, productivity, and pedagogy.
In probabilistic programming, this effect is even stronger, and it’s increasingly common to see three languages: one for writing models, a second for data manipulation, model assessment, etc, and a third for implementation of inference algorithms.
In this workshop, we’ll see how the Julia programming language can help to solve this problem, and we’ll explore the basic ideas in Soss, a new probabilistic programming language written entirely in Julia. Soss allows a high-level representation of the kinds of models often written in PyMC3 or Stan, and offers a way to programmatically specify and apply model transformations like approximations or reparameterizations…more details
Dr. Chad Scherrer has been actively developing and using probabilistic programming systems since 2010, and served as technical lead for the language evaluation team in DARPA’s Probabilistic Programming for Advancing Machine Learning (“PPAML”) program. Much of his blog is devoted to describing Bayesian concepts using PyMC3, while his current Soss.jl project aims to improve execution performance by directly manipulating source code for models expressed in the Julia Programming Language.
Chad is a Senior Data Scientist at Metis Seattle, where he teaches the Data Science Bootcamp.
Talk | Open Source Data Science | Beginner-Intermediate
As IT operations become more agile and complex, at the same time the need to enhance operational efficiency and intelligence grows. Monitoring applications and kubernetes clusters with Prometheus has become quite common. Yet identifying relevant metrics and thresholds for your setup is getting harder.
In this talk, Marcel will show the tooling used to collect and store metrics gathered by Prometheus for the long term. Then analyze those on a large scale using Spark. This includes extracting trends and seasonality but also forecasting of expected values for a given metric. Finally, he will integrate the predicted metrics back into the Prometheus monitoring and alerting stack to enable dynamic thresholding and anomaly detection…more details
Tutorial | AI for Engineers | Big Data | Beginner-Intermediate
Large volumes of data generated from human interaction – with software systems that support daily applications in areas such as commerce, law, and entertainment – have given rise to what is commonly referred to as the “Big Data Problem”. Graph data is of growing importance in this context. Applications that rely on graph data include the semantic web (i.e., RDF), bioinformatics, finance and trade, and social networks among others. Graphs naturally model complicated structures, such as protein interaction networks, product purchasing, business transactions, relationships and interactions in social or computer networks, and web page connections. The size and complexity of these graphs raise significant data management and data analysis challenges.
The first part of this talk will discuss how to graph data could be processed in parallel using different graph computation models, and then discuss the quantitative performance of seven existing systems. Since these systems typically support static graph data, we will show how to expand some of them to support dynamic graphs, e.i. graph data that change over time. Finally, we will discuss how to extract and process graph data in various applications…more details
Talks | Data Science Management | Beginner-Intermediate-Advanced
Did you know that all manufacturing processes generate about a third of all data today? With the recent rise of digital manufacturing this percentage will likely continue to increase in the coming years. Industry 4.0 could be viewed as the next phase in digital manufacturing, and it is driven by several main technologies: (i) Internet of Things (IoT) and astonishing amounts of generated data; (ii) physical connectivity and integration; (iii) Big Data Analytics enabled by computational power; (iv) augmented-reality systems and (v) advanced robotics. Industry 4.0 promises higher productivity, rapid innovation, reduced costs, and improved customer satisfaction. However, the adoption has been very slow. The main reason is probably that many manufacturing companies still lack the foundational technology infrastructure that must be in place before they can fully exploit Industry 4.0.
The talk will offer practical advices how to collect and effectively organize enormous amount of industrial data from various siloed sources into a cloud data lake and then unleash the full power of advanced analytics to help benefit manufacturing companies. The speaker will also cover a few use cases how machine learning and AI helped digital manufacturing organizations. The first one is predictive maintenance, where equipment with IoT is remotely monitored to early predict its failures, diagnose the root cause of the faults and predict equipment remaining useful life. The second one is supply chain management, where real-time machine learning is used to track the location of assets in transit, predict when shipments will arrive and provide end-to-end visibility of goods throughout the supply chain…more details
Talk | Machine Learning | Intermediate-Advanced
To effectively develop predictive model with machine learning it is often necessary to partition data. Sometimes the data is partitioned to facilitate testing and sometimes the partitioning is required to analyze very large data volumes. The authors introduce a novel technique for both types of partitioning rooted in Latin Square experimental design theory that provides major advantages, allowing analysts to obtain new measures of uncertainty surrounding record level predictions, providing for new forms of automatic ensemble creation, introducing a new strategy for deliberately overfitting models that participate in an ensemble (with overfitting eliminated by the ensemble averaging), and the partitioning of very large databases into optimally overlapping subsamples. The partitioning plans are also applicable to partitioning data by columns rather than rows, thus, we might partition data into many thousands of subsets of overlapping predictors while also simultaneously partitioning the data by rows.
For K-fold cross-validation the most obvious novelty is in leaving out of multiple parts for testing for every fold instead of the classical “leave out just one part”. Parts of data are also left out for testing in multiple folds resulting in multiple “test” predictions for every record of data, supporting a measure of the prediction variance. Examples of several variations of the new scheme applied to real data are presented….more details
Pieter Abbeel is Professor and Director of the Robot Learning Lab at UC Berkeley [2008- ], Co-Founder of covariant.ai [2017- ], Co-Founder of Gradescope [2014- ], Advisor to OpenAI, Founding Faculty Partner AI@TheHouse, Advisor to many AI/Robotics start-ups. He works in machine learning and robotics. In particular his research focuses on making robots learn from people (apprenticeship learning), how to make robots learn through their own trial and error (reinforcement learning), and how to speed up skill acquisition through learning-to-learn (meta-learning). His robots have learned advanced helicopter aerobatics, knot-tying, basic assembly, organizing laundry, locomotion, and vision-based robotic manipulation. He has won numerous awards, including best paper awards at ICML, NIPS and ICRA, early career awards from NSF, Darpa, ONR, AFOSR, Sloan, TR35, IEEE, and the Presidential Early Career Award for Scientists and Engineers (PECASE). Pieter’s work is frequently featured in the popular press, including New York Times, BBC, Bloomberg, Wall Street Journal, Wired, Forbes, Tech Review, NPR, Rolling Stone.
Talk | Deep Learning | AI for Engineers | Beginner-Intermediate
Over the past decade, we have seen an almost unbelievable rise in the power and influence of data analysis and the broader data community, touching an incredibly broad swath of business and society. We have seen data-driven decision making has gone from fringe to mainstream to the default, and companies at the forefront of data science, like Google and Facebook, have moved towards the top of the S&P 500. Recently, we have had to confront the negative consequences that a myopic focus on metrics and models have had on society, and a spirit of optimism and potential around machine learning and artificial intelligence has given way towards a sense of foreboding and even fear at the consequences of what we do next. I would like to talk about how we go forward from here with a sense of humility about the limits of our ability to understand both the inner workings of our deep learning models and the complexity of the world at large, informed by the perspectives of Berkshire Hathaway, Michel de Montaigne, and the Houston Astros…more details
Josh Wills is an engineer on Slack’s Search, Learning, and Intelligence Team helping to build the company’s production search and machine learning infrastructure. He’s a recovering manager, having most recently built and led Slack’s data engineering team and before that the data science and engineering teams at Cloudera and Google. Josh is a member of the Apache Software Foundation, the founder of the Apache Crunch project, and a co-author of O’Reilly’s Advanced Analytics with Spark. In May of 2012, he tweeted a pithy definition of a data scientist as someone who is better at statistics than any software engineer and better at software engineering than any statistician, and his Twitter mentions have never been the same.
Tutorial | Deep Learning | Research Bridge | Beginner-Intermediate
Over the last few years, convolutional neural networks (CNN) have risen in popularity, especially in the area of computer vision. Many mobile applications running on smartphones and wearable devices would potentially benefit from the new opportunities enabled by deep learning techniques.
This workshop explains how to practically bring the power of convolutional neural networks and deep learning to memory and power-constrained devices like smartphones. You will learn various strategies to circumvent obstacles and build mobile-friendly shallow CNN architectures that significantly reduce the memory footprint and therefore make them easier to store on a smartphone; The workshop also dives into how to use a family of model compression techniques to prune the network size for live image processing, enabling you to build a CNN version optimized for inference on mobile devices.
Following a step by step example of building an iOS deep learning app, we will discuss tips and tricks, speed and accuracy trade-offs, and benchmarks on different hardware to demonstrate how to get started developing your own deep learning application suitable for deployment on storage- and power-constrained mobile devices…more details
Anirudh is the Head of AI & Research at Aira (Visual interpreter for the blind), and was previously at Microsoft AI & Research where he founded Seeing AI – Talking camera app for the blind community. He is also the co-author of the upcoming book, ‘Practical Deep Learning for Cloud and Mobile’. He brings over a decade of production-oriented Applied Research experience on Peta Byte scale datasets, with features shipped to about a billion people. He has been prototyping ideas using computer vision and deep learning techniques for Augmented Reality, Speech, Productivity as well as Accessibility. Some of his recent work, which IEEE has called ‘life changing’, has been honored by CES, FCC, Cannes Lions, American Council of the Blind, showcased at events by White House, House of Lords, World Economic Forum, on Netflix, National Geographic, and applauded by world leaders including Justin Trudeau and Theresa May.
Workshops | Open Source Data Science | Beginner-Intermediate-Advanced
In this workshop, we introduce a new data analysis tool that enables predictions in Excel-like environment **without** any prior knowledge of Machine Learning, Statistics or Data Science. This, seemingly magical, ability is direct consequence of viewing the question of prediction as estimating missing values or correcting errors within observations. More precisely, this boils down to estimating a structured “tensor” from its noisy, missing observations. We will show an intuitive, simple and scalable approach for estimating tensor as well as provide a collection of case-studies using an actual tool…more details
Christina Lee Yu is an Assistant Professor at Cornell University in Operations Research and Information Engineering. Prior to Cornell, she was a postdoc at Microsoft Research New England. She received her PhD in 2017 and MS in 2013 in Electrical Engineering and Computer Science from Massachusetts Institute of Technology in the Laboratory for Information and Decision Systems. She received her BS in Computer Science from California Institute of Technology in 2011. She received honorable mention for the 2018 INFORMS Dantzig Dissertation Award. Her research focuses on designing and analyzing scalable algorithms for processing social data based on principles from statistical inference.
Devavrat Shah is a Professor with the department of Electrical Engineering and Computer Science at Massachusetts Institute of Technology. His current research interests are at the interface of Statistical Inference and Social Data Processing. His work has been recognized through prize paper awards in Machine Learning, Operations Research and Computer Science, as well as career prizes including 2010 Erlang prize from the INFORMS Applied Probability Society and 2008 ACM Sigmetrics Rising Star Award. He is a distinguished young alumni of his alma mater IIT Bombay.
Panel | Research Bridge | Space AI | Beginner-Intermediate-Advanced
AI is everywhere. In our phones, in our homes, where we work, where we play. The “new industry revolution” in AI and machine learning, fueled by the “new oil” – massive amounts of data – has spread globally with alarming speed. But, AI has rarely left earth’s orbit, and it’s non-existent in deep spacecraft. The most recent Mars Rover runs on 15 year old CPU technology. Recently launched spacecraft lack the ability to navigate themselves, requiring constant communication with ground control. Meanwhile, back on earth, humans are already riding in self-driving vehicles, and earthlings routinely run billion-parameter neural networks at real-time speeds.
Disruptive change is on its way, and it’s happening at the intersection of space and artificial intelligence. Our panelists are ushering in a new golden age of space exploration, one that is embracing cutting edge AI at all layers of the technology stack. In this session, you will hear from experts from industry, government, and academia – and they will talk on a broad range of topics – from new AI hardware that can survive the harsh conditions of space, to new deep learning algorithms that can locate new galaxies and dark matter, to robotic virtual assistants that monitor astronauts and keep them company on deep space missions.
Get ready. AI is soon going where no AI has gone before!..more details
Roberto Carlino is an Aerospace Engineer at NASA Ames, currently working as a hardware/flight software engineer for the Astrobee Free-flyers project.
Before that, he was working at the Science Processing Operation Center (SPOC) for the mission Transiting Exoplanet Survey Satellite (TESS) (follow-on mission of the Kepler Space Telescope), which is searching for exoplanets around the closest and brightest stars to our Sun.
Roberto started at NASA Ames around 4 years ago, working on small flight projects and mission proposals.
He got his Bachelor’s degree and Master of Science in Aerospace Engineering at the University of Naples Federico II, in Italy, and Delft University of Technology, in The Netherlands. After that, he got his second Master degree in ‘Space Systems and Services’ from the University of Rome, La Sapienza.
Jiwei Liu is a data scientist at NVIDIA working on NVIDIA AI infrastructure, including the RAPIDS data science framework. Jiwei received a PhD degree from University of Pittsburgh in electrical and computer engineering. He has 5 years’ experience in data science, predictive modeling, machine learning and GPU programming. Jiwei is a kaggle grandmaster and ranked top 40 world-wide.
Dr. Ian Troxel is the Founder and CEO of Troxel Aerospace Industries, Inc., a firm developing autonomous fault mitigation software and performing radiation testing services for migrating commercial technology to space systems. He earned a masters and doctorate in Electrical and Computer Engineering from the University of Florida with a focus on aircraft and spacecraft onboard processing, mission assurance for big-data movement and computation (before it was called cloud computing), processor-in-memory architectures, and optical networking technologies. Dr. Troxel served as the Principal Engineer for Processor and Memory systems at SEAKR Engineering where he developed system-level fault mitigation strategies and designed next-generation processor and data management systems for several high-end space missions. He was selected for the NRO Technology Fellowship program in 2009-2010 to shape development of new technologies for image processing. Dr. Troxel formed Troxel Aerospace in 2015 to focus on product development, which has seen significant growth since 2017 through a NASA SBIR Phase 2 award selection and several commercial engagements.
Workshops | Deep Learning | Beginner – Intermediate
In this hands-on MATLAB workshop, you will explore deep learning techniques for data types such as images, text, and time-series data. You’ll use an online MATLAB instance to perform the following tasks:
1. Train deep neural networks on GPUs in the cloud
2. Access and explore pretrained models
3. Build a CNN to solve an image classification problem
4. Use LSTM networks to solve a time-series and text analytics problem.…more details
Jianghao Wang is a Data Scientist at MathWorks. In her role, Jianghao supports deep learning research and teaching in academia. Before joining MathWorks, Jianghao obtained her Ph.D. in Statistical Climatology from the University of Southern California and B.S. in Applied Mathematics from Nankai University.
Pitambar Dayal works on deep learning and computer vision applications in technical marketing. Prior to joining MathWorks, he worked on creating technological healthcare solutions for developing countries and researching the diagnosis and treatment of ischemic stroke patients. Pitambar holds a B.S. in biomedical engineering from the New Jersey Institute of Technology.
Talks | Data Science Management | Machine Learning | Beginner-Intermediate-Advanced
I’ve watched lots of companies attempt to deploy machine learning — some succeed wildly and some fail spectacularly. One constant is that machine learning teams have a hard time setting goals and setting expectations. This talk will give some examples of how teams fail and recommendations for everyone from executives to researchers to make their machine learning projects work better…more details
Lukas Biewald is a co-founder and CEO of Weights and Biases which builds performance and visualization tools for machine learning teams and practitioners. Lukas also founded Figure Eight (formerly CrowdFlower) — a human in the loop platform transforms unstructured text, image, audio, and video data into customized high quality training data. — which he co-founded in December 2007 with Chris Van Pelt. Prior to co-founding Weights and Biases and CrowdFlower, Biewald was a Senior Scientist and Manager within the Ranking and Management Team at Powerset, a natural language search technology company later acquired by Microsoft. From 2005 to 2006, Lukas also led the Search Relevance Team for Yahoo! Japan.
Talks | Data Visualization | Open Source Data Science
Forbes Magazine calls “data communicators” the most essential part of the modern data team. But how do we become better data communicators? How do we identify the most important insights in our business data and communicate them in a compelling way?
This talk will cover four essential keys to data storytelling. The talk will also cover:
* Using color to focus attention in data stories
* Chart selection and design
* Common chart design errors
* The Gestalt principals of visual perception and how they can be used to tell better stories with data…more details
Isaac is an Australian data scientist, company founder and TEDx speaker who lives, breathes and dreams data. Isaac travels the world teaching data visualization skills and in 2017, his “Art of Data Storytelling” speaking tour saw him visit Asia, North America, Europe and Australia. A passionate data science educator, Isaac previously lectured in analytics and statistical theory at the Australian National University. He has delivered his Data Storytelling course on site to forward thinking companies including Cisco, AIG and JPMorgan Chase. Isaac was a featured data visualization keynote presenter at the 2017 Strata Data Conference and makes regular appearances at data conferences.
Workshops | Data Science Management | Beginner-Intermediate-Advanced
Panjiva maps the network of global trade with unprecedented scope and granularity. Using over one billion shipping records sourced from governments around the world, we perform large-scale entity extraction and entity resolution, identifying trading relationships among millions of companies and thousands of ports, located across the world. We have developed a powerful platform facilitating search, analysis, and visualization of this network as well as a data feed integrated into S&P Global’s Xpressfeed platform.
Placing supply chain networks in the familiar setting of a world map lends context that facilitates understanding of stories contained in the data. The web browser is an apt setting for creating interactive maps and geospatial applications due to the powerful inbuilt visualization and computation tools and the rich ecosystem of compatible open source libraries. Through the process of developing interactive web maps, we will learn about open source technologies for analyzing and interacting with data and explore patterns in the flow of goods between geographic areas in the global supply chain graph.
Robert Christie is a Front End Engineer at Panjiva, a division of S&P Global Market Intelligence. He specializes in interactive data visualization and cartography for the web and has a background in statistics and spatial analysis. Much of Robert’s work has been in the domain of transportation, mobility, and logistics. He is passionate about the role of visualization in increasing the comprehensibility and observability of machine learning driven decision making. Robert received a B.A. from the McGill School of Environment and a Masters from the University of Toronto School of Information.
Workshops | Research Bridge | Machine Learning | Intermediate-Advanced
Applications such as climate science, intelligent transportation, aerospace control, and sports analytics apply machine learning for large-scale spatiotemporal data. This data is often nonlinear, high-dimensional, and demonstrates complex spatial and temporal correlation. Existing machine learning models cannot handle complex spatiotemporal dependency structures. We’ll explain how to design machine learning models to learn from large-scale spatiotemporal data, especially for dealing with non-Euclidean geometry, long-term dependencies, and logical and physical constraints. We’ll showcase the application of these models to problems such as long-term forecasting for transportation, long-range trajectories synthesis for sports analytics, and combating ground effect in quadcopter landing for aerospace control…more details
Dr. Rose Yu is an Assistant Professor at Northeastern University Khoury College of Computer Sciences. Previously, she was a postdoctoral researcher in the Department of Computing and Mathematical Sciences at Caltech. She earned her Ph.D. in Computer Science at the University of Southern California and was a visiting researcher at Stanford University.
Her research focuses on machine learning for large-scale spatiotemporal data and its applications, especially in the emerging field of computational sustainability. She has over a dozen publications in leading machine learning and data mining conference and several patents. She is the recipient of the USC Best Dissertation Award, “MIT Rising Stars in EECS”, and the Annenberg fellowship.
Workshop | Machine Learning | Intermediate
There are tens of billions of online profiles today, each associated with some identity, on diverse platforms including social networks, online marketplaces, dating sites and financial institutions. Every platform needs to understand, validate and verify these identities.
The landscape of identity challenges, available data, and machine-learning technology have evolved over the years. However, identity still remains a notoriously hard problem. While we’ve made a lot of progress in academia and industry, there still are several unsolved problems. In this session, we will talk through three core, interconnected problems: (1) identity authentication/validation; (2) identity matching; (3) identity verification. We will discuss our work on effectively using machine learning technology to solve these problems, along with an analysis of popular techniques used on different platforms…more details
Liren Peng is a Software Engineer on the Trust team at Airbnb. He is responsible for the architecture and development of user identity verification systems. He also works on the utilization of third party data and vendor integration. Prior to Airbnb, Liren worked at Trooly, a startup that built machine learning based trust models using both social media data and proprietary data to access the trustworthiness of individuals. He received B.S. from Carnegie Mellon University and M.Sc from Stanford University focusing data analytics.
Sukhada Palkar is a software engineer at Airbnb working on the various challenges of trusting digital identities. She enjoys working at the intersection of open ended problem solving, software engineering and machine learning. She has a background in applying machine learning for text and speech systems, and more recently identity and risk analytics.
Before Airbnb, Sukhada was an early member of the Amazon Alexa core natural language team and part of Trooly, a startup in the digital identity verification space, that was acquired by Airbnb. Sukhada has a M.S. in speech and language technologies from Carnegie Mellon.
Talks | DS for Good | Open Source Data Science | Intermediate
For clinical prediction problems, short free-text fields often hold valuable information. However, feature engineering from non-standardized fields can be difficult without manual curation. Word embedding approaches such as word2vec (Mikolov et al. 2013) or GloVe (Pennington et al. 2014) represent a mechanism for unsupervised and data-driven feature engineering for free text but suffer from a lack of interpretability necessary for applications in the clinical domain. Previous feature engineering approaches for short clinical text have relied on bag of words techniques or mapping concept unique identifiers from the Unified Medical Language System (UMLS) (Bodenreider 2004) to create features while others studies have used raw word embeddings. Combining information from pre-existing clinical ontologies from the UMLS and data-driven word embeddings to create interpretable features from short free-text could improve performance for clinical prediction problems. We combined word embeddings generated from the Global Vectors, or GloVe, method (Pennington et al. 2014) with clinical ontologies with an approach utilizing category word lists and the Bhattacharya distance to map embedding dimensions to interpretable categories (Senel et al. 2017). We applied the approach to generate features from emergency department chief complaints, the principle reason for visit, and predicted clinical orders placed during the visit. We compared functions for combining multiple words in a single chief complaint, variations on words lists and categories generated from distinct UMLS vocabularies, and utilizing interpretable features versus raw concept identifiers and raw word embeddings. We provide an automated and unsupervised framework for combining a priori knowledge and data-driven approaches for feature engineering from short free-text. This approach can be generalized to other clinical free-text and prediction problems beyond clinical orders...more details
Haley Hunter-Zinck is a health science specialist at the VA Boston Healthcare System. She has a Ph.D. in computational biology from Cornell University and transitioned to medical informatics during a postdoc in Porto Alegre, Brazil working with Brazilian public hospitals and a fellowship at VA Boston. She applies and develops machine learning techniques and visualization tools to improve hospital patient flow with a focus on the emergency department.
Talks | Machine Learning | Deep Learning | Beginner-Intermediate-Advanced
Recent generative models are adept at solving Image domain transfer problems, i.e., they learn to transform an image based on a set of training examples. These are usually based on generative adversarial networks (GANs), and can be supervised or unsupervised as well as unimodal or multimodal. I will present a number of our recent methods in this space that can be used to translate, for instance, a label map to a realistic street image, a day time street image to a night time street image, a dog to different cat breeds, and many more…more details
Jan is VP of Learning and Perception Research at NVIDIA, where he leads the Learning & Perception Research team. He is working predominantly on computer vision problems (from low-level vision through geometric vision to high-level vision), as well as machine learning problems (including generative models and efficient deep learning). Before joining NVIDIA in 2013, Jan was a tenured faculty member at University College London. He holds a BSc in Computer Science from the University of Erlangen-Nürnberg (1999), an MMath from the University of Waterloo (1999), received his PhD from the Max-Planck-Institut für Informatik (2003), and worked as a post-doctoral researcher at the Massachusetts Institute of Technology (2003-2006).
Talk | AI for Engineers | Open Source Data Science | Beginner
In the last ten years, there have been a number of advancements in the study of Hamiltonian Monte Carlo algorithms that have enabled effective Bayesian statistical computation for much more complicated models than were previously feasible. These algorithmic advancements have been accompanied by a number of open source probabilistic programming packages that make them accessible to programmers and statisticians. PyMC3 is one such package written in Python and supported by NumFOCUS. This workshop will give an introduction to probabilistic programming with PyMC3. No preexisting knowledge of Bayesian statistics is necessary; a working knowledge of Python will be helpful…more details
Austin Rochford is Principal Data Scientist and Director of Monetate Labs. He is a founding member of Monetate Labs, where he does research and development for machine learning-driven marketing products. He is a recovering mathematician, a passionate Bayesian, and a PyMC3 developer. His writing is available online at http://austinrochford.com/
Talks | Data Science Management
As businesses rush to adopt and operationalize data-intensive applications, the myriad organizational and technical gaps have led to the emergence of a variety of misguided concepts, narratives, and approaches. In this talk, Peter investigates a few of these, including the concept of “citizen data science” and the “open source head fake” from cloud vendors, and argues for a holistic treatment of code+data for enterprise machine learning...more details
Peter has a B.A. in Physics from Cornell University, and has been developing commercial scientific computing and visualization software for over 15 years. He has software design and development experience across a broad variety of areas, including 3D graphics, geophysics, financial risk modeling, large data simulation and visualization, and medical imaging.Peter’s interests in the fundamentals of vector computing and interactive, large-scale visualization led him to co-founding Continuum Analytics. As CTO, Peter is the technology visionary and leads the product engineering team for the Anaconda platform as well as open source projects including Bokeh and Blaze. As a creator of the PyData conference, he also devotes time and energy to growing the Python data community by advocating, teaching, and speaking about Python at conferences worldwide.
Talks | Machine Learning | Beginner-Intermediate-Advanced
Statistical estimation typically assumes access to uncensored and independent observations. In practice, data is commonly censored due to measurement errors, legal restrictions, and data collection or sharing practices. Moreover, observations are commonly collected on a network, a spatial or a temporal domain and may be intricately correlated. We present recent work on statistical estimation from censored and dependent data. We first present a framework for statistical learning under truncated samples. Truncation is a strong type of censoring, which occurs when samples falling outside of some set S are not observed, and their count in proportion to the observed samples is also not observed. We then provide a statistical learning framework for samples that are weakly dependent…more details
Constantinos Daskalakis is a professor of computer science and electrical engineering at MIT. He holds a diploma in electrical and computer engineering from the National Technical University of Athens, and a Ph.D. in electrical engineering and computer sciences from UC-Berkeley. His research interests lie in theoretical computer science and its interface with economics, probability, learning and statistics. He has been honored with the 2007 Microsoft Graduate Research Fellowship, the 2008 ACM Doctoral Dissertation Award, the Game Theory and Computer Science (Kalai) Prize from the Game Theory Society, the 2010 Sloan Fellowship in Computer Science, the 2011 SIAM Outstanding Paper Prize, the 2011 Ruth and Joel Spira Award for Distinguished Teaching, the 2012 Microsoft Research Faculty Fellowship, the 2015 Research and Development Award by the Vatican Giuseppe Sciacca Foundation, the 2017 Google Faculty Research Award, the 2018 Simons Investigator Award, and the 2018 Rolf Nevanlinna Prize from the International Mathematical Union. He is also a recipient of Best Paper awards at the ACM Conference on Economics and Computation in 2006 and in 2013.
Workshops | Machine Learning | Open Source Data Science
Pomegranate is a Python package for probabilistic modeling that emphasizes both ease of use and speed. In keeping with the first emphasis, pomegranate has a simple sklearn-like API for training models and making inference, and a convenient “lego API” that allows complex models to be specified out of simple components. In keeping with the second emphasis , the computationally intensive parts of pomegranate are written in efficient Cython code, all models support both multithreaded parallelism, out-of-core computations, and some models support GPU calculations. Currently, pomegranate allows you to build general mixture models, naive Bayes classifiers, Markov chains, hidden Markov models, factor graphs, and Bayesian networks, as well as combinations such as a mixture of hidden Markov models. In this talk I will show how to build models of increasing complexity with code examples, including a naive Bayes’ classifier that uses different probability distributions for different features, and a Bayesian network given data and a set of constraints. When necessarily, I will draw examples from “popular culture” and inadvertently prove how out of touch I am with today’s youth...more details
Jacob Schreiber is a fifth year Ph.D. student and NSF IGERT big data fellow in the Computer Science and Engineering department at the University of Washington. His primary research focus is on the application of machine larning methods, primarily deep learning ones, to the massive amount of data being generated in the field of genome science. His research projects have involved using convolutional neural networks to predict the three dimensional structure of the genome and using deep tensor factorization to learn a latent representation of the human epigenome. He routinely contributes to the Python open source community, currently as the core developer of the pomegranate package for flexible probabilistic modeling, and in the past as a developer for the scikit-learn project. Future projects include graduating.
Talks | Deep Learning | Data Visualization | Intermediate
Methodologies for text analysis are improving, and deep learning, topic modeling and novel lexical parsing techniques are allowing practitioners to create powerful, useful and, importantly, interpretable models of language.
These techniques allow us to understand and summarize documents. In this talk, I’ll discuss some of the problems facing traditional industries and how these solutions can be applied.
I’ll start with my experience at the New York Times, where I built systems to help journalists cluster reader responses, find tips quicker, and find related articles. I’ll focus on work I’m currently doing at NASA, where we are building systems to summarize literature and recommend datasets, collaborators and papers to new scientists.
Topics covered: Deep learning, Bayesian topic modeling, lexical parsing.
Perks: Live demo!…more details
Alex has worked as a data scientist at The New York Times since July 2014. His work has primarily involved text modeling for newsroom, product and advertising stakeholders to create advanced recommendation engines, perform automated information retrieval for journalists and sell premium ads. His work has been written about or featured in The New York Times, The Wall Street Journal, on NPR, and in Columbia Journalism Review, and at conferences, and he has earned a Masters in Data Science and a Masters in Journalism from Columbia University.
Tutorials | Open Source Data Science | Big Data | Beginner
Resampling methods like the bootstrap are becoming increasingly common in modern data science. For good reason too; the bootstrap is incredibly powerful. Unlike t-statistics, the bootstrap doesn’t depend on a normality assumption nor require any arcane formulas. You’re no longer limited to working with well understood metrics like means. One can easily build tools that compute confidence for an arbitrary metric. What’s the standard error of a Median? Who cares! I used the bootstrap.
In this talk we’ll explore what types to data the bootstrap has trouble with. Then we’ll discuss how to identify these problems in the wild and how to deal with the problematic data. We will explore simulated data and share the code to conduct the simulations yourself. However, this isn’t just a theoretical problem. We’ll also explore real Firefox data and discuss how Firefox’s data science team handles this data when analyzing experiments.
At the end of this session you’ll leave with a firm understanding of the bootstrap. Even better, you’ll understand how to spot potential issues in your data and avoid false confidence in your results…more details
Ryan Harter is a Senior-Staff Data Scientist with Mozilla working on Firefox. He has years of experience solving business problems in the technology and energy industries both as a data scientist and data engineer. Ryan shares practical advice for applying data science as a mentor and in his blog.
Saptarshi Guha is a Senior Staff Data Scientist with Mozilla working across domains at Firefox from marketing and software quality to product development. He has been at Firefox for seven years and witnessed the data team grow from ‘two guys and a dog’ to a sophisticated collaboration between product, data engineering and data science.
Workshops | Open Source Data Science | Data Visualization | Beginner-Intermediate-Advanced
The tidyverse in R has traditionally been focused on data ingestion, manipulation, and visualization. The tidymodels packages apply the same design principles to modeling to create packages with high usability that produce results in predictable formats and structures. This workshop is a concise overview of the system and is illustrated with examples. Remote servers are available for users who cannot install software locally. Materials and preparation instructions can be found at https://github.com/topepo/odsc_2019..more details