Hearing is Believing: Generating Realistic Speech with Deep Learning

Abstract: 

Over the past few years speech synthesis or text-to-speech (TTS) has seen rapid advances thanks to deep learning. As anyone who owns a voice assistant will know, artificial voices are becoming more and more natural and convincing. The good news is you can recreate this impressive technology yourself, using high quality open-source tools.

In this workshop, we'll learn all about TTS and create a custom speech synthesis system from scratch. We'll take a look at the development of TTS systems up to the present day, investigate the challenges that researchers are still grappling with, and walk through and end-to-end example of creating a deep learning-based TTS system - including data preparation, training, inference and evaluation. This workshop doesn't require any prior knowledge of TTS or deep learning.

Session Outline:
1. Problem and development of TTS systems (20 mins): What exactly does it mean to synthesize speech convincingly, and what approaches exist to tackle the problem? We'll look at the journey from concatenative to parametric to neural TTS systems. We'll also discuss some of the competing criteria for a TTS system: including naturalness, robustness, clarity, expressiveness and efficiency.

2. Components of a modern TTS system (20 mins): We'll break down the composition and workflow of recent, deep learning-based TTS systems. We'll look at the role of the vocoder, attention mechanism, speaker encoder and more.

3. How to build your own custom TTS (40 mins): Finally, we'll walk through creating a custom, state-of-the-art TTS system from scratch. It could be modeled on your own voice, that of a celebrity, or anyone at all! We'll see how to create a dataset with forced alignment tools, how to select a suitable model and hyperparameters, and how to ensure fast and stable training.

Background Knowledge:
No background knowledge is needed, although some basic knowledge of machine learning (e.g. test vs. validation sets, gradient descent, neural networks) and Python will be helpful.

Bio: 

Alex Peattie is the co-founder and CTO of Peg, a technology platform helping multinational brands and agencies to find and work with top YouTubers. Peg is used by over 1500 organisations worldwide including Coca-Cola, L'Oreal and Google.

An experienced digital entrepreneur, Alex spent six years as a developer and consultant for the likes of Grubwithus, Huckberry, UNICEF and Nike, before joining coding bootcamp Makers Academy as senior coach, where he trained hundreds of junior developers. Alex was also a technical judge at this year's TechCrunch Disrupt conference.

Open Data Science

 

 

 

Open Data Science
One Broadway
Cambridge, MA 02142
info@odsc.com

Privacy Settings
We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
Youtube
Consent to display content from - Youtube
Vimeo
Consent to display content from - Vimeo
Google Maps
Consent to display content from - Google