Abstract: Sometimes a speech signal might be degraded by background noise, making it hard to understand the speech content. Speech enhancement and separation systems can be used to reduce the noise level and improve the quality and the intelligibility of a speech signal. The use of visual cues, such as mouth movements and facial expressions, might be beneficial for the systems and provide a substantial performance improvement.
In this session, participants will be introduced to recent advances in audio-visual speech enhancement and separation, which has a variety of different applications, including:
* hearing assistive devices;
* videoconference systems;
* noise reduction in live videos.
The objective of this session is to provide participants with the basic theoretical background that would allow them to understand and build state-of-the-art systems for audio-visual speech enhancement and separation.
MODULE 1 - Introduction to deep learning and audio-visual speech corpora.
Participants will familiarise themselves with the concept of deep learning with a specific focus on audio-visual speech enhancement and separation. In addition, some characteristics of common audio-visual speech corpora will be explained.
MODULE 2 - Audio-visual speech enhancement and separation systems.
Participants will learn the main components of state-of-the-art approaches for audio-visual speech enhancement and separation.
MODULE 3 - Examples and demos
Participants will have the possibility to learn about recent advances in audio-visual speech enhancement and separation systems and experience some demos of these systems.
Basic knowledge of signal processing and deep learning.
Bio: Prof. Zheng-Hua Tan is a Professor of Machine Learning and Speech Processing, a Co-Head of the Centre for Acoustic Signal Processing Research (CASPR), and Machine Learning Research Group Leader in the Department of Electronic Systems at Aalborg University, Denmark.
Prof. Zheng-Hua Tan
was a Visiting Scientist/Professor at the Computer Science and Artificial Intelligence Laboratory (CSAIL), Massachusetts Institute of Technology (MIT), Cambridge, USA, an Associate Professor in the Department of Electronic Engineering at Shanghai Jiao Tong University, China, and a postdoctoral fellow at AI Spoken Language Lab, in the Department of Computer Science at KAIST, Korea.
He received the B.S. and M.S. degrees in electrical engineering from Hunan University, China, in 1990 and 1996, respectively, and the Ph.D. degree in electronic engineering from Shanghai Jiao Tong University, China, in 1999.
His research interests include machine learning, deep learning, pattern recognition, speech and speaker recognition, noise-robust speech processing, multimodal signal processing, and social robotics. He has over 200 publications. He edited the book Automatic Speech Recognition on Mobile Devices and over Communication Networks (Springer, 2008).
He is the elected Chair of the IEEE Machine Learning for Signal Processing Technical Committee. He is an Associate Editor for the IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH AND LANGUAGE PROCESSING. He has served as an Editorial Board Member/Associate Editor for several journals including Computer Speech and Language, and Digital Signal Processing. He was a Lead Guest Editor of the IEEE Journal of STSP and a Guest Editor of several journals including Neurocomputing. He was the General Chair of IEEE 28th International Workshop on Machine Learning for Signal Processing (MLSP 2018), Aalborg, Denmark, and was a Program Co-Chair for IEEE Workshop on Spoken Language Technology (SLT 2016), San Diego, California, USA. He has served as a Chair, Program Co-chair, Area and Session Chair, and Tutorial Speaker of many international conferences. He is a Senior Member of the IEEE.