Automatic Speech Recognition: a Paradigm Change in Motion
Automatic Speech Recognition: a Paradigm Change in Motion


From its advent more than 40 years ago, robust and high-performing approaches to automatic speech recognition (ASR) have been following a statistical approach based on Bayes decision rule. For decades, state-of-the-art ASR systems were based on standard signal processing for feature extraction, hidden Markov modeling, complex data-driven acoustic and language models, and advanced search concepts based on dynamic programming. This classical approach to automatic speech recognition has not been challenged significantly until recently. Even when artificial neural networks started to considerably boost ASR performance, the general architecture of state-of-the-art ASR systems was not altered considerably. Gradually, deep learning concepts were integrated into the components of the common ASR architecture, leading to neural feature processing, acoustic modeling via hybrid deep neural network (DNN) HMMs, and neural language modeling, boosting ASR performance by more than 50% relative within the last 10 years. Today, the hybrid DNN/HMM approach, together with recurrent long short-term memory (LSTM) neural network language modeling currently marks the state-of-the-art on many ASR tasks, covering a wide range of training set sizes. However, currently more and more alternative approaches occur, moving gradually towards so-called end-to-end approaches. Gradually, these novel end-to-end approaches replace explicit modeling of model properties and dedicated search space organisation by more implicit, integrated neural-network based representations, while also introducing new overall architectures. Corresponding approaches show promising results, especially using large training sets. In this presentation, a contrastive overview of ASR architectures and modeling will be given, including variations of both more classical HMM-based as well as end-to-end modeling approaches.


Ralf Schlüter serves as Academic Director and senior lecturer in the Computer Science Department at RWTH Aachen University, Germany. He leads the Automatic Speech Recognition Group at the Lehrstuhl Informatik 6: Human Language Technology and Pattern Recognition. He studied Physics at RWTH Aachen University, and Edinburgh University, UK, and received the Diplom degree in Physics, the Dr.rer.nat. degree in Computer Science, and completed his habilitation in Computer Science, all at RWTH Aachen University. His research interests cover automatic speech recognition and machine learning in general, discriminative training, neural network modeling, information theory, stochastic modeling, and speech signal analysis.

Privacy Settings
We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
Consent to display content from - Youtube
Consent to display content from - Vimeo
Google Maps
Consent to display content from - Google