Much of data is sequential — think speech, text, DNA, stock prices, financial transactions, and customer action histories.
Our best-performing methods for modelling sequence data use deep neural networks, usually either modified recurrent neural networks (RNNs) or attention-based Transformer blocks. Although a tremendous amount of research progress has recently been made in sequence modelling (particularly in application to NLP problems), these models can seem a bit arcane, and their inner workings mysterious and difficult to intuitively understand.
This tutorial will start from the basics and gradually build upon concepts in order to impart an understanding of the inner mechanics of modern sequence models. I will try to address questions like:
- Why do we need new architectures for handling sequences at all, when you could presumably just use something simpler, like standard feed-forward networks?
- RNNs seem good for modelling sequences, but how do they actually process sequential information and represent contextual knowledge?
- How do gating mechanisms in RNNs (like GRUs and LSTM units) help longer-term remembering of information?
- Transformers are the hot new neural architecture for sequence modelling. How can they do such a good job without any recurrence or convolution operations? What is self-attention, exactly?
By the end of the tutorial, I hope that attendees will get an understanding of the computations that are happening under the hood. Diagrams like the following will hopefully make much more sense to you 🙂
A basic single-layer recurrent neural network (“unrolled through time”):
A diagram of one of the fancier types of RNNs used in production sequence-to-sequence systems (this is a bidirectional encoder-decoder RNN with attention):
And self-attention based model mechanisms:
Use Cases of Sequence Modelling
Sequential data pops up absolutely everywhere, and I will talk about some particularly popular use cases for sequence modelling, including:
- Language models (prediction of the next word given a seed string, as you see in keyboard apps on mobile phones)
- Machine translation (automatic translation between different languages)
- Computational biology, for example in the functional modelling of DNA and protein sequences (predicting which regions of biological sequences are functional and what that function could be)
The main goals of this tutorial are to provide an overview of popular sequence-based problems, impart an intuition for how the most commonly-used sequence models work under the hood, and show that quite similar architectures are used to solve sequence-based problems across many domains.
Hope you decide to attend! the talk, “Sequence Modelling with Deep Learning” 🙂
Originally posted on OpenDataScience.com