In October 2016, Google Deepmind released WaveNet, which represented a revolution in the quality of audio synthesis. Instead of using traditional vocoding or spectrogram inversion techniques, WaveNet directly generated an audio waveform sample by sample using a deep neural net. Unfortunately, WaveNet was impractical to use in production because its sample-by-sample generation meant that it took minutes to produce a second of audio. Recently, DeepMind came out with a technique whereby a trained WaveNet is used to train a student a student network that is capable of producing audio in parallel using a technique called inverse autoregressive flows. The resulting student model is capable of producing audio just as good as that produced by the original WaveNet but about 3000 times faster.
This exciting technique could also be applied to autoregressive models of other modalities, eliminating one of the main disadvantages of autoregressive models compared with generative adversarial networks and variational autoencoders (the other two prominent families of deep generative models).