
Abstract: Recently, OpenAI showcased their latest text-to-image model known as DALL-E2. It generates photorealistic images from text including some unusual one e.g. ""astraunaut riding a horse"". Soon after, this was superceded by Google's Imagen as state-of-the-art model. Both models share a common thing, they use diffusion models as the core algorithm, which is the topic of our workshop.
In this workshop, we will first go through the evolution of models i.e GANs and autoregressive transformer (used in DALL-E) before delving into diffusion model. What you'll learn in this workshop:
- The principle of training diffusion model
- Image super resolution
- Classifier-free guidance for text conditioning
- Architecture of DALL-E 2 and Imagen
- CLIP guidance and hands-on using DiscoDiffusion
Background Knowledge:
Pytorch
Bio: Soon-Yau Cheong is the founder of Sooner.ai, an AI consulting and training company specialises in image/video generation and manipulation. Past projects include face swapping, portrait cartoonisation, shoes virtual try-on etc. He is well-versed in generative AI techniques which include GANs, autoregressive transformer and diffusion models. He authored the book “Hands-on Image Generation with TensorFlow” which is well-received for its hands-on approach in making difficult mathematical theories easy to understand. Soon-Yau is also currently doing PhD in AI digital media creation at University of Surrey.

Soon Yau Cheong, PhD
Title
Founder | Author of Hands-on Image Generation with TensorFlow | Sooner.ai
