How Multimodal LLMs (like Google’s Gemini) learn to generate images!
Hello all! Sharing my new YT video about Multimodal LLMs and how they generate images. I go over concepts like VQ-VAE and image tokens, and how these neural networks convert the image generation problem into a language generation problem. Link ab…