How Multimodal LLMs (like Google’s Gemini) learn to generate images!
How Multimodal LLMs (like Google’s Gemini) learn to generate images!

How Multimodal LLMs (like Google’s Gemini) learn to generate images!

How Multimodal LLMs (like Google's Gemini) learn to generate images!

Hello all! Sharing my new YT video about Multimodal LLMs and how they generate images. I go over concepts like VQ-VAE and image tokens, and how these neural networks convert the image generation problem into a language generation problem. Link above for those interested.

If you like it, a thumbs up on the YT page would be super appreciated as it helps the channel grow! Thanks, hope you enjoy it!

submitted by /u/AvvYaa
[link] [comments]