Hello all! Sharing my new YT video about Multimodal LLMs and how they generate images. I go over concepts like VQ-VAE and image tokens, and how these neural networks convert the image generation problem into a language generation problem. Link above for those interested. If you like it, a thumbs up on the YT page would be super appreciated as it helps the channel grow! Thanks, hope you enjoy it! [link] [comments] |