<span class="vcard">/u/jaketocake</span>
/u/jaketocake

AI — weekly megathread!

News provided by aibrews.com Google introduced Gemini – a family of multimodal models built from the ground up for multimodality, capable of reasoning seamlessly across text, images, video, audio, and code. It comes in Ultra, Pro, and Nano sizes, suit…

AI — weekly megathread!

News provided by aibrews.com Meta AI introduced a suite of AI language translation models that preserve expression and improve streaming [Details | GitHub]: SeamlessExpressive enables the transfer of tones, emotional expression and vocal styles in sp…

AI — weekly megathread!

News provided by aibrews.com Stability AI released Stable Video Diffusion, a latent video diffusion model for high-resolution text-to-video and image-to-video generation. [Details | Paper]. Microsoft Research released Orca 2 (7 billion and 13 billion…

AI — weekly megathread!

News provided by aibrews.com Meta AI introduces: Emu Video: new text-to-video model that leverages Meta’s Emu image generation model and can respond to text-only, image-only or combined text & image inputs to generate high quality video [Details]…

AI — weekly megathread!

News provided by aibrews.com Luma AI introduced Genie, a generative 3D foundation model in research preview. It’s free during research preview via Discord [Details]. Nous Research released Obsidian, the world's first 3B multi-modal model family pr…

AI — weekly megathread!

News provided by aibrews.com Luma AI introduced Genie, a generative 3D foundation model in research preview. It’s free during research preview via Discord [Details]. Nous Research released Obsidian, the world's first 3B multi-modal model family pr…

AI — weekly megathread!

News provided by aibrews.com ​ Twelve Labs announced video-language foundation model Pegasus-1 (80B) along with a new suite of Video-to-Text APIs. Pegasus-1 integrates visual, audio, and speech information to generate more holistic text from vi…

AI — weekly megathread!

News provided by aibrews.com ​ Adept open-sources Fuyu-8B – a multimodal model designed from the ground up for digital agents, so it can support arbitrary image resolutions, answer questions about graphs and diagrams, answer UI-based questions …

AI — weekly megathread!

News provided by aibrews.com ​ Google DeepMind introduced 𝗥𝗧-𝗫: a generalist AI model to help advance how robots can learn new skills. To train it, DeepMind together with 33 academic labs developed Open X-Embodiment, a massive open dataset that…

AI — weekly megathread!

News provided by aibrews.com ​ Google DeepMind introduced 𝗥𝗧-𝗫: a generalist AI model to help advance how robots can learn new skills. To train it, DeepMind together with 33 academic labs developed Open X-Embodiment, a massive open dataset that…