This week in AI – all the Major AI developments in a nutshell
This week in AI – all the Major AI developments in a nutshell

This week in AI – all the Major AI developments in a nutshell

  1. Google launches Ultra 1.0, its largest and most capable AI model, in its ChatGPT-like assistant which has now been rebranded as Gemini (earlier called Bard). Gemini Advanced is available, in 150 countries, as a premium plan for $19.99/month, starting with a two-month trial at no cost. Google is also rolling out Android and iOS apps for Gemini [Details].
  2. Alibaba Group released Qwen1.5 series, open-sourcing models of 6 sizes: 0.5B, 1.8B, 4B, 7B, 14B, and 72B. Qwen1.5-72B outperforms Llama2-70B across all benchmarks. The Qwen1.5 series is available on Ollama and LMStudio. Additionally, API on together.ai [Details | Hugging Face].
  3. NVIDIA released Canary 1B, a multilingual model for speech-to-text recognition and translation. Canary transcribes speech in English, Spanish, German, and French and also generates text with punctuation and capitalization. It supports bi-directional translation, between English and three other supported languages. Canary outperforms similarly-sized Whisper-large-v3, and SeamlessM4T-Medium-v1 on both transcription and translation tasks and achieves the first place on HuggingFace Open ASR leaderboard with an average word error rate of 6.67%, outperforming all other open source models [Details].
  4. Researchers released Lag-Llama, the first open-source foundation model for time series forecasting [Details].
  5. LAION released BUD-E, an open-source conversational and empathic AI Voice Assistant that uses natural voices, empathy & emotional intelligence and can handle multi-speaker conversations [Details].
  6. MetaVoice released MetaVoice-1B, a 1.2B parameter base model trained on 100K hours of speech, for TTS (text-to-speech). It supports emotional speech in English and voice cloning. MetaVoice-1B has been released under the Apache 2.0 license [Details].
  7. Bria AI released RMBG v1.4, an an open-source background removal model trained on fully licensed images [Details].
  8. Researchers introduce InteractiveVideo, a user-centric framework for video generation that is designed for dynamic interaction, allowing users to instruct the generative model during the generation process [Details |GitHub ].
  9. Microsoft announced a redesigned look for its Copilot AI search and chatbot experience on the web (formerly known as Bing Chat), new built-in AI image creation and editing functionality, and Deucalion, a fine tuned model that makes Balanced mode for Copilot richer and faster [Details].
  10. Roblox introduced AI-powered real-time chat translations in 16 languages [Details].
  11. Hugging Face launched Assistants feature on HuggingChat. Assistants are custom chatbots similar to OpenAI’s GPTs that can be built for free using open source LLMs like Mistral, Llama and others [Link].
  12. DeepSeek AI released DeepSeekMath 7B model, a 7B open-source model that approaches the mathematical reasoning capability of GPT-4. DeepSeekMath-Base is initialized with DeepSeek-Coder-Base-v1.5 7B [Details].
  13. Microsoft is launching several collaborations with news organizations to adopt generative AI [Details].
  14. LG Electronics signed a partnership with Korean generative AI startup Upstage to develop small language models (SLMs) for LG’s on-device AI features and AI services on LG notebooks [Details].
  15. Stability AI released SVD 1.1, an updated model of Stable Video Diffusion model, optimized to generate short AI videos with better motion and more consistency [Details | Hugging Face] .
  16. OpenAI and Meta announced to label AI generated images [Details].
  17. Google saves your conversations with Gemini for years by default [Details].

Source: AI Brews - you can subscribe the newsletter here. it's free to join, sent only once a week with bite-sized news, learning resources and selected tools. Thanks.

submitted by /u/wyem
[link] [comments]