This week in AI – all the Major AI developments in a nutshell
This week in AI – all the Major AI developments in a nutshell

This week in AI – all the Major AI developments in a nutshell

  1. Meta AI introduces V-JEPA (Video Joint Embedding Predictive Architecture), a method for teaching machines to understand and model the physical world by watching videos. Meta AI releases a collection of V-JEPA vision models trained with a feature prediction objective using self-supervised learning. The models are able to understand and predict what is going on in a video, even with limited information [Details | GitHub].
  2. Open AI introduces Sora, a text-to-video model that can create videos of up to 60 seconds featuring highly detailed scenes, complex camera motion, and multiple characters with vibrant emotions [Details + sample videos | Report].
  3. Google announces their next-generation model, Gemini 1.5, that uses a new Mixture-of-Experts (MoE) architecture. The first Gemini 1.5 model being released for early testing is Gemini 1.5 Pro with a context window of up to 1 million tokens, which is the longest context window of any large-scale foundation model yet. 1.5 Pro can perform sophisticated understanding and reasoning tasks for different modalities, including video and it performs at a similar level to 1.0 Ultra [Details |Tech Report].
  4. Reka introduced Reka Flash, a new 21B multimodal and multilingual model trained entirely from scratch that is competitive with Gemini Pro & GPT 3.5 on key language & vision benchmarks. Reka also present a compact variant Reka Edge , a smaller and more efficient model (7B) suitable for local and on-device deployment. Both models are in public beta and available in Reka Playground [Details].
  5. Cohere For AI released Aya, a new open-source, massively multilingual LLM & dataset to help support under-represented languages. Aya outperforms existing open-source models and covers 101 different languages – more than double covered by previous models [Details].
  6. BAAI released Bunny, a family of lightweight but powerful multimodal models. Bunny-3B model built upon SigLIP and Phi-2 outperforms the state-of-the-art MLLMs, not only in comparison with models of similar size but also against larger MLLMs (7B), and even achieves performance on par with LLaVA-13B [Details].
  7. Amazon introduced a text-to-speech (TTS) model called BASE TTS (Big Adaptive Streamable TTS with Emergent abilities). BASE TTS is the largest TTS model to-date, trained on 100K hours of public domain speech data and exhibits “emergent” qualities improving its ability to speak even complex sentences naturally [Details | Paper].
  8. Stability AI released Stable Cascade in research preview, a new text to image model that is exceptionally easy to train and finetune on consumer hardware due to its three-stage architecture. Stable Cascade can also generate image variations and image-to-image generations. In addition to providing checkpoints and inference scripts, Stability AI has also released scripts for finetuning, ControlNet, and LoRA training [Details].
  9. Researchers from UC berkeley released Large World Model (LWM), an open-source general-purpose large-context multimodal autoregressive model, trained from LLaMA-2, that can perform language, image, and video understanding and generation. LWM answers questions about 1 hour long YouTube video even if GPT-4V and Gemini Pro both fail and can retriev facts across 1M context with high accuracy [Details].
  10. GitHub opens applications for the next cohort of GitHub Accelerator program with a focus on funding the people and projects that are building AI-based solutions under an open source license [Details].
  11. NVIDIA released Chat with RTX, a locally running (Windows PCs with specific NVIDIA GPUs) AI assistant that integrates with your file system and lets you chat with your notes, documents, and videos using open source models [Details].
  12. Open AI is testing memory with ChatGPT, enabling it to remember things you discuss across all chats. ChatGPT's memories evolve with your interactions and aren't linked to specific conversations. It is being rolled out to a small portion of ChatGPT free and Plus users this week [Details].
  13. BCG X released of AgentKit, a LangChain-based starter kit (NextJS, FastAPI) to build constrained agent applications [Details | GitHub].
  14. Elevenalabs' Speech to Speech feature, launched in November, for voice transformation with control over emotions and delivery, is now multilingual and available in 29 languages [Link]
  15. Apple introduced Keyframer, an LLM-powered animation prototyping tool that can generate animations from static images (SVGs). Users can iterate on their design by adding prompts and editing LLM-generated CSS animation code or properties [Paper].
  16. Eleven Labs launched a payout program for voice actors to earn rewards every time their voice clone is used [Details].
  17. Azure OpenAI Service announced Assistants API, new models for finetuning, new text-to-speech model and new generation of embeddings models with lower pricing [Details].
  18. Brilliant Labs, the developer of AI glasses, launched Frame, the world’s first glasses featuring an integrated AI assistant, Noa. Powered by an integrated multimodal generative AI system capable of running GPT4, Stability AI, and the Whisper AI model simultaneously, Noa performs real-world visual processing, novel image generation, and real-time speech recognition and translation. [Details].
  19. Nous Research released Nous Hermes 2 Llama-2 70B model trained on the Nous Hermes 2 dataset, with over 1,000,000 entries of primarily synthetic data [Details].
  20. Open AI in partnership with Microsoft Threat Intelligence, have disrupted five state-affiliated actors that sought to use AI services in support of malicious cyber activities [Details]
  21. Perplexity partners with Vercel, opening AI search to developer apps [Details].
  22. Researchers show that LLM agents can autonomously hack websites, performing tasks as complex as blind database schema extraction and SQL injections without human feedback. The agent does not need to know the vulnerability beforehand [Paper].
  23. FCC makes AI-generated voices in unsolicited robocalls illegal [Link].
  24. Slack adds AI-powered search and summarization to the platform for enterprise plans [Details].

Source: AI Brews - you can subscribe the newsletter here. it's free to join, sent only once a week with bite-sized news, learning resources and selected tools. Thanks.

submitted by /u/wyem
[link] [comments]