This week in AI – all the Major AI developments in a nutshell
This week in AI – all the Major AI developments in a nutshell

This week in AI – all the Major AI developments in a nutshell

  1. Meta released:

    1. Meta Llama 3 family of large language models in 8 and 70B sizes. The training dataset (15T+ tokens) is seven times larger than that used for Llama 2. Meta believes these to be the best open source models of their class. The models achieve substantially reduced false refusal rates, increased diversity in model responses and improved capabilities. The upcoming Llama 3 400B model (still in training) is competitive to GPT-4/ Claude 3 Opus class of models. Meta also released new Meta Llama trust & safety tools featuring Llama Guard 2, Code Shield and Cybersec Eval 2 [Details | Model Card | Getting Started ].
    2. Meta AI: an intelligent assistant that integrates Llama 3 in a ChatGPT like interface. It can be accessed on web without login and is also available in search across Facebook, Instagram, WhatsApp and Messenger [Details].
    3. Imagine Flash: Meta AI’s Imagine feature now enables creating images from text in real-time. You’ll see an image appear as you start typing and it’ll change with every few letters typed [Paper].
  2. Reka AI introduced Reka Core, a multimodal, multilingual language model with 128K context window and trained from scratch. It has powerful contextualized understanding of images, videos, and audio. Core is competitive with GPT-4V and Claude-3 Opus and surpasses Gemini Ultra on video tasks [Details].

  3. StabilityAI announced Stable Assistant, a friendly chatbot powered by Stability AI’s text and image generation technology, featuring Stable Diffusion 3 and Stable LM 2 12B. Stable Diffusion 3 and Stable Diffusion 3 Turbo are now available on the Stability AI Developer Platform API. Model weights will be available soon. StabilityAI has partnered with Fireworks AI, to deliver Stable Diffusion 3 and Stable Diffusion 3 Turbo [Details].

  4. MistralAI shared the details of their latest model Mixtral 8x22B released under Apache 2.0. It is a sparse Mixture-of-Experts (SMoE) model that uses only 39B active parameters out of 141B, offering cost efficiency for its size. It is natively capable of function calling, has 64K tokens context window and is fluent in English, French, Italian, German, and Spanish. The instructed version of the Mixtral 8x22B has also now been released [Details].

  5. Microsoft Research introduced VASA, a framework for generating lifelike talking faces of virtual characters. By utilizing a single static image and an accompanying speech audio clip, VASA can generate highly realistic lip-audio synchronization, lifelike facial expressions, and natural head movements, all in real-time. This is only a research demonstration and there's no product or API release plan [Details].

  6. Wayve introduced LINGO-2, a driving model that links vision, language, and action to explain and determine driving behavior. It can both generate real-time driving commentary and control a car. LINGO-2 is the first closed-loop vision-language-action driving model (VLAM) tested on public roads [Details].

  7. Hugging Face researchers released Idefics2, a general open multimodal model that takes as input arbitrary sequences of texts and images, and generates text responses. It is built on top of two pre-trained models: Mistral-7B-v0.1 and siglip-so400m-patch14-384. Idefics2 shows strong performance for a model of its size (8B parameters) when compared to other open multimodal models and is often competitive with closed-source systems [Details].

  8. Microsoft Research released SAMMO, a new open-source tool that streamlines the optimization of prompts [Details ].

  9. Tencent released InstantMesh, an open-source framework for efficient 3D mesh generation from a single image. InstantMesh is able to create diverse 3D assets within 10 seconds [Hugging Face Demo | GitHub] .

  10. Poe, the AI chat app by Quora, has added a new feature to Poe: multi-bot chat. It lets you easily chat with multiple models in a single thread, compare the responses and discover optimal combinations of models for various tasks. For instance, users can do in-depth analysis with Gemini 1.5 Pro’s 1M token context window, mention Web Search bot into the conversation to pull in up-to-date information about the topic, and then bring a specialized writing bot in to complete a writing task, using the context from all of the previous bots [Details].

  11. OpenAI announced new features and improvements to the Assistants API. This includes an improved retrieval tool, file_search, which can ingest up to 10,000 files per assistant. It works with the new vector store objects for automated file parsing, chunking, and embedding [Details].

  12. Zyphra released Zamba, a novel 7B parameter foundation model (Mamba blocks with a global shared attention layer). It outperforms LLaMA-2 7B and OLMo-7B on multiple benchmarks despite requiring less than half of the training data. Zamba-7B has been developed by 7 people, on 128 H100 GPUs, in 30 days. All checkpoints are released open-source (Apache 2.0) [Details].

  13. Cohere announced the private beta for Cohere Compass, a new foundation embedding model that allows indexing and searching on multi-aspect data. Cohere Compass is designed to address multi-aspect data like emails, invoices, CVs, support tickets, log messages, and tabular data [details].

  14. OpenAI released a new Batch API for cost-effective bulk processing of asynchronous tasks like summarization, translation, and image classification. Users can upload a file of bulk requests, receive results within 24 hours, and get 50% off API prices [Link].

  15. Blackmagic Design announced DaVinci Resolve 19 a major new update which adds new AI tools for motion tracking and color grading [Details].

  16. xAI announced Grok-1.5V, its first multimodal model that can process a wide variety of visual information, including documents, diagrams, charts, screenshots, and photographs. Grok-1.5V will be available soon to early testers and existing Grok users [Details].

  17. Alibaba Cloud released CodeQwen1.5-7B, a specialized codeLLM built upon the Qwen1.5 language model. CodeQwen1.5-7B has been pretrained with around 3 trillion tokens of code-related data and supports long context understanding and generation with the context length of 64K tokens [Details].

  18. Adobe to add AI video generators Sora, Runway, Pika to Premiere Pro [Details].

  19. AI Inference now available in Supabase Edge Functions [Details].

  20. Amazon Music follows Spotify with an AI playlist generator of its own, Maestro [Details]

  21. Allen Institute for AI released an updated version of their 7 billion parameter Open Language Model, OLMo 1.7–7B and an updated version of the dataset, Dolma 1.7 [Details].

  22. Nothing plans to bring ChatGPT to its earbuds and phones [Details].

  23. Google launched Code Assist, its latest challenger to GitHub’s Copilot [Details].

Source: AI Brews - Links removed from this post due to auto-delete, but they are present in the newsletter. it's free to join, sent only once a week with bite-sized news, learning resources and selected tools. Thanks!

submitted by /u/wyem
[link] [comments]