This week in AI – all the Major AI developments in a nutshell

Cohere introduced Rerank 3, a new foundation model purpose built for efficient enterprise search and Retrieval Augmented Generation (RAG) systems. It enables search over multi-aspect and semi-structured data like emails, invoices, JSON documents, code, and tables in 100+ languages [Details].
Google DeepMind used deep reinforcement learning (deep RL) to train humanoid robots to play a simplified one-versus-one soccer game. The agents learnt by trial and error and could cope with unexpected interference in the real world. They were able to walk, turn, kick and stand up faster than manually programmed skills on this type of robot. They could also combine movements to score goals, anticipate ball movements and block opponent shots - thereby developing a basic understanding of the game [Details ].
Hugging Face researchers released Parler TTS, a fully open-source, Apache 2.0 licensed Text-to-speech model focused on providing maximum controllability. Through voice prompts, you can control the pitch, speed, gender, noise levels, emotion characteristics and more [Details | Demo]
Mistral AI released Mixtral 8×22B, a 176B parameters Sparse Mixture of Experts model with context length of 65k tokens - Apache 2.0 license [Link | Hugging Face].
Google :
1. The input modalities for Gemini 1.5 Pro now expanded to include audio (speech) understanding in both the Gemini API and Google AI Studio. You can upload an audio recording of a lecture, for example, and Gemini 1.5 Pro can turn it into a quiz with an answer key. Additionally, Gemini 1.5 Pro is now able to reason across both image (frames) and audio (speech) for videos uploaded in Google AI Studio [Details].
2. Gemini 1.5 Pro is now available in 180+ countries via the Gemini API in public preview [Details].
3. Two new variants to Gemma family of lightweight, open models: CodeGemma for code completion and generation tasks as well as instruction following, and RecurrentGemma, an efficiency-optimized architecture for research experimentation [Details + Hugging Face blog].
4. Google Vids, a new AI-powered video creation app for work with real-time collaboration announced. It can generate a storyboard that you can easily edit, and after choosing a style, it pieces together your first draft with suggested scenes from stock videos, images, and background music and voiceover. Vids is being released to Workspace Labs in June [Details].
5. Vertex AI Agent Builder launched. It lets developers easily build and deploy enterprise-ready gen AI experiences using natural language or a code-first approach [Details].
6. new Gemini-powered security updates to Chronicle and Workspace [Details].
7. Gemini 1.0 Pro added to Android Studio as AI coding assistant [Details].
Cohere released Command R+, a RAG-optimized multilingual model designed to tackle enterprise-grade workloads. It support Multi-Step Tool Use which allows the model to combine multiple tools over multiple steps to accomplish difficult tasks. Command R+ is available on HuggingChat [Details].
Archetype AI introduced Newton, a physical AI foundational model that is capable of perceiving, understanding and reasoning about the world. It fuses real-time sensor data – such as from radars, cameras, accelerometers, temperature sensors, and more – with natural language, so you can ask open-ended questions about the world around you [Details].
Intercom launched Fin AI Copilot, a personal AI assistant for customer service agents. It uses RAG + semantic search to generate answers for support agents via internal knowledge bases, public URLs etc. Fin AI Copilot retains the context from a conversation with a support agent, so the agent can ask Fin follow-up questions later [Details].
Meta AI released Open-Vocabulary Embodied Question Answering (OpenEQA) framework—a new benchmark which measures an AI agent’s understanding of physical spaces via questions like “Where did I leave my badge?” [Details].
OpenAI’s new GPT-4 Turbo model, with improved capabilities in writing, math, logical reasoning, and coding, is now available to paid ChatGPT users and generally available via the API. Vision requests can now also use JSON mode and function calling [Details].
Poe introduced a new way for model developers and bot creators to generate revenue on Poe platform. Creators can now set a per-message price for their bots and generate revenue every time a user messages them [Details].
Oracle Financial Services introduced Oracle Financial Services Compliance Agent that helps banks mitigate anti-money-laundering risks [Details].
Apple Researchers present Ferret-UI, a new multimodal large language model (MLLM) tailored for enhanced understanding of mobile UI screens. Ferret-UI is able to perform referring tasks (e.g., widget classification, icon recognition, OCR) with flexible input formats (point, box, scribble) and grounding tasks (e.g., find widget, find icon, find text, widget listing) on mobile UI screens [Paper].
Stability AI released Stable LM 2 12B, a pair of powerful 12 billion parameter language models trained on multilingual data in English, Spanish, German, Italian, French, Portuguese, and Dutch, featuring a base and instruction-tuned model [Details].
Anthropic announced the Build with Claude contest, running from April 9th to April 16th, 2024. The top 5 winners will win $1,000 in API credits [Details].
Meta AI introduced the next generation of the Meta Training and Inference Accelerator (MTIA), the family of custom-made chips designed for Meta’s AI workloads. This new MTIA chip has improved performance by 3x over the first generation chip across four key model evaluations [Details].
Pika Labs and ElevenLabs are launching a 72-hour AI short film competition, FilmFAST, from April 12-14 [Details].
Intel introduced the Gaudi 3 AI accelerator, claiming to deliver 50% on average better inference and 40% on average better power efficiency than Nvidia H100 at a lower cost [Details].
Stability AI released Cos Stable Diffusion XL 1.0 and Cos Stable Diffusion XL 1.0 Edit, fine-tuned SDXL models that can produce full color range images [Hugging Face | Unofficial Demo]
Replit announced Code Repair, a low-latency code repair AI agent that fixes code automatically without prompting and outperforms GPT-4 and Claude 3 Opus. Replit also announced early access to a new AI-powered Replit Teams product [Details].
Meta confirmed that its Llama 3 open source LLM is coming in the next month [Details].
Apple researchers have developed an AI system called ReALM (Reference Resolution As Language Modeling) that can ‘see’ and understand screen context [Details | Paper]

Source: AI Brews - Links removed from this post due to auto-delete, but they are present in the newsletter. it's free to join, sent only once a week with bite-sized news, learning resources and selected tools. Thanks!

submitted by /u/wyem
[link] [comments]