This week in AI – all the Major AI developments in a nutshell

Microsoft introduced:
1. Copilot+ PCs, a new category of Windows PCs designed for AI with built-in AI hardware and support for AI features across the operating system. Users can easily locate and recall previously viewed content using Recall, generate and refine AI images in near real-time directly on the device using Cocreator, and Live Captions with live translations, providing real-time English captions for any audio across all apps [Details].
2. Copilot AI agents**:** businesses and developers will be able to build AI-powered Copilots that can work like virtual employees and perform tasks automatically. Instead of Copilot sitting idle waiting for queries, it will be able to do things like monitor email inboxes and automate a series of tasks or data entry that employees normally have to do manually [Details].
3. Phi-Silica, a 3.3B parameter model made for Copilot+ PC NPUs [Details].
4. Phi-3 lightweight open model family is now generally available. Phi-3-mini does better than models twice its size, and Phi-3-small and Phi-3-medium outperform much larger models, including GPT-3.5T [Details]
5. Phi-3-vision, a 4.2B parameter open multimodal model (128K context length) with language and vision capabilities. It outperforms larger models such as Claude-3 Haiku and Gemini 1.0 Pro V across general visual reasoning tasks, OCR, table and chart understanding tasks [Details].
OpenBNB released MiniCPM-Llama3-V 2.5, the latest model in the MiniCPM-V series designed for vision-language understanding. With 8B parameters, the model surpasses GPT-4V-1106, Gemini Pro, Qwen-VL-Max and Claude 3 in overall performance. The model is built on SigLip-400M and Llama3-8B-Instruct [Details].
Cohere released Aya 23, a family of open weights multilingual instruction-tuned language models (8B and 35B) supporting 23 languages based on Cohere’s Command model. Aya-23-35B achieves the highest results for the languages covered in multilingual benchmarks [Report | Hugging Face].
Mistral released new 7B base and instruct models Mistral-7B-v0.3/Mistral-7B-Instruct-v0.3. Compared to v0.2, it has extended vocabulary and supports function calling [Details].
Truecaller partners with Microsoft to let its AI respond to calls in your own voice [Details].
OpenAI shared a demo at VivaTech conference featuring Sora, ChatGPT and VoiceEngine models [Link]
Google AI introduced LANISTR, a new framework that enables multimodal learning by ingesting unstructured (image, text) and structured (time series, tabular) data, performing alignment and fusion, and generating predictions [Details].
Arc Search’s new Call Arc feature lets you ask questions by ‘making a phone call’ [Details].
IDEA Research introduced Grounding DINO 1.5 models for detecting objects in images and videos, even those not seen during training. It comes in two versions: Grounding DINO 1.5 Pro, which offers high accuracy, and Grounding DINO 1.5 Edge, optimized for real-time performance on devices with limited computing power [Details | Demo].
Hollywood agency CAA teamed up with AI tech company Veritone to help stars manage their own AI likenesses. CAA clients can now store their AI digital doubles and other assets within a secure personal hub in the CAAvault, which can only be accessed by authorized users, allowing them to share and monetize their content as they see fit [Details].
Meta AI developed Chameleon, a series of foundation models that can generate and reason with sequences containing a mix of text, images, and code. Chameleon uses a single, uniform architecture that is trained end-to-end on an interleaved mixture of all modalities from the ground up [Paper].

Source: AI Brews - Links removed from this post due to auto-delete, but they are present in the newsletter. it's free to join, sent only once a week with bite-sized news, learning resources and selected tools. Thanks!

submitted by /u/wyem
[link] [comments]