AI — weekly megathread!
AI — weekly megathread!

AI — weekly megathread!

News provided by aibrews.com

  1. Adept open-sources Fuyu-8B - a multimodal model designed from the ground up for digital agents, so it can support arbitrary image resolutions, answer questions about graphs and diagrams, answer UI-based questions and more. It has a much simpler architecture and training procedure than other multi-modal models- there is no image encoder [Details].
  2. Meta AI researchers present an AI system that can be deployed in real time to reconstruct, from brain activity, the images perceived and processed by the brain at each instant. It uses magnetoencephalography (MEG), a non-invasive neuroimaging technique in which thousands of brain activity measurements are taken per second [Details].
  3. Scaled Foundations released GRID (General Robot Intelligence Development) - a platform that combines foundation models, simulation and large language models for rapid prototyping of AI capabilities in robotics. GRID can ingest entire sensor/control APIs of any robot, and for a given task, generate code that goes from sensor -> perception -> reasoning -> control commands [Details].
  4. DALL·E 3 is now available in ChatGPT Plus and Enterprise. OpenAI shares the DALL·E 3 research paper [Details | Paper].
  5. PlayHT released PlayHT Turbo - a new version of their conversational voice model, PlayHT 2.0 that generates speech in under 300ms via network [Details].
  6. Google announced a new feature of Google Search that helps English learners practice speaking words in context. Responses are analyzed to provide helpful, real-time suggestions and corrections [Details].
  7. Researchers from EleutherAI present Llemma: an open language model for math trained on up to 200B tokens of mathematical text. The performance of Llemma 34B approaches Google's Minerva 62B despite having half the parameters [Details].
  8. Midjourney partnered with Japanese game company Sizigi Studios to launch Niji Journey, an Android and iOS app. Users can generate entire range of art styles, including non-niji images, by selecting “v5” in the settings. Existing Midjourney subscribers can log into it using their Discord credentials without paying more. [Details].
  9. Microsoft Azure AI present Idea2Img - a multimodal iterative self-refinement system that enhances any T2I model for automatic image design and generation, enabling various new image creation functionalities togther with better visual qualities [Details].
  10. China’s Baidu unveiled the newest version of its LLM, Ernie 4.0 and several AI-native applications including Baidu Maps for AI-powered navigation, ride-hailing, restaurant recommendations, hotel booking etc. [Details].
  11. Stability AI released stable-audio-tools - repo for training and inference of generative audio models [Link].
  12. Microsoft announced the new Microsoft AI bug bounty program with awards up to $15,000 to discover vulnerabilities in the AI-powered Bing experience [Details].
  13. Google researchers present PaLI-3, a smaller, faster, and stronger vision language model (VLM) that compares favorably to similar models that are 10x larger [Paper].
  14. Morph Labs released Morph Prover v0 7B, the first open-source model trained as a conversational assistant for Lean users. Morph Prover v0 7B is a chat fine-tune of Mistral 7B that performs better than the original Mistral model on some benchmarks [Details].
  15. Microsoft research presented HoloAssist: A multimodal dataset for next-gen AI copilots for the physical world [Details].
  16. YouTube gets new AI-powered ads that let brands target special cultural moments [Details].
  17. Anthropic Claude is now available in 95 countries [Link].
  18. Runway AI is launching a 3-month paid Runway Acceleration Program to help software engineers become ML practitioners [Details].

🔦 Weekly Spotlight

  1. Twitter/X thread on the finalists at the TED Multimodal AI Hackathon [Link].
  2. 3D to Photo: an open-source package by Dabble, that combines threeJS and Stable diffusion to build a virtual photo studio for product photography [Link]
  3. Multi-modal prompt injection image attacks against GPT-4V [Link].
  4. Meet two open source challengers to OpenAI’s ‘multimodal’ GPT-4V [Link].
  5. From physics to generative AI: An AI model for advanced pattern generation [Link].

- - -

Welcome to the r/artificial weekly megathread. This is where you can discuss Artificial Intelligence - talk about new models, recent news, ask questions, make predictions, and chat other related topics.

Click here for discussion starters for this thread or for a separate post.

Self-promo is allowed in these weekly discussions. If you want to make a separate post, please read and go by the rules or you will be banned.

Previous Megathreads & Subreddit revamp and going forward

submitted by /u/jaketocake
[link] [comments]