<span class="vcard">/u/Successful-Western27</span>
/u/Successful-Western27

Tool-Integrated Reasoning: A New Approach for Math-Savvy LLMs

When trying to get language models to solve complex math problems, researchers kept running into limits. Models like GPT-3 and ChatGPT still struggle with advanced algebra, calculus, and geometry questions. The math is just too abstract and symbol-heav…

Meta, INRIA researchers discover that explicit registers eliminate ViT attention spikes

When visualizing the inner workings of vision transformers (ViTs), researchers noticed weird spikes of attention on random background patches. This didn't make sense since the models should focus on foreground objects. By analyzing the output embed…

Show-1: Marrying Pixel and Latent Diffusion Models for Efficient and High-Quality Text-to-Video Generation

A new paper proposes Show-1, a hybrid model that combines pixel and latent diffusion for efficient high-quality text-to-video generation. Both of these approaches have tradeoffs, so researchers at the National University of Singapore tried a hybrid app…

UNC Researchers Present VideoDirectorGPT: Using AI to Generate Multi-Scene Videos from Text

Generating coherent videos spanning multiple scenes from text descriptions poses unique challenges for AI. While recent progress enables creating short clips, smoothly transitioning across diverse events and maintaining continuity remains difficult. A …

Microsoft Researchers Propose AI Morality Test for LLMs in New Study

Researchers from Microsoft have just proposed using a psychological assessment tool called the Defining Issues Test (DIT) to evaluate the moral reasoning capabilities of large language models (LLMs) like GPT-3, ChatGPT, etc. The DIT presents moral dile…

DeepMind: Increasing learning rate in small models lets you reproduce errors in large ones

Training giant AI models like GPT-3 requires large resources – thousands of GPUs running for months. As a solo researcher without access to that kind of scale, I can't easily reproduce experiments and findings from papers on huge models. But a new …

Researchers announce GPT4Tools: a method for teaching LLMs how to use tools for visual tasks

LLMs are great with words but can't handle visual tasks like understanding images. Teaching them to use visual tools could make them much more capable. A new paper introduces GPT4Tools – a method to efficiently teach existing LLMs to invoke tools f…

Meet ALMA: A New Training Method That Boosts Translation Performance for Large Language Models

TLDR: New training approach enables smaller AI models to achieve state-of-the-art translation performance Large AI models like GPT-3 have good performance on translation tasks, but some smaller models struggle. Researchers from Johns Hopkins and Micros…

LongLoRA: New method extends LLAMA2 7B to 100k context length, 70B to 32k context length on on a single 8 × A100 machine

As AI models get bigger, training them requires more and more computing power. Researchers are looking for ways to train these large AI models without needing Google-scale resources. A new paper proposes LongLoRA, a fine-tuning approach that can extend…

[I read the paper for you] LLMs compress images 43% better than PNG, and audio nearly 2x better than MP3

Edit: FLAC is the tested audio extension, not MP3 I read the new paper from DeepMind so you don't have to. Here are the key highlights: Despite training on text, langauge models compressed images 43% better than PNG, and audio nearly 2x better tha…