LongLoRA: New method extends LLAMA2 7B to 100k context length, 70B to 32k context length on on a single 8 × A100 machine
LongLoRA: New method extends LLAMA2 7B to 100k context length, 70B to 32k context length on on a single 8 × A100 machine

LongLoRA: New method extends LLAMA2 7B to 100k context length, 70B to 32k context length on on a single 8 × A100 machine

As AI models get bigger, training them requires more and more computing power. Researchers are looking for ways to train these large AI models without needing Google-scale resources.

A new paper proposes LongLoRA, a fine-tuning approach that can extend LLaMA2 7B to 100k context length and 70B model to 32k context length on a single 8× A100 machine.

Here are my highlights from the paper:

Big one of course: LongLoRA efficiently fine-tunes large AI models on longer texts

Key points:

  • Approximates standard attention via "shift short attention" during training
  • Tuning only a subset of weights (LoRA) plus some embeddings & norms
  • Fine-tuned 7B parameter model on 100k tokens with 1 machine
  • Way lower training cost than full fine-tuning for large contexts
  • Close to full fine-tuning performance

The core insight is that an approximation of full attention enables efficient training while retaining standard attention for final inference. Combined with selective weight tuning, this really reduces compute needs.

I think this demonstrates the potential to train more capable AI without unreasonable resources. Efficient training techniques = more powerful LLMs for the same resources.

Full summary here.

submitted by /u/Successful-Western27
[link] [comments]