TLDR: New training approach enables smaller AI models to achieve state-of-the-art translation performance
Large AI models like GPT-3 have good performance on translation tasks, but some smaller models struggle.
Researchers from Johns Hopkins and Microsoft propose a new 2-stage fine-tuning method called ALMA that unlocks stronger translation abilities in smaller models with just 7-13 billion parameters.
How it works:
- Fine-tune on monolingual data in non-English languages to improve comprehension
- Further fine-tune on small sets of high-quality human-translated parallel text
The authors claim this achieves SOTA-level translation using far less data and compute than conventional methods:
- Matches performance of 175B parameter GPT-3 and 54B parameter NLLB with only 7-13B parameters
- Reaches NLLB-level quality with just 1 billion monolingual tokens and 18 hours of training
I think this shows that smaller models can reach SOTA translation with specialized fine-tuning, so we may not need endlessly bigger datasets and models to get better performance. Looks like deliberate tuning targeting key language skills could be more important.
[link] [comments]