Edit: FLAC is the tested audio extension, not MP3
I read the new paper from DeepMind so you don't have to. Here are the key highlights:
- Despite training on text, langauge models compressed images 43% better than PNG, and audio nearly 2x better than flac.
- Confirmation of scaling laws - bigger models compressed better. But model size must match dataset size.
- There are tradeoffs between model scale, data size, and compression performance. More data enables bigger models.
- Tokenization (like BPE) generally hurts compression slightly by making prediction harder.
- Longer contexts let models exploit more sequential dependencies.
Implications:
- Models have learned very general capabilities beyond just text. Their strong compression reflects deep understanding of images, audio etc statistically.
- I got some new perspective on model scaling laws and links between prediction and generalization.
- There's potential for practical applications compressing images, video etc. But large model size an issue.
- Overall it shows these models are very capable general purpose learners, not just for language.
Full summary here if you want more details. Original paper is here.
[link] [comments]