[I read the paper for you] LLMs compress images 43% better than PNG, and audio nearly 2x better than MP3

Edit: FLAC is the tested audio extension, not MP3

I read the new paper from DeepMind so you don't have to. Here are the key highlights:

Despite training on text, langauge models compressed images 43% better than PNG, and audio nearly 2x better than flac.
Confirmation of scaling laws - bigger models compressed better. But model size must match dataset size.
There are tradeoffs between model scale, data size, and compression performance. More data enables bigger models.
Tokenization (like BPE) generally hurts compression slightly by making prediction harder.
Longer contexts let models exploit more sequential dependencies.

Implications:

Models have learned very general capabilities beyond just text. Their strong compression reflects deep understanding of images, audio etc statistically.
I got some new perspective on model scaling laws and links between prediction and generalization.
There's potential for practical applications compressing images, video etc. But large model size an issue.
Overall it shows these models are very capable general purpose learners, not just for language.