[I read the paper for you] LLMs compress images 43% better than PNG, and audio nearly 2x better than MP3
[I read the paper for you] LLMs compress images 43% better than PNG, and audio nearly 2x better than MP3

[I read the paper for you] LLMs compress images 43% better than PNG, and audio nearly 2x better than MP3

Edit: FLAC is the tested audio extension, not MP3

I read the new paper from DeepMind so you don't have to. Here are the key highlights:

  • Despite training on text, langauge models compressed images 43% better than PNG, and audio nearly 2x better than flac.
  • Confirmation of scaling laws - bigger models compressed better. But model size must match dataset size.
  • There are tradeoffs between model scale, data size, and compression performance. More data enables bigger models.
  • Tokenization (like BPE) generally hurts compression slightly by making prediction harder.
  • Longer contexts let models exploit more sequential dependencies.

Implications:

  • Models have learned very general capabilities beyond just text. Their strong compression reflects deep understanding of images, audio etc statistically.
  • I got some new perspective on model scaling laws and links between prediction and generalization.
  • There's potential for practical applications compressing images, video etc. But large model size an issue.
  • Overall it shows these models are very capable general purpose learners, not just for language.

Full summary here if you want more details. Original paper is here.

submitted by /u/Successful-Western27
[link] [comments]