LLMs can beat Balatro!

LLMs can beat base-difficulty Balatro when passed the game state as JSON, with win rate matching human players. Admittedly getting a stringified version of the game is a big boost over having to learn to navigate the game UI with clicks, but aside from this the agents acted with no additional help in the form of specifically coded harnesses, loops, prompting, or hand written strategy.

I tested on models from Anthropic, OpenAI, and Google, and there was a big surprise in which model performed the best. Spoiler: it was not the biggest or most expensive one.

submitted by /u/mlemlemleeeem
[link] [comments]