Opus 4.8 ARC-AGI-3 Replay
Opus 4.8 ARC-AGI-3 Replay

Opus 4.8 ARC-AGI-3 Replay

https://reddit.com/link/1ty3xhz/video/dzede49lhk5h1/player

Link to the replay.

What are everyone’s thoughts on this?

I know the benchmark has gotten a lot of criticism for being “too difficult” from a scoring perspective, but after watching the replay, it honestly looks like the models just aren’t that close to solving it yet.

I’m not saying the benchmark is perfect, but the failures don’t really look like minor scoring issues. They look more like the model still doesn’t understand the task well enough to complete it reliably.

submitted by /u/ClickedMoss5
[link] [comments]