Chat GPT 4o just got some kind of upgrade. It’s now the first model I’ve ever tested to pass the 4×4 grid test, the test that has brought every other model to its knees.
Chat GPT 4o just got some kind of upgrade. It’s now the first model I’ve ever tested to pass the 4×4 grid test, the test that has brought every other model to its knees.

Chat GPT 4o just got some kind of upgrade. It’s now the first model I’ve ever tested to pass the 4×4 grid test, the test that has brought every other model to its knees.

The raw conversation, zero shot:

https://ibb.co/JQWC2XJ

https://ibb.co/b54j4d3

https://ibb.co/JQbfmpt

https://ibb.co/2Wv7tHs

In short, the challenge is for the AI to create a 4x4 alphanumeric grid that is filled with interesting relationships and secrets and creative references buried inside of it. It's a pretty intense challenge that every model has failed spectacularly up until now. Most fail to follow the basic instructions and their grids aren't alphanumeric and they include all manner of symbols in them, even when repeatedly asked not to. For those that do manage to finally create a grid (including ChatGPT before tonight in previous tests) they end up hallucinating all sorts of things about the grid they just created. They'll claim numbers are there which aren't, etc.

So my standards for a basic 'pass' are that the AI creates a grid on the first try that satisfies the requirements (4x4 alphanumeric) and then can at least coherently explain the truthful contents and relationships inside the grid.

I'm aware that the grid above is not terribly majestic or complex, but it does have some relationships inside of it and they are explained accurately by the AI in the very same output.

This is remarkable, and no AI has been able to do this before tonight. I'm simply stunned.

submitted by /u/katiecharm
[link] [comments]