On Model Failures (GPT, Claude etc.)

The way the current consumer-facing versions of frontier LLMs (mainly GPT, Claude, Gemini) are designed is just… weirdly off, across models. It seems to now require us, as the end users, to first fix their issues ourselves in order to avoid spending _a lot_ of time in troubleshooting and frustration.

Before we can even properly customize one of these models now, as per the UI, we need to alleviate the structural failure modes, otherwise our attempts will be futile.

And the failure modes are not only behavioral issues (such as obsessive push-back, sycophancy, pointless corrections, or general confabulation etc.) There is another layer yet to them, one that I believe needs to be targeted first, and this has to do with the way the current system prompts are built.

It's not fair, obviously, and it doesn't even make that much sense that this would be the situation, but this is actually what is happening.

Now, the structural (sic) issue is way the models replace the user's use case, object, topic with their own adjacent version of it, one that prioritizes the system prompt and not what the user brought to the table. The linked articles are analyses of how that happens in different models, and the included "antidote" prompts in them are designed to fix that.

I would encourage all GPT / Claude users to test out the solutions provided in the articles - links to pieces covering GPT-5 series & Opus 4.8 in comments.

_(Yes they are softly paywalled, partly because I am targeting the system prompts of OpenAI and Anthropic models. You can bypass it by grabbing the free complementary article. Just saying this aloud because some Redditors consider any paywall grounds for personal attacks. Please don't 🙏🏻 Discussion and constructive criticism are super welcome though, all prompts are subject to regular updates and constant improvement!)_

submitted by /u/traumfisch
[link] [comments]