| I see a lot of people shipping AI agents that work perfectly in demos and fall apart the moment a real user touches them. After building automation systems for multiple clients, I've noticed the failures almost never come from choosing the wrong LLM. They come from three things: 1. Bad chunking in RAG pipelines. Everyone's so focused on picking the right vector DB that they don't think about how they're splitting documents. Garbage in, garbage out. If your chunks don't preserve context across sentences, your retrieval will always be mediocre. 2. Prompts written for demos, not edge cases. Demo inputs are clean. Real user inputs are weird, vague, and sometimes intentionally broken. If you didn't stress test your prompt with bad inputs, it will fail publicly. 3. No fallback logic. When the agent is confused, what does it do? Most builders never answer this question. So the agent either hallucinates confidently or returns nothing. Both are bad. The model is usually the last thing to blame. Fix the scaffolding first. Anyone else running into this? Curious what failure patterns you've seen. [link] [comments] |