<span class="vcard">/u/coolandy00</span>
/u/coolandy00

Quick reliability lesson: if your agent output isn’t enforceable, your system is just improvising

I used to think “better prompt” would fix everything. Then I watched my system break because the agent returned: Sure! { "route": "PLAN", } So now I treat agent outputs like API responses: Strict JSON only (no “helpful” prose) Exac…

Using a Christmas-themed use case to think through agent design 🎄😊

Since it’s Christmas, I ended up thinking through a Christmas-themed use case, mostly as a way to explain how I approach agent design beyond the foundational layer. The theme itself doesn’t matter much. It just gives you a nice mix of: vague, emotiona…

AI work feels hard because we keep redoing the same setup

Something I don’t see talked about enough: How much time AI builders spend repeating setup work. Every project: – Pull data – Clean it – Structure it – Validate outputs – Fix edge cases – Re-run when something changes None of this is the interesting pa…

The unsexy part of AI apps: glue work that breaks everything (and how we stopped it)

I used to think building an AI feature was mostly model choice + prompts. Then we shipped one. What went wrong: The assistant started giving different answers to the same questions. We didn’t change the model. We didn’t change the UI. It looked like th…

What I learned building and debugging a RAG + agent workflow stack

After building RAG + multi-step agent systems, three lessons stood out: Good ingestion determines everything downstream. If extraction isn’t deterministic, nothing else is. Verification is non-negotiable. Without schema/citation checking, errors sprea…

Adding verification nodes made our agent system way more stable

In our multi-step workflow where each step depended on the previous one’s output, problems we observed were silent errors: malformed JSON, missing fields, incorrect assumptions, etc. We added verification nodes between steps: check structure check sch…

We found badly defined tool contracts to cause unkown AI behavior

We were debugging a workflow where several steps were orchestrated by an AI agent. At first glance, the failures looked like reasoning errors. But the more we investigated, the clearer the pattern became: The tools themselves were unreliable. Examples:…

We found our agent workflow failures were architecture bugs

We were debugging a pretty complex automation pipeline and kept blaming the model for inconsistent behavior. Turns out… the model wasn’t the problem. The actual failure points were architectural: Tasks weren’t specific enough -> different agents in…

For agent systems, which metrics give you the clearest signal during evaluation

When evaluating an agent system that changes its behavior as tools and planning steps evolve, it can be hard to choose metrics that actually explain what went wrong. We tried several complex scoring schemes before realizing that a simple grouping works…

How do you handle JSON validation for evolving agent systems during evaluation?

Agent systems change shape as you adjust tools, add reasoning steps, or rewrite planners. One challenge I ran into is that the JSON output shifts while the evaluation script expects a fixed structure. A small structural drift in the output can make an …