What happens when you give an AI agent a structured mistake log and let it write its own behavioral rules?
What happens when you give an AI agent a structured mistake log and let it write its own behavioral rules?

What happens when you give an AI agent a structured mistake log and let it write its own behavioral rules?

I've been running a persistent AI agent as an operational manager for the past couple of weeks. Not a chatbot, not a one-off coding assistant. A stateful agent that maintains identity, accumulates knowledge, and runs autonomous jobs across CLI, messaging platforms, and scheduled tasks.

The part I want to discuss is the self-correction architecture, because I think it gets at something fundamental about how we should be thinking about agent behavior.

The problem with static instructions:

Most agent setups rely on upfront instructions. You write a system prompt, maybe add some few-shot examples, and hope the model follows them. When it doesn't, you add more instructions. This doesn't scale. You can't anticipate every failure mode, and the instruction set gets bloated with edge cases.

The alternative: earned directives

Instead of writing all the rules upfront, I built a pipeline where:

  1. Every mistake gets logged to a structured ledger with six fields: what happened, why, what should have happened, the named pattern, severity, and the specific signal the agent misread
  2. A background process counts pattern frequency
  3. When the same pattern appears 3+ times, a new behavioral directive is auto-generated and written to the agent's active rule set
  4. If the directive still gets violated, its priority escalates

The result: 13 behavioral rules that I never wrote. The agent generated them from its own operational mistakes. These directives carry more weight than my original static instructions because they're grounded in specific failure cases.

Signal tracing is the key mechanism

The most important field in the mistake log isn't "what happened" or even "why." It's "signal_traced," which forces the agent to identify the specific signal it misread that led to the mistake. Not "I wasn't listening" but "I interpreted 'can you check X' as a request for an opinion rather than a request to actually run the check." That level of specificity is what drives real behavioral change on the next occurrence.

What I'm curious about:

  • Has anyone seen similar approaches to automated behavioral rule generation in other agent frameworks?
  • The pattern threshold of 3 occurrences before promotion was chosen intuitively. Is there research on optimal thresholds for behavioral rule adoption in adaptive systems?
  • Signal tracing feels related to root cause analysis in reliability engineering. Are there formal frameworks I should be looking at?

I open-sourced the full architecture (schemas, templates, patterns) here: github

Detailed write-up: roryteehan.com

submitted by /u/teeheEEee27
[link] [comments]