What evidence should AI coding agents leave before saying “done”?

I’m the maker of Superloopy, a small MIT-licensed workflow layer for Codex and Claude Code.

I built it around a problem I kept running into with coding agents: after a long task, the final answer often sounds confident, but the human still has to reconstruct what was actually checked.

The pattern I’m trying is an evidence gate before the agent can call work done:

- define acceptance criteria up front

- route specialized work through skills/subagents when useful

- run command-backed checks where possible

- save logs, screenshots, review notes, research notes, or other artifacts under `.superloopy/evidence/`

- separate deterministic checks from manual/visual judgment

- finish with a report that points to the actual evidence

The strongest part is the command-backed gate: if a criterion has a command, Superloopy re-runs it in-process at completion, so a stale or fabricated “passed” claim should not reach the final report. Manual/visual checks still need human review, but they are called out separately instead of being mixed into a blanket “done.”

Repo:

https://github.com/beefiker/superloopy

For people using AI coding agents: what proof do you actually want before trusting “done”? Tests/lint are obvious, but I’m curious about screenshots, visual diffs, browser traces, security scans, design checklists, or explicit “manual judgment required” sections.

submitted by /u/Simple_Somewhere7662
[link] [comments]