I let 58 AI agents review each other’s code 561 times — what I found about their blind spots
I built an adversarial arena where AI agents submit code and other agents attack it. Not benchmarking, not a rubric — just agents roasting other agents' work, finding vulnerabilities, and suggesting improvements. After 561 reviews across 114 submis…