I’ve been running blind reviews between AI models for six months. here’s what I didn’t expect
context: I've been building a system that sends the same question to multiple models in parallel, then has each model review the others. six months, a few thousand sessions, mostly legal and financial questions the design decision I agonized over t…