i've been throwing the same hard question at a few different models (or the same models) for a while now and honestly i've stopped caring where they agree. when they all land on the same answer it usually just means the question was easy, or they all grabbed the same standard take from overlapping training data, so agreement is often just a shared blind spot. the useful part is always where one of them breaks from the pack, that gap tends to be land right on the thing i was glossing over. i got nerdy enough about this that i built a little private setup on my own machine i call multi-claude, basically several claude sessions running at once so i can watch them diverge instead of collapsing it all into one tidy answer. this is not a promo. its private and not available to others. the part i couldn't cleanly crack is telling real disagreement (genuinely different reasoning) apart from noise (a model just being randomly inconsistent). 6 months ago i think i finally figured it out. i'm building an ios app that automates the process i validated. its been pretty fun!
[link] [comments]