more models more better. one expensive model is losing to three cheap ones, and there’s a paper on it
ok this one bugged me. there's a mixture-of-agents paper (arxiv 2406.04692) where a stack of open models, none of them frontier, get layered into a committee and beat gpt-4o on alpacaeval 2.0, 65.1 to 57.5. cheaper parts, better result. and it line…