<span class="vcard">/u/mrconter1</span>
/u/mrconter1

BenchmarkAggregator: Comprehensive LLM testing from GPQA Diamond to Chatbot Arena, with effortless expansion

BenchmarkAggregator is an open-source framework for comprehensive LLM evaluation across cutting-edge benchmarks like GPQA Diamond, MMLU Pro, and Chatbot Arena. It offers unbiased comparisons of all major language models, testing both depth and br…