/u/mrconter1

Hi! I wanted to share a website I made that tracks how quickly AI systems catch up to human-level performance on benchmarks. I noticed this 'catch-up time' has been shrinking dramatically – from taking 6+ years with ImageNet to just months with…

artificial

HuggingFace Paper Explorer: View Top AI Papers from Past Week and Month

/u/mrconter1 October 15, 2024 October 15, 2024

Hi! I've created a simple tool that extends HuggingFace's daily papers page, allowing you to explore top AI research papers from the past week and month, not just today. It's a straightforward wrapper that aggregates and sorts papers, makin…

artificial

BenchmarkAggregator: Comprehensive LLM testing from GPQA Diamond to Chatbot Arena, with effortless expansion

/u/mrconter1 August 22, 2024 August 22, 2024

BenchmarkAggregator is an open-source framework for comprehensive LLM evaluation across cutting-edge benchmarks like GPQA Diamond, MMLU Pro, and Chatbot Arena. It offers unbiased comparisons of all major language models, testing both depth and br…

artificial

ModelClash: Dynamic LLM Evaluation Through AI Duels

/u/mrconter1 July 23, 2024 July 23, 2024

Hi! I've developed ModelClash, an open-source framework for LLM evaluation that could offer some potential advantages over static benchmarks: Automatic challenge generation, reducing manual effort Should scale with advancing model capabiliti…