<span class="vcard">/u/mrconter1</span>
/u/mrconter1

H-Matched: A website tracking shrinking gap between AI and human performance

Hi! I wanted to share a website I made that tracks how quickly AI systems catch up to human-level performance on benchmarks. I noticed this 'catch-up time' has been shrinking dramatically – from taking 6+ years with ImageNet to just months with…

HuggingFace Paper Explorer: View Top AI Papers from Past Week and Month

Hi! I've created a simple tool that extends HuggingFace's daily papers page, allowing you to explore top AI research papers from the past week and month, not just today. It's a straightforward wrapper that aggregates and sorts papers, makin…

BenchmarkAggregator: Comprehensive LLM testing from GPQA Diamond to Chatbot Arena, with effortless expansion

BenchmarkAggregator is an open-source framework for comprehensive LLM evaluation across cutting-edge benchmarks like GPQA Diamond, MMLU Pro, and Chatbot Arena. It offers unbiased comparisons of all major language models, testing both depth and br…