<span class="vcard">/u/Efistoffeles</span>
/u/Efistoffeles

Re-evaluating MedQA: Why Current Benchmarks Overstate AI Diagnostic Skills

I recently ran a research and an evaluation of top LLMs on the MedQA dataset (Vals.ai, 09 May 2025). Normally these tests are multiple-choice questions plus five answer choices (A–E). They show the following: – o1 96.5 %, – o3 96.1 %, – o4 Mini 9…

AI Users! This is how to export all your AI Chats to store them in a JSON file locally!

submitted by /u/Efistoffeles [link] [comments]

Google released this video yesterday. As much as the show was a joke, this is actually amazing.

submitted by /u/Efistoffeles [link] [comments]

Freepik just released their collaboration with Magnific. It can create endless zoom-in images and it’s looking Crazy!

submitted by /u/Efistoffeles [link] [comments]