<span class="vcard">/u/Efistoffeles</span>
/u/Efistoffeles

Re-evaluating MedQA: Why Current Benchmarks Overstate AI Diagnostic Skills

I recently ran a research and an evaluation of top LLMs on the MedQA dataset (Vals.ai, 09 May 2025). Normally these tests are multiple-choice questions plus five answer choices (A–E). They show the following: – o1 96.5 %, – o3 96.1 %, – o4 Mini 9…

Prompt checker for enhancing I created with Claude in 12 hours.

submitted by /u/Efistoffeles [link] [comments]

AI Users! This is how to export all your AI Chats to store them in a JSON file locally!

submitted by /u/Efistoffeles [link] [comments]

Google released this video yesterday. As much as the show was a joke, this is actually amazing.

submitted by /u/Efistoffeles [link] [comments]

Microsoft announced Copilot+ and this is what it does… in Minecraft.

submitted by /u/Efistoffeles [link] [comments]

Copilot has suddently a limit of 5 messages per Chat…

submitted by /u/Efistoffeles [link] [comments]

Connecting any version of GPT straight to Gemini is now possible.

submitted by /u/Efistoffeles [link] [comments]

Freepik just released their collaboration with Magnific. It can create endless zoom-in images and it’s looking Crazy!

submitted by /u/Efistoffeles [link] [comments]