artificial
artificial

Re-evaluating MedQA: Why Current Benchmarks Overstate AI Diagnostic Skills

I recently ran a research and an evaluation of top LLMs on the MedQA dataset (Vals.ai, 09 May 2025). Normally these tests are multiple-choice questions plus five answer choices (A–E). They show the following: – o1 96.5 %, – o3 96.1 %, – o4 Mini 9…

AI finally did something useful: made our cold emails feel human

Not sure if anyone else has felt this, but most AI sales tools today feel… off. We tested a bunch, and it always ended the same way: robotic follow-ups, missed context, and prospects ghosting harder than ever. So we built something different. Not an …

Ludus AI created entire game in Unreal Engine

Found out that people are making entire games in UE using Ludus AI agent, and documenting the process. Credit: rafalobrebski on youtube submitted by /u/SmalecMoimBogiem [link] [comments]

One-Minute Daily AI News 5/11/2025

SoundCloud changes policies to allow AI training on user content.[1] OpenAI agrees to buy Windsurf for about $3 billion, Bloomberg News reports.[2] Amazon offers peek at new human jobs in an AI bot world.[3] Visual Studio Code beefs up AI coding featu…

Gemini can identify sounds. This skill is new to me.

It's not perfect, but it does a pretty good job. I've been running around testing it on different things. Here's what I've found that it can recognize so far: -Clanging a knife against a metal french press coffee maker. It called it a m…

I emailed OpenAI about self-referential memory entries and the conversation led to a discussion on consciousness and ethical responsibility.

Note: When I wrote the reply on Friday night, I was honestly very tired and wanted to just finish it so there were mistakes in some references I didn't crosscheck before sending it the next day but the statements are true, it's just that …

I made a tool that turns Ambition into Income

Hey people, so I've been seeing so many people getting stuck at ideation phases, And so many people who are inherently ambitious but don't exactly know what to do with all of their fire, people who wish to take control of their lives and …

Where does most AI/LLM happen? Reddit? Twitter?

I'm trying to monitor the best sources for AI news. It seems to me most of this is happening on Twitter and Reddit. Would you agree? Am I missing somewhere? submitted by /u/brainhack3r [link] [comments]

mlop: An Fully OSS alternative to wandb

Hey guys, just launched a fully open source alternative to wandb called mlop.ai, that is performant and secure (yes our backend is in rust). Its fully compatible with the wandb API so migration is just a one line change. WandB has pretty bad performanc…