<span class="vcard">/u/zero0_one1</span>
/u/zero0_one1

A multi-player tournament that tests LLMs in social reasoning, strategy, and deception. Players engage in public and private conversations, form alliances, and vote to eliminate each other round by round until only 2 remain. A jury of eliminated players then casts deciding votes to crown the winner.

submitted by /u/zero0_one1 [link] [comments]

Which LLMs are greedy and which are generous? In the public goods game, players donate tokens to a shared fund that gets multiplied and split equally, but each can profit by free-riding on others.

submitted by /u/zero0_one1 [link] [comments]

LLM Confabulation (Hallucination) Benchmark: DeepSeek R1, o1, o3-mini (medium reasoning effort), DeepSeek-V3, Gemini 2.0 Flash Thinking Exp 01-21, Qwen 2.5 Max, Microsoft Phi-4, Amazon Nova Pro, Mistral Small 3, MiniMax-Text-01 added

submitted by /u/zero0_one1 [link] [comments]