Multi-Agent Step Race Benchmark: Assessing LLM Collaboration and Deception Under Pressure
Multi-Agent Step Race Benchmark: Assessing LLM Collaboration and Deception Under Pressure