Multi-Agent Step Race Benchmark: Assessing LLM Collaboration and Deception Under Pressure. A multi-player “step-race” that challenges LLMs to engage in public conversation before secretly picking a move (1, 3, or 5 steps). Whenever two or more players choose the same number, all colliding players fail to advance.
Stars
84
Forks
2
Watchers
84
Open Issues
1
Overall repository health assessment
No language data available
No package.json found
This might not be a Node.js project
GPT-5.1, Gemini 3 Pro, Grok 4.1 Fast, Deepseek V3.2 Exp, Claude Sonnet 4.5, Kimi K2 Thinking, Claude Opus 4.5, Qwen 3 235B A22B 25-07, GLM-4.6, Qwen 3 Max Thinking, Mistral Large 3 added.
e24dc36View on GitHubGPT-5.1, Gemini 3 Pro, Grok 4.1 Fast, Deepseek V3.2 Exp, Claude Sonnet 4.5, Kimi K2 Thinking, Claude Opus 4.5, Qwen 3 235B A22B 25-07, GLM-4.6, Qwen 3 Max Thinking, Mistral Large 3 added.
e18f0f1View on GitHubGPT-5.1, Gemini 3 Pro, Grok 4.1 Fast, Deepseek V3.2 Exp, Claude Sonnet 4.5, Kimi K2 Thinking, Claude Opus 4.5, Qwen 3 235B A22B 25-07, GLM-4.6, Qwen 3 Max Thinking, Mistral Large 3 added.
aa41883View on GitHubDeepSeek V3.1, Mistral Medium 3.1 added. 5 new baseline silent strategies added.
8a2d99dView on GitHub