An in-the-wild benchmark for AI agents in the OpenClaw Environment.
Stars
249
Forks
17
Watchers
249
Open Issues
1
Overall repository health assessment
No package.json found
This might not be a Node.js project
fix: add two additional ground truth paths for google scholar search task
ef1e125View on GitHubUse per-run unique ID to prevent parallel run collisions
e50cf51View on GitHubrefactor: extract hardcoded judge model to JUDGE_MODEL env var
745a8deView on GitHub