AgentKernelArena provides an end-to-end siloed-benchmarking environment where different LLM-powered agents—such as Cursor Agent, Claude Code, Codex, SWE-agent, and GEAK—can be evaluated side-by-side on the same GPU kernel tasks, using objective and reproducible metrics.
Stars
13
Forks
3
Watchers
13
Open Issues
13
Overall repository health assessment
No package.json found
This might not be a Node.js project
35
commits
22
commits
7
commits
1
commits
1
commits
1
commits
support bytes_per_second_gs in benchmark JSON for speedup, increase default timeouts and add performance_timeout for benchmarks
e1f2ec2View on GitHubAdd repository task type and repository_language prompts, Add post_clone_install on rocPRIM tasks config, run optional post_clone_install after repository clone
ef5a73cView on GitHubMerge pull request #25 from sharareh-y/sharareh/timestamped-reports-with-comparison
5e571f6View on GitHubUpdate config with 60 reperesentative tasks for benchmarking
3d3dd50View on GitHub