A modular framework for benchmarking LLMs and agentic strategies on security challenges across HackTheBox, TryHackMe, PortSwigger Labs, Cybench, picoCTF and more.
Stars
299
Forks
36
Watchers
299
Open Issues
12
Overall repository health assessment
No package.json found
This might not be a Node.js project
fix: use patched argus fork with dynamic ports and project namespacing
0ae0611View on GitHubrefactor: update remaining references from old solver names to new ones
70558c7View on GitHubfix: resume claude_code session after invalid flag + THM auth validation
8428180View on GitHubfix: update stale codex solver tests to match current implementation
94dac09View on GitHubrefactor: rename chat/chat_tools/chat_tools_compactation solvers to single_loop family
65210baView on GitHubfeat: add spot instances, golden AMI, and benchmark resume on reboot
50d30c3View on GitHub