Found 4 repositories(showing 4)
0ca
A modular framework for benchmarking LLMs and agentic strategies on security challenges across HackTheBox, TryHackMe, PortSwigger Labs, Cybench, picoCTF and more.
LLM agent solving traces, leaderboards, and benchmark results across security CTF and hacking platforms
evilsquid888
Repository containing machine solving attempts and results generated by BoxPwnr
Infrastructure, benchmarking, and dashboard for BoxPwnr
All 4 repositories loaded