Found 14 repositories(showing 14)
aorwall
No description available
epoch-research
Docker image registry for SWE-bench, created by Epoch AI.
SubhanshuMG
Hard-level DevOps/SWE terminal task built using the Terminal Bench 2.0 framework. This scenario requires deep reasoning across Linux systems, dependency resolution, runtime debugging, and environment reconstruction inside a Dockerized setup. Designed to evaluate problem-solving ability beyond surface-level file manipulation.
JetBrains
End-to-end TeamCity framework to run AI agents on SWE-Bench Lite. Spin up isolated Docker images per task, extract patches, score with the official harness, and aggregate success rates. As an example, we'll look at Junie and Google Gemini CLI
logic-star-ai
Heavily compressed docker images for SWE Bench Verified
islem-esi
Create, customize, and manage SWE-Bench containers
greynewell
One-command SWE-bench eval harness in Go. Native ARM64 containers with 6.3x test runner speedup on Apple Silicon and AWS Graviton. Pre-built images on Docker Hub.
riverLaugh
No description available
john-b-yang
No description available
No description available
sauravpanda
Utilities for running SWE-Bench with pre-built Docker containers locally
vasu-bhai
SWE-bench style coding agent evaluation framework using Modal sandboxes and Docker
tordukhanov
Automated validation system for SWE-bench data points using Docker-based evaluation
hertera1
Utilities for running SWE-Bench with pre-built Docker containers and Modal
All 14 repositories loaded