Verification and governance layer for AI coding agents. Parallel orchestration with evidence-based quality gates for Copilot, Claude Code, and Codex.
Stars
74
Forks
1
Watchers
74
Open Issues
0
Overall repository health assessment
^4.4.1^5.0.0^4.1.1^18.2.0^4.0.9^10.0.10^25.0.10^18.2.79^11.7.5^4.21.0^5.9.32
commits
add benchmark 7: Claude Code CLI tool (logwatch), narrowest gap at 30/50 vs 35/50
7652ffaView on GitHubfix verifier: retry with auto-discovery when node --test dir fails on ESM projects
f3cb89fView on GitHubstandardize benchmark 2 grading to match attribute-level Yes/No format used by all other benchmarks
9edffc8View on GitHubmove benchmarks to docs/benchmarks.md, keep summary table in README
8f42dffView on GitHubadd benchmark 6: Codex backend API comparison (14/48 vs 46/48)
2675bdcView on GitHubadd --dir alias for --target flag, log target directory during execution
77f668bView on GitHubscope quality gates to agent-changed files via baseline snapshot
34fd9efView on GitHubclean up all worktrees before quality gates, not just merged ones
6603543View on GitHubfix accessibility gate false positive on decimal outline values like 0.1875rem
b9c977cView on GitHubadd Copilot CLI benchmark: markdown note-taking app, 3/30 vs 30/30
e5e1d3fView on GitHubfix merge failures from untracked files, display bug, copilot adapter env
83aedf2View on GitHubv4.1.0: web-app quality parity, accessibility gate expansion, orchestrator cleanup
44eaeddView on GitHubremove swarm-generated demo artifacts from tracked files
3b07a02View on GitHub