Regression testing for AI agents. Snapshot behavior,diff tool calls,catch regressions in CI. Works with LangGraph, CrewAI, OpenAI, Anthropic.
Stars
80
Forks
17
Watchers
80
Open Issues
14
Overall repository health assessment
^11.1.0^2.3.4^4.28.0^5.3.0^8.0.1^20.11.16^5.3.3^1.2.2660
commits
27
commits
6
commits
3
commits
2
commits
2
commits
1
commits
1
commits
1
commits
1
commits
fix: use List[Any] for OpenAI messages to satisfy mypy
f2706a1View on GitHubfeat: add model comparison API (run_eval, compare_models, score)
12f6844View on GitHubfix: reconfigure stdout/stderr to UTF-8 on Windows to prevent UnicodeEncodeError
9f92e98View on GitHubfix(dogfood): add --no-judge to regression step, surface agent/snapshot logs
9837f9cView on GitHubfix: use _load_config_if_exists in cloud push token resolution
ca5a823View on GitHubfeat: add cloud push module for CLI-to-cloud result upload
ca140f7View on GitHubMerge pull request #147 from zeel2104/feat/discord-monitor-webhook
27425a7View on GitHubfix: resolve daily dogfood failures (max_tokens + missing baselines)
78a99e2View on GitHubfix: stable JSON contract for run_check subprocess path
e9600c7View on GitHub