Per-turn behavioral analysis framework for Claude Agent SDK — instruments real agentic sessions, scores each turn with LLM-as-Judge, identifies decision quality degradation patterns
Stars
0
Forks
0
Watchers
0
Open Issues
0
Overall repository health assessment
No package.json found
This might not be a Node.js project
25
commits
feat(pipeline): add session mode, fix task definitions, improve runner
6b515daView on GitHubfeat(analyzer): improve phase detection, scoring guards, and plot generation
50507ffView on GitHubfeat(instrumenter): integrate Claude Agent SDK hooks for session recording
4b1a7b9View on GitHubchore: add session fields to types, update gitignore and deps
ec6d832View on GitHubdocs: update progress with session handoff and next steps guide
63f0140View on GitHubdocs: expand future work with real-time drift detection roadmap
43eb70aView on GitHubchore: polish — ruff lint/format, mypy strict, update package exports
9444256View on GitHubdocs: add research writeup with methodology and placeholder results
0cf69beView on GitHubdocs: add project README with architecture, methodology, and CLI reference
000e6b5View on GitHubfix: resolve triple-quote collision in hard_debug.py setup_script
82b2be9View on GitHub