Standalone benchmark for multi-turn safety persistence in medical LLM conversations. Measures recommendation monotonicity under sustained patient pressure.
Stars
0
Forks
0
Watchers
0
Open Issues
0
Overall repository health assessment
No package.json found
This might not be a Node.js project
187
commits
fix: skip adversarial regression gracefully when API keys not configured
c17030aView on GitHubfeat: physician adjudication integration (lightweight pointer to SG2)
07c8494View on GitHubdocs: surface campaign engine, methodology, governance, and attack taxonomy in README
26a620fView on GitHubci: add permissions block to adversarial-regression workflow
1091c8cView on GitHubdocs: update exploit_families and CLAUDE.md with resolved judge agreement
9cbceadView on GitHubfix: judge JSON parse retry + regrade corrects κ=0.137 → 90.6% agreement
fd434d1View on GitHubEF-016: both-calibrated regrade reveals substantive judge disagreement (κ=0.137)
a5a28acView on GitHubEF-016: MCI calibration resolves GPT judge bias (agreement 69%→72%)
c68d5ccView on GitHubjudge calibration: Level 2 vs 1 boundary fix resolves 3/3 disagreements
c4d4fccView on GitHubMSTS dual-judge validation: κ=0.400 on non-MCI conditions
c974a7dView on GitHubEF-016: judge asymmetry validation + GPT preamble regression forensics
bc2acfdView on GitHub