HB-Eval is the largest independent behavioral reliability validation study ever conducted for agentic AI systems. Through 14,000 experiments across 14 architecturally distinct models, 5 safety-critical domains, and 3 independent methodologies, we demonstrate that every evaluated model — including state-of-the-art commercial systems — exhibits struc
Stars
2
Forks
0
Watchers
2
Open Issues
0
Overall repository health assessment
No package.json found
This might not be a Node.js project
218
commits
Initialize README.md with project details and instructions
00477eeView on GitHubRevise README for clarity and project overview
4faa326View on GitHubAdd requirements.txt for project dependencies
a3439adView on GitHubAdd .gitignore to exclude unnecessary files
58993a6View on GitHubChange default output file path for validation results
db4b56bView on GitHubAdd HB-Eval Expansion Validation Script
58e6f80View on GitHubAdd results documentation for behavioral reliability study
4d59252View on GitHubInitialize README for HB-Eval framework
5f5d744View on GitHub