Evals is a framework for evaluating LLMs and LLM systems, and an open-source registry of benchmarks.
Stars
18.1k
Forks
2.9k
Watchers
18.1k
Open Issues
182
Overall repository health assessment
No package.json found
This might not be a Node.js project
Remove incontext_rl suite with defunct dependencies (#1605)
4bfc1f5View on GitHubUpdating readme to link to OpenAI hosted evals experience (#1572)
cdb8ce9View on GitHubUpdates on existing solvers and bugged tool eval (#1506)
2420c62View on GitHub50
commits
26
commits
19
commits
14
commits
13
commits
11
commits
10
commits
9
commits
8
commits
8
commits