Python framework to run the ADeLe benchmark at scale against models deployed in Azure AI Foundry, and to evaluate outputs using multiple LLM judges.
Stars
2
Forks
1
Watchers
2
Open Issues
0
Overall repository health assessment
No package.json found
This might not be a Node.js project
33
commits
Added --force-run command to inference and judging to avoid idempotent run absed on dedup index
9a30e60View on GitHubImplemented automatic reanudation of pending batch jobs for run-judge
83d4c25View on GitHubImplemented independent validate_config for inference and judging
a9069f0View on GitHubImplemented provider-agnostic concurrency and batch budgeting
82ccd4cView on GitHub