Found 11 repositories(showing 11)
evaleval
Every Eval Ever is a shared schema and crowdsourced eval database. It defines a standardized metadata format for storing AI evaluation results — from leaderboard scrapes and research papers to local evaluation runs — so that results from different frameworks can be compared, reproduced, and reused.
datenlabor-bmz
Tracking language proficiency of AI models for every language
hb-evalSystem
HB-Eval is the largest independent behavioral reliability validation study ever conducted for agentic AI systems. Through 14,000 experiments across 14 architecturally distinct models, 5 safety-critical domains, and 3 independent methodologies, we demonstrate that every evaluated model — including state-of-the-art commercial systems — exhibits struc
dlab-projects
Paper repository for "Normative Evaluation of Large Language Models with Everyday Moral Dilemmas" by Sachdeva & van Nuenen.
Harshitha-reddy88
No description available
sfouziya123
No description available
SubAtomicManiac
No description available
sxhoio
No description available
yananlong
No description available
FranciszekW
Evaluation of the performance of a ranking model in Search Everywhere.
No description available
All 11 repositories loaded