Search Results

Found 11 repositories(showing 11)

every_eval_ever

evaleval

🧡55

Every Eval Ever is a shared schema and crowdsourced eval database. It defines a standardized metadata format for storing AI evaluation results — from leaderboard scrapes and research papers to local evaluation runs — so that results from different frameworks can be compared, reproduced, and reused.

MIT

Python

Updated 1 day ago

evaluationsinfra

evals-for-every-language

datenlabor-bmz

❤️45

Tracking language proficiency of AI models for every language

MIT

Python

Updated 4 weeks ago

HB-System

hb-evalSystem

❤️45

HB-Eval is the largest independent behavioral reliability validation study ever conducted for agentic AI systems. Through 14,000 experiments across 14 architecturally distinct models, 5 safety-critical domains, and 3 independent methodologies, we demonstrate that every evaluated model — including state-of-the-art commercial systems — exhibits struc

Python

Updated 1 month ago

normative_evaluation_llms_everyday_dilemmas

dlab-projects

🧡50

Paper repository for "Normative Evaluation of Large Language Models with Everyday Moral Dilemmas" by Sachdeva & van Nuenen.

Jupyter Notebook

Updated 1 week ago

everydayEvaluation

Harshitha-reddy88

❤️10

No description available

Updated 2 years ago

EveryDayEvalution

sfouziya123

❤️10

No description available

HTML

Updated 2 years ago

everyLIFE---repository-for-evaluation

SubAtomicManiac

❤️25

No description available

Kotlin

Updated 6 years ago

evaluate-everything

sxhoio

❤️30

No description available

NOASSERTION

Java

Updated 10 months ago

every-eval-ever-pipeline

yananlong

❤️15

No description available

Python

Updated 1 month ago

SearchEverywhereEvaluation

FranciszekW

❤️35

Evaluation of the performance of a ranking model in Search Everywhere.

Jupyter Notebook

Updated 1 year ago

Lecture_Evaluation_Crawling-Everytime-

alertjjm

❤️25

No description available

Python

Updated 7 years ago

All 11 repositories loaded

GitHub Explorer

Search Results

every_eval_ever

evals-for-every-language

HB-System

normative_evaluation_llms_everyday_dilemmas

everydayEvaluation

EveryDayEvalution

everyLIFE---repository-for-evaluation

evaluate-everything

every-eval-ever-pipeline

SearchEverywhereEvaluation

Lecture_Evaluation_Crawling-Everytime-

every_eval_ever

evals-for-every-language

HB-System

normative_evaluation_llms_everyday_dilemmas

everydayEvaluation

EveryDayEvalution

everyLIFE---repository-for-evaluation

evaluate-everything

every-eval-ever-pipeline

SearchEverywhereEvaluation

Lecture_Evaluation_Crawling-Everytime-