Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks
Stars
4.0k
Forks
667
Watchers
4.0k
Open Issues
231
Overall repository health assessment
No package.json found
This might not be a Node.js project
825
commits
179
commits
84
commits
30
commits
27
commits
24
commits
18
commits
18
commits
18
commits
15
commits
[Benchmark] Add support for MMOral-OPG-Closed benchmark (#1483)
a3656d4View on GitHub[Benchmark] Add support for MMOral-OPG-Open benchmark (#1484)
589fe36View on GitHub[Benchmark] Add support for MMSafetyBench, XSTest, MMSBench, Flames, SIUO and M3oralBench. (#1488)
6e7e372View on GitHub[Update] Update model settings for 2026.2 live leaderboard. (#1492)
655e65fView on GitHub[Feature] Support sequential inference accorss all datasets and parallel evaluation. (#1487)
4baeeeeView on GitHub[Fix] physics: deduplicate preds to handle repeat output, add signal timeout handler (#1470)
1e2b2f9View on GitHub[Benchmark] XLRSBench: rewrite evaluate() with track_progress_rich and GPT-assisted scoring (#1472)
f8e2bc1View on GitHub[Fix] olympiadbench: add isinstance(line, int) guard in build_prompt to handle integer row index (#1469)
9049740View on GitHub