Enterprise-grade LLM evaluation framework | Multi-model benchmarking, honest dashboards, system profiling | Academic metrics: MMLU, TruthfulQA, HellaSwag | Zero fake data | PyPI: llm-benchmark-toolkit | Blog: https://dev.to/nahuelgiudizi/building-an-honest-llm-evaluation-framework-from-fake-metrics-to-real-benchmarks-2b90
Stars
1
Forks
1
Watchers
1
Open Issues
1
Overall repository health assessment
No package.json found
This might not be a Node.js project
72
commits
fix: add redteam and prompt_injection to benchmark command
56a0b83View on GitHubfix: update version to 2.4.1 and improve UI checkbox circles
ad40f02View on GitHubchore(release): bump to v2.4.1 - metadata sync, unified CLI, analytics
ccd0cceView on GitHubfeat: complete AI Security CLI integration and testing
3d11ed0View on GitHubfeat: improve API Keys UX and auto-refresh providers
3ecdcf8View on GitHubRelease v2.4.0: API Keys UI, Complete Design System, Code Quality Improvements
f30b93dView on GitHubfix(tests): Skip optional provider tests when packages not installed
13da002View on GitHubfix(mypy): Add cross-platform type checking compatibility for winreg
e303901View on GitHubfeat(dashboard): Add sorting and click-to-highlight to Model Comparison
c5d7c87View on GitHubfix(dashboard): Fix benchmark score display and auto-scroll issues
35bf1f3View on GitHubfeat: Add Gemini provider support with retry logic, comprehensive tests, and documentation
c3fa1ffView on GitHub