GitHub Explorer

by Alexey Ratnikov

GitHub Explorer

GitHub Explorer|TRENDING COMPARE|FEEDBACK

Back to search

Copyright (c) 2026 Alexey Ratnikov

NahuelGiudizi/llm-evaluation - GitHub Explorer | GitHub Explorer | Trending | Compare

llm-evaluation

NahuelGiudizi•PUBLIC

Enterprise-grade LLM evaluation framework | Multi-model benchmarking, honest dashboards, system profiling | Academic metrics: MMLU, TruthfulQA, HellaSwag | Zero fake data | PyPI: llm-benchmark-toolkit | Blog: https://dev.to/nahuelgiudizi/building-an-honest-llm-evaluation-framework-from-fake-metrics-to-real-benchmarks-2b90

academic-metricsbenchmarkinghellaswagllm-evaluationmachine-learningmmlu

Created on Nov 29, 2025

Updated on Jan 25, 2026

Stars

1

Forks

1

Watchers

1

Open Issues

1

Repository Health Score

❤️

30/100

Poor

Overall repository health assessment

Score Breakdown

Activity

Slow updates - updated within 3 months

10/30

33%

ollama

performance-testing

python

truthfulqa

visualization

Community

1 stars, 1 forks

0/30

0%

Documentation

Has description, wiki

15/20

75%

Maintenance

100.0% issue ratio

5/20

25%

Health score is calculated based on activity, community engagement, documentation quality, and maintenance practices

Languages

Python

77.1%

JavaScript

21.9%

CSS

0.6%

Dockerfile

0.2%

HTML

0.1%

Dependencies

No package.json found

This might not be a Node.js project

Top Contributors

1

NahuelGiudizi

User

72

commits

Recent Commits

fix: add redteam and prompt_injection to benchmark command

Nahuel Giudizi•3 months ago

56a0b83View on GitHub

fix: update version to 2.4.1 and improve UI checkbox circles

Nahuel Giudizi•3 months ago

ad40f02View on GitHub

chore(release): bump to v2.4.1 - metadata sync, unified CLI, analytics

Nahuel Giudizi•3 months ago

ccd0cceView on GitHub

feat: complete AI Security CLI integration and testing

Nahuel Giudizi•3 months ago

3d11ed0View on GitHub

feat: add AI Safety/Security testing module

Nahuel Giudizi•3 months ago

8cce720View on GitHub

feat: improve API Keys UX and auto-refresh providers

Nahuel Giudizi•3 months ago

3ecdcf8View on GitHub

Fix: CI/CD black formatting and Progress view UX

Nahuel Giudizi•4 months ago

31860c8View on GitHub

Release v2.4.0: API Keys UI, Complete Design System, Code Quality Improvements

Nahuel Giudizi•4 months ago

f30b93dView on GitHub

fix(tests): Skip optional provider tests when packages not installed

Nahuel Giudizi•4 months ago

13da002View on GitHub

fix(mypy): Add cross-platform type checking compatibility for winreg

Nahuel Giudizi•4 months ago

e303901View on GitHub

feat(dashboard): Add sorting and click-to-highlight to Model Comparison

Nahuel Giudizi•4 months ago

c5d7c87View on GitHub

chore: Bump version to 2.3.1 - Dashboard bug fixes

Nahuel Giudizi•4 months ago

15cc1c2View on GitHub

fix(dashboard): Fix benchmark score display and auto-scroll issues

Nahuel Giudizi•4 months ago

35bf1f3View on GitHub

feat: Add Gemini provider support with retry logic, comprehensive tests, and documentation

Nahuel Giudizi•4 months ago

c3fa1ffView on GitHub

build: update dashboard frontend bundle

Nahuel Giudizi•4 months ago

e7e6fd3View on GitHub

View all commits