Found 10 repositories(showing 10)
JoeyHendricks
A micro/macro benchmark framework for the Python programming language that helps with optimizing your software.
zzstoatzz
a benchmark harness for python CLIs
aerkanc
A minimal, reproducible harness to benchmark coding LLMs on Project Euler–style problems in Python. Runs each solution in isolation, measures runtime, checks correctness, scores by difficulty & latency, and ranks models on a shared scoreboard.
cschubiner
Codenames AI: Python benchmarking harness + interactive web game with LLM spymasters
EzraEngel
A python test harness to benchmark agent-based simulations across a range of representative computational kernels.
misha-armatura
Minimal C++23 benchmark JSON parse/transform/serialize using Glaze, RapidJSON, and nlohmann::json, with curl & Python test harnesses.
dicnunz
Description: Tiny reproducible harness for evaluating local LLMs through an OpenAI-compatible API. Topics: python, llm, evaluation, benchmarking, cli, local-llm
Lalu2002
A memory-optimized Python harness for sequential, large-scale LLM benchmarking. Utilizes Hugging Face models, lm-eval-harness, and 4-bit quantization with aggressive GPU memory management to run evaluations efficiently.
shamspias
RAG-Scout is a Python test-harness that automatically benchmarks multiple similarity-search strategies (sparse, dense, hybrid, late-interaction, rerankers) on any Q/A dataset and tells you which retriever stack is the best foundation for your Retrieval-Augmented Generation pipeline.
savourylie
Project Euler practice repo, language-agnostic. Each problem folder includes my hand-written solutions plus alternative solutions from different LLMs (with prompts, models, tests, and complexity notes). Reusable test harnesses, benchmarks, and CI help verify correctness across Python, JS/TS, Rust, Go, and more.
All 10 repositories loaded