Search Results

Found 10 repositories(showing 10)

python-benchmark-harness

JoeyHendricks

❤️40

A micro/macro benchmark framework for the Python programming language that helps with optimizing your software.

157

MIT

Python

Updated 3 months ago

benchmarkbenchmark-frameworkbenchmarking+16

python-cli-bench

zzstoatzz

❤️45

a benchmark harness for python CLIs

Python

Updated 1 month ago

A minimal, reproducible harness to benchmark coding LLMs on Project Euler–style problems in Python. Runs each solution in isolation, measures runtime, checks correctness, scores by difficulty & latency, and ranks models on a shared scoreboard.

GPL-3.0

Python

Updated 20 hours ago

codenames-ai

cschubiner

❤️20

Codenames AI: Python benchmarking harness + interactive web game with LLM spymasters

TypeScript

Updated 3 months ago

sim-benchmarking

EzraEngel

❤️35

A python test harness to benchmark agent-based simulations across a range of representative computational kernels.

Python

Updated 3 months ago

json-parsing-comparation

misha-armatura

❤️40

Minimal C++23 benchmark JSON parse/transform/serialize using Glaze, RapidJSON, and nlohmann::json, with curl & Python test harnesses.

MIT

C++

Updated 6 months ago

llm-eval-harness

dicnunz

🧡50

Description: Tiny reproducible harness for evaluating local LLMs through an OpenAI-compatible API. Topics: python, llm, evaluation, benchmarking, cli, local-llm

MIT

Python

Updated 1 month ago

LLM-Benchmarking-Harness-4bit

Lalu2002

❤️40

A memory-optimized Python harness for sequential, large-scale LLM benchmarking. Utilizes Hugging Face models, lm-eval-harness, and 4-bit quantization with aggressive GPU memory management to run evaluations efficiently.

MIT

Python

Updated 5 months ago

RAG-Scout

shamspias

❤️40

RAG-Scout is a Python test-harness that automatically benchmarks multiple similarity-search strategies (sparse, dense, hybrid, late-interaction, rerankers) on any Q/A dataset and tells you which retriever stack is the best foundation for your Retrieval-Augmented Generation pipeline.

MIT

Python

Updated 11 months ago

project-euler

savourylie

❤️40

Project Euler practice repo, language-agnostic. Each problem folder includes my hand-written solutions plus alternative solutions from different LLMs (with prompts, models, tests, and complexity notes). Reusable test harnesses, benchmarks, and CI help verify correctness across Python, JS/TS, Rust, Go, and more.

MIT

Updated 5 months ago

All 10 repositories loaded

GitHub Explorer

Search Results

python-benchmark-harness

python-cli-bench

LLMHackathon

codenames-ai

sim-benchmarking

json-parsing-comparation

llm-eval-harness

LLM-Benchmarking-Harness-4bit

RAG-Scout

project-euler

python-benchmark-harness

python-cli-bench

LLMHackathon

codenames-ai

sim-benchmarking

json-parsing-comparation

llm-eval-harness

LLM-Benchmarking-Harness-4bit

RAG-Scout

project-euler