Found 89,702 repositories(showing 30)
wg
Modern HTTP benchmarking tool
open-mmlab
OpenMMLab Detection Toolbox and Benchmark
sharkdp
A command-line benchmarking tool
toon-format
π Token-Oriented Object Notation (TOON) β Compact, human-readable, schema-aware JSON for LLM prompts. Spec, benchmarks, TypeScript SDK.
openai
Evals is a framework for evaluating LLMs and LLM systems, and an open-source registry of benchmarks.
trycua
Open-source infrastructure for Computer-Use Agents. Sandboxes, SDKs, and benchmarks to train and evaluate AI agents that can control full desktops (macOS, Linux, Windows).
zalandoresearch
A MNIST-like fashion product database. Benchmark :point_down:
dotnet
Powerful .NET library for benchmarking
xmrig
RandomX, KawPow, CryptoNight and GhostRider unified CPU/GPU miner and RandomX benchmark
open-mmlab
OpenMMLab Semantic Segmentation Toolbox and Benchmark.
mcollina
fast HTTP/1.1 benchmarking tool written in Node.js
TechEmpower
Source for the TechEmpower Framework Benchmarks project
aquasecurity
Checks whether Kubernetes is deployed according to security best practices as defined in the CIS Kubernetes Benchmark
open-mmlab
OpenMMLab Pose Estimation Toolbox and Benchmark.
codesenberg
Fast cross-platform HTTP benchmarking tool written in Go
akopytov
Scriptable database and system performance benchmark
cleverhans-lab
An adversarial example library for constructing attacks, building defenses, and benchmarking both
JoeDog
Siege is an http load tester and benchmarking utility
erikbern
Benchmarks of approximate nearest neighbor libraries in Python
bestiejs
A benchmarking library. As used on jsPerf.com.
AgentOps-AI
Python SDK for AI agent monitoring, LLM cost tracking, benchmarking, and more. Integrates with most LLMs and agent frameworks including CrewAI, Agno, OpenAI Agents SDK, Langchain, Autogen, AG2, and CamelAI
bheisler
Statistics-driven benchmarking library for Rust
brianfrankcooper
Yahoo! Cloud Serving Benchmark
open-mmlab
OpenMMLab's Next Generation Video Understanding Toolbox and Benchmark
locuslab
Sequence modeling benchmarks and temporal convolutional networks
six-ddc
A high-performance HTTP benchmarking tool that includes a real-time web UI and terminal display
LearningCircuit
Local Deep Research achieves ~95% on SimpleQA benchmark (tested with GPT-4.1-mini). Supports local and cloud LLMs (Ollama, Google, Anthropic, ...). Searches 10+ sources - arXiv, PubMed, web, and your private documents. Everything Local & Encrypted.
CLUEbenchmark
δΈζθ―θ¨ηθ§£ζ΅θ―εΊε Chinese Language Understanding Evaluation Benchmark: datasets, baselines, pre-trained models, corpus and leaderboard
open-compass
Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks
open-mmlab
OpenMMLab Pre-training Toolbox and Benchmark