Found 85 repositories(showing 30)
Kiln-AI
Build, Evaluate, and Optimize AI Systems. Includes evals, RAG, agents, fine-tuning, synthetic data generation, dataset management, MCP, and more.
ScriptedAlchemy
An MCP server exposing full Chrome DevTools Protocol debugging: breakpoints, step/run, call stacks, eval, and source maps.
mclenhard
A Node.js package and GitHub Action for evaluating MCP (Model Context Protocol) tool implementations using LLM-based scoring. This helps ensure your MCP server's tools are working correctly and performing well.
indranilbanerjee
Claude Code plugin: 115 commands, 25 agents, 64 scripts, 67 MCP servers, 143 reference files. Eval/QA layer (hallucination detection, claim verification, A+ through F grading). Multilingual (Sarvam AI, DeepL, Google Cloud Translation). Full execution with approval workflow.
moneo
A complete, driver based RAG pipeline for Laravel with pgvector & sqlite-vec, streaming, agentic retrieval, hybrid search, evals, MCP server, Filament admin.
CalvinSturm
Local-first agent runtime for MCP workflows with explicit trust controls, replayable runs, and built-in evals.
novyxlabs
Persistent memory for AI agents. 107 MCP tools for remember, recall, rollback, audit, knowledge graph, eval, cortex, replay, governed actions, threat intel, auto-defense, Runtime v2 agents/missions, and more. Works locally (zero config) or with Novyx Cloud.
dylibso
An open-ended eval framework for mcp.run tools
lastmile-ai
Lightweight eval framework for MCP servers, built on mcp-agent
zazencodes
Lightweight framework for generating, running, and reviewing MCP evals.
kongyo2
EVE Online Market MCP Server - A Model Context Protocol server for accessing EVE Online market data through ESI API
iris-eval
The agent eval standard for MCP — score output quality, catch safety failures, enforce cost budgets
gleanwork
Playwright-based testing and eval framework for MCP servers with LLM-as-a-judge
WaterPistolAI
A local MCP server for accessing the EVE Online ESI API
stainless-api
Evals using Braintrust for Stripe MCP servers
wolfeidau
A Go library and CLI for evaluating Model Context Protocol (MCP) servers using Claude.
Tejas-TA
Bridging Generative AI and Classical ML. A production-grade multi-agent system using ReAct orchestration, MCP, A2A, and LMMs for hyper-local property valuation with Evals.
k-celal
A step-by-step educational repo that teaches AI agent development from scratch—agent loop, MCP, reflection, tool use, evals, and multi-agent—with concrete examples and code.
smaht-ai
Starter kit for MCP-powered AI agents: Includes sample client, eval tools, and easy integration guide for Claude Code & beyond.
ElMoorish
🚀 The Definitive Field Manual for AI Engineering. 12 chapters covering RAG, Agents, MCP, Evals, and Token Economics. 50+ proven patterns for shipping mission-critical LLM products. Vendor-agnostic, senior-level documentation for the 2026 AI tech stack.
scorecard-ai
MCP Evals
buildwithlayer
A CLI tool for evaluating MCP servers
darinkishore
MCP Evals CLI (Deno + Cliffy): import traces, browse viewer, ask, config
Use-Tusk
Order processing API for Drift MCP evals
tedfytw1209
MCP Server for EVE fleet manager
Fuuijin
MCP server for EVE Online market data
ldrmqs
Code for the blog post "AI Eval for MCP in AIOps".
ahmedmusawir
This should be the first complete agent in a harness w/ MCPs, RAG, Session File Memory, Skills, Usage Meter, Evals etc. FIRST AGENT: GEMINI ARCHITECT
JediLuke
A robust MCP server for Elixir/BEAM applications — logs, eval, docs over persistent TCP with auto-reconnect
Deep-De-coder
Adversarial eval harness for any LLM agent pipeline — Claude, OpenAI, or your own. CLI + REST API + MCP server for Cursor/Antigravity.