Search Results

Found 85 repositories(showing 30)

Kiln

Kiln-AI

💛73

Build, Evaluate, and Optimize AI Systems. Includes evals, RAG, agents, fine-tuning, synthetic data generation, dataset management, MCP, and more.

4.7k

351

NOASSERTION

Python

Updated 14 minutes ago

aichain-of-thoughtcollaboration+17

devtools-debugger-mcp

ScriptedAlchemy

🧡61

An MCP server exposing full Chrome DevTools Protocol debugging: breakpoints, step/run, call stacks, eval, and source maps.

342

MIT

JavaScript

Updated 2 weeks ago

A Node.js package and GitHub Action for evaluating MCP (Model Context Protocol) tool implementations using LLM-based scoring. This helps ensure your MCP server's tools are working correctly and performing well.

127

MIT

TypeScript

Updated 1 week ago

aievalsmcp

digital-marketing-pro

indranilbanerjee

💛70

Claude Code plugin: 115 commands, 25 agents, 64 scripts, 67 MCP servers, 143 reference files. Eval/QA layer (hallucination detection, claim verification, A+ through F grading). Multilingual (Sarvam AI, DeepL, Google Cloud Translation). Full execution with approval workflow.

MIT

Python

Updated 19 minutes ago

aeoai-agentsanalytics+17

laravel-rag

moneo

🧡65

A complete, driver based RAG pipeline for Laravel with pgvector & sqlite-vec, streaming, agentic retrieval, hybrid search, evals, MCP server, Filament admin.

MIT

PHP

Updated 11 hours ago

ailaravelpg-vector+4

LocalAgent

CalvinSturm

🧡60

Local-first agent runtime for MCP workflows with explicit trust controls, replayable runs, and built-in evals.

MIT

Rust

Updated 3 weeks ago

novyx-mcp

novyxlabs

💛70

Persistent memory for AI agents. 107 MCP tools for remember, recall, rollback, audit, knowledge graph, eval, cortex, replay, governed actions, threat intel, auto-defense, Runtime v2 agents/missions, and more. Works locally (zero config) or with Novyx Cloud.

MIT

Python

Updated 1 day ago

agentsai-agentsaudit-trail+8

mcpx-eval

dylibso

❤️45

An open-ended eval framework for mcp.run tools

BSD-3-Clause

Python

Updated 2 months ago

mcp-eval

lastmile-ai

❤️45

Lightweight eval framework for MCP servers, built on mcp-agent

Apache-2.0

Python

Updated 2 weeks ago

arbiter-mcp-evals

zazencodes

❤️45

Lightweight framework for generating, running, and reviewing MCP evals.

MIT

Python

Updated 2 months ago

eve-online-mcp

kongyo2

❤️40

EVE Online Market MCP Server - A Model Context Protocol server for accessing EVE Online market data through ESI API

MIT

TypeScript

Updated 3 months ago

eveonlinemcpmcp-server+1

mcp-server

iris-eval

🧡55

The agent eval standard for MCP — score output quality, catch safety failures, enforce cost budgets

MIT

TypeScript

Updated 59 minutes ago

agent-evaluationai-agentclaude+9

mcp-server-tester

gleanwork

🧡55

Playwright-based testing and eval framework for MCP servers with LLM-as-a-judge

MIT

TypeScript

Updated 23 hours ago

EveOnlineMCP

WaterPistolAI

❤️40

A local MCP server for accessing the EVE Online ESI API

GPL-3.0

Python

Updated 3 months ago

esieveeveonline+5

mcp-evals-harness

stainless-api

🧡60

Evals using Braintrust for Stripe MCP servers

MIT

TypeScript

Updated 1 week ago

mcp-evals

wolfeidau

❤️30

A Go library and CLI for evaluating Model Context Protocol (MCP) servers using Claude.

Apache-2.0

Updated 2 months ago

aiclaudeevals+2

deep-agent-real-estate

Tejas-TA

🧡55

Bridging Generative AI and Classical ML. A production-grade multi-agent system using ReAct orchestration, MCP, A2A, and LMMs for hyper-local property valuation with Evals.

Jupyter Notebook

Updated 1 week ago

a2a-protocolagentic-ragfastapi+12

agentic-ai-systems

k-celal

❤️45

A step-by-step educational repo that teaches AI agent development from scratch—agent loop, MCP, reflection, tool use, evals, and multi-agent—with concrete examples and code.

Python

Updated 1 month ago

smaht-agent-kit

smaht-ai

❤️40

Starter kit for MCP-powered AI agents: Includes sample client, eval tools, and easy integration guide for Claude Code & beyond.

MIT

Python

Updated 5 months ago

ai-engineer-vault

ElMoorish

🧡55

🚀 The Definitive Field Manual for AI Engineering. 12 chapters covering RAG, Agents, MCP, Evals, and Token Economics. 50+ proven patterns for shipping mission-critical LLM products. Vendor-agnostic, senior-level documentation for the 2026 AI tech stack.

Updated 3 weeks ago

ai-agentsai-engineeringarchitecture-patterns+12

mcp-eval

scorecard-ai

❤️25

MCP Evals

TypeScript

Updated 2 months ago

mcp-evals

buildwithlayer

❤️35

A CLI tool for evaluating MCP servers

TypeScript

Updated 7 months ago

mcp-evals-cli

darinkishore

❤️30

MCP Evals CLI (Deno + Cliffy): import traces, browse viewer, ask, config

TypeScript

Updated 5 months ago

drift-demo-order-api

Use-Tusk

🧡50

Order processing API for Drift MCP evals

TypeScript

Updated 4 weeks ago

mcp-server-EVEfleet

tedfytw1209

❤️40

MCP Server for EVE fleet manager

MIT

Python

Updated 6 months ago

tradergrader

Fuuijin

❤️40

MCP server for EVE Online market data

MIT

Rust

Updated 8 months ago

eval-ai-mcp-aiops

ldrmqs

❤️35

Code for the blog post "AI Eval for MCP in AIOps".

Python

Updated 7 months ago

adk-agent-harness-v1

ahmedmusawir

🧡55

This should be the first complete agent in a harness w/ MCPs, RAG, Session File Memory, Skills, Usage Meter, Evals etc. FIRST AGENT: GEMINI ARCHITECT

Python

Updated 1 week ago

BeamScope-MCP

JediLuke

❤️45

A robust MCP server for Elixir/BEAM applications — logs, eval, docs over persistent TCP with auto-reconnect

MIT

Elixir

Updated 1 week ago

Gauntlet

Deep-De-coder

🧡60

Adversarial eval harness for any LLM agent pipeline — Claude, OpenAI, or your own. CLI + REST API + MCP server for Cursor/Antigravity.

NOASSERTION

Python

Updated 1 week ago

adversarial-testinganthropicclaude+5

GitHub Explorer

Search Results

Kiln

devtools-debugger-mcp

mcp-evals

digital-marketing-pro

laravel-rag

LocalAgent

novyx-mcp

mcpx-eval

mcp-eval

arbiter-mcp-evals

eve-online-mcp

mcp-server

mcp-server-tester

EveOnlineMCP

mcp-evals-harness

mcp-evals

deep-agent-real-estate

agentic-ai-systems

smaht-agent-kit

ai-engineer-vault

mcp-eval

mcp-evals

mcp-evals-cli

drift-demo-order-api

mcp-server-EVEfleet

tradergrader

eval-ai-mcp-aiops

adk-agent-harness-v1

BeamScope-MCP

Gauntlet

Kiln

devtools-debugger-mcp

mcp-evals

digital-marketing-pro

laravel-rag

LocalAgent

novyx-mcp

mcpx-eval

mcp-eval

arbiter-mcp-evals

eve-online-mcp

mcp-server

mcp-server-tester

EveOnlineMCP

mcp-evals-harness

mcp-evals

deep-agent-real-estate

agentic-ai-systems

smaht-agent-kit

ai-engineer-vault

mcp-eval

mcp-evals

mcp-evals-cli

drift-demo-order-api

mcp-server-EVEfleet

tradergrader

eval-ai-mcp-aiops

adk-agent-harness-v1

BeamScope-MCP

Gauntlet