Found 10 repositories(showing 10)
meshkovQA
Comprehensive AI Model Evaluation Framework with advanced techniques including Temperature-Controlled Verdict Aggregation via Generalized Power Mean. Support for multiple LLM providers and 15+ evaluation metrics for RAG systems and AI agents.
firstlinesoftware
Comprehensive AI Evaluation Framework with advanced techniques including Temperature-Controlled Verdict Aggregation via Generalized Power Mean. Support for multiple LLM providers and 15+ evaluation metrics for RAG systems and AI agents.
microsoft
No description available
microsoft
A plugin for AI agent evaluation. Plan evals, generate test cases, interpret results for Copilot Studio agents. Grounded in Microsoft's Eval Scenario Library & Triage Playbook.
Dmunch04
A Python library for interacting, and creating your own AI, with Eve
AirVetra
Automated LLM testing pipeline for LM Studio using Eval AI Library. Features dynamic model loading/unloading, interactive CLI, multiple metrics (RAG, Security, Deterministic), and integrated web dashboard.
gcampton
AI professional firm for Claude Code โ 29 eval-tested specialists (lawyers, accountants, designers, SEO, copywriters, and more) with real methodologies and reference libraries. Lightweight coordinator loads skills on demand.
arjunghosh
A Python CLI tool and a library to convert word docx file into .JSON and .JSONL (e.g.: For AI Foundry Eval upload) file
cylijinpeng
A practical builder library for AI agents: prompts, skills, MCP, frameworks, RAG, evals, and starter packs.
trehansalil
AI Agent Workflow Studio โ paste a business process, auto-generate prompts/tool schemas/evals, red-team for prompt injection, data leakage & tool misuse, with pass/fail traces, attack libraries, regression tests, and a live hardening checklist.
All 10 repositories loaded