Search Results

Found 77 repositories(showing 30)

AgentBench

THUDM

💛71

A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)

3.3k

244

Apache-2.0

Python

Updated 1 day ago

chatgptgpt-4llm+1

Ko-AgentBench

Hugging-Face-KREW

🧡55

No description available

Apache-2.0

Python

Updated 5 days ago

agentbench

eth-sri

🧡55

No description available

MIT

Python

Updated 1 day ago

AgentBench

VIA-Research

🧡65

The set of AI agent model implementations, benchmarks, and others used in our paper "The Cost of Dynamic Reasoning: Demystifying AI Agents and Test-Time Scaling from an AI Infrastructure Perspective"

MIT

Python

Updated 4 days ago

FHIR-AgentBench

glee4810

🧡55

Code and Data for FHIR-AgentBench

CC-BY-4.0

Python

Updated 2 weeks ago

General-AgentBench

cxcscmu

🧡50

Benchmark Test-Time Scaling of General LLM Agents

MIT

Python

Updated 1 week ago

agentbench-openclaw

agentbench

🧡50

No description available

MIT

Shell

Updated 1 week ago

AgentBench-Live

jackjin1997

🧡55

The open benchmark for AI agent task execution. Claude Code vs Gemini CLI — who wins? Live leaderboard inside.

MIT

Python

Updated 4 days ago

agent-benchmarkai-agentsbenchmark+5

agentbench

yongPhone

❤️40

A lightweight, type-safe Go framework for testing AI agents with customizable scorers, concurrent execution, and flexible configuration

MIT

Updated 5 months ago

agentbench

🧡50

No description available

MIT

Shell

Updated 2 weeks ago

AgentBench

keijiro

❤️40

General-purpose (non-project specific) workbench for AI coding agents

Shell

Updated 1 month ago

agentbench-rw

sauremilk

🧡60

Real-world evaluation framework for AI coding agents — measures safety, containment, cost, and autonomy beyond just correctness.

MIT

Python

Updated 2 weeks ago

AgentBenchmark

Leu3ery

❤️45

No description available

Python

Updated 2 weeks ago

agentbench

michaelwinczuk

🧡60

Framework-agnostic CLI tool for benchmarking AI agents across standardized tasks

MIT

TypeScript

Updated 2 weeks ago

ai-agentsbenchmarking

AgentBench

Z-ZHHH

❤️35

small adjustment to AgentBench v0.2

Python

Updated 2 years ago

agentbench

chu2bard

🧡65

Evaluation framework for AI coding agents

MIT

Python

Updated 2 days ago

agentsaibenchmarks+3

AgentBench

dx2ztm76-new

❤️25

No description available

Updated 1 year ago

AgentBench

OmnionixAI

💛70

A comprehensive evaluation framework and benchmark suite designed to rigorously assess the performance, reliability, and reasoning capabilities of autonomous AI agents.

Apache-2.0

Python

Updated 5 days ago

A comprehensive evaluation framework for GitHub agents, built using LlamaIndex and Arize Phoenix telemetry. It supports both single-agent and multi-agent architectures, enabling automated assessment of agent reasoning, tool selection, and execution efficiency. Ideal for developers aiming to benchmark and enhance AI-driven GitHub automation tools.

Python

Updated 2 months ago

AgentBench-Framework

AbdulElahOthmanGwaith

❤️45

edia application

Apache-2.0

Python

Updated 3 weeks ago

AgentBenchMedicine_source

NCCYUNSONG

🧡50

No description available

MIT

Python

Updated 3 weeks ago

Green-Quantum-AgentBench

NurcholishAdam

❤️40

Green-Quantum AgentBench Advancing sustainability-aware agent benchmarking with Quantum Limit Graph architectures. Integrated with Quantum Error Correction (QEC) and Multilingual Provenance modules for AgentBeats.

TypeScript

Updated 1 month ago

AgentBench_log

jiniac-v2

🧡55

AgentBenchのログ

Updated 3 weeks ago

agentbench

to-real

❤️35

AI Agent评测平台 - Complete evaluation platform for AI Agents

TypeScript

Updated 8 months ago

agentbench

JakeB-5

🧡55

AI Agent Evaluation, Testing & Monitoring Platform - Ship reliable AI agents with confidence

TypeScript

Updated 3 weeks ago

agentbench

wingtonrbrito

❤️25

No description available

Python

Updated 5 months ago

agentbench

stevenkozeniesky02

🧡60

Standardized benchmark framework for comparing AI coding agents (Claude Code, Codex, Cursor)

MIT

Python

Updated 3 weeks ago

general-agentbench.github.io

general-agentbench

❤️30

Project website for General AgentBench

TeX

Updated 1 month ago

agentbench

Helm-Development

🧡60

Evaluation framework for agentic coding flows

Python

Updated 1 day ago

agentbench

dhruvvenkat

❤️45

ai evolution framework to see how your agents are performing

Updated 1 month ago

GitHub Explorer

Search Results

AgentBench

Ko-AgentBench

agentbench

AgentBench

FHIR-AgentBench

General-AgentBench

agentbench-openclaw

AgentBench-Live

agentbench

agentbench

AgentBench

agentbench-rw

AgentBenchmark

agentbench

AgentBench

agentbench

AgentBench

AgentBench

AgentBench

AgentBench-Framework

AgentBenchMedicine_source

Green-Quantum-AgentBench

AgentBench_log

agentbench

agentbench

agentbench

agentbench

general-agentbench.github.io

agentbench

agentbench

AgentBench

Ko-AgentBench

agentbench

AgentBench

FHIR-AgentBench

General-AgentBench

agentbench-openclaw

AgentBench-Live

agentbench

agentbench

AgentBench

agentbench-rw

AgentBenchmark

agentbench

AgentBench

agentbench

AgentBench

AgentBench

AgentBench

AgentBench-Framework

AgentBenchMedicine_source

Green-Quantum-AgentBench

AgentBench_log

agentbench

agentbench

agentbench

agentbench

general-agentbench.github.io

agentbench

agentbench