Search Results

Found 108 repositories(showing 30)

tau-bench

sierra-research

💛73

Code and Data for Tau-Bench

1.2k

189

MIT

Python

Updated 4 hours ago

tau2-bench

sierra-research

🧡68

τ-Bench: A Benchmark for Tool-Agent-User Interaction in Real-World Domains

968

243

MIT

Python

Updated 3 hours ago

aibenchmarkconversational-agents+2

τ²-Bench-Verified is a corrected and verified version of the original τ²-bench benchmark. This release addresses issues discovered in the original dataset where task definitions, expected actions, and evaluation criteria did not properly align with the stated policies or database contents.

MIT

Python

Updated 5 days ago

agentify-example-tau-bench

agentbeats

❤️30

About Example code of agentifying tau-bench for the blog `Agentify the Agent Assessment`.

Python

Updated 2 months ago

streetbeat-tau-bench

dev-streetbeat

❤️40

Streetbeat Tackles τ-bench: Evaluating Advanced Agentic Capabilities in Realistic Scenarios

MIT

Updated 10 months ago

tau2-bench-revised

AGI-Eval-Official

❤️40

No description available

MIT

Updated 1 month ago

tau2-bench

oscaralvaro

🧡55

Copia del repositorio original https://github.com/sierra-research/tau2-bench?tab=readme-ov-file

MIT

Python

Updated 2 days ago

tau-bench

safikhanSoofiyani

❤️40

Code and Data for Tau-Bench

MIT

Python

Updated 4 months ago

agentic-rl

Maxusmusti

🧡55

Agentic RLVR PoC: ART vs Agent Lightning on tau-bench

Python

Updated 2 weeks ago

tauri-ipc-benchmark

federico-terzi

❤️40

A benchmark of possible Tauri/wry IPC methods

MIT

Rust

Updated 1 year ago

taugpt-kvcache-bench

tuned-org-uk

❤️45

A benchmark of KV-Cache efficiency of tauformer. Paper -> https://github.com/tuned-org-uk/tauformer-paper

NOASSERTION

Rust

Updated 2 months ago

amazon-tau-bench-utilities

OmarElsendiony

❤️25

No description available

Python

Updated 4 months ago

tauri-bin-transfer-bench

insopitus

❤️35

A small benchmark between tauri's default readBinaryFile api(json serialization) and custom method using base64 encoding

TypeScript

Updated 1 year ago

tauri

tau-kg-benchmarking-GPT

SCAI-BIO

❤️40

Benchmarking the performance of GPT-based LLMs against human curated Tau KG from Human Brain Pharmacome (HBP)

Apache-2.0

Jupyter Notebook

Updated 2 years ago

tauri-vs-electron-benchmark

dkisb

❤️40

This benchmark serves the purpose of seeing how much resource tauri and electron use on the same exact projects.

TypeScript

Updated 2 months ago

Test-Tau-Bench-Repository

sarmad-t

❤️10

No description available

Updated 5 months ago

tau-bench-app

preethisesh

❤️30

No description available

MIT

Python

Updated 3 months ago

tau-bench-improved

abdallah197

❤️35

TAU Bench repository

Python

Updated 6 months ago

tau-b

sert121

❤️40

repr for tau-bench

MIT

Python

Updated 6 months ago

amazon-tau-bench-tasks-main

turing-raghava

❤️20

amazon-tau-bench-tasks-main

Python

Updated 5 months ago

tau-bench

abhiklodh

❤️25

Code and Data for Tau-Bench

MIT

Python

Updated 6 months ago

tau-bench

shahabeddin

❤️30

No description available

MIT

Jupyter Notebook

Updated 6 months ago

tau-bench

benchflow-yaml

❤️25

No description available

Python

Updated 11 months ago

tau-bench

kunato

❤️30

No description available

MIT

Python

Updated 7 months ago

Tau-bench

salmantask123-prog

❤️25

No description available

Updated 6 months ago

tau-bench

ShayanPervez

❤️25

No description available

Jupyter Notebook

Updated 8 months ago

tau-bench

ABHINAV2400

❤️30

No description available

MIT

Python

Updated 7 months ago

tau-bench

BhaveshBalaji

🧡60

Running agentic AI benchmark experiment aiming to improve the performance with agentic architecture.

MIT

Python

Updated 2 weeks ago

Improving-Agent-Success-Rates-using-Prompt-Engineering

Rajitb002

❤️45

Improving Task Success Rate in Tau Square - Bench

Jupyter Notebook

Updated 2 months ago

Tau-bench

Mav11Young

❤️30

No description available

MIT

Python

Updated 3 months ago

GitHub Explorer

Search Results

tau-bench

tau2-bench

tau2-bench-verified

agentify-example-tau-bench

streetbeat-tau-bench

tau2-bench-revised

tau2-bench

tau-bench

agentic-rl

tauri-ipc-benchmark

taugpt-kvcache-bench

amazon-tau-bench-utilities

tauri-bin-transfer-bench

tau-kg-benchmarking-GPT

tauri-vs-electron-benchmark

Test-Tau-Bench-Repository

tau-bench-app

tau-bench-improved

tau-b

amazon-tau-bench-tasks-main

tau-bench

tau-bench

tau-bench

tau-bench

Tau-bench

tau-bench

tau-bench

tau-bench

Improving-Agent-Success-Rates-using-Prompt-Engineering

Tau-bench

tau-bench

tau2-bench

tau2-bench-verified

agentify-example-tau-bench

streetbeat-tau-bench

tau2-bench-revised

tau2-bench

tau-bench

agentic-rl

tauri-ipc-benchmark

taugpt-kvcache-bench

amazon-tau-bench-utilities

tauri-bin-transfer-bench

tau-kg-benchmarking-GPT

tauri-vs-electron-benchmark

Test-Tau-Bench-Repository

tau-bench-app

tau-bench-improved

tau-b

amazon-tau-bench-tasks-main

tau-bench

tau-bench

tau-bench

tau-bench

Tau-bench

tau-bench

tau-bench

tau-bench

Improving-Agent-Success-Rates-using-Prompt-Engineering

Tau-bench