Found 108 repositories(showing 30)
sierra-research
Code and Data for Tau-Bench
sierra-research
τ-Bench: A Benchmark for Tool-Agent-User Interaction in Real-World Domains
amazon-agi
τ²-Bench-Verified is a corrected and verified version of the original τ²-bench benchmark. This release addresses issues discovered in the original dataset where task definitions, expected actions, and evaluation criteria did not properly align with the stated policies or database contents.
agentbeats
About Example code of agentifying tau-bench for the blog `Agentify the Agent Assessment`.
dev-streetbeat
Streetbeat Tackles τ-bench: Evaluating Advanced Agentic Capabilities in Realistic Scenarios
AGI-Eval-Official
No description available
oscaralvaro
Copia del repositorio original https://github.com/sierra-research/tau2-bench?tab=readme-ov-file
safikhanSoofiyani
Code and Data for Tau-Bench
Maxusmusti
Agentic RLVR PoC: ART vs Agent Lightning on tau-bench
federico-terzi
A benchmark of possible Tauri/wry IPC methods
tuned-org-uk
A benchmark of KV-Cache efficiency of tauformer. Paper -> https://github.com/tuned-org-uk/tauformer-paper
OmarElsendiony
No description available
insopitus
A small benchmark between tauri's default readBinaryFile api(json serialization) and custom method using base64 encoding
SCAI-BIO
Benchmarking the performance of GPT-based LLMs against human curated Tau KG from Human Brain Pharmacome (HBP)
This benchmark serves the purpose of seeing how much resource tauri and electron use on the same exact projects.
sarmad-t
No description available
preethisesh
No description available
abdallah197
TAU Bench repository
sert121
repr for tau-bench
turing-raghava
amazon-tau-bench-tasks-main
abhiklodh
Code and Data for Tau-Bench
shahabeddin
No description available
benchflow-yaml
No description available
kunato
No description available
salmantask123-prog
No description available
ShayanPervez
No description available
ABHINAV2400
No description available
BhaveshBalaji
Running agentic AI benchmark experiment aiming to improve the performance with agentic architecture.
Improving Task Success Rate in Tau Square - Bench
Mav11Young
No description available