Found 4 repositories(showing 4)
petergpt
BullshitBench measures whether AI models challenge nonsensical prompts instead of confidently answering them, created by Peter Gostev.
Agent-Evaluation
No description available
petergpt
Standalone BullshitBench reasoning-trace annotation lab.
fdietze
no bullshit relational vs graph database benchmarks
All 4 repositories loaded