Search Results

Found 4 repositories(showing 4)

petergpt

💛72

BullshitBench measures whether AI models challenge nonsensical prompts instead of confidently answering them, created by Peter Gostev.

1.4k

MIT

Python

Updated 5 hours ago

Agent-Evaluation

🧡60

No description available

MIT

Python

Updated 5 hours ago

petergpt

🧡60

Standalone BullshitBench reasoning-trace annotation lab.

Python

Updated 6 days ago

fdietze

❤️35

no bullshit relational vs graph database benchmarks

PLpgSQL

Updated 7 years ago

All 4 repositories loaded

GitHub Explorer