Terminal-Bench-Science: Evaluating AI Agents on Complex Real-World Scientific Workflows in the Terminal
Stars
56
Forks
34
Watchers
56
Open Issues
17
Overall repository health assessment
No package.json found
This might not be a Node.js project
177
commits
158
commits
28
commits
5
commits
2
commits
2
commits
2
commits
1
commits
Add heredoc size guidance to solution_quality criterion (#98)
9c834e3View on GitHubMerge pull request #54 from AaronFeller/add-geometric-pharmacophore-alignment
1ea7b49View on GitHubUpdate solver script, threshholds for scoring, and instruction.
d41a612View on GitHubMerge pull request #82 from harbor-framework/fix/overview-triple-quoted-toml
6821049View on GitHub