PinchBench is a benchmarking system for evaluating LLM models as OpenClaw coding agents. Made with 🦀 by the humans at https://kilo.ai
Stars
30
Forks
11
Watchers
30
Open Issues
13
Overall repository health assessment
^3.9.11.2.21.1.41.1.11.1.21.1.31.1.22.2.41.1.42.1.41.1.42.1.11.1.41.2.31.1.41.1.11.2.21.2.22.1.41.1.11.2.21.1.11.1.21.1.21.2.41.1.11.1.11.1.6^10.4.20^0.7.1^2.1.11.1.14.1.08.5.1^1.11.131.4.1^0.544.016.1.6^0.4.619.2.38.10.119.2.3^7.54.1^2.1.72.15.0^1.7.1^2.5.5^1.0.7^1.1.2^1.205.3^3.24.1^4.1.13^1.3.11^2219.2.719.2.3^8.5^3.4.175.7.3Merge pull request #81 from pinchbench/feat/consolidate-versions-by-semver
0bbad97View on GitHubfeat: consolidate versions by semver in dropdown (#79)
37216d6View on GitHubMerge pull request #75 from pinchbench/gt/flint/c7246f65
57f1c31View on GitHubdocs: update Benchmark Versioning section for semver scheme
c3765a9View on GitHubUpdate about page with semver versioning documentation
cab8635View on GitHubMerge pull request #61 from doubledare704/feat/48/daily-weekly-monthly-badges
751bf71View on GitHubfeat(badges): add period normalization and aliases for badge metrics
77b662eView on GitHubfeat(badges): implement daily, weekly, monthly badge support
6d8e8c4View on GitHubMerge pull request #60 from doubledare704/feature/task-heatmap-category-filter
ce5880aView on GitHubMerge pull request #65 from NianJiuZst/fix/calculate-ranks-tie-bug
85579d6View on GitHubMerge pull request #63 from doubledare704/feature/model-landing-pages-search
1be2e0fView on GitHubMerge pull request #62 from doubledare704/feat/47/show-overal-rank-every-run
8d42e17View on GitHubfeat: add model landing pages with search and score trend
13a8d4eView on GitHub