GitHub Explorer

by Alexey Ratnikov

GitHub Explorer

GitHub Explorer|TRENDING COMPARE|FEEDBACK

Back to search

ianarawjo/promptstats - GitHub Explorer | GitHub Explorer | Trending | Compare

Back to search

promptstats

ianarawjo•PUBLIC

View on GitHub

Statistical analysis methods for comparing prompt and model performance in LLM evaluations.

Other

Created on Mar 3, 2026

Updated on Apr 4, 2026

Stars

Forks

Watchers

Open Issues

Repository Health Score

💛

70/100

Good

Overall repository health assessment

Score Breakdown

Activity

Active development - updated this week

30/30

100%

Recent Commits

Improve SEO

Ian Arawjo•5 hours ago

0691a8cView on GitHub

Remove point advantage in favor of abs mean CIs; Add p-value printing method to `analyze` and `cli`

ianarawjo•6 hours ago

a239bd1View on GitHub

Add option to print p-values to analyze() and the CLI. Update index of website

Ian Arawjo•6 hours ago

d436681View on GitHub

Finish removing point_advantage across the repository, in favor of abs point estimates.

Ian Arawjo•11 hours ago

5db02ebView on GitHub

WIP: Remove point advantage in favor of abs mean CIs

Ian Arawjo•12 hours ago

0736333View on GitHub

Fix inconsistencies in best-prompt notebook

Ian Arawjo•2 days ago

1ff7bd4View on GitHub

Add discussion of 'Statistical Comparisons of Classifiers over Multiple Data Sets' to the 'Statistical Debates in LLM Evals' section

Ian Arawjo•2 days ago

c8d126cView on GitHub

Improve critical difference diagram plots. Use E[rank] for CD diagram numeric ranks---an estimation-based analogue to the Friedman test's scores.

Ian Arawjo•2 days ago

4b42147View on GitHub

WIP: Improve critical diff diagrams

Ian Arawjo•2 days ago

7f6a245View on GitHub

WIP: Best-prompt improvements

Ian Arawjo•2 days ago

aaebe39View on GitHub

Add heatmap plot for models x prompts

Ian Arawjo•2 days ago

e04e7f3View on GitHub

Update analyze on cli to be consistent with the new analyze

Ian Arawjo•2 days ago

4800236View on GitHub

Update package version and add section on max-T to which method page.

Ian Arawjo•4 days ago

e5114bdView on GitHub

Switch to regular bootstrap for non-binary inputs, when N>=200

Ian Arawjo•4 days ago

8678dccView on GitHub

Remove p-value correction printing in headers when no p-values asked for

Ian Arawjo•4 days ago

60310c1View on GitHub

View all commits