GitHub Explorer

by Alexey Ratnikov

GitHub Explorer

GitHub Explorer|TRENDING COMPARE|FEEDBACK

Back to search

aidanmclaughlin/AidanBench - GitHub Explorer | GitHub Explorer | Trending | Compare

Back to search

AidanBench

aidanmclaughlin•PUBLIC

View on GitHub

Aidan Bench attempts to measure <big_model_smell> in LLMs.

Created on Aug 5, 2024

Updated on Apr 1, 2026

Stars

318

Forks

Watchers

318

Open Issues

Repository Health Score

🧡

65/100

Fair

Overall repository health assessment

Score Breakdown

Activity

Active development - updated this week

30/30

100%

Issues Analytics

Total Issues

All time

Open

0% of total

Closed

Recent Commits

Revert "model mapping fixes for broken api endpoints etc"

Anuja•9 months ago

a6bb325View on GitHub

model mapping fixes for broken api endpoints etc

Anuja•9 months ago

fcc5a5dView on GitHub

requirements file update

Anuja•9 months ago

0fc675bView on GitHub

model additions

Anuja•9 months ago

8dc0106View on GitHub

fixes to load env vars properly, bugged out before and got annoying

Anuja•9 months ago

79ec81aView on GitHub

Add python-dotenv dependency to requirements.txt

Anuja•9 months ago

6636460View on GitHub

Add latest AI models: Claude 4 (Opus/Sonnet + thinking), OpenAI (o3/o3-pro/o4-mini), Gemini 2.0/2.5, Grok 3 thinking variants, DeepSeek R1-0528, new Mistral models

Anuja•9 months ago

82ee00fView on GitHub

add atomic file writes to prevent results.json corruption

Anuja•9 months ago

68806e3View on GitHub

fix import errors when API keys are missing, add clearer error messages

Anuja•9 months ago

36fdd60View on GitHub

add requirements.txt with pinned dependencies

Anuja•9 months ago

2724805View on GitHub

fix max_tokens not being passed to API calls

Anuja•9 months ago

a384c71View on GitHub

timeout and token limit research; empirical analysis of processing times and token usage patterns, timeout testing experiments, discovery that historical slow scenarios now complete quickly

Anuja•9 months ago

ed92c9cView on GitHub

time limit experiment analysis and results; statistical experiment comparing model performance over time, a pilot thinking experiment with timing baselines, results dashboard and analysis docs, timeaware model configs and prompts

Anuja•9 months ago

8cba33cView on GitHub

grok3 mini reasoning levels

James Campbell•11 months ago

082853fView on GitHub

add 4.1 to results

James Campbell•11 months ago

890ccceView on GitHub

View all commits