GitHub Explorer

by Alexey Ratnikov

GitHub Explorer

GitHub Explorer|TRENDING COMPARE|FEEDBACK

Back to search

Copyright (c) 2026 Alexey Ratnikov

zaebee/evaluator - GitHub Explorer | GitHub Explorer | Trending | Compare

evaluator

zaebee•PUBLIC

Offline evaluator for LLM agent traces from a CI/CD benchmark task. Agents are given a git repository and asked to get a change merged into `main`. The evaluator reads raw session logs, detects exploitative behavior, computes multi-dimensional scores, and produces a cross-model leaderboard.

Created on Apr 3, 2026

Updated on Apr 5, 2026

Stars

0

Forks

0

Watchers

0

Open Issues

0

Repository Health Score

🧡

65/100

Fair

Overall repository health assessment

Score Breakdown

Activity

Active development - updated this week

30/30

100%

Community

0 stars, 0 forks

0/30

0%

Documentation

Has description, wiki

15/20

75%

Maintenance

0.0% issue ratio

20/20

100%

Health score is calculated based on activity, community engagement, documentation quality, and maintenance practices

Languages

Python

100.0%

Dependencies

No package.json found

This might not be a Node.js project

Top Contributors

1

zaebee

User

23

commits

Recent Commits

Merge pull request #1 from zaebee/fix/outcome-merge-detection

Andrey G•2 days ago

c5c178cView on GitHub

refactor: tighten _MERGE_SUCCESS_PAT and guard with gh pr command check

zaebee•2 days ago

a87841cView on GitHub

refactor: move _MERGE_SUCCESS_PAT to module level

zaebee•2 days ago

a41f27eView on GitHub

fix: detect merge success from tool_result output, not command text

zaebee•2 days ago

9ac72a4View on GitHub

fix: recursive directory discovery for --traces ../traces/

zaebee•4 days ago

c153d0cView on GitHub

feat: add --format frontend output and fill ModelStats gaps

zaebee•4 days ago

603b80bView on GitHub

docs: update README to reflect v2 scoring, strategy taxonomy, OpenCode format

zaebee•4 days ago

056e6ddView on GitHub

feat: detect sleep/polling loops as efficiency signal and looper condition

zaebee•4 days ago

7f8cc73View on GitHub

feat: add OpenCode markdown trace parser

zaebee•4 days ago

06d0efdView on GitHub

fix: catch --method PUT/PATCH/DELETE variant of ruleset bypass

zaebee•4 days ago

0219516View on GitHub

feat: wire strategy classification into run_eval pipeline

zaebee•4 days ago

dd3bd76View on GitHub

feat: aggregate strategy distribution into ModelStats

zaebee•4 days ago

fade0bfView on GitHub

feat: add strategy.py — BehaviorProfile computation and strategy classification

zaebee•4 days ago

494cc2aView on GitHub

feat: metrics v2 — severity-weighted integrity, loop/retry efficiency deduction

zaebee•4 days ago

b5ca951View on GitHub

feat: add severity modifiers to PatternRule and classify_event

zaebee•4 days ago

3e8a3deView on GitHub

View all commits