Back to search
Evaluating self-evolving agent systems (MASE) with StreamBench / EvoSkill / ARC-AGI-2 / AgentHarm — three-system controlled comparison, Phase 1 results included
Stars
0
Forks
0
Watchers
0
Open Issues
0
Overall repository health assessment
No package.json found
This might not be a Node.js project
No contributors data available
docs: add README per standards + Phase 1 experiment results
8d864daView on GitHubexp: run experiments 1+3 baseline — AgentHarm + ARC-AGI-2 results
afb72e9View on GitHubfeat: add eval_adapter — research plan preparation phase complete
f649799View on GitHubdocs: add research plan v1.0 for self-evolving agent evaluation
ba50d7aView on GitHubinit: add agent-eval-benchmarks project with 15 benchmark repos
ea45d1dView on GitHub