HaluMem is the first operation level hallucination evaluation benchmark tailored to agent memory systems.
Stars
121
Forks
14
Watchers
121
Open Issues
0
Overall repository health assessment
No package.json found
This might not be a Node.js project
18
commits
feat: add F1-score metric for memory extraction task in `eval/evaluation.py`.
fdcfc8eView on GitHubfeat: update add and search interfaces in eval/eval_memos.py.
5e0c277View on GitHubfix: optimize retry mechanism in `eval_supermemory.py` and fix invalid characters in `container_tag`.
10c7f8bView on GitHub