Found 5 repositories(showing 5)
29shnick
A multilingual benchmark evaluating LLM logic, slang, and technical jargon under strict system-prompt constraint
Med-865
Comprehensive Stress-Test Suite for Arabic Large Language Models (LLMs)
vip529
A full-stack, open-source suite of tools for evaluating, stress-testing, and improving the reliability of large language models (LLMs) and agentic AI systems
nandanchitale
A local-first red-team harness for stress-testing LLM guardrails. Runs adversarial prompt suites against pluggable LLM providers (Ollama, Gemini), applies layered guardrails, and evaluates refusals, jailbreaks, and false positives with transparent incident logging.
jofainita
Experimental framework to evaluate operational risk in LLM-assisted cybersecurity decisions under uncertainty and stress conditions. The project introduces a risk-oriented metric (Unsafe Disruptive Decision Rate, UDR) and a stress testing suite to analyze decision-making degradation beyond action correctness.
All 5 repositories loaded