Search Results

Found 5 repositories(showing 5)

LLM-Stress-Test-Suite

29shnick

🧡60

A multilingual benchmark evaluating LLM logic, slang, and technical jargon under strict system-prompt constraint

MIT

Updated 2 weeks ago

Arabic-LLM.Stress-Test.Suite

Med-865

❤️35

Comprehensive Stress-Test Suite for Arabic Large Language Models (LLMs)

Python

Updated 4 months ago

ai-reliability-suite

vip529

❤️40

A full-stack, open-source suite of tools for evaluating, stress-testing, and improving the reliability of large language models (LLMs) and agentic AI systems

Apache-2.0

Updated 4 months ago

A local-first red-team harness for stress-testing LLM guardrails. Runs adversarial prompt suites against pluggable LLM providers (Ollama, Gemini), applies layered guardrails, and evaluates refusals, jailbreaks, and false positives with transparent incident logging.

Java

Updated 1 month ago

tfm-llm-risk-lab

jofainita

🧡50

Experimental framework to evaluate operational risk in LLM-assisted cybersecurity decisions under uncertainty and stress conditions. The project introduces a risk-oriented metric (Unsafe Disruptive Decision Rate, UDR) and a stress testing suite to analyze decision-making degradation beyond action correctness.

MIT

Jupyter Notebook

Updated 2 months ago

All 5 repositories loaded

GitHub Explorer

Search Results

LLM-Stress-Test-Suite

Arabic-LLM.Stress-Test.Suite

ai-reliability-suite

guiardrail-redteam-harness

tfm-llm-risk-lab

LLM-Stress-Test-Suite

Arabic-LLM.Stress-Test.Suite

ai-reliability-suite

guiardrail-redteam-harness

tfm-llm-risk-lab