Found 997 repositories(showing 30)
llm-attacks
Universal and Transferable Attacks on Aligned Language Models
MorDavid
Advanced LLM-powered brute-force tool combining AI intelligence with automated login attacks
ethz-spylab
A Dynamic Environment to Evaluate Attacks and Defenses for LLM Agents.
agencyenterprise
PromptInject is a framework that assembles prompts in a modular fashion to provide a quantitative analysis of the robustness of LLMs to adversarial prompt attacks. 🏆 Best Paper Awards @ NeurIPS ML Safety Workshop 2022
knostic
OpenAnt from Knostic is an open source LLM-based vulnerability discovery product that helps defenders proactively find verified security flaws while minimizing both false positives and false negatives. Stage 1 detects. Stage 2 attacks. What survives is real.
liu00222
This repository provides a benchmark for prompt injection attacks and defenses in LLMs
tml-epfl
Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks [ICLR 2025]
Yu-Fangxu
[ICML 2024] COLD-Attack: Jailbreaking LLMs with Stealthiness and Controllability
praetorian-inc
LLM security testing framework for detecting prompt injection, jailbreaks, and adversarial attacks — 190+ probes, 28 providers, single Go binary
PKU-YuanGroup
Attack to induce LLMs within hallucinations
romovpa
Autoresearch for LLM adversarial attacks
usail-hkust
Bag of Tricks: Benchmarking of Jailbreak Attacks on LLMs. Empirical tricks for LLM Jailbreaking. (NeurIPS 2024)
BishopFox
A productionized greedy coordinate gradient (GCG) attack tool for large language models (LLMs)
microsoft
A benchmark for evaluating the robustness of LLMs and defenses to indirect prompt injection attacks.
GodXuxilie
An LLM can Fool Itself: A Prompt-Based Adversarial Attack (ICLR 2024)
MrMoshkovitz
Automated red-team toolkit for stress-testing LLM defences - Vector Attacks on LLMs (Gendalf Case Study)
niconi19
[NAACL2024] Attacks, Defenses and Evaluations for LLM Conversation Safety: A Survey
DmitrL-dev
AI Security Platform: Defense (61 Rust engines + Micro-Model Swarm) + Offense (39K+ payloads)
uw-nsl
[ACL24] Official Repo of Paper `ArtPrompt: ASCII Art-based Jailbreak Attacks against Aligned LLMs`
SaFo-Lab
[COLM 2024] JailBreakV-28K: A comprehensive benchmark designed to evaluate the transferability of LLM jailbreak attacks to MLLMs, and further assess the robustness and safety of MLLMs against a variety of jailbreak attacks.
Junjie-Chu
This is the public code repository of paper 'Comprehensive Assessment of Jailbreak Attacks Against LLMs'
OSU-NLP-Group
AmpleGCG: Learning a Universal and Transferable Generator of Adversarial Attacks on Both Open and Closed LLM
ezztahoun
Find relevant incidents, logs, events, and alerts to all of your incidents. [Attack Flows, Attack Chains, & Root Cause Discovery - NO LLMs, NO Queries, Just Explainable Machine Learning] >> Use it for free here: https://app.cypienta.io
LiuYuancheng
The objective of this program is to leverage AI-LLM technology to process of human language-based CTI documents to succinctly summarize the attack flow path outlined within such materials via mapping the attack behaviors to the MITRE-ATT&CK and matching the vulnerabilities to MITRE-CWE.
Buyun-Liang
[NeurIPS 2025] SECA: Semantically Equivalent and Coherent Attacks for Eliciting LLM Hallucinations
XHMY
AutoDefense: Multi-Agent LLM Defense against Jailbreak Attacks
Beijing-AISI
Panda Guard is designed for researching jailbreak attacks, defenses, and evaluation algorithms for large language models (LLMs).
requie
A comprehensive reference for securing Large Language Models (LLMs). Covers OWASP GenAI Top-10 risks, prompt injection, adversarial attacks, real-world incidents, and practical defenses. Includes catalogs of red-teaming tools, guardrails, and mitigation strategies to help developers, researchers, and security teams deploy AI responsibly.
facebookresearch
Repo for the paper "Meta SecAlign: A Secure Foundation LLM Against Prompt Injection Attacks".
datasec-lab
[USENIX Security '24] An LLM-Assisted Easy-to-Trigger Backdoor Attack on Code Completion Models: Injecting Disguised Vulnerabilities against Strong Detection