Search Results

Found 189 repositories(showing 30)

BenchmarkJava

OWASP-Benchmark

💛85

OWASP Benchmark is a test suite designed to verify the speed and accuracy of software vulnerability detection tools. A fully runnable web app written in Java, it supports analysis by Static (SAST), Dynamic (DAST), and Runtime (IAST) tools that support Java. The idea is that since it is fully runnable and all the vulnerabilities are actually exploitable, it’s a fair test for any kind of vulnerability detection tool. For more details on this project, please see the OWASP Benchmark Project home page.

786

1.4k

GPL-2.0

Java

Updated 6 days ago

PatchEval

bytedance

🧡55

PatchEval: A New Benchmark for Evaluating LLMs on Patching Real-World Vulnerabilities

195

Apache-2.0

Python

Updated 2 weeks ago

software-engineeringsoftware-securityvulnerability-repair

cve-bench

uiuc-kang-lab

💛71

CVE-Bench: A Benchmark for AI Agents’ Ability to Exploit Real-World Web Application Vulnerabilities

189

Apache-2.0

Python

Updated 2 days ago

benchmarkinspectlanguage-model

ossf-cve-benchmark

🧡56

The OpenSSF CVE Benchmark consists of code and metadata for over 200 real life CVEs, as well as tooling to analyze the vulnerable codebases using a variety of static analysis security testing (SAST) tools and generate reports to evaluate those tools.

162

MIT

TypeScript

Updated 2 days ago

benchmarkcveopen-source+2

whalescan

nccgroup

🧡60

Whalescan is a vulnerability scanner for Windows containers, which performs several benchmark checks, as well as checking for CVEs/vulnerable packages on the container

157

Apache-2.0

Python

Updated 3 weeks ago

cybersecuritydocker

pivaa

HTBridge

🧡61

Created by High-Tech Bridge, the Purposefully Insecure and Vulnerable Android Application (PIVAA) replaces outdated DIVA for benchmark of mobile vulnerability scanners.

106

GPL-3.0

Java

Updated 3 weeks ago

android-applicationmobile-securityvulnerable-application

SolidityGuard

alt-research

💛70

Solidity/EVM smart contract security auditor — 104 vulnerability patterns, 8 tools, 100% CTF + EVMBench benchmark (120/120)

NOASSERTION

Python

Updated 8 hours ago

VulBench

Hustcw

🧡55

This is a benchmark for evaluating the vulnerability discovery ability of automated approaches including Large Language Models (LLMs), deep learning methods and static analyzers

MIT

Updated 3 weeks ago

auto-pen-bench

lucagioacchini

💛70

This repo contains the codes of the penetration test benchmark for Generative Agents presented in the paper "AutoPenBench: Benchmarking Generative Agents for Penetration Testing". It contains also the instructions to install, develop and test new vulnerable containers to include in the benchmark.

MIT

Python

Updated 4 days ago

benchmarkgenerative-agentsgenerative-ai+1

eyeballvul

timothee-chauvin

💛70

future-proof vulnerability detection benchmark, based on CVEs in open-source repos

MIT

Python

Updated 2 days ago

OWApp-Benchmarking-Suite

Mobile-IoT-Security-Lab

❤️30

The OWApp Benchmark: an OWASP-compliant Vulnerable Android App Dataset

Java

Updated 8 months ago

MT-Consistency

yubol-bobo

🧡60

This repo investigates LLMs' tendency to exhibit acquiescence bias in sequential QA interactions. Includes evaluation methods, datasets, benchmarks, and experiment code to assess and mitigate vulnerabilities in conversational consistency and robustness, offering a reproducible framework for future research.

MIT

Python

Updated 2 weeks ago

benchmarkconsistencylarge-language-models+2

VulDetectBench

Sweetaroo

🧡60

A Novel Benchmark evaluating the Deep Capability of Vulnerability Detection with Large Language Models

MIT

Updated 2 weeks ago

erebus-redgiant

Troublor

❤️30

Smart contract front-running vulnerability benchmark

CC0-1.0

Updated 8 months ago

benchmarkblockchainethereum+3

CASTLE-Benchmark

🧡60

The CASTLE Benchmark is a modern micro-benchmarking solution to test Static Analyzers and LLMs in vulnerability detection

Updated 14 hours ago

benchmarksaststatic-analyzer+1

IoTVulBench

a101e-lab

🧡60

IoTVulBench is an open-source benchmark dataset for IoT security research, containing firmware-related vulnerabilities and the corresponding toolkits for building firmware emulations and verifying vulnerabilities.

Apache-2.0

Python

Updated 1 week ago

BenchmarkPython

OWASP-Benchmark

🧡60

The OWASP Benchmark for Python is a test suite designed to verify the accuracy of Python software vulnerability detection tools. A fully runnable web app written in Python, it supports analysis by Static (SAST), Dynamic (DAST), and Runtime (IAST) tools that support Python. For more details, see the OWASP Benchmark Project home page.

GPL-3.0

Python

Updated 4 days ago

Fine-Tuning-LLMs-for-Cyber-Threat-Intelligence

FaroukDaboussi0

❤️30

This project aims to fine-tune a pre-trained LLM using CTI-specific data and evaluate its performance with CTIBench, a benchmark designed for cybersecurity tasks. CTIBench helps assess how well the model performs on tasks like identifying threat actors, mapping attack techniques, and correlating vulnerabilities

Jupyter Notebook

Updated 3 months ago

bench

agentlisa

🧡60

LISABench - Smart Contract Vulnerability Detection Benchmark

MIT

Python

Updated 1 week ago

PEVuln

nross12

❤️40

PEVuln: A Benchmark Dataset for Using Machine Learning to Detect Vulnerabilities in PE Malware

GPL-3.0

Updated 4 months ago

SoC_Vulnerability_Benchmarks

socsecresearch

❤️45

No description available

Updated 1 week ago

Loki

satty-br

🧡50

LOKI (Leverage Offensive Knowledge Intelligently) is an AI-powered tool that creates pull requests with realistic code and dependency vulnerabilities to test and benchmark SAST/SCA tools.

Apache-2.0

Python

Updated 2 months ago

vulnerability-benchmark

mikusher

❤️35

Benchmark collection for analysis. The idea is to have a collection of projects in several languages as well as various sast applications to do scans and comparisons. At the end of the day the intention is to reduce the number of false positives in benchmarks projects.

Updated 1 year ago

analysislanguagessast+3

CryptoFormalEval

Cristian-Curaba

🧡60

We introduce a benchmark for testing how well LLMs can find vulnerabilities in cryptographic protocols. By combining LLMs with symbolic reasoning tools like Tamarin, we aim to improve the efficiency and thoroughness of protocol analysis, paving the way for future AI-powered cybersecurity defenses.

GPL-3.0

Haskell

Updated 2 weeks ago

communication-protocolcryptographyevaluation+4

Legal-bug-bounty

Amitelazari

❤️20

This is the #legalbugbounty standardization project. As I explain in my Enigma talk and my papers - the legal landscape of bug bounties is currently lacking. Safe harbor is the exception, not the standard and thousands of thousands of hunters are put in "legal's" harm way. I've suggested that bug bounty legal terms, starting with safe harbor, could and should be standardized. Once standardization of bug bounty legal language is achieved, the bug bounty economy will become an alternate private legal regime in which white-hat hacking is celebrated through regulatory incentives. Standardization will start a race-to-the-top over the quality of bug bounty terms. This project, supported by CLTC, aims to achieve standardization of bug bounty legal terms across platforms, industries and sponsors, in line with the DOJ framework, and akin to the licenses employed by Creative Commons and the open source industry. This will reduce the informational burden and increase hackers’ awareness of terms (salience). It could also signal whether a particular platform or company conforms with the standard terms that are considered best practice. Finally, it could reduce the drafting costs of the platform or sponsoring program, as well as the transactional costs. While some organizations (such as governmental or financial organizations) might require adjustments, generally the legal concerns of bug bounties’ sponsors and platforms are similar and could be addressed in standardized language. Moreover, standardization should be used to ensure that hackers have authorized access to any third-parties data or components implemented in the bug bounty administrator product/network, and to facilitate coordinated disclosure of third-party vulnerabilities found (and ethically disclosed). Companies and platforms should coordinate to ensure that such clauses are included in all terms, facilitating a best practice mentality in the industry. The benefits of standardization in bug bounties/CVDs of legal language across industries and platforms in light of DOJ framework +One language of safe harbor akin to Creative Commons/Open Source +Create an industry standard that will serve as a benchmark and signal to hackers if companies don’t adopt it +Reduce the informational burden and increase hackers’ awareness towards terms +Reduce transaction and drafting costs +Create a reputation system for legal terms You must consult with a lawyer Disclaimer: this report does not constitute legal advice and the author is not admitted to practice law in the U.S. The information contained herein is for general guidance on matters of interest only. The application and impact of laws can vary widely based on the specific facts involved. Given the changing nature of laws, terms, rules and regulations, there may be delays, omissions or inaccuracies in information contained herein. Accordingly, the information is provided with the understanding that the author is not herein engaged in rendering legal or other professional advice and services. As such, it should not be used as a substitute for consultation with professional legal or other competent advisers. Before making any decision or taking any action, you should consult a professional. All information is provided “as is”, with no guarantee of completeness, accuracy, timeliness or of the results obtained from the use of this information, and without warranty of any kind, express or implied, including, but not limited to warranties of performance, merchantability and fitness for a particular purpose. In no event will the author be liable to you or anyone else for any decision made or action taken in reliance on the information herein or for any consequential, special or similar damages.

Updated 6 months ago

kidnapp-ai-benchmark

toxy4ny

❤️45

Kidnapp-AI-Benchmark is a modular, extensible framework designed to systematically test and evaluate privacy leakage, data extraction, and adversarial vulnerabilities in large language models (LLMs) and other generative AI systems. Built for red teamers, penetration testers, and AI security researchers.

Python

Updated 1 month ago

achilles-benchmark-depscanners

secure-software-engineering

❤️30

Achilles - Benchmark for assessing OSS-Vulnerability Scanners 59

LGPL-3.0

Java

Updated 9 months ago

hypejab

getastra

❤️40

HypeJab is a deliberately vulnerable web application intended for benchmarking automated scanners.

JavaScript

Updated 2 weeks ago

benchmark-llms-security

FuzzingLabs

❤️30

Benchmarking 12 LLMs for vulnerability research

Python

Updated 8 months ago

llm-security-benchmark

rapticore

🧡50

A multi-LLM benchmark suite for evaluating security analysis and vulnerability detection capabilities across OpenAI, Anthropic, Google's models.

MIT

Python

Updated 4 days ago

GitHub Explorer

Search Results

BenchmarkJava

PatchEval

cve-bench

ossf-cve-benchmark

whalescan

pivaa

SolidityGuard

VulBench

auto-pen-bench

eyeballvul

OWApp-Benchmarking-Suite

MT-Consistency

VulDetectBench

erebus-redgiant

CASTLE-Benchmark

IoTVulBench

BenchmarkPython

Fine-Tuning-LLMs-for-Cyber-Threat-Intelligence

bench

PEVuln

SoC_Vulnerability_Benchmarks

Loki

vulnerability-benchmark

CryptoFormalEval

Legal-bug-bounty

kidnapp-ai-benchmark

achilles-benchmark-depscanners

hypejab

benchmark-llms-security

llm-security-benchmark

BenchmarkJava

PatchEval

cve-bench

ossf-cve-benchmark

whalescan

pivaa

SolidityGuard

VulBench

auto-pen-bench

eyeballvul

OWApp-Benchmarking-Suite

MT-Consistency

VulDetectBench

erebus-redgiant

CASTLE-Benchmark

IoTVulBench

BenchmarkPython

Fine-Tuning-LLMs-for-Cyber-Threat-Intelligence

bench

PEVuln

SoC_Vulnerability_Benchmarks

Loki

vulnerability-benchmark

CryptoFormalEval

Legal-bug-bounty

kidnapp-ai-benchmark

achilles-benchmark-depscanners

hypejab

benchmark-llms-security

llm-security-benchmark