Found 39 repositories(showing 30)
zzyfight
GenAI compliance benchmark is a evaluation benchmarks for generative AI in regulated industries.
sgl-project
Genai-bench is a powerful benchmark tool designed for comprehensive token-level performance evaluation of large language model (LLM) serving systems.
TIGER-AI-Lab
Code and Data for "GenAI Arena: An Open Evaluation Platform for Generative Models" [NeurIPS 2024]
facebookresearch
GenAI Media Generation Challenge Benchmark
hiamitabha
Code to benchmark APIs available from LLM vendors and demostrate how they work
kangruobing
VQualA 2025 GenAI-Bench AIGC Video Quality Assessment Challenge
seanbetts
Benchmark tests to evaluate LLM's marketing knowledge, understanding and capabilities
speglich
This project provides a complete Terraform infrastructure setup for benchmarking Generative AI models, specifically designed for Oracle Cloud Infrastructure (OCI). It automates the deployment of compute instances with pre-configured benchmarking tools and includes performance comparison capabilities between different AI platforms.
This project aims to provide hands-on experience with three major classes of generative models — Generative Adversarial Networks , Variational Autoencoders , and Diffusion Models. Students will implement simplified versions of each model, train them on image datasets and compare their generated samples, training dynamics, and evaluation metrics.
enterprisebot-community
No description available
Testing-AI-Security-Dashboard-Org
No description available
Run OpenVINO GenAI LLM_BENCH in a batch
nearai
No description available
Testing-AI-Security-Dashboard-Org
No description available
pdtgct
Yet another Generative AI Performance dataset generation and benchmarking toolset.
guytonde
Energy profiling and benchmarking suite for inference optimizations across LLMs and diffusion models.
ptakpiotr
Simple TUI for running benchmarks (tasks) for locally-run AI models using Ollama
HaoZhang615
No description available
jelyoussefi
No description available
Nvillaluenga
A few benchmarks on different agentic architectural approachs
key4ng
A Rust reimplementation of genai-bench for benchmarking LLM serving systems at high concurrency with accurate timing and industry-standard metrics.
wanheo09
No description available
SumitKochar
GenAI_Benchmarking_Models
GazzoA
Benchmarking generative AI tools for literature retrieval and summarization in genomic variant interpretation
Collinsbrefo123
This repository contains a multi-model generative AI evaluation project comparing lightweight, open-source LLMs under identical inference conditions. It focuses on analyzing instruction-following behavior, response quality, and model trade-offs for practical GenAI system design.
RazumAI-ch
GxP benchmark for ALCOA+ deviation detection and audit validation
igorrazumny
A structured benchmark for evaluating Generative AI models (e.g., OpenAI GPT-4o, Claude, Gemini) on their ability to identify quality deviations in healthcare manufacturing recipes. Focuses on GxP-relevant issues, model comparison, and long-term reproducibility.
ramirez-ai-labs
Lakehouse-native evaluation framework for measuring regional Spanish LLM performance (SV vs PE) using Delta tables, Spark, and Databricks. Demonstrates Bronze/Silver/Gold architecture and production-ready GenAI evaluation patterns.
Saivinay24
Physics-grounded evaluation harness for auditing Generative Video (Optical Flow metrics).
Chenik00Anas
Measuring the cost of accuracy in generative AI models — TER I3S Lab, Université Côte d'Azur