Search Results

Found 129 repositories(showing 30)

DECEIVE

splunk

💛71

DECeption with Evaluative Integrated Validation Engine (DECEIVE): Let an LLM do all the hard honeypot work!

281

MIT

Python

Updated 1 day ago

dslighting

usail-hkust

💛70

🔥🔥🔥 DSLighting is an LLM-driven autonomous data science execution engine that turns task descriptions and datasets into iterative code generation, execution, evaluation, and refinement workflows.

NOASSERTION

Python

Updated 2 days ago

GAGE

HiThink-Research

❤️40

General AI evaluation and Gauge Engine. A unified evaluation engine for LLMs, MLLMs, audio, and diffusion models.

Python

Updated 1 week ago

agentgame-arenallm+3

ruby_llm-evals

sinaptia

🧡55

LLM evaluation engine for Rails.

Ruby

Updated 3 days ago

railsruby-llm

LLM-Inference-Deployment-Tutorial

modelize-ai

❤️35

Tutorial for LLM developers about engine design, service deployment, evaluation/benchmark, etc. Provide a C/S style optimized LLM inference engine.

Apache-2.0

Python

Updated 1 year ago

gptinferencellama+5

First complete benchmark for Generative Engine Marketing (GEM), an emerging field that focuses on monetizing generative AI by seamlessly integrating advertisements into Large Language Model (LLM) responses. Our work addresses the core problem of ad-injected response (AIR) generation and provides a framework for its evaluation.

Apache-2.0

Python

Updated 5 days ago

aibenchmarkgem+4

rpgbench-public

boson-ai

🧡50

Evaluation of LLMs as RPG Game Engines

Apache-2.0

Python

Updated 1 month ago

sfguide-prompt-engineering-and-llm-evaluation

Snowflake-Labs

❤️30

No description available

Apache-2.0

PLpgSQL

Updated 2 years ago

LLM-Evaluation-Engine

Dextergao14

🧡50

An early access of FNSIE 4D LLM-Evaluation Engine

MIT

Python

Updated 1 month ago

metareason-core

metareason-ai

🧡50

Open-source LLM evaluation engine with statistical confidence scoring

MIT

Python

Updated 1 day ago

ai-governancebayesian-inferenceconfidence-scoring+2

llm-engineering-lab

aman-bhaskar-codes

🧡55

Production-grade LLM systems built through learning-by-building. Covers extraction engines, RAG pipelines, evaluation systems, agents, and LLM observability.

Python

Updated 2 weeks ago

vikaasloop

LucidAkshay

🧡60

An autonomous, self-improving 5-agent engine for end-to-end LLM fine-tuning. Automates data generation, QLoRA training, and evaluation.

AGPL-3.0

Python

Updated 1 week ago

hooda-hiring-ai

yashhooda1

❤️45

AI-powered hiring engine that parses resumes, extracts structured candidate intelligence with LLMs, and evaluates job fit against a job description. Built with Python, Streamlit, and OpenAI.

Python

Updated 1 month ago

Intelligent-Movie-Recommendation-System

krishnapanjiyar

❤️35

A hybrid recommendation engine built with Python, Pandas, Scikit-learn, and Flask, combining SVD matrix factorization and cosine similarity for personalized movie recommendations. Features include real-time REST API, LLM-powered natural language queries, evaluation with RMSE / Precision@k, optional Streamlit UI, and Docker + CI/CD support.

Python

Updated 4 months ago

LLM-Evaluation-for-Software-Engineering

ztavakolirad

🧡50

This project evaluates and compares two LLMs on various software engineering tasks, including code generation, test generation, and documentation. The models used are phi-2 and Cohere Command.

Python

Updated 4 weeks ago

coherellmnlp+1

IrvineHack-2026-Backend-generative-AI-model-

yizhel17

💛70

An AI-powered underwriting engine for real estate risk evaluation using cross-document validation and LLM-based analysis

MIT

JavaScript

Updated 6 days ago

QA_Ai-Automation-Project

Shan1252424

❤️40

End-to-end Automated QA Engine for LLM testing with logging, evaluation metrics, SQL, JIRA, and Google Sheets integration.

MIT

Updated 5 months ago

turify-prompts

turify

❤️40

🧠 Build advanced prompt optimisation engine with AI scoring, recommendations - contributors needed for LLM evaluation algorithms. NexJS 15, LangChain, Prism, Shadcn.

MIT

TypeScript

Updated 10 months ago

genailangchainlanggraph+7

welfare-ai

sidx04

🧡55

Explainable Eligibility and Impact Engine for Indian public welfare schemes that combines deterministic, rule-based eligibility evaluation with an LLM-assisted explanation layer..

Python

Updated 1 week ago

astraeus-ai-research-engine

0DevDutt0

🧡55

Autonomous multi-agent AI research and evaluation engine built with FastAPI, Streamlit, CrewAI and local LLMs (Ollama). Implements structured research generation, verification, critique, scoring, and quality analysis pipeline.

Python

Updated 1 week ago

ai-systemsartificial-intelligenceautonomous-agents+10

rule-engine

GTRPGM

❤️30

A fixed rule engine for LLM-driven TRPG that evaluates actions, applies world rules, and produces deterministic state transitions with contextual inputs.

Python

Updated 1 month ago

search-engine-comparison

ojasvatsyayan

❤️35

🔍 A Python-based comparison of Bing and Google search engines using LangChain and official API wrappers. Collects, stores, and analyzes search result data for future use in LLM evaluation, ranking, or NLP pipelines.

Jupyter Notebook

Updated 1 year ago

semantic-scoring-agent

tianzq13184

❤️35

A lightweight AI-powered evaluation engine for short-answer and technical questions. Semantic Scoring Agent combines LLM semantic understanding, rubric-based reasoning, and rule-enhanced scoring to deliver consistent, explainable, and teacher-reviewable assessments.

Python

Updated 3 months ago

genai-rag-engine

sankarbaseone

❤️35

A modular Retrieval-Augmented Generation (RAG) engine for building enterprise AI assistants. Supports document ingestion, chunking, embeddings, vector search, and LLM-based answer generation. Includes evaluation tools and an extensible architecture for chatbots, knowledge bases, and AI copilots.

Python

Updated 4 months ago

ai-safety-governance-engine

Alkur123

🧡60

Built a pre-generation AI governance engine that evaluates user prompts for harm, prompt injection, medical risk, and PHI exposure before LLM invocation. The system enforces ALLOW/ABSTAIN/BLOCK decisions with uncertainty scoring, explainable decision traces, and FP/FN evaluation to make safety trade-offs transparent and auditable.

Apache-2.0

Python

Updated 2 weeks ago

Risk-Engine-LLM

AnkitMaheshwariIn

❤️35

Risk-Engine-LLM is an AI-powered microservice that evaluates fraud risk for payment transactions using rule-based heuristics. It simulates a real-world fraud scoring engine focused on explainability and modularity. Built with Node.js, TypeScript, and Express, it uses LLMs to generate human-readable risk explanations.

TypeScript

Updated 9 months ago

IntelliRisk-AI-Powered-Risk-Evaluation-Engine

achyuth-2308

❤️40

IntelliRisk is an AI-powered compliance risk evaluation engine built using Google Gemini APIs. It parses engineering specs and technical documents, identifies non-compliance using RAG and LLMs, and delivers explainable insights to assist product engineers in regulatory decision-making.

MIT

Python

Updated 8 months ago

geo-pulse

baltagiyc

❤️45

GEO Pulse is a brand audit application for GEO (Generative Engine Optimization). It evaluates a brand's visibility in LLM responses (ChatGPT, Gemini, Perplexity, etc.) and generates strategic recommendations to improve this visibility.

Python

Updated 2 months ago

enterprise-llm-selector

anilatambharii

🧡60

A full-stack decision engine for enterprises to evaluate and rank LLMs (Llama 4, GPT-5, Gemini) based on domain constraints like HIPAA compliance, financial determinism, and latency. Built with Next.js, FastAPI, and Docker

MIT

Python

Updated 2 weeks ago

spark-agent-showcase

davidade10

🧡55

Full-stack algorithmic trading agent for Iron Condors (Sanitized Showcase). Built with Python, FastAPI, Next.js, and TimescaleDB. Integrates Charles Schwab API data with local LLM reasoning and a deterministic rules engine to evaluate, score, and execute options trades via a high-density React dashboard.

Python

Updated 2 weeks ago

GitHub Explorer

Search Results

DECEIVE

dslighting

GAGE

ruby_llm-evals

LLM-Inference-Deployment-Tutorial

GEM-Bench

rpgbench-public

sfguide-prompt-engineering-and-llm-evaluation

LLM-Evaluation-Engine

metareason-core

llm-engineering-lab

vikaasloop

hooda-hiring-ai

Intelligent-Movie-Recommendation-System

LLM-Evaluation-for-Software-Engineering

IrvineHack-2026-Backend-generative-AI-model-

QA_Ai-Automation-Project

turify-prompts

welfare-ai

astraeus-ai-research-engine

rule-engine

search-engine-comparison

semantic-scoring-agent

genai-rag-engine

ai-safety-governance-engine

Risk-Engine-LLM

IntelliRisk-AI-Powered-Risk-Evaluation-Engine

geo-pulse

enterprise-llm-selector

spark-agent-showcase

DECEIVE

dslighting

GAGE

ruby_llm-evals

LLM-Inference-Deployment-Tutorial

GEM-Bench

rpgbench-public

sfguide-prompt-engineering-and-llm-evaluation

LLM-Evaluation-Engine

metareason-core

llm-engineering-lab

vikaasloop

hooda-hiring-ai

Intelligent-Movie-Recommendation-System

LLM-Evaluation-for-Software-Engineering

IrvineHack-2026-Backend-generative-AI-model-

QA_Ai-Automation-Project

turify-prompts

welfare-ai

astraeus-ai-research-engine

rule-engine

search-engine-comparison

semantic-scoring-agent

genai-rag-engine

ai-safety-governance-engine

Risk-Engine-LLM

IntelliRisk-AI-Powered-Risk-Evaluation-Engine

geo-pulse

enterprise-llm-selector

spark-agent-showcase