Found 129 repositories(showing 30)
splunk
DECeption with Evaluative Integrated Validation Engine (DECEIVE): Let an LLM do all the hard honeypot work!
usail-hkust
🔥🔥🔥 DSLighting is an LLM-driven autonomous data science execution engine that turns task descriptions and datasets into iterative code generation, execution, evaluation, and refinement workflows.
HiThink-Research
General AI evaluation and Gauge Engine. A unified evaluation engine for LLMs, MLLMs, audio, and diffusion models.
sinaptia
LLM evaluation engine for Rails.
modelize-ai
Tutorial for LLM developers about engine design, service deployment, evaluation/benchmark, etc. Provide a C/S style optimized LLM inference engine.
Generative-Engine-Marketing
First complete benchmark for Generative Engine Marketing (GEM), an emerging field that focuses on monetizing generative AI by seamlessly integrating advertisements into Large Language Model (LLM) responses. Our work addresses the core problem of ad-injected response (AIR) generation and provides a framework for its evaluation.
boson-ai
Evaluation of LLMs as RPG Game Engines
Snowflake-Labs
No description available
Dextergao14
An early access of FNSIE 4D LLM-Evaluation Engine
metareason-ai
Open-source LLM evaluation engine with statistical confidence scoring
aman-bhaskar-codes
Production-grade LLM systems built through learning-by-building. Covers extraction engines, RAG pipelines, evaluation systems, agents, and LLM observability.
LucidAkshay
An autonomous, self-improving 5-agent engine for end-to-end LLM fine-tuning. Automates data generation, QLoRA training, and evaluation.
yashhooda1
AI-powered hiring engine that parses resumes, extracts structured candidate intelligence with LLMs, and evaluates job fit against a job description. Built with Python, Streamlit, and OpenAI.
krishnapanjiyar
A hybrid recommendation engine built with Python, Pandas, Scikit-learn, and Flask, combining SVD matrix factorization and cosine similarity for personalized movie recommendations. Features include real-time REST API, LLM-powered natural language queries, evaluation with RMSE / Precision@k, optional Streamlit UI, and Docker + CI/CD support.
ztavakolirad
This project evaluates and compares two LLMs on various software engineering tasks, including code generation, test generation, and documentation. The models used are phi-2 and Cohere Command.
An AI-powered underwriting engine for real estate risk evaluation using cross-document validation and LLM-based analysis
Shan1252424
End-to-end Automated QA Engine for LLM testing with logging, evaluation metrics, SQL, JIRA, and Google Sheets integration.
turify
🧠 Build advanced prompt optimisation engine with AI scoring, recommendations - contributors needed for LLM evaluation algorithms. NexJS 15, LangChain, Prism, Shadcn.
sidx04
Explainable Eligibility and Impact Engine for Indian public welfare schemes that combines deterministic, rule-based eligibility evaluation with an LLM-assisted explanation layer..
0DevDutt0
Autonomous multi-agent AI research and evaluation engine built with FastAPI, Streamlit, CrewAI and local LLMs (Ollama). Implements structured research generation, verification, critique, scoring, and quality analysis pipeline.
GTRPGM
A fixed rule engine for LLM-driven TRPG that evaluates actions, applies world rules, and produces deterministic state transitions with contextual inputs.
ojasvatsyayan
🔍 A Python-based comparison of Bing and Google search engines using LangChain and official API wrappers. Collects, stores, and analyzes search result data for future use in LLM evaluation, ranking, or NLP pipelines.
tianzq13184
A lightweight AI-powered evaluation engine for short-answer and technical questions. Semantic Scoring Agent combines LLM semantic understanding, rubric-based reasoning, and rule-enhanced scoring to deliver consistent, explainable, and teacher-reviewable assessments.
sankarbaseone
A modular Retrieval-Augmented Generation (RAG) engine for building enterprise AI assistants. Supports document ingestion, chunking, embeddings, vector search, and LLM-based answer generation. Includes evaluation tools and an extensible architecture for chatbots, knowledge bases, and AI copilots.
Alkur123
Built a pre-generation AI governance engine that evaluates user prompts for harm, prompt injection, medical risk, and PHI exposure before LLM invocation. The system enforces ALLOW/ABSTAIN/BLOCK decisions with uncertainty scoring, explainable decision traces, and FP/FN evaluation to make safety trade-offs transparent and auditable.
AnkitMaheshwariIn
Risk-Engine-LLM is an AI-powered microservice that evaluates fraud risk for payment transactions using rule-based heuristics. It simulates a real-world fraud scoring engine focused on explainability and modularity. Built with Node.js, TypeScript, and Express, it uses LLMs to generate human-readable risk explanations.
achyuth-2308
IntelliRisk is an AI-powered compliance risk evaluation engine built using Google Gemini APIs. It parses engineering specs and technical documents, identifies non-compliance using RAG and LLMs, and delivers explainable insights to assist product engineers in regulatory decision-making.
baltagiyc
GEO Pulse is a brand audit application for GEO (Generative Engine Optimization). It evaluates a brand's visibility in LLM responses (ChatGPT, Gemini, Perplexity, etc.) and generates strategic recommendations to improve this visibility.
anilatambharii
A full-stack decision engine for enterprises to evaluate and rank LLMs (Llama 4, GPT-5, Gemini) based on domain constraints like HIPAA compliance, financial determinism, and latency. Built with Next.js, FastAPI, and Docker
davidade10
Full-stack algorithmic trading agent for Iron Condors (Sanitized Showcase). Built with Python, FastAPI, Next.js, and TimescaleDB. Integrates Charles Schwab API data with local LLM reasoning and a deterministic rules engine to evaluate, score, and execute options trades via a high-density React dashboard.