Search Results

Found 17 repositories(showing 17)

BizFinBench.v2

HiThink-Research

🧡50

BizFinBench.v2: A Unified Offline–Online Bilingual Benchmark for Expert-Level Financial Capability Evaluation of LLMs

Python

Updated 1 week ago

benchmarkfinancial-llmllm-benchmarking+1

A unified benchmark for evaluating continual agent memory in LLM-based systems across 5 evaluation modes (Online, Offline, Replay, Transfer, Repair) and 6 interactive tasks, supporting both system and personal memory mechanisms.

MIT

Python

Updated 1 week ago

OnDeviceLLM-Android

krtarunsingh

🧡50

Starter repo for building an offline Android Chat + Translate app with multi-path LLM backends: llama.cpp (Adreno OpenCL), MLC-LLM (TVM), and WebLLM/WebGPU fallback. Includes JNI stubs, Kotlin UI, PWA shell, benchmark harness, and scripts for GGUF/MLC model packs.

MIT

Python

Updated 1 month ago

inferbench-ai

omprakash0702

❤️45

CPU-based offline LLM benchmarking platform with Prometheus observability and Docker deployment.

Python

Updated 1 month ago

ollama-says

tannernicol

🧡55

Structured prompt injection testing against local LLMs — 37 attack vectors, 8 signal detectors, model benchmarks, fully offline

MIT

Python

Updated 3 weeks ago

ai-safetyevaluationllm-security+4

2025-Wordle-Fibble-LLM-Game-AI-Competition

gbalaji27

❤️35

LLM-powered Wordle solver achieving 100% win rate with 3.7-3.8 avg tries - outperforming GPT-5 benchmark by 35%. Supports Groq API (online) and Ollama (offline).

Python

Updated 3 months ago

vLLM-vs-Hugging-Face

alishafique3

❤️45

This project benchmarks vLLM and Hugging Face Transformers for offline LLM inference, leveraging vLLM’s optimized execution such as PagedAttention and continuous batching, to enable faster generation and efficient GPU memory usage.

Jupyter Notebook

Updated 2 months ago

offline-llm-benchmark

divyathakran

🧡60

A framework for benchmarking and evaluating locally running LLMs across latency, throughput, and structured output reliability.

MIT

Python

Updated 3 weeks ago

LocalLLM-Bench

SrileakhanaMangapathi

🧡60

Offline LLM benchmarking system using Ollama to evaluate latency, quality, and resource usage across models.

MIT

Updated 1 week ago

edgebench-local-guardrails

vaibhav4046

🧡65

Offline benchmark suite for local Ollama LLMs on Windows — measures latency per model, enforces JSON schema output guardrails, generates comparative reports

MIT

Python

Updated 2 days ago

benchmarkguardrailsjson-schema+6

EdgeMind

BhaveshMakhija

🧡55

EdgeMind: Offline LLM Benchmarking Platform using Ollama, FastAPI, and React. Run and compare quantized LLMs locally with a modern UI, analyzing latency, performance, and quality on CPU-only systems.

JavaScript

Updated 2 weeks ago

agentic-kahoot-2.0

pateldhruvkumar

❤️45

Agentic Kahoot auto‑answer bot that joins live games with Puppeteer, sends each question to an n8n + LLM research workflow, and automatically clicks the answer while also supporting offline MCQ benchmarking.

JavaScript

Updated 2 months ago

automationjavascriptjson+3

-Offline-LLMs-for-Low-Spec-Devices--A-Study-on-Efficiency-and-Usability-

phani-x507

❤️35

This project benchmarks offline Masked and Causal Language Models (MLMs, CLMs) like DistilBERT, MobileBERT, TinyLLaMA, Phi-2, and Gemma-2B. It evaluates accuracy, efficiency, and text quality using BLEU, ROUGE, and Perplexity, ensuring privacy-focused AI applications. The results guide optimal offline LLM selection.

MIT

Python

Updated 3 months ago

OfflineRAG

yerramsettysuchita

🧡55

O RAG, a fully offline Retrieval-Augmented Generation Android app that runs a 1.5 B-parameter quantized LLM and hybrid BM25/TF IDF/embedding retriever entirely on-device, achieving perfect retrieval quality on an employee-handbook benchmark.

Jupyter Notebook

Updated 2 weeks ago

Local-LLM-Inference-Benchmarking-Platform

AahanaGanjewar

❤️45

A fully offline benchmarking platform for evaluating Small Language Models (SLMs) using **Ollama** and **Gradio**. This project compares multiple locally hosted LLMs across **latency, token throughput, memory usage, and response quality** to understand real-world **speed vs quality tradeoffs**.

Python

Updated 1 month ago

LocalLLM-Lab

Haus-Nous

🧡65

A hands-on project exploring local LLM/SLM inference using Ollama, focused on real-world constraints like privacy, latency, and cost. Benchmarks multiple models on the same hardware, comparing speed vs quality trade-offs, and analyzes performance for practical deployment scenarios — all fully offline.

Python

Updated 6 days ago

evaluator

zaebee

🧡65

Offline evaluator for LLM agent traces from a CI/CD benchmark task. Agents are given a git repository and asked to get a change merged into `main`. The evaluator reads raw session logs, detects exploitative behavior, computes multi-dimensional scores, and produces a cross-model leaderboard.

Python

Updated 1 day ago

All 17 repositories loaded

GitHub Explorer

Search Results

BizFinBench.v2

AgentMemoryBench

OnDeviceLLM-Android

inferbench-ai

ollama-says

2025-Wordle-Fibble-LLM-Game-AI-Competition

vLLM-vs-Hugging-Face

offline-llm-benchmark

LocalLLM-Bench

edgebench-local-guardrails

EdgeMind

agentic-kahoot-2.0

-Offline-LLMs-for-Low-Spec-Devices--A-Study-on-Efficiency-and-Usability-

OfflineRAG

Local-LLM-Inference-Benchmarking-Platform

LocalLLM-Lab

evaluator

BizFinBench.v2

AgentMemoryBench

OnDeviceLLM-Android

inferbench-ai

ollama-says

2025-Wordle-Fibble-LLM-Game-AI-Competition

vLLM-vs-Hugging-Face

offline-llm-benchmark

LocalLLM-Bench

edgebench-local-guardrails

EdgeMind

agentic-kahoot-2.0

-Offline-LLMs-for-Low-Spec-Devices--A-Study-on-Efficiency-and-Usability-

OfflineRAG

Local-LLM-Inference-Benchmarking-Platform

LocalLLM-Lab

evaluator