Found 17 repositories(showing 17)
HiThink-Research
BizFinBench.v2: A Unified Offline–Online Bilingual Benchmark for Expert-Level Financial Capability Evaluation of LLMs
s010m00n
A unified benchmark for evaluating continual agent memory in LLM-based systems across 5 evaluation modes (Online, Offline, Replay, Transfer, Repair) and 6 interactive tasks, supporting both system and personal memory mechanisms.
krtarunsingh
Starter repo for building an offline Android Chat + Translate app with multi-path LLM backends: llama.cpp (Adreno OpenCL), MLC-LLM (TVM), and WebLLM/WebGPU fallback. Includes JNI stubs, Kotlin UI, PWA shell, benchmark harness, and scripts for GGUF/MLC model packs.
omprakash0702
CPU-based offline LLM benchmarking platform with Prometheus observability and Docker deployment.
tannernicol
Structured prompt injection testing against local LLMs — 37 attack vectors, 8 signal detectors, model benchmarks, fully offline
LLM-powered Wordle solver achieving 100% win rate with 3.7-3.8 avg tries - outperforming GPT-5 benchmark by 35%. Supports Groq API (online) and Ollama (offline).
alishafique3
This project benchmarks vLLM and Hugging Face Transformers for offline LLM inference, leveraging vLLM’s optimized execution such as PagedAttention and continuous batching, to enable faster generation and efficient GPU memory usage.
divyathakran
A framework for benchmarking and evaluating locally running LLMs across latency, throughput, and structured output reliability.
SrileakhanaMangapathi
Offline LLM benchmarking system using Ollama to evaluate latency, quality, and resource usage across models.
vaibhav4046
Offline benchmark suite for local Ollama LLMs on Windows — measures latency per model, enforces JSON schema output guardrails, generates comparative reports
BhaveshMakhija
EdgeMind: Offline LLM Benchmarking Platform using Ollama, FastAPI, and React. Run and compare quantized LLMs locally with a modern UI, analyzing latency, performance, and quality on CPU-only systems.
pateldhruvkumar
Agentic Kahoot auto‑answer bot that joins live games with Puppeteer, sends each question to an n8n + LLM research workflow, and automatically clicks the answer while also supporting offline MCQ benchmarking.
This project benchmarks offline Masked and Causal Language Models (MLMs, CLMs) like DistilBERT, MobileBERT, TinyLLaMA, Phi-2, and Gemma-2B. It evaluates accuracy, efficiency, and text quality using BLEU, ROUGE, and Perplexity, ensuring privacy-focused AI applications. The results guide optimal offline LLM selection.
yerramsettysuchita
O RAG, a fully offline Retrieval-Augmented Generation Android app that runs a 1.5 B-parameter quantized LLM and hybrid BM25/TF IDF/embedding retriever entirely on-device, achieving perfect retrieval quality on an employee-handbook benchmark.
AahanaGanjewar
A fully offline benchmarking platform for evaluating Small Language Models (SLMs) using **Ollama** and **Gradio**. This project compares multiple locally hosted LLMs across **latency, token throughput, memory usage, and response quality** to understand real-world **speed vs quality tradeoffs**.
Haus-Nous
A hands-on project exploring local LLM/SLM inference using Ollama, focused on real-world constraints like privacy, latency, and cost. Benchmarks multiple models on the same hardware, comparing speed vs quality trade-offs, and analyzes performance for practical deployment scenarios — all fully offline.
zaebee
Offline evaluator for LLM agent traces from a CI/CD benchmark task. Agents are given a git repository and asked to get a change merged into `main`. The evaluator reads raw session logs, detects exploitative behavior, computes multi-dimensional scores, and produces a cross-model leaderboard.
All 17 repositories loaded