Found 791 repositories(showing 30)
intel
Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, DeepSeek, Mixtral, Gemma, Phi, MiniCPM, Qwen-VL, MiniCPM-V, etc.) on Intel XPU (e.g., local PC with iGPU and NPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, vLLM, DeepSpeed, Axolotl, etc.
johnbean393
A native macOS app that allows users to chat with a local LLM that can respond with information from files, folders and websites on your Mac without installing any other software. Powered by llama.cpp.
RahulSChand
Calculate token/s & GPU memory requirement for any LLM. Supports llama.cpp/ggml/bnb/QLoRA quantization
milanglacier
💃 Dance with Intelligence in Your Code. Minuet offers code completion as-you-type from popular LLMs including OpenAI, Gemini, Claude, Ollama, Llama.cpp, Codestral, and more.
ngxson
WebAssembly binding for llama.cpp - Enabling on-browser LLM inference
devoxx
DevoxxGenie is a plugin for IntelliJ IDEA that uses local LLM's (Ollama, LMStudio, GPT4All, Jan and Llama.cpp) and Cloud based LLMs to help review, test, explain your project code. Latest version now also supports Spec Driven Development with CLI Runners.
Maximilian-Winter
The llama-cpp-agent framework is a tool designed for easy interaction with Large Language Models (LLMs). Allowing users to chat with LLM models, execute structured function calls and get structured output. Works also with models not fine-tuned to JSON output and function calls.
TheBlewish
A Python-based web-assisted large language model (LLM) search assistant using Llama.cpp
milanglacier
💃 Dance with LLM in Your Code. Minuet offers code completion as-you-type from popular LLMs including OpenAI, Gemini, Claude, Ollama, Llama.cpp, Codestral, and more.
lucasjinreal
A Pure Rust based LLM, VLM, VLA, TTS, OCR Inference Engine, powering by Candle & Rust. Alternate to your llama.cpp but much more simpler and cleaner..
Siddhesh2377
On-device AI for Android — LLM chat (GGUF/llama.cpp), vision models (VLM), image generation (Stable Diffusion), tool calling, AI personas, RAG knowledge packs, TTS/STT. Fully offline, zero subscriptions, open-source.
mgonzs13
llama.cpp (GGUF LLMs) and llava.cpp (GGUF VLMs) for ROS 2
baileytec-labs
Deploy llama.cpp compatible Generative AI LLMs on AWS Lambda!
phronmophobic
Run LLMs locally. A clojure wrapper for llama.cpp.
QuantiusBenignus
Input text from speech in any Linux window, the lean, fast and accurate way, using whisper.cpp OFFLINE. Speak with local LLMs via llama.cpp.
NetEase-Media
Higher performance OpenAI LLM service than vLLM serve: A pure C++ high-performance OpenAI LLM service implemented with GPRS+TensorRT-LLM+Tokenizers.cpp, supporting chat and function call, AI agents, distributed multi-GPU inference, multimodal capabilities, and a Gradio chat interface.
simonw
LLM plugin for running models using llama.cpp
implyinfer
Start running LLMs, vision models, and camera pipelines in under an hour. This fully-configured single board computer comes with a bootable NVMe pre-loaded with Ollama, Llama.cpp, Roboflow vision inference, various LLM and VLM models, and 20+ applications ready to run. Save 40-120 hours of setup time.
samestrin
A simple NPM interface for seamlessly interacting with 36 Large Language Model (LLM) providers, including OpenAI, Anthropic, Google Gemini, Cohere, Hugging Face Inference, NVIDIA AI, Mistral AI, AI21 Studio, LLaMA.CPP, and Ollama, and hundreds of models.
A Discord Bot for chatting with LLaMA, Vicuna, Alpaca, MPT, or any other Large Language Model (LLM) supported by text-generation-webui or llama.cpp.
HaujetZhao
将 Qwen3-ASR 的 LLM 部分导出为 GGUF,用 llama.cpp 进行加速推理。后者支持 Vulkan 和 Cuda 加速。
BjornMelin
DocMind AI is a powerful, open-source Streamlit application leveraging LlamaIndex, LangGraph, and local Large Language Models (LLMs) via Ollama, LMStudio, llama.cpp, or vLLM for advanced document analysis. Analyze, summarize, and extract insights from a wide array of file formats, securely and privately, all offline.
peva3
SmarterRouter: An intelligent LLM gateway and VRAM-aware router for Ollama, llama.cpp, and OpenAI. Features semantic caching, model profiling, and automatic failover for local AI labs.
ddh0
Python package wrapping llama.cpp for on-device LLM inference
tylike
openai chatgpt or local llm(llama.cpp gguf format)+TTS+STT+Word+Excel
HaujetZhao
最极速的Qwen3-TTS推理方案。将 Qwen3-TTS 的 LLM 部分导出为 GGUF,用 llama.cpp 进行加速推理。后者支持 Vulkan 和 Cuda 加速。
ferranpons
True on-device AI for Kotlin Multiplatform (Android, iOS, Desktop, JVM, WASM). LLM, Speech-to-Text and Image Generation — powered by llama.cpp, whisper.cpp and stable-diffusion.cpp.
docusealco
Ruby FFI bindings for llama.cpp to run open-source LLMs such as GPT-OSS, Qwen 3.5, Gemma 4, and Llama 3 locally with Ruby.
ngxson
Web UI for Alpaca.cpp - Locally run an Instruction-Tuned Chat-Style LLM
ADT109119
一個基於 llama.cpp 的分佈式 LLM 推理程式,讓您能夠利用區域網路內的多台電腦協同進行大型語言模型的分佈式推理,使用 Electron 的製作跨平台桌面應用程式操作 UI。