Found 882 repositories(showing 30)
ngxson
Real-time webcam demo with SmolVLM and llama.cpp server
mostlygeek
Reliable model swapping for any local OpenAI/Anthropic compatible server - llama.cpp, vllm, etc
abi
Fully private LLM chatbot that runs entirely with a browser with no server needed. Supports Mistral and LLama 3.
waybarrios
OpenAI and Anthropic compatible server for Apple Silicon. Run LLMs and vision-language models (Llama, Qwen-VL, LLaVA) with continuous batching, MCP tool calling, and multimodal support. Native MLX backend, 400+ tok/s. Works with Claude Code.
alexziskind1
Interactive launcher and benchmarking harness for llama.cpp server throughput, with tests, sweeps, and round‑robin load tools.
ardanlabs
Your personal engine for running open source models locally. Use Go for hardware accelerated local inference with llama.cpp directly integrated into your Go applications via the yzma module. Kronk provides a high-level API that feels similar to using an OpenAI compatible API. Kronk also provides a model server to run local work
iaalm
A OpenAI API compatible REST server for llama.
Najmul190
A Discord chatbot / selfbot that allows users to talk to AI powered by Groq API which uses Meta Llama-3 or use your own ChatGPT API key. The AI runs on a genuine Discord account, not a bot account and so it can be put in any server without any permissions! Try it out at: https://discord.gg/yUWmzQBV4P
trzy
LLaVA server (llama.cpp).
GobinFan
支持查询主流agent框架技术文档的MCP server(支持stdio和sse两种传输协议), 支持 langchain、llama-index、autogen、agno、openai-agents-sdk、mcp-doc、camel-ai 和 crew-ai
nuance1979
LLaMA Server combines the power of LLaMA C++ with the beauty of Chatbot UI.
ortegaalfredo
Native gui to serveral AI services plus llama.cpp local AIs.
thad0ctor
No description available
lordmathis
Unified management and routing for llama.cpp, MLX and vLLM models with web dashboard.
run-llama
A MCP server connecting to managed indexes on LlamaCloud
avilum
A client/server for LLaMA (Large Language Model Meta AI) that can run ANYWHERE.
yazon
🚀 FlexLLama - Lightweight self-hosted tool for running multiple llama.cpp server instances with OpenAI v1 API compatibility and multi-GPU support
nicknochnack
An end to end walkthrough of LLaMA CPP's server.
herrera-luis
Demo python script app to interact with llama.cpp server using whisper API, microphone and webcam devices.
willbnu
Configs, launchers, benchmarks, and tooling for running Qwen3.5 GGUF models locally with llama.cpp on a 16GB NVIDIA GPU
llm-use
LLM orchestration toolkit for agent workflows: planner + workers + synthesis, optional router (LLM + learned fallback), supports OpenAI/Anthropic/Ollama/llama.cpp, real scraping with caching, MCP server integration, and a TUI chat UI.
SamuelTallet
A lightweight LLaMA.cpp HTTP server Docker image based on Alpine Linux.
vmlinuzx
One stop shop - Local-first RAG stack with intelligent polyglot-code/docs, remote code execution, local llama enrichment, progressive disclosure tools, mcp server, sandboxed security.
m18coppola
No-messing-around sh client for llama.cpp's server
simonw
LLM plugin for interacting with llama-server models
kurnevsky
A client for llama-cpp server
jhud
An easily-trained baby GPT that can stand in for the real thing. Based on Andrej Karpathy's makemore, but set up to mimic a llama-cpp server. This is not production-ready; it's a toy implementation for educational purposes.
AI powered voice calling assistant using Twilio as telephony server and Meta LLAMA as agent model.
hwpoison
A lightweight chat terminal-interface for llama.cpp server written in C++ with many features and windows/linux support.
Use the two different methods (deepspeed and SageMaker model parallelism library) to fine tune llama model on Sagemaker. Then deploy the fine tuned llama on Sagemaker with server side batch.