Search Results

Found 791 repositories(showing 30)

ipex-llm

intel

💛83

Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, DeepSeek, Mixtral, Gemma, Phi, MiniCPM, Qwen-VL, MiniCPM-V, etc.) on Intel XPU (e.g., local PC with iGPU and NPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, vLLM, DeepSpeed, Axolotl, etc.

8.8k

1.4k

Apache-2.0

Python

Updated 1 hour ago

gpullmpytorch+1

Sidekick

johnbean393

💛70

A native macOS app that allows users to chat with a local LLM that can respond with information from files, folders and websites on your Mac without installing any other software. Powered by llama.cpp.

3.2k

142

MIT

Swift

Updated 12 hours ago

agentic-aiagentsai+16

gpu_poor

RahulSChand

🧡67

Calculate token/s & GPU memory requirement for any LLM. Supports llama.cpp/ggml/bnb/QLoRA quantization

1.4k

JavaScript

Updated 1 day ago

ggmlgpuhuggingface+7

minuet-ai.nvim

milanglacier

💛72

💃 Dance with Intelligence in Your Code. Minuet offers code completion as-you-type from popular LLMs including OpenAI, Gemini, Claude, Ollama, Llama.cpp, Codestral, and more.

1.1k

GPL-3.0

Lua

Updated 3 hours ago

aicode-completionllm+2

wllama

ngxson

💛72

WebAssembly binding for llama.cpp - Enabling on-browser LLM inference

1.0k

MIT

TypeScript

Updated 21 hours ago

llamallamacppllm+2

DevoxxGenieIDEAPlugin

devoxx

🧡56

DevoxxGenie is a plugin for IntelliJ IDEA that uses local LLM's (Ollama, LMStudio, GPT4All, Jan and Llama.cpp) and Cloud based LLMs to help review, test, explain your project code. Latest version now also supports Spec Driven Development with CLI Runners.

636

MIT

Java

Updated 1 day ago

anthropicazure-aichatgpt+17

llama-cpp-agent

Maximilian-Winter

🧡66

The llama-cpp-agent framework is a tool designed for easy interaction with Large Language Models (LLMs). Allowing users to chat with LLM models, execute structured function calls and get structured output. Works also with models not fine-tuned to JSON output and function calls.

624

NOASSERTION

Python

Updated 7 hours ago

agentsfunction-callingllamacpp+5

Web-LLM-Assistant-Llamacpp-Ollama

TheBlewish

💛71

A Python-based web-assisted large language model (LLM) search assistant using Llama.cpp

375

MIT

Python

Updated 2 days ago

minuet-ai.el

milanglacier

🧡65

💃 Dance with LLM in Your Code. Minuet offers code completion as-you-type from popular LLMs including OpenAI, Gemini, Claude, Ollama, Llama.cpp, Codestral, and more.

345

GPL-3.0

Emacs Lisp

Updated 2 days ago

aicode-completionemacs+2

Crane

lucasjinreal

🧡66

A Pure Rust based LLM, VLM, VLA, TTS, OCR Inference Engine, powering by Candle & Rust. Alternate to your llama.cpp but much more simpler and cleaner..

341

Rust

Updated 2 days ago

llama-cppmllmqwen2-vl+3

ToolNeuron

Siddhesh2377

🧡61

On-device AI for Android — LLM chat (GGUF/llama.cpp), vision models (VLM), image generation (Stable Diffusion), tool calling, AI personas, RAG knowledge packs, TTS/STT. Fully offline, zero subscriptions, open-source.

338

Apache-2.0

Kotlin

Updated 2 hours ago

ai-personasandroidgguf-models+13

llama_ros

mgonzs13

💛71

llama.cpp (GGUF LLMs) and llava.cpp (GGUF VLMs) for ROS 2

249

MIT

C++

Updated 1 day ago

audiocppembeddings+14

llama-on-lambda

baileytec-labs

❤️30

Deploy llama.cpp compatible Generative AI LLMs on AWS Lambda!

177

NOASSERTION

Python

Updated 3 months ago

llama.clj

phronmophobic

🧡50

Run LLMs locally. A clojure wrapper for llama.cpp.

173

MIT

Clojure

Updated 1 month ago

clojurellamallama-cpp+1

BlahST

QuantiusBenignus

🧡60

Input text from speech in any Linux window, the lean, fast and accurate way, using whisper.cpp OFFLINE. Speak with local LLMs via llama.cpp.

172

BSD-3-Clause

Shell

Updated 1 week ago

accessibilityaibloat-free+17

grps_trtllm

NetEase-Media

🧡60

Higher performance OpenAI LLM service than vLLM serve: A pure C++ high-performance OpenAI LLM service implemented with GPRS+TensorRT-LLM+Tokenizers.cpp, supporting chat and function call, AI agents, distributed multi-GPU inference, multimodal capabilities, and a Gradio chat interface.

160

Apache-2.0

Python

Updated 2 weeks ago

ai-agentchatglmdeepseek-r1+17

llm-llama-cpp

simonw

🧡60

LLM plugin for running models using llama.cpp

145

Apache-2.0

Python

Updated 2 days ago

jetson-orin-nano-field-kit

implyinfer

🧡65

Start running LLMs, vision models, and camera pipelines in under an hour. This fully-configured single board computer comes with a bootable NVMe pre-loaded with Ollama, Llama.cpp, Roboflow vision inference, various LLM and VLM models, and 20+ applications ready to run. Save 40-120 hours of setup time.

131

MIT

Python

Updated 3 days ago

edge-computingjetpackjetson-orin-nano

llm-interface

samestrin

💛70

A simple NPM interface for seamlessly interacting with 36 Large Language Model (LLM) providers, including OpenAI, Anthropic, Google Gemini, Cohere, Hugging Face Inference, NVIDIA AI, Mistral AI, AI21 Studio, LLaMA.CPP, and Ollama, and hundreds of models.

121

MIT

JavaScript

Updated 9 hours ago

aiai21anthropic+17

chat-llama-discord-bot

xNul

🧡65

A Discord Bot for chatting with LLaMA, Vicuna, Alpaca, MPT, or any other Large Language Model (LLM) supported by text-generation-webui or llama.cpp.

119

MIT

Python

Updated 4 days ago

alpacabotchat+14

Qwen3-ASR-GGUF

HaujetZhao

🧡60

将 Qwen3-ASR 的 LLM 部分导出为 GGUF，用 llama.cpp 进行加速推理。后者支持 Vulkan 和 Cuda 加速。

114

C++

Updated 14 minutes ago

docmind-ai-llm

BjornMelin

💛70

DocMind AI is a powerful, open-source Streamlit application leveraging LlamaIndex, LangGraph, and local Large Language Models (LLMs) via Ollama, LMStudio, llama.cpp, or vLLM for advanced document analysis. Analyze, summarize, and extract insights from a wide array of file formats, securely and privately, all offline.

113

MIT

Python

Updated 1 day ago

ai-agentsdocument-analysishybrid-search+16

SmarterRouter

peva3

🧡65

SmarterRouter: An intelligent LLM gateway and VRAM-aware router for Ollama, llama.cpp, and OpenAI. Features semantic caching, model profiling, and automatic failover for local AI labs.

105

MIT

Python

Updated 2 days ago

ai-cacheai-gatewaydocker+13

easy-llama

ddh0

🧡65

Python package wrapping llama.cpp for on-device LLM inference

103

MIT

Python

Updated 6 days ago

llamallamacppllm+2

AI.Labs

tylike

🧡65

openai chatgpt or local llm(llama.cpp gguf format)+TTS+STT+Word+Excel

100

Updated 3 days ago

chatgptexcelllamacpp+6

Qwen3-TTS-GGUF

HaujetZhao

🧡55

最极速的Qwen3-TTS推理方案。将 Qwen3-TTS 的 LLM 部分导出为 GGUF，用 llama.cpp 进行加速推理。后者支持 Vulkan 和 Cuda 加速。

C++

Updated 2 hours ago

Llamatik

ferranpons

🧡60

True on-device AI for Kotlin Multiplatform (Android, iOS, Desktop, JVM, WASM). LLM, Speech-to-Text and Image Generation — powered by llama.cpp, whisper.cpp and stable-diffusion.cpp.

MIT

HTML

Updated 1 day ago

aiandroiddesktop+17

rllama

docusealco

💛70

Ruby FFI bindings for llama.cpp to run open-source LLMs such as GPT-OSS, Qwen 3.5, Gemma 4, and Llama 3 locally with Ruby.

MIT

Ruby

Updated 1 hour ago

aiembeddingsffi+5

alpaca.cpp-webui

ngxson

❤️40

Web UI for Alpaca.cpp - Locally run an Instruction-Tuned Chat-Style LLM

NOASSERTION

JavaScript

Updated 1 year ago

alpacachatgptllama+1

llamacpp-distributed-inference

ADT109119

🧡55

一個基於 llama.cpp 的分佈式 LLM 推理程式，讓您能夠利用區域網路內的多台電腦協同進行大型語言模型的分佈式推理，使用 Electron 的製作跨平台桌面應用程式操作 UI。

Apache-2.0

JavaScript

Updated 1 week ago

distributed-inferencedistributed-llmgguf+4

GitHub Explorer

Search Results

ipex-llm

Sidekick

gpu_poor

minuet-ai.nvim

wllama

DevoxxGenieIDEAPlugin

llama-cpp-agent

Web-LLM-Assistant-Llamacpp-Ollama

minuet-ai.el

Crane

ToolNeuron

llama_ros

llama-on-lambda

llama.clj

BlahST

grps_trtllm

llm-llama-cpp

jetson-orin-nano-field-kit

llm-interface

chat-llama-discord-bot

Qwen3-ASR-GGUF

docmind-ai-llm

SmarterRouter

easy-llama

AI.Labs

Qwen3-TTS-GGUF

Llamatik

rllama

alpaca.cpp-webui

llamacpp-distributed-inference

ipex-llm

Sidekick

gpu_poor

minuet-ai.nvim

wllama

DevoxxGenieIDEAPlugin

llama-cpp-agent

Web-LLM-Assistant-Llamacpp-Ollama

minuet-ai.el

Crane

ToolNeuron

llama_ros

llama-on-lambda

llama.clj

BlahST

grps_trtllm

llm-llama-cpp

jetson-orin-nano-field-kit

llm-interface

chat-llama-discord-bot

Qwen3-ASR-GGUF

docmind-ai-llm

SmarterRouter

easy-llama

AI.Labs

Qwen3-TTS-GGUF

Llamatik

rllama

alpaca.cpp-webui

llamacpp-distributed-inference