Search Results

Found 54 repositories(showing 30)

ComfyUI_Simple_Qwen3-VL-gguf

KLL535

🧡65

Simple Qwen3-VL gguf model loader for Comfy-UI.

MIT

Python

Updated 5 hours ago

gguf-loader

GGUFloader

🧡50

GGUF Loader with its Agentic Mode, and floating button, ai Models | Open Source & Offline. Mistral, Deepseek, llama, gemma, qwen

MIT

Python

Updated 2 weeks ago

ai-assistantchatgpt-osscode+8

EdgeRunner

christopherkarani

🧡65

🚀 LLM inference Engine in Swift/Metal, Load GGUF and safe tensors modes, no conversion, no cpp, pure swift

Swift

Updated 17 hours ago

Comprehensive ComfyUI custom node suite featuring Smart Loaders (multi-format checkpoint support with Nunchaku/GGUF quantization), Smart Prompt system with wildcards, sophisticated pipe ecosystem, universal type converters, image/video utilities, and workflow helpers.

Apache-2.0

Python

Updated 1 hour ago

ml-library-cpp

AidenTran900

🧡55

A C++/Python machine learning library built from scratch. Features classic ML algorithms and a GGUF-compatible inference loader for transformers.

MIT

C++

Updated 2 days ago

glai

laelhalawani

❤️25

glai - GGUF LLAMA AI - Package for simplified model handling and text generation with Llama models quantized to GGUF format. APIs for downloading and loading models automatically, includes a db with models of various scale and quantizations. With this high level API you need one line to load the model and one to generate text completions.

NOASSERTION

Python

Updated 11 months ago

aichatgptgenerative-ai+4

blazr

ml-rust

❤️45

Production-grade inference server for LLMs. Supports standard HuggingFace models (Llama, Mistral, Qwen, Phi, Gemma, DeepSeek) and custom hybrid architectures (Mamba2, MLA, MoE). Loads SafeTensors, AWQ, GPTQ, and GGUF formats

Apache-2.0

Rust

Updated 1 week ago

godot_llama

mgrigajtis

💛70

Godot 4 GDExtension plugin for local LLM dialogue with llama.cpp. Includes GGUF model loading, context creation, streaming text generation, and a runnable demo scene with GUI controls for NPC persona, world state, memory, and generation settings.

MIT

C++

Updated 22 hours ago

CMDAI

Krzyzyk33

🧡60

Local two-file AI app for running GGUF models via llama-cpp-python: model loading, terminal chat, and much more....

NOASSERTION

Python

Updated 1 week ago

aiawesome-ggufawesome-llm+17

comfyui_gguf_marduk191

marduk191

❤️35

A comprehensive set of custom nodes for working with GGUF (GPT-Generated Unified Format) quantized models in ComfyUI. These nodes enable you to load and use quantized diffusion models, reducing memory usage while maintaining quality.

Python

Updated 3 months ago

Llama3-GGUF-llama_cpp-Implementation

tonyma163

❤️35

Using llama_cpp to load Llama GGUF model

MIT

Python

Updated 1 year ago

gguf-reader-dot-net

abbaing

❤️40

GGUF Reader .NET facilitates reading GGUF files from different LLMs in .NET Core 8. It includes features for dynamic DLL loading, GGUF file interpretation, and interactive prompt execution for advanced operations.

MIT

Updated 1 year ago

ggufllmsnetcore

GAML

Fimeg

❤️35

GAML accelerates GGUF model loading using GPU parallel processing instead of slow CPU sequential operations.

C++

Updated 7 months ago

SMITH

fredrikpaulin

💛70

Metal GPU compute for Bun — tensors, autograd, transformers, convnets, GGUF/safetensors model loading, and vision (ResNet, CLIP) on Apple Silicon

MIT

JavaScript

Updated 6 days ago

SharpInfer

JPaulDuncan

🧡60

A pure C# LLM inference engine built from scratch — no Python, no llama.cpp bindings, no ONNX Runtime. SharpInfer loads GGUF and Safetensors models directly, dequantizes weights in managed code, and runs the full transformer forward pass natively on .NET 8.

MIT

Updated 2 weeks ago

aiinferenceinference-api+4

decode-gguf

Guney-olu

❤️35

Learning and loading gguf format

Python

Updated 7 months ago

Comfy-MultiGPU-Loader

AngelCookiesLab

🧡60

ComfyUI custom nodes for sharding SD/SDXL/Flux checkpoints across multiple GPUs no GGUF/quantization, just multi-GPU loading and sampling with debug diagnostics.

GPL-3.0

Python

Updated 1 week ago

turboquant-llama

CarapaceUDE

💛70

llama.cpp fork: Qwen 3.5 hybrid GGUF + loader fixes; syncs with ggml-org/llama.cpp

MIT

C++

Updated 10 hours ago

llama-server-docker

gtrias

🧡50

Docker setup for llama.cpp server with router mode, supporting multiple GGUF models with lazy loading

Shell

Updated 1 week ago

gguf-parser

Defilan

🧡60

A Rust library and CLI for parsing GGUF model file headers — extract metadata, architecture, quantization, and tensor info without loading weights.

MIT

Rust

Updated 3 weeks ago

ggmlggufllama-cpp+4

vasted

deeflect

💛70

GPU inference in one command. Auto-picks a cheap Vast.ai GPU, loads any GGUF model, gives you an OpenAI-compatible endpoint.

MIT

Python

Updated 2 days ago

ggufgpuinference+4

4DLLM-Python-GGUF-Integration

RomanAILabs-Auth

❤️40

A runtime and builder enabling direct Python execution inside GGUF-based LLMs using the 4DLLM container format. Supports modular cognition, dependency-aware module loading, and programmable model behavior at inference time.

NOASSERTION

Python

Updated 3 months ago

glm4-7flash-opus-colab

GrandFuzard

💛70

Ready-to-run Colab notebook to run GLM-4.7-Flash Finetuned on Claude Opus 4.5 xHigh-Reasoning (GGUF) with llama.cpp, featuring GPU/CPU split loading, streaming chat, multi-chat manager, and a Gradio web UI — optimized for free T4 environments.

MIT

Jupyter Notebook

Updated 5 days ago

ggufgguf-quantizationglm+2

FinetuneOrch

SunPCSolutions

❤️40

FineTuneOrch is a web-based orchestration dashboard that simplifies fine-tuning language models using easy-dataset and LLaMA-Factory. It provides a unified interface to monitor services, manage end-to-end workflows (data prep, fine-tuning, GGUF conversion, Ollama loading), and deploy models seamlessly.

MIT

C++

Updated 5 months ago

deep-learningfine-tuningllama-factory+5

agent-arena

MckAnissa

❤️45

A fully local AI environment where two (or more) LLM agents debate, argue, collaborate, or meltdown in real time. Built with Python + Streamlit + llama-cpp, Agent Arena lets you load quantized GGUF models, assign personalities, configure behavior, and watch unsupervised agent-to-agent. Includes conversation logging, personality creation.

Python

Updated 1 month ago

agent-to-agentaicpp+4

gguf_loader

zihaomu

❤️30

No description available

MIT

Updated 1 year ago

ollama-gguf-loader

winternewt

❤️25

No description available

Updated 1 year ago

tiny-gguf-loader-for-chat

k-min9

❤️35

tiny LLM loader

Python

Updated 2 years ago

DanilkaAI-Loader

Lolik612

❤️45

The Loader for gguf models

Java

Updated 2 months ago

Universal-Loader

Zenthrose

❤️35

Universal vulkan gguf loader. Will load v1, v2, and v3 gguf files, all quantized formats

C++

Updated 3 months ago

GitHub Explorer

Search Results

ComfyUI_Simple_Qwen3-VL-gguf

gguf-loader

EdgeRunner

ComfyUI_Eclipse

ml-library-cpp

glai

blazr

godot_llama

CMDAI

comfyui_gguf_marduk191

Llama3-GGUF-llama_cpp-Implementation

gguf-reader-dot-net

GAML

SMITH

SharpInfer

decode-gguf

Comfy-MultiGPU-Loader

turboquant-llama

llama-server-docker

gguf-parser

vasted

4DLLM-Python-GGUF-Integration

glm4-7flash-opus-colab

FinetuneOrch

agent-arena

gguf_loader

ollama-gguf-loader

tiny-gguf-loader-for-chat

DanilkaAI-Loader

Universal-Loader

ComfyUI_Simple_Qwen3-VL-gguf

gguf-loader

EdgeRunner

ComfyUI_Eclipse

ml-library-cpp

glai

blazr

godot_llama

CMDAI

comfyui_gguf_marduk191

Llama3-GGUF-llama_cpp-Implementation

gguf-reader-dot-net

GAML

SMITH

SharpInfer

decode-gguf

Comfy-MultiGPU-Loader

turboquant-llama

llama-server-docker

gguf-parser

vasted

4DLLM-Python-GGUF-Integration

glm4-7flash-opus-colab

FinetuneOrch

agent-arena

gguf_loader

ollama-gguf-loader

tiny-gguf-loader-for-chat

DanilkaAI-Loader

Universal-Loader