Found 54 repositories(showing 30)
Simple Qwen3-VL gguf model loader for Comfy-UI.
GGUFloader
GGUF Loader with its Agentic Mode, and floating button, ai Models | Open Source & Offline. Mistral, Deepseek, llama, gemma, qwen
christopherkarani
🚀 LLM inference Engine in Swift/Metal, Load GGUF and safe tensors modes, no conversion, no cpp, pure swift
r-vage
Comprehensive ComfyUI custom node suite featuring Smart Loaders (multi-format checkpoint support with Nunchaku/GGUF quantization), Smart Prompt system with wildcards, sophisticated pipe ecosystem, universal type converters, image/video utilities, and workflow helpers.
AidenTran900
A C++/Python machine learning library built from scratch. Features classic ML algorithms and a GGUF-compatible inference loader for transformers.
laelhalawani
glai - GGUF LLAMA AI - Package for simplified model handling and text generation with Llama models quantized to GGUF format. APIs for downloading and loading models automatically, includes a db with models of various scale and quantizations. With this high level API you need one line to load the model and one to generate text completions.
ml-rust
Production-grade inference server for LLMs. Supports standard HuggingFace models (Llama, Mistral, Qwen, Phi, Gemma, DeepSeek) and custom hybrid architectures (Mamba2, MLA, MoE). Loads SafeTensors, AWQ, GPTQ, and GGUF formats
mgrigajtis
Godot 4 GDExtension plugin for local LLM dialogue with llama.cpp. Includes GGUF model loading, context creation, streaming text generation, and a runnable demo scene with GUI controls for NPC persona, world state, memory, and generation settings.
Krzyzyk33
Local two-file AI app for running GGUF models via llama-cpp-python: model loading, terminal chat, and much more....
marduk191
A comprehensive set of custom nodes for working with GGUF (GPT-Generated Unified Format) quantized models in ComfyUI. These nodes enable you to load and use quantized diffusion models, reducing memory usage while maintaining quality.
tonyma163
Using llama_cpp to load Llama GGUF model
abbaing
GGUF Reader .NET facilitates reading GGUF files from different LLMs in .NET Core 8. It includes features for dynamic DLL loading, GGUF file interpretation, and interactive prompt execution for advanced operations.
Fimeg
GAML accelerates GGUF model loading using GPU parallel processing instead of slow CPU sequential operations.
fredrikpaulin
Metal GPU compute for Bun — tensors, autograd, transformers, convnets, GGUF/safetensors model loading, and vision (ResNet, CLIP) on Apple Silicon
JPaulDuncan
A pure C# LLM inference engine built from scratch — no Python, no llama.cpp bindings, no ONNX Runtime. SharpInfer loads GGUF and Safetensors models directly, dequantizes weights in managed code, and runs the full transformer forward pass natively on .NET 8.
Guney-olu
Learning and loading gguf format
AngelCookiesLab
ComfyUI custom nodes for sharding SD/SDXL/Flux checkpoints across multiple GPUs no GGUF/quantization, just multi-GPU loading and sampling with debug diagnostics.
CarapaceUDE
llama.cpp fork: Qwen 3.5 hybrid GGUF + loader fixes; syncs with ggml-org/llama.cpp
gtrias
Docker setup for llama.cpp server with router mode, supporting multiple GGUF models with lazy loading
Defilan
A Rust library and CLI for parsing GGUF model file headers — extract metadata, architecture, quantization, and tensor info without loading weights.
deeflect
GPU inference in one command. Auto-picks a cheap Vast.ai GPU, loads any GGUF model, gives you an OpenAI-compatible endpoint.
RomanAILabs-Auth
A runtime and builder enabling direct Python execution inside GGUF-based LLMs using the 4DLLM container format. Supports modular cognition, dependency-aware module loading, and programmable model behavior at inference time.
GrandFuzard
Ready-to-run Colab notebook to run GLM-4.7-Flash Finetuned on Claude Opus 4.5 xHigh-Reasoning (GGUF) with llama.cpp, featuring GPU/CPU split loading, streaming chat, multi-chat manager, and a Gradio web UI — optimized for free T4 environments.
SunPCSolutions
FineTuneOrch is a web-based orchestration dashboard that simplifies fine-tuning language models using easy-dataset and LLaMA-Factory. It provides a unified interface to monitor services, manage end-to-end workflows (data prep, fine-tuning, GGUF conversion, Ollama loading), and deploy models seamlessly.
MckAnissa
A fully local AI environment where two (or more) LLM agents debate, argue, collaborate, or meltdown in real time. Built with Python + Streamlit + llama-cpp, Agent Arena lets you load quantized GGUF models, assign personalities, configure behavior, and watch unsupervised agent-to-agent. Includes conversation logging, personality creation.
zihaomu
No description available
winternewt
No description available
tiny LLM loader
Lolik612
The Loader for gguf models
Zenthrose
Universal vulkan gguf loader. Will load v1, v2, and v3 gguf files, all quantized formats