Found 77 repositories(showing 30)
b4rtaz
Distributed LLM inference. Connect home devices into a powerful cluster to accelerate LLM inference. More devices means faster inference.
michaelneale
reference impl with llama.cpp compiled to distributed inference across machines, with real end to end demo
ADT109119
一個基於 llama.cpp 的分佈式 LLM 推理程式,讓您能夠利用區域網路內的多台電腦協同進行大型語言模型的分佈式推理,使用 Electron 的製作跨平台桌面應用程式操作 UI。
tsaol
🚀 Fine-tune Large Language Models on AWS SageMaker using LLaMA Factory - End-to-end pipeline for distributed LLM training, evaluation & deployment
Github-Scalers-AI
Serve Llama 2 (7B/13B/70B) Large Language Models efficiently at scale by leveraging heterogeneous Dell™ PowerEdge™ Rack servers in a distributed manner.
aws-samples
End-to-end solution for cold-start recommendations using vLLM, DeepSeek Llama (8B & 70B), and FAISS on AWS Trainium (Trn1) with the Neuron SDK and NeuronX Distributed. Includes LLM-based interest expansion, embedding comparisons (T5 & SentenceTransformers), and scalable retrieval workflows.
sajosam
Self-spawning AI agents born from tasks. Zero pre-built agents. Distributed memory. 4-layer guardrails. Fossil record. Groq + LLaMA.
zosma-ai
Task Manager for Distributed LLaMA 2 inference network
arseniy0924
Web UI for orchestrating distributed llama.cpp RPC GPU clusters with auto node discovery, telemetry, and one-click deployment.
HichamAgueny
LLM course for distributed fine-tuning and inference on HPC systems using PyTorch and LLaMA model for summarization & QA.
Siritao
Deploy llama2 serving on multiple GPUs via flask
zosma-ai
LLAMA-2 inference node that works with distributed cluster
Ptchwir3
Turn any Kubernetes Cluster into a private LLM endpoint. One Helm command deploys distributed inference across commodity hardware. Raspberry Pi's, old servers, mixed architectures. OpenAI-Compatible API Powered by llama.cpp RPC
LambdaLabsML
No description available
saakethtypes
No description available
Romyull-Islam
No description available
stillandcalm
Full FineTuning of Llama-3-8B on distributed GPU nodes using Deepspeed
fabiofalopes
No description available
stafel
A distributed language model service for Alpaca / Llama
himanishpuri
Local RAG chatbot with semantic search (ChromaDB), Redis-backed caching and queues, distributed workers, and real-time streaming via SSE — powered by llama.cpp
mnouira02
A distributed, local-first AI Race Engineer for F1 202x. Uses Computer Vision, UDP Telemetry, and Llama 3.2 to provide real-time strategy without cloud latency.
bar6132
AI-powered distributed video platform using FastAPI, RabbitMQ, and Next.js 16. Features a local Generative AI pipeline (Llama 3.2 + Moondream) for video summarization, zero-hallucination analysis, and dynamic transcoding.
rammaruboina-rgb
Production-grade LLaMA fine-tuning framework with Go orchestration. Features LoRA/QLoRA adapters, 4-bit quantization, distributed training, and seamless deployment via vLLM, BentoML, and cloud platforms. Optimized for domain-specific AI models.
rinoScremin
High-performance distributed matrix computation for AI workloads. Supports CPUs, Vulkan/Metal GPUs, PyTorch CUDA nodes, and LLaMA/ggml backends. Uses shard-based distribution with ZeroMQ networking, RAM/disk storage, and flexible environment-based configuration for multi-node clusters.
rinoScremin
High-performance distributed matrix computation for AI workloads. Supports CPUs, Vulkan/Metal GPUs, PyTorch CUDA nodes, and LLaMA/ggml backends. Uses shard-based distribution with ZeroMQ networking, RAM/disk storage, and flexible environment-based configuration for multi-node clusters.
ssr9857
Distributed inference llama
rzredg
No description available
llamasearchai
No description available
vedantjh2
No description available
fromthefox
No description available