Found 38 repositories(showing 30)
ztxz16
fastllm是后端无依赖的高性能大模型推理库。同时支持张量并行推理稠密模型和混合模式推理MOE模型,任意10G以上显卡即可推理满血DeepSeek。双路9004/9005服务器+单显卡部署DeepSeek满血满精度原版模型,单并发20tps;INT4量化模型单并发30tps,多并发可达60+。
567-labs
A collection of LLM services you can self host via docker or modal labs to support your applications development
FreedomIntelligence
Fast LLM Training CodeBase With dynamic strategy choosing [Deepspeed+Megatron+FlashAttention+CudaFusionKernel+Compiler];
Rexhaif
High-performance parallel LLM API request tool
shafvfshkga
Fastllm-based chatbot
thansen0
A low latency, fault tolerant API for accessing LLM's written in C++ using llama.cpp.
siemonchan
A fork of ztxz16/fastllm, support ThinkForce`s TFACC
eigen2017
triton backend for fastllm
shellyfung
fastllm windows
clemens33
Fast and easy wrapper around LLMs.
joao-savietto
No description available
yuebo
A openai api wrapper using fastllm
ArtemisDicoTiar
No description available
henryyantq
The repo provides the "main" binary file generated with FastLLM for the latest mainstream ARMv8-based Android mobile devices (i.e. smartphones and tablets). The file has been tested compatible to Qualcomm Snapdragon 8+ Gen 1.
ArtificialZeng
fastllm-explained
chensiyi0904
One python project to overcome ollama on different servers that make LLM generate fast.
StormMapleleaf
fastllm
HanHoupu
No description available
blkt
FastLLM - Rust based LLM Inference API
thansen0
Website for FastLLM.cpp
deppcyan
No description available
jomarweb
No description available
Ashwyn28
No description available
fredrickboscoqxxe
No description available
UPTK-GPGPU
No description available
eigen2017
triton stream backend for fastllm
ARES3366
No description available
HanHoupu
No description available
soodaryan
LLM Inference Optimization Pipelines using Quantization Techniques
eigen2017
No description available