Search Results

Found 38 repositories(showing 30)

fastllm

ztxz16

💛73

fastllm是后端无依赖的高性能大模型推理库。同时支持张量并行推理稠密模型和混合模式推理MOE模型，任意10G以上显卡即可推理满血DeepSeek。双路9004/9005服务器+单显卡部署DeepSeek满血满精度原版模型，单并发20tps；INT4量化模型单并发30tps，多并发可达60+。

4.2k

416

Apache-2.0

C++

Updated 15 hours ago

fastllm

567-labs

🧡51

A collection of LLM services you can self host via docker or modal labs to support your applications development

201

MIT

Python

Updated 1 month ago

FastLLM

FreedomIntelligence

❤️45

Fast LLM Training CodeBase With dynamic strategy choosing [Deepspeed+Megatron+FlashAttention+CudaFusionKernel+Compiler];

Python

Updated 1 month ago

fastllm

Rexhaif

🧡60

High-performance parallel LLM API request tool

Python

Updated 1 day ago

Chat-Monika-Chinese-cpp

shafvfshkga

❤️35

Fastllm-based chatbot

C++

Updated 1 year ago

fastllm.cpp

thansen0

❤️35

A low latency, fault tolerant API for accessing LLM's written in C++ using llama.cpp.

Unlicense

C++

Updated 3 months ago

llamacppllmllm-inference

thinkforce-fastllm

siemonchan

🧡60

A fork of ztxz16/fastllm, support ThinkForce`s TFACC

Apache-2.0

C++

Updated 1 week ago

fastllmtritonbackend

eigen2017

❤️35

triton backend for fastllm

C++

Updated 7 months ago

fastllm_windows

shellyfung

❤️40

fastllm windows

C++

Updated 2 months ago

fastllm

clemens33

❤️35

Fast and easy wrapper around LLMs.

Apache-2.0

Python

Updated 7 months ago

fastllm

joao-savietto

❤️45

No description available

MIT

Python

Updated 2 weeks ago

openai-api-fastllm

yuebo

❤️35

A openai api wrapper using fastllm

Java

Updated 7 months ago

FastLLM

ArtemisDicoTiar

❤️10

No description available

Python

Updated 1 year ago

mobile-on-device-GLM-inference

henryyantq

❤️40

The repo provides the "main" binary file generated with FastLLM for the latest mainstream ARMv8-based Android mobile devices (i.e. smartphones and tablets). The file has been tested compatible to Qualcomm Snapdragon 8+ Gen 1.

MIT

Updated 1 year ago

fastllm-explained

ArtificialZeng

❤️45

fastllm-explained

C++

Updated 1 month ago

FastLLM

chensiyi0904

❤️35

One python project to overcome ollama on different servers that make LLM generate fast.

Python

Updated 1 year ago

fastllm

StormMapleleaf

❤️35

fastllm

C++

Updated 10 months ago

fastllm2skip

HanHoupu

❤️35

No description available

Python

Updated 2 months ago

fastllm

blkt

❤️40

FastLLM - Rust based LLM Inference API

Apache-2.0

Rust

Updated 4 months ago

fastllmcpp-website

thansen0

❤️35

Website for FastLLM.cpp

Elixir

Updated 1 year ago

fastllm

deppcyan

❤️25

No description available

Cuda

Updated 1 year ago

fastllm

jomarweb

❤️20

No description available

Python

Updated 8 months ago

fastllm

Ashwyn28

❤️25

No description available

Python

Updated 2 years ago

fastllm

fredrickboscoqxxe

❤️25

No description available

Updated 2 years ago

fastllm

UPTK-GPGPU

❤️30

No description available

C++

Updated 1 month ago

fastllmtritonbackendstream

eigen2017

❤️35

triton stream backend for fastllm

C++

Updated 2 years ago

Fastllm

ARES3366

❤️25

No description available

Updated 2 years ago

fastllmskip

HanHoupu

❤️40

No description available

Apache-2.0

Python

Updated 2 months ago

FastLLMs

soodaryan

❤️35

LLM Inference Optimization Pipelines using Quantization Techniques

Jupyter Notebook

Updated 10 months ago

fastllmtritonbackendstreamclient

eigen2017

❤️25

No description available

Java

Updated 2 years ago

GitHub Explorer

Search Results

fastllm

fastllm

FastLLM

fastllm

Chat-Monika-Chinese-cpp

fastllm.cpp

thinkforce-fastllm

fastllmtritonbackend

fastllm_windows

fastllm

fastllm

openai-api-fastllm

FastLLM

mobile-on-device-GLM-inference

fastllm-explained

FastLLM

fastllm

fastllm2skip

fastllm

fastllmcpp-website

fastllm

fastllm

fastllm

fastllm

fastllm

fastllmtritonbackendstream

Fastllm

fastllmskip

FastLLMs

fastllmtritonbackendstreamclient

fastllm

fastllm

FastLLM

fastllm

Chat-Monika-Chinese-cpp

fastllm.cpp

thinkforce-fastllm

fastllmtritonbackend

fastllm_windows

fastllm

fastllm

openai-api-fastllm

FastLLM

mobile-on-device-GLM-inference

fastllm-explained

FastLLM

fastllm

fastllm2skip

fastllm

fastllmcpp-website

fastllm

fastllm

fastllm

fastllm

fastllm

fastllmtritonbackendstream

Fastllm

fastllmskip

FastLLMs

fastllmtritonbackendstreamclient