Search Results

Found 813 repositories(showing 30)

omlx

jundot

💛82

LLM inference server with continuous batching & SSD caching for Apple Silicon — managed from the macOS menu bar

9.1k

772

Apache-2.0

Python

Updated 34 minutes ago

apple-siliconinference-serverllm+3

LitServe

Lightning-AI

💛72

A minimal Python framework for building custom AI inference servers with full control over logic, batching, and scaling.

3.9k

278

Apache-2.0

Python

Updated 9 hours ago

aiapiartificial-intelligence+6

micronet, a model compression and deploy lib. compression: 1、quantization: quantization-aware-training(QAT), High-Bit(>2b)(DoReFa/Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference)、Low-Bit(≤2b)/Ternary and Binary(TWN/BNN/XNOR-Net); post-training-quantization(PTQ), 8-bit(tensorrt); 2、 pruning: normal、regular and group convolutional channel pruning; 3、 group convolution structure; 4、batch-normalization fuse for quantization. deploy: tensorrt, fp32/fp16/int8(ptq-calibration)、op-adapt(upsample)、dynamic_shape

2.3k

477

MIT

Python

Updated 5 days ago

batch-normalization-fusebnnconvolutional-networks+17

segment-anything-fast

meta-pytorch

🧡67

A batched offline inference oriented version of segment-anything

1.3k

Apache-2.0

Python

Updated 2 hours ago

qwen600

yassa9

💛71

Static suckless single batch CUDA-only qwen3-0.6B mini inference engine

552

MIT

Cuda

Updated 4 days ago

cudacuda-programminggpu+6

VisionMamba

kyegomez

💛71

Implementation of Vision Mamba from the paper: "Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model" It's 2.8x faster than DeiT and saves 86.8% GPU memory when performing batch inference to extract features on high-res images

484

MIT

Python

Updated 20 hours ago

aimachine-learningmamba+3

stable-diffusion-deploy

Lightning-Universe

❤️31

Learn to serve Stable Diffusion models on cloud infrastructure at scale. This Lightning App shows load-balancing, orchestrating, pre-provisioning, dynamic batching, GPU-inference, micro-services working together via the Lightning Apps framework.

391

Apache-2.0

Python

Updated 4 months ago

model-servingstable-diffusion

onnx_tensorrt_project

ttanzhiqiang

❤️36

Support Yolov5(4.0)/Yolov5(5.0)/YoloR/YoloX/Yolov4/Yolov3/CenterNet/CenterFace/RetinaFace/Classify/Unet. use darknet/libtorch/pytorch/mxnet to onnx to tensorrt

210

C++

Updated 6 months ago

batch-inferencecenterfacecenternet+12

CVPR2018_attention

USTCPCS

❤️46

Context Encoding for Semantic Segmentation MegaDepth: Learning Single-View Depth Prediction from Internet Photos LiteFlowNet: A Lightweight Convolutional Neural Network for Optical Flow Estimation PWC-Net: CNNs for Optical Flow Using Pyramid, Warping, and Cost Volume On the Robustness of Semantic Segmentation Models to Adversarial Attacks SPLATNet: Sparse Lattice Networks for Point Cloud Processing Left-Right Comparative Recurrent Model for Stereo Matching Enhancing the Spatial Resolution of Stereo Images using a Parallax Prior Unsupervised CCA Discovering Point Lights with Intensity Distance Fields CBMV: A Coalesced Bidirectional Matching Volume for Disparity Estimation Learning a Discriminative Feature Network for Semantic Segmentation Revisiting Dilated Convolution: A Simple Approach for Weakly- and Semi- Supervised Semantic Segmentation Unsupervised Deep Generative Adversarial Hashing Network Monocular Relative Depth Perception with Web Stereo Data Supervision Single Image Reflection Separation with Perceptual Losses Zoom and Learn: Generalizing Deep Stereo Matching to Novel Domains EPINET: A Fully-Convolutional Neural Network for Light Field Depth Estimation by Using Epipolar Geometry FoldingNet: Interpretable Unsupervised Learning on 3D Point Clouds Decorrelated Batch Normalization Unsupervised Learning of Depth and Egomotion from Monocular Video Using 3D Geometric Constraints PU-Net: Point Cloud Upsampling Network Real-Time Monocular Depth Estimation using Synthetic Data with Domain Adaptation via Image Style Transfer Tell Me Where To Look: Guided Attention Inference Network Residual Dense Network for Image Super-Resolution Reflection Removal for Large-Scale 3D Point Clouds PlaneNet: Piece-wise Planar Reconstruction from a Single RGB Image Fully Convolutional Adaptation Networks for Semantic Segmentation CRRN: Multi-Scale Guided Concurrent Reflection Removal Network DenseASPP: Densely Connected Networks for Semantic Segmentation SGAN: An Alternative Training of Generative Adversarial Networks Multi-Agent Diverse Generative Adversarial Networks Robust Depth Estimation from Auto Bracketed Images AdaDepth: Unsupervised Content Congruent Adaptation for Depth Estimation DeepMVS: Learning Multi-View Stereopsis GeoNet: Unsupervised Learning of Dense Depth, Optical Flow and Camera Pose GeoNet: Geometric Neural Network for Joint Depth and Surface Normal Estimation Single-Image Depth Estimation Based on Fourier Domain Analysis Single View Stereo Matching Pyramid Stereo Matching Network A Unifying Contrast Maximization Framework for Event Cameras, with Applications to Motion, Depth, and Optical Flow Estimation Image Correction via Deep Reciprocating HDR Transformation Occlusion Aware Unsupervised Learning of Optical Flow PAD-Net: Multi-Tasks Guided Prediciton-and-Distillation Network for Simultaneous Depth Estimation and Scene Parsing Surface Networks Structured Attention Guided Convolutional Neural Fields for Monocular Depth Estimation TextureGAN: Controlling Deep Image Synthesis with Texture Patches Aperture Supervision for Monocular Depth Estimation Two-Stream Convolutional Networks for Dynamic Texture Synthesis Unsupervised Learning of Single View Depth Estimation and Visual Odometry with Deep Feature Reconstruction Left/Right Asymmetric Layer Skippable Networks Learning to See in the Dark

179

Updated 1 month ago

batched

mixedbread-ai

🧡60

The Batched API provides a flexible and efficient way to process multiple requests in a batch, with a primary focus on dynamic batching of inference workloads.

160

Apache-2.0

Python

Updated 2 weeks ago

mini-infer

psmarter

💛70

LLM inference engine from scratch — paged KV cache, continuous batching, chunked prefill, prefix caching, speculative decoding, CUDA graph, tensor parallelism, OpenAI-compatible serving

157

MIT

Python

Updated 3 hours ago

continuous-batchingcudainference+13

SDBI

YellowOldOdd

❤️35

Simple Dynamic Batching Inference

145

Python

Updated 5 months ago

novae

MICS-Lab

🧡65

Graph-based foundation model for spatial transcriptomics data. Zero-shot spatial domain inference, batch-effect correction, and many other features.

125

BSD-3-Clause

Python

Updated 2 days ago

nichesself-supervised-learningspatial-domains+2

fox

ferrumox

💛70

High-performance LLM inference engine — drop-in replacement for Ollama with faster multi-turn inference, lower TTFT, and higher throughput through prefix caching and continuous batching.

122

NOASSERTION

Rust

Updated 17 hours ago

e2e-llm-workflows

anyscale

🧡55

Fine-tune an LLM to perform batch inference and online serving.

120

Python

Updated 1 week ago

NeuPIMs

casys-kaist

🧡55

NeuPIMs: NPU-PIM Heterogeneous Acceleration for Batched LLM Inferencing

112

Jupyter Notebook

Updated 1 week ago

multimodal-ai

anyscale

❤️45

Multimodal AI workloads: batch inference, model training and online serving.

107

Jupyter Notebook

Updated 1 month ago

batch-inference

microsoft

🧡50

Dynamic batching library for Deep Learning inference. Tutorials for LLM, GPT scenarios.

106

MIT

Python

Updated 1 month ago

deep-learningdynamic-batchinggpt+4

photon_infer

lumia431

🧡60

A High-Performance LLM Inference Engine with vLLM-Style Continuous Batching

104

MIT

C++

Updated 1 week ago

ai-infracontinuous-batchinginference-engine+4

yolov5_torchserve

louisoutin

❤️30

Torchserve server using a YoloV5 model running on docker with GPU and static batch inference to perform production ready and real time inference.

101

Python

Updated 3 months ago

batch-inferencedeep-learningdocker+4

H2-LLM-ISCA-2025

leesou

🧡55

H2-LLM: Hardware-Dataflow Co-Exploration for Heterogeneous Hybrid-Bonding-based Low-Batch LLM Inference

Apache-2.0

Python

Updated 1 week ago

vertex-ai-alphafold-inference-pipeline

GoogleCloudPlatform

🧡50

This repository compiles prescriptive guidance and code samples demonstrating how to operationalize AlphaFold batch inference using Vertex AI Pipelines.

Apache-2.0

Python

Updated 1 day ago

gpt-accelera

Edward-Sun

💛70

Simple and efficient pytorch-native transformer training and inference (batched)

BSD-3-Clause

Python

Updated 2 days ago

batch-prompting

xlang-ai

🧡55

[EMNLP 2023 Industry Track] A simple prompting approach that enables the LLMs to run inference in batches.

Python

Updated 3 weeks ago

inference-server

jpmorganchase

🧡65

Deploy your AI/ML model to Amazon SageMaker for Real-Time Inference and Batch Transform using your own Docker container image.

Apache-2.0

Python

Updated 8 hours ago

artificial-intelligencedockerhttp-server+6

RWKV-Infer

OpenMOSE

🧡50

A large-scale RWKV v7(World, PRWKV, Hybrid-RWKV) inference. Capable of inference by combining multiple states(Pseudo MoE). Easy to deploy on docker. Supports true multi-batch generation and dynamic State switching. CUDA and Rocm Supported :)

Apache-2.0

Python

Updated 1 month ago

GNN-ARCH

GraphSAINT

❤️45

[ASAP 2020; FPGA 2020] Hardware architecture to accelerate GNNs (common IP modules for minibatch training and full batch inference)

Verilog

Updated 2 months ago

acceleratorfpgagcn+2

Yolov3_Dynamic_Batch_TensorRT_Triton

MAhaitao999

❤️30

将Yolov3模型转成可以进行动态Batch的TensorRT推理以及Triton Inference Serving上部署的TensorRT模型

Python

Updated 8 months ago

batch_fish_speech

mkgs210

🧡60

Boost your efficiency with Fish Speech Batch Inference. Easily process multiple texts and achieve consistently great results. 🗨️🐟

Apache-2.0

Python

Updated 3 weeks ago

qwen3.c

gigit0000

🧡55

Lightweight C inference for Qwen3 GGUF. Multiturn prefix caching & batch processing.

MIT

Updated 2 weeks ago

cfp32gguf+3

GitHub Explorer

Search Results

omlx

LitServe

micronet

segment-anything-fast

qwen600

VisionMamba

stable-diffusion-deploy

onnx_tensorrt_project

CVPR2018_attention

batched

mini-infer

SDBI

novae

fox

e2e-llm-workflows

NeuPIMs

multimodal-ai

batch-inference

photon_infer

yolov5_torchserve

H2-LLM-ISCA-2025

vertex-ai-alphafold-inference-pipeline

gpt-accelera

batch-prompting

inference-server

RWKV-Infer

GNN-ARCH

Yolov3_Dynamic_Batch_TensorRT_Triton

batch_fish_speech

qwen3.c

omlx

LitServe

micronet

segment-anything-fast

qwen600

VisionMamba

stable-diffusion-deploy

onnx_tensorrt_project

CVPR2018_attention

batched

mini-infer

SDBI

novae

fox

e2e-llm-workflows

NeuPIMs

multimodal-ai

batch-inference

photon_infer

yolov5_torchserve

H2-LLM-ISCA-2025

vertex-ai-alphafold-inference-pipeline

gpt-accelera

batch-prompting

inference-server

RWKV-Infer

GNN-ARCH

Yolov3_Dynamic_Batch_TensorRT_Triton

batch_fish_speech

qwen3.c