Found 813 repositories(showing 30)
jundot
LLM inference server with continuous batching & SSD caching for Apple Silicon — managed from the macOS menu bar
Lightning-AI
A minimal Python framework for building custom AI inference servers with full control over logic, batching, and scaling.
666DZY666
micronet, a model compression and deploy lib. compression: 1、quantization: quantization-aware-training(QAT), High-Bit(>2b)(DoReFa/Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference)、Low-Bit(≤2b)/Ternary and Binary(TWN/BNN/XNOR-Net); post-training-quantization(PTQ), 8-bit(tensorrt); 2、 pruning: normal、regular and group convolutional channel pruning; 3、 group convolution structure; 4、batch-normalization fuse for quantization. deploy: tensorrt, fp32/fp16/int8(ptq-calibration)、op-adapt(upsample)、dynamic_shape
meta-pytorch
A batched offline inference oriented version of segment-anything
yassa9
Static suckless single batch CUDA-only qwen3-0.6B mini inference engine
kyegomez
Implementation of Vision Mamba from the paper: "Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model" It's 2.8x faster than DeiT and saves 86.8% GPU memory when performing batch inference to extract features on high-res images
Lightning-Universe
Learn to serve Stable Diffusion models on cloud infrastructure at scale. This Lightning App shows load-balancing, orchestrating, pre-provisioning, dynamic batching, GPU-inference, micro-services working together via the Lightning Apps framework.
ttanzhiqiang
Support Yolov5(4.0)/Yolov5(5.0)/YoloR/YoloX/Yolov4/Yolov3/CenterNet/CenterFace/RetinaFace/Classify/Unet. use darknet/libtorch/pytorch/mxnet to onnx to tensorrt
USTCPCS
Context Encoding for Semantic Segmentation MegaDepth: Learning Single-View Depth Prediction from Internet Photos LiteFlowNet: A Lightweight Convolutional Neural Network for Optical Flow Estimation PWC-Net: CNNs for Optical Flow Using Pyramid, Warping, and Cost Volume On the Robustness of Semantic Segmentation Models to Adversarial Attacks SPLATNet: Sparse Lattice Networks for Point Cloud Processing Left-Right Comparative Recurrent Model for Stereo Matching Enhancing the Spatial Resolution of Stereo Images using a Parallax Prior Unsupervised CCA Discovering Point Lights with Intensity Distance Fields CBMV: A Coalesced Bidirectional Matching Volume for Disparity Estimation Learning a Discriminative Feature Network for Semantic Segmentation Revisiting Dilated Convolution: A Simple Approach for Weakly- and Semi- Supervised Semantic Segmentation Unsupervised Deep Generative Adversarial Hashing Network Monocular Relative Depth Perception with Web Stereo Data Supervision Single Image Reflection Separation with Perceptual Losses Zoom and Learn: Generalizing Deep Stereo Matching to Novel Domains EPINET: A Fully-Convolutional Neural Network for Light Field Depth Estimation by Using Epipolar Geometry FoldingNet: Interpretable Unsupervised Learning on 3D Point Clouds Decorrelated Batch Normalization Unsupervised Learning of Depth and Egomotion from Monocular Video Using 3D Geometric Constraints PU-Net: Point Cloud Upsampling Network Real-Time Monocular Depth Estimation using Synthetic Data with Domain Adaptation via Image Style Transfer Tell Me Where To Look: Guided Attention Inference Network Residual Dense Network for Image Super-Resolution Reflection Removal for Large-Scale 3D Point Clouds PlaneNet: Piece-wise Planar Reconstruction from a Single RGB Image Fully Convolutional Adaptation Networks for Semantic Segmentation CRRN: Multi-Scale Guided Concurrent Reflection Removal Network DenseASPP: Densely Connected Networks for Semantic Segmentation SGAN: An Alternative Training of Generative Adversarial Networks Multi-Agent Diverse Generative Adversarial Networks Robust Depth Estimation from Auto Bracketed Images AdaDepth: Unsupervised Content Congruent Adaptation for Depth Estimation DeepMVS: Learning Multi-View Stereopsis GeoNet: Unsupervised Learning of Dense Depth, Optical Flow and Camera Pose GeoNet: Geometric Neural Network for Joint Depth and Surface Normal Estimation Single-Image Depth Estimation Based on Fourier Domain Analysis Single View Stereo Matching Pyramid Stereo Matching Network A Unifying Contrast Maximization Framework for Event Cameras, with Applications to Motion, Depth, and Optical Flow Estimation Image Correction via Deep Reciprocating HDR Transformation Occlusion Aware Unsupervised Learning of Optical Flow PAD-Net: Multi-Tasks Guided Prediciton-and-Distillation Network for Simultaneous Depth Estimation and Scene Parsing Surface Networks Structured Attention Guided Convolutional Neural Fields for Monocular Depth Estimation TextureGAN: Controlling Deep Image Synthesis with Texture Patches Aperture Supervision for Monocular Depth Estimation Two-Stream Convolutional Networks for Dynamic Texture Synthesis Unsupervised Learning of Single View Depth Estimation and Visual Odometry with Deep Feature Reconstruction Left/Right Asymmetric Layer Skippable Networks Learning to See in the Dark
mixedbread-ai
The Batched API provides a flexible and efficient way to process multiple requests in a batch, with a primary focus on dynamic batching of inference workloads.
psmarter
LLM inference engine from scratch — paged KV cache, continuous batching, chunked prefill, prefix caching, speculative decoding, CUDA graph, tensor parallelism, OpenAI-compatible serving
YellowOldOdd
Simple Dynamic Batching Inference
MICS-Lab
Graph-based foundation model for spatial transcriptomics data. Zero-shot spatial domain inference, batch-effect correction, and many other features.
ferrumox
High-performance LLM inference engine — drop-in replacement for Ollama with faster multi-turn inference, lower TTFT, and higher throughput through prefix caching and continuous batching.
anyscale
Fine-tune an LLM to perform batch inference and online serving.
casys-kaist
NeuPIMs: NPU-PIM Heterogeneous Acceleration for Batched LLM Inferencing
anyscale
Multimodal AI workloads: batch inference, model training and online serving.
microsoft
Dynamic batching library for Deep Learning inference. Tutorials for LLM, GPT scenarios.
lumia431
A High-Performance LLM Inference Engine with vLLM-Style Continuous Batching
louisoutin
Torchserve server using a YoloV5 model running on docker with GPU and static batch inference to perform production ready and real time inference.
leesou
H2-LLM: Hardware-Dataflow Co-Exploration for Heterogeneous Hybrid-Bonding-based Low-Batch LLM Inference
GoogleCloudPlatform
This repository compiles prescriptive guidance and code samples demonstrating how to operationalize AlphaFold batch inference using Vertex AI Pipelines.
Edward-Sun
Simple and efficient pytorch-native transformer training and inference (batched)
xlang-ai
[EMNLP 2023 Industry Track] A simple prompting approach that enables the LLMs to run inference in batches.
jpmorganchase
Deploy your AI/ML model to Amazon SageMaker for Real-Time Inference and Batch Transform using your own Docker container image.
OpenMOSE
A large-scale RWKV v7(World, PRWKV, Hybrid-RWKV) inference. Capable of inference by combining multiple states(Pseudo MoE). Easy to deploy on docker. Supports true multi-batch generation and dynamic State switching. CUDA and Rocm Supported :)
GraphSAINT
[ASAP 2020; FPGA 2020] Hardware architecture to accelerate GNNs (common IP modules for minibatch training and full batch inference)
MAhaitao999
将Yolov3模型转成可以进行动态Batch的TensorRT推理以及Triton Inference Serving上部署的TensorRT模型
mkgs210
Boost your efficiency with Fish Speech Batch Inference. Easily process multiple texts and achieve consistently great results. 🗨️🐟
gigit0000
Lightweight C inference for Qwen3 GGUF. Multiturn prefix caching & batch processing.