Back to search
A production-ready template for serving Large Language Models via gRPC with streaming token generation. Built with Python, PyTorch, Hugging Face Transformers, and gRPC. Supports any causal language model from HuggingFace with configurable sampling parameters (temperature, top_p, top_k).
Stars
0
Forks
0
Watchers
0
Open Issues
0
Overall repository health assessment
No package.json found
This might not be a Node.js project
6
commits