Found 2 repositories(showing 2)
ashankgupta
A production-ready template for serving Large Language Models via gRPC with streaming token generation. Built with Python, PyTorch, Hugging Face Transformers, and gRPC. Supports any causal language model from HuggingFace with configurable sampling parameters (temperature, top_p, top_k).
mjcastner
It is surprisingly difficult to find a working bzlmod template you can give to LLMs. This aims to be one such example. Includes working build targets for C++, Python, and gRPC / protobuf.
All 2 repositories loaded