JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs welcome).
Stars
420
Forks
63
Watchers
420
Open Issues
27
Overall repository health assessment
No package.json found
This might not be a Node.js project
43
commits
19
commits
15
commits
14
commits
12
commits
11
commits
9
commits
9
commits
9
commits
8
commits
Update README with archival notice for Jetstream (#278)
acd4f5aView on GitHubMerge pull request #275 from AI-Hypercomputer:log_prob_pytree_node
29329e8View on GitHubMerge pull request #222 from AI-Hypercomputer:amangu-lora-3
261f250View on GitHub- JetStream changes for Jax based implementation of unified_lora_params for decoding batch of multiple different lora adapters.
8839d1aView on GitHubMerge pull request #271 from AI-Hypercomputer:lihao/fix
89acc8cView on GitHubMerge pull request #269 from AI-Hypercomputer:yuyan-prefix-cache
2756c6fView on GitHubRefactor(PrefixCache): New load API, per-layer Tries, async ops & stats
f40d0daView on GitHubMerge pull request #268 from AI-Hypercomputer:yuyan-prefix-cache-benchmark
4aafd76View on GitHubMerge pull request #266 from AI-Hypercomputer:lihao/bos
219e5a1View on GitHubMerge pull request #267 from AI-Hypercomputer:gsutil-bug-fix
97a3011View on GitHub