Stars
32
Forks
7
Watchers
32
Open Issues
0
Overall repository health assessment
No package.json found
This might not be a Node.js project
No contributors data available
perf: Triton-accelerate full pipeline (quantize + dequantize + WHT rotation)
a391a98View on GitHubperf: Triton WHT rotation kernel for O(d log d) fused rotation
c1126abView on GitHubperf: wire Triton fused dequantize kernel into GenerationCache
6018fbfView on GitHubperf: add speed benchmark for Triton vs Python quantize/dequantize
40b7d03View on GitHubfeat: overnight validation — 236B model running on single RTX 4090
afef829View on GitHubfeat: cross-layer KV cache with shared codebook/rotation resources
c86f04eView on GitHubfeat: 1-bit value quantization with correction (V compression is free)
6b8a749View on GitHubfeat: self-correcting KV cache with periodic refresh (prevents error accumulation)
f4c10baView on GitHubfeat: ultra-streaming engine for 200B+ models on consumer GPUs
3688a3eView on GitHubperf: switch to WHT rotation as default (O(d log d), 98% quality match)
a40eaabView on GitHubfeat: unified 70B-on-4090 launcher with auto-configured KV compression
dc904ecView on GitHubfeat: integrate HybridCache + fixed eviction + streaming into autoresearch sweep
9959393View on GitHubfeat: TurboQuant weight compression (TQ-W) for ultra-low-bit model deployment
c94127fView on GitHubfeat: HybridCache combining boundary anchoring + gradient bits + per-head allocation
f22e6a1View on GitHub