Stars
172
Forks
17
Watchers
172
Open Issues
0
Overall repository health assessment
No package.json found
This might not be a Node.js project
Merge pull request #8 from cksac/copilot/implement-turboquant-model
c486c48View on GitHubAddress code review: use actual prime for table_size, clarify test assertions
bd516deView on GitHubAdd hash-based weight compression module (TurboQuant-Model)
af4ae1fView on GitHubfeat: per-group blockwise calibration (4-bit PPL -0.51, KLD -26%)\n\nAdd per-group alpha correction to blockwise calibration. Instead of a\nsingle scalar per row (M,), each group gets its own learnable correction\n(M,G), injected into weight_norms during the forward pass for gradient\nflow through the per-group norm scaling.\n\nResults on Qwen3.5-0.8B-Base (4-bit, 4s/50i):\n- Per-row: PPL 13.6971 (-0.259), KLD 0.1170 (-10%), 12.9 min\n- Per-group: PPL 13.4427 (-0.514), KLD 0.0959 (-26%), 14.0 min\n\nPer-group is 2x better PPL and 2.6x better KLD at only 8% more time.\nRecovers 28.1% of the quantization gap (vs 14.2% for per-row).\n\nChanges:\n- CalibrationConfig.per_group: default True\n- Blockwise cal: per-group alpha via weight_norms injection\n- _fold_alpha: handle (M,G) alpha shape\n- Test script: --per-group flag\n- Updated docs and site with per-group results"
16c7de7View on GitHubfeat: block-wise norm calibration (4-bit PPL -0.26, KLD -10%)\n\nAdd block-wise end-to-end norm calibration that optimizes per-row norms\nthrough each transformer block sequentially, using MSE + angular + KLD\nloss against pre-captured FP targets.\n\nKey results on Qwen3.5-0.8B-Base (4-bit):\n- PPL: 13.9564 → 13.6971 (-0.2592)\n- KLD: 0.1301 → 0.1170 (-10.1%)\n- Calibration time: ~13 min (4 samples, 50 iters)\n\nPer-layer calibration was harmful (+0.07 PPL), confirming that\nlocally optimal norms don't compose through the network.\n4+4 residual doesn't benefit (already cos≈1.000).\n\nChanges:\n- norm_calibration.py: calibrate_norms_blockwise() with sequential\n block processing, FP target pre-capture, exp parameterization\n- cli.py: --calibrate flag and calibrate subcommand now use blockwise\n- Defaults: n_samples=4, n_iters=50 (3.9x faster, equal quality)\n- docs/techniques/blockwise-calibration.md\n- site/src/app/techniques/blockwise-calibration/page.tsx\n- Updated nav chain: Norm Compression → Block-wise Cal → QJL"
6825487View on GitHubfeat: CPU offload for pass 2 + embedding quantization (INT8/INT4)
d54fc7dView on GitHubfeat: add comprehensive survey of recent quantization papers and their compatibility with TurboQuant
39ae95eView on GitHubfeat: add MMLU benchmark for f16 reference and 4+4bit per-layer rotation with factored_int8 norm
1edf89aView on GitHubfeat: implement per-layer rotation strategy in quantization and add corresponding tests
b3c8cbdView on GitHubfeat: enhance norm codec support with factored_int4 and update related tests
db4c66dView on GitHubAdd comprehensive tests for polar decomposition quantization and compression methods
aaf0945View on GitHubAdd documentation for quantization techniques and optimizations
632a1f8View on GitHubAdd norm codec documentation and implement entropy coding technique
8067cebView on GitHub