Found 1 repositories(showing 1)
gyunggyung
52 Layers 4B(0.6B Active) MoE | Nemotron-3 Style + Teon Optimizer + Mamba-2 SSM + FP8 Training on H100
All 1 repositories loaded