Found 1 repositories(showing 1)
petermartens98
Lightweight LLM inspired by Qwen3, built from scratch in PyTorch. Full training pipeline with transformer components including RMSNorm, Rotary Position Embeddings (RoPE), Grouped-Query Attention (GQA), and SwiGLU layers. Trained with hybrid Muon + AdamW optimizer, causal masking, efficient batching, and evaluation tools.
All 1 repositories loaded