Search Results

Found 20 repositories(showing 20)

Rope_with_LLM

MingyuJ666

❤️40

[ICML'25] Our study systematically investigates massive values in LLMs' attention mechanisms. First, we observe massive values are concentrated in low-frequency dimensions across different attention heads exclusively in attention queries (Q) and keys (K) while absent in values (V).

Python

Updated 1 month ago

PyTorch-Scratch-LLM

s-chh

❤️30

Simple and easy to understand PyTorch implementation of Large Language Model (LLM) GPT and LLAMA from scratch with detailed steps. Implemented: Byte-Pair Tokenizer, Rotational Positional Embedding (RoPe), SwishGLU, RMSNorm, Mixture of Experts (MOE). Tested on Taylor Swift song lyrics dataset.

Python

Updated 3 months ago

byte-pair-tokenizerkv-cachellm+6

llm-from-scratch

merterbak

🧡60

Train LLM from scratch with SOTA techniques like RoPE, GQA and KV caching.

Apache-2.0

Python

Updated 3 weeks ago

llmpytorchqwen3

Qwen3-LLM-Pytorch-Implementation-From-Scratch

petermartens98

❤️45

Lightweight LLM inspired by Qwen3, built from scratch in PyTorch. Full training pipeline with transformer components including RMSNorm, Rotary Position Embeddings (RoPE), Grouped-Query Attention (GQA), and SwiGLU layers. Trained with hybrid Muon + AdamW optimizer, causal masking, efficient batching, and evaluation tools.

Jupyter Notebook

Updated 2 months ago

llmpytorchqwen3+1

LLM-Algorithm-Intern-Guide

sudhanshukuumar

🧡65

📚 Enhance your interview preparation for LLM algorithm internships with insights on DeepSeek, PPO, RoPE, and RLHF core concepts.

Updated 2 hours ago

aialgorithmdata-structures+14

ShomsherLLM

Hasin-Al

❤️40

This is a LLM using Decoupled RoPE, MultiHeadLatentAttention and TransfomerBLocks with post and pre normalization and using MoE. The Basic Idea is to build an LLM from scratch.

MIT

Jupyter Notebook

Updated 7 months ago

donut

AhriCat

🧡60

A new type of LLM/ machine learning model. This is based on RoPe, HymBa and Kronecker transform used in combination with a ternary tokenizer using a [-1,1] token space.

NOASSERTION

Python

Updated 2 weeks ago

Building-Qwen3-from-Scratch-Qwen3-Style-LLM

Adi0506

❤️45

Built a Qwen3-style large language model from scratch in Python, implementing transformer architecture with GQA, SwiGLU activations, RoPE embeddings, and a custom Muon optimizer, gaining hands-on experience in LLM training, optimization, and dataset handling.

Jupyter Notebook

Updated 2 months ago

LLM-From-Scratch-To-Advanced-RL-Techniques

ShwaTech

❤️25

Build an end-to-end Large Language Model from scratch: implement transformers, train a tiny LLM, modernize with RoPE and RMSNorm, scale training, add Mixture-of-Experts, perform Supervised Fine-Tuning, train a Reward Model, and apply RLHF with PPO and also RLHF with GRPO for alignment and reinforcement learning.

Apache-2.0

Python

Updated 4 months ago

LLM-WITH-ROPE-PE-

gauravkumarsl

❤️45

No description available

Jupyter Notebook

Updated 3 weeks ago

Training-Tiny-LLMs-from-Scratch-with-RoPE-and-Knowledge-Distillation

Zac-Cardwell

❤️25

No description available

Python

Updated 3 months ago

vision-llm-rope

Shumatsurontek

❤️35

Vision-LLM integration with RoPE for arbitrary resolution support and temporal downsampling

Python

Updated 9 months ago

LLM_Stuff

sealsnipe

❤️35

🚀 Complete LLM Training System - GPU-optimized with torch.compile, GQA, RoPE, SwiGLU, and production-ready inference for consumer hardware (RTX 4070 Ti optimized)

Python

Updated 7 months ago

BPE-Transformer

milasd

🧡55

Implementation of the Byte-Pair Encoding Tokenizer, RoPE Embeddings, Transformer LLM distributed training & inference from scratch w/ PyTorch (and MLX), with a Flash Attention 2 Triton kernel.

Python

Updated 3 weeks ago

bpebpe-encoderbpe-tokenizer+8

Generative-Decoding-Dynamics

VidyasagarDudekula

❤️45

An end-to-end framework for analyzing LLM behavior. Implements a Llama-style architecture with Grouped Query Attention (GQA) and RoPE, coupled with a comparative analysis suite for deterministic vs. stochastic sampling algorithms.

Python

Updated 1 month ago

tinyllm

brianmeyer

🧡65

I built a tiny LLM from scratch to understand how GPT-4 and LLaMA actually work. 10M params, trained on Shakespeare, modernized with RMSNorm + SwiGLU + RoPE + KV cache. Every mistake documented.

Python

Updated 4 days ago

educationalfrom-scratchllm+5

hilda-kernel

kirsten-1

🧡55

High-performance Triton kernel library for LLM training with 12 fused operators (AttnRes, RMSNorm, RoPE, CrossEntropy, GRPO, JSD, FusedLinear, etc.) — up to 24x faster than PyTorch with 78% memory savings, outperforming Liger-Kernel on RTX 5090

Python

Updated 2 weeks ago

tritontriton-inferencetriton-kernels+1

Final_DeepLearing

NguyenQuangTrung19

❤️35

Deep Learning final project exploring advanced attention mechanisms in LLMs (self-attention, MQA, GQA, Flash/linear/sparse attention, RoPE) with PyTorch demos, plus a CNN + Transformer-Decoder OCR model for image-to-text with evaluation on test data.

Jupyter Notebook

Updated 4 months ago

Pretrained-Gemma-3-270B

subramanyasrevankar

❤️40

GPT-OSS 270B is a 270B-parameter open-source LLM built with transformer architecture, using token embeddings, RMSNorm, sliding/full attention, RoPE positional encodings, and feed-forward layers. Optimized for efficient training, inference, and high-quality next-token prediction.

MIT

Python

Updated 6 months ago

frad

ralolooafanxyaiml

❤️40

A from-scratch PyTorch LLM implementing Sparse Mixture-of-Experts (MoE) with Top-2 gating. Integrates modern Llama-3 components (RMSNorm, SwiGLU, RoPE, GQA) and a custom-coded Byte-Level BPE tokenizer. Pre-trained on a curated corpus of existential & dark philosophical literature.

MIT

Python

Updated 4 months ago

bpecustom-tokenizerexistential-ai+12

All 20 repositories loaded

GitHub Explorer

Search Results

Rope_with_LLM

PyTorch-Scratch-LLM

llm-from-scratch

Qwen3-LLM-Pytorch-Implementation-From-Scratch

LLM-Algorithm-Intern-Guide

ShomsherLLM

donut

Building-Qwen3-from-Scratch-Qwen3-Style-LLM

LLM-From-Scratch-To-Advanced-RL-Techniques

LLM-WITH-ROPE-PE-

Training-Tiny-LLMs-from-Scratch-with-RoPE-and-Knowledge-Distillation

vision-llm-rope

LLM_Stuff

BPE-Transformer

Generative-Decoding-Dynamics

tinyllm

hilda-kernel

Final_DeepLearing

Pretrained-Gemma-3-270B

frad

Rope_with_LLM

PyTorch-Scratch-LLM

llm-from-scratch

Qwen3-LLM-Pytorch-Implementation-From-Scratch

LLM-Algorithm-Intern-Guide

ShomsherLLM

donut

Building-Qwen3-from-Scratch-Qwen3-Style-LLM

LLM-From-Scratch-To-Advanced-RL-Techniques

LLM-WITH-ROPE-PE-

Training-Tiny-LLMs-from-Scratch-with-RoPE-and-Knowledge-Distillation

vision-llm-rope

LLM_Stuff

BPE-Transformer

Generative-Decoding-Dynamics

tinyllm

hilda-kernel

Final_DeepLearing

Pretrained-Gemma-3-270B

frad