Back to search
This is a LLM using Decoupled RoPE, MultiHeadLatentAttention and TransfomerBLocks with post and pre normalization and using MoE. The Basic Idea is to build an LLM from scratch.
Stars
1
Forks
0
Watchers
1
Open Issues
0
Overall repository health assessment
No package.json found
This might not be a Node.js project
10
commits
check dir exists or not and add merged.txt as training and val file
37c0c5cView on GitHubMerge branch 'main' of https://github.com/Hasin-Al/ShomsherLLM
718ec4dView on GitHubMerge branch 'main' of https://github.com/Hasin-Al/ShomsherLLM
7fb0c00View on GitHub