Found 56 repositories(showing 30)
EleutherAI
An implementation of model parallel autoregressive transformers on GPUs, based on the Megatron and DeepSpeed libraries
bigscience-workshop
Ongoing research training transformer language models at scale, including: BERT & GPT-2
genggui001
No description available
ModelTC
Built upon Megatron-Deepspeed and HuggingFace Trainer, EasyLLM has reorganized the code logic with a focus on usability. While enhancing usability, it also ensures training efficiency.
FreedomIntelligence
Fast LLM Training CodeBase With dynamic strategy choosing [Deepspeed+Megatron+FlashAttention+CudaFusionKernel+Compiler];
DataStates
LLM checkpointing for DeepSpeed/Megatron
Anonymous1252022
No description available
yuguo-Jack
GLM-Pretrain in Megatron-Deepspeed for DCU
SulRash
Minimal yet high performant code for pretraining llms. Attempts to implement some SOTA features. Implements training through: Deepspeed, Megatron-LM, and FSDP. WIP
kojimano
No description available
woojinsoh
Execute Megatron-DeepSpeed using Slurm for multi-nodes distributed training
okoge-kaz
Turing Tech Blog Repository
llm-jp
microsoft/Megatron-DeepSpeed のフォークです。
Eugene29
Fork of Megatron-DeepSpeed with VIT bug fixes and model parallelisms (TP, TP-SP, Ulysses, etc) enabled for VIT. Pipeline Parallelism is not yet enabled.
George614
GPU Memory Calculator for LLM Training - Calculate GPU memory requirements for training Large Language Models with support for multiple training engines including PyTorch DDP, DeepSpeed ZeRO, Megatron-LM, and FSDP.
wangbluo
Build a llama fine-tuning script from scratch using PyTorch and transformers API. It needs to support 4 optional features: gradient checkpointing, mixed precision, data parallelism, tensor parallelism. Do not use ColossalAI/Megatron/DeepSpeed frameworks, you can refer to the code.
henkdr
Megatron-Deepspeed benchmark on LUMI
okoge-kaz
環境構築方法の詳細は以下のLinkから
kungfu-team
Checkpoint structure with Deepspeed and Megatron-LM
hannawong
No description available
Gabriel4256
No description available
jianbangzhang
支持中文版的大模型
Ongoing research training transformer language models at scale, including: BERT & GPT-2
Xuweijia-buaa
No description available
Eutenacity
No description available
zhenghh04
No description available
hannawong
No description available
quorvath
This repository is based on megatron-deepspeed, incorporating the block coordinate descent method for training large-scale models.
jmerizia
A (WIP) lightweight implementation of DeepSpeed/Megatron-ML style 3D parallelism, along with some models and helpful utilities.
anilatambharii
LLM Pretraining Framework (100B+ Params): Megatron-LM + DeepSpeed + FSDP. Open-source, HPC-ready system with tiny GPT simulation, distributed training, tokenizer tools, dataset pipelines, and deployment scripts for Slurm, AWS, Azure, and Docker.