Search Results

Found 250 repositories(showing 30)

PaLM-rlhf-pytorch

lucidrains

💛85

Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the PaLM architecture. Basically ChatGPT but with PaLM

7.9k

680

MIT

Python

Updated 18 hours ago

artificial-intelligenceattention-mechanismsdeep-learning+3

trlx

CarperAI

💛80

A repo for distributed training of language models with Reinforcement Learning via Human Feedback (RLHF)

4.7k

484

MIT

Python

Updated 2 days ago

machine-learningpytorchreinforcement-learning

safe-rlhf

PKU-Alignment

🧡68

Safe RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human Feedback

1.6k

132

Apache-2.0

Python

Updated 4 days ago

ai-safetyalpacabeaver+17

TextRL

voidful

🧡61

Implementation of ChatGPT RLHF (Reinforcement Learning with Human Feedback) on any generation model in huggingface's transformer (blommz-176B/bloom/gpt/bart/T5/MetaICL)

566

MIT

Python

Updated 1 week ago

chatgptcontrolled-nlggpt-2+7

A full pipeline to finetune Vicuna LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the Vicuna architecture. Basically ChatGPT but with Vicuna

221

MIT

Python

Updated 3 months ago

chatgptfinetunegpt+10

AIDoctor

Jerry-XDL

💛70

AIDoctor training medical GPT model with ChatGPT training pipeline, implemantation of Pretraining, Supervised Finetuning, RLHF(Reward Modeling and Reinforcement Learning) and DPO(Direct Preferenc…

203

Apache-2.0

Python

Updated 1 day ago

LLM-RLHF-Tuning-with-PPO-and-DPO

raghavc

🧡65

Comprehensive toolkit for Reinforcement Learning from Human Feedback (RLHF) training, featuring instruction fine-tuning, reward model training, and support for PPO and DPO algorithms with various configurations for the Alpaca, LLaMA, and LLaMA2 models.

189

Python

Updated 4 days ago

instructGOOSE

xrsrke

🧡50

Implementation of Reinforcement Learning from Human Feedback (RLHF)

174

MIT

Jupyter Notebook

Updated 1 month ago

chatgpthuman-feedbackinstructgpt+2

llama-qrlhf

lucidrains

❤️40

Implementation of the Llama architecture with RLHF + Q-learning

170

MIT

Python

Updated 3 months ago

artificial-intelligenceattentiondeep-learning+1

ChatGLM-LoRA-RLHF-PyTorch

jackaduma

🧡60

A full pipeline to finetune ChatGLM LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the ChatGLM architecture. Basically ChatGPT but with ChatGLM

139

MIT

Python

Updated 1 week ago

chatglmchatglm-6bchatgpt+11

reasoning_models_how_to

rkinas

🧡50

This repository serves as a collection of research notes and resources on training large language models (LLMs) and Reinforcement Learning from Human Feedback (RLHF). It focuses on the latest research, methodologies, and techniques for fine-tuning language models.

134

Python

Updated 1 week ago

llmrlrlhf

alpaca-rlhf

l294265421

🧡55

Finetuning LLaMA with RLHF (Reinforcement Learning with Human Feedback) based on DeepSpeed Chat

118

MIT

Python

Updated 3 weeks ago

alpacachatgptlanguage-model+5

ChiMed-GPT

synlp

🧡65

ChiMed-GPT is a Chinese medical large language model (LLM) built by continually training Ziya-v2 on Chinese medical data, where pre-training, supervised fine-tuning (SFT), and reinforcement learning from human feedback (RLHF) are comprehensively performed on it.

105

MIT

Updated 4 days ago

awesome-rlhf

louieworth

❤️40

An index of algorithms for reinforcement learning from human feedback (rlhf))

Apache-2.0

Updated 6 months ago

Online_RLHF

ZinYY

🧡55

A PyTorch implementation of the paper "Provably Efficient Online RLHF with One-Pass Reward Modeling". This repository provides a flexible and modular approach to Online Reinforcement Learning from Human Feedback (Online RLHF).

Python

Updated 2 weeks ago

large-language-modelllmpost-training+1

RWKV-LM-RLHF

OpenMOSE

💛70

Reinforcement Learning Toolkit for RWKV.(v6,v7,ARWKV) Distillation,SFT,RLHF(DPO,ORPO), infinite context training, Aligning. Exploring the possibilities for deeper fine-tuning of RWKV.

Apache-2.0

Python

Updated 1 day ago

Alpaca-LoRA-RLHF-PyTorch

jackaduma

🧡50

A full pipeline to finetune Alpaca LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the Alpaca architecture. Basically ChatGPT but with Alpaca

MIT

Python

Updated 1 month ago

alpacachatgptdeepspeed+10

oreilly-llm-rl-alignment

sinanuozdemir

🧡65

This training offers an intensive exploration into the frontier of reinforcement learning techniques with large language models (LLMs). We will explore advanced topics such as Reinforcement Learning with Human Feedback (RLHF), Reinforcement Learning from AI Feedback (RLAIF), Reasoning LLMs, and demonstrate practical applications such as fine-tuning

Jupyter Notebook

Updated 12 hours ago

agentsaideepseek+8

InstructLLaMA

michaelnny

🧡65

Implements pre-training, supervised fine-tuning (SFT), and reinforcement learning from human feedback (RLHF), to train and fine-tune the LLaMA2 model to follow human instructions, similar to InstructGPT or ChatGPT, but on a much smaller scale.

MIT

Jupyter Notebook

Updated 4 days ago

4bit-fine-tuneinstructgptllam2+3

CleanRL

firechecking

💛70

Reinforcement Learning algorithms and use-cases, including DQN, PG, A3C, PPO etc. and RLHF, AlphaZero implementations. Designed for clarity, ease of use, and educational purposes.

MIT

Python

Updated 1 day ago

Uni-RLHF-Platform

pickxiguapi

❤️40

Uni-RLHF platform for "Uni-RLHF: Universal Platform and Benchmark Suite for Reinforcement Learning with Diverse Human Feedback" (ICLR2024)

MIT

Python

Updated 4 months ago

Clean-Offline-RLHF

pickxiguapi

💛70

Offline RLHF codebase implementation for "Uni-RLHF: Universal Platform and Benchmark Suite for Reinforcement Learning with Diverse Human Feedback" (ICLR2024)

MIT

Python

Updated 1 day ago

LLM-Alignment-Project

astorfi

🧡55

A comprehensive template for aligning large language models (LLMs) using Reinforcement Learning from Human Feedback (RLHF), transfer learning, and more. Build your own customizable LLM alignment solution with ease.

MIT

Python

Updated 1 week ago

aialignmentdeep-learning+6

modsysML

cloudguruab

🧡55

Human reinforcement learning (RLHF) framework for AI models. Evaluate and compare LLM outputs, test quality, catch regressions and automate.

Apache-2.0

Python

Updated 3 days ago

aiautomation-frameworkdata-science+9

hidden-context

cassidylaidlaw

🧡50

Code and data for the paper "Understanding Hidden Context in Preference Learning: Consequences for RLHF"

Python

Updated 3 weeks ago

Reinforcement-Learning-from-bandits-to-RLHF

pyshka501

💛70

This repository contains lecture notes, practical materials, and implementations for the course: "Reinforcement Learning: from Bandits to RLHF" The course is designed to provide a deep and systematic understanding of RL, combining: solid mathematical foundations intuitive explanations practical implementations modern research insights

MIT

Jupyter Notebook

Updated 4 days ago

rlhf_korean_dataset

JoJo0217

❤️35

For the rlhf learning environment of Koreans

Python

Updated 7 months ago

PipelineLLM

iBacklight

🧡65

PipelineLLM 是一个系统性的大语言模型（LLM）后训练学习项目，涵盖从监督微调（SFT）到偏好优化（DPO）、强化学习（RLHF/PPO/GRPO）再到持续学习（Continual Learning)的完整技术栈。

MIT

Python

Updated 8 hours ago

continual-learningfine-tuningllm-infrastructure+8

nanoRLHF

li-plus

❤️45

Train a tiny LLaMA model from scratch to repeat your words using Reinforcement Learning from Human Feedback (RLHF)

MIT

Python

Updated 1 month ago

deep-reinforcement-learningllamallm+3

LLM-Improving-Trained-Models-with-RLHF

Mattral

🧡55

Experimented with the three essential Reinforcement Learning with Human Feedback (RLHF) process stages. It starts by revisiting the Supervised Fine-Tuning (SFT) process, then proceeds with the training of a reward model, and finally concludes with the reinforcement learning phase. We explored and applied methods such as 4-bit quantization and LoRA

Jupyter Notebook

Updated 1 week ago

GitHub Explorer

Search Results

PaLM-rlhf-pytorch

trlx

safe-rlhf

TextRL

Vicuna-LoRA-RLHF-PyTorch

AIDoctor

LLM-RLHF-Tuning-with-PPO-and-DPO

instructGOOSE

llama-qrlhf

ChatGLM-LoRA-RLHF-PyTorch

reasoning_models_how_to

alpaca-rlhf

ChiMed-GPT

awesome-rlhf

Online_RLHF

RWKV-LM-RLHF

Alpaca-LoRA-RLHF-PyTorch

oreilly-llm-rl-alignment

InstructLLaMA

CleanRL

Uni-RLHF-Platform

Clean-Offline-RLHF

LLM-Alignment-Project

modsysML

hidden-context

Reinforcement-Learning-from-bandits-to-RLHF

rlhf_korean_dataset

PipelineLLM

nanoRLHF

LLM-Improving-Trained-Models-with-RLHF

PaLM-rlhf-pytorch

trlx

safe-rlhf

TextRL

Vicuna-LoRA-RLHF-PyTorch

AIDoctor

LLM-RLHF-Tuning-with-PPO-and-DPO

instructGOOSE

llama-qrlhf

ChatGLM-LoRA-RLHF-PyTorch

reasoning_models_how_to

alpaca-rlhf

ChiMed-GPT

awesome-rlhf

Online_RLHF

RWKV-LM-RLHF

Alpaca-LoRA-RLHF-PyTorch

oreilly-llm-rl-alignment

InstructLLaMA

CleanRL

Uni-RLHF-Platform

Clean-Offline-RLHF

LLM-Alignment-Project

modsysML

hidden-context

Reinforcement-Learning-from-bandits-to-RLHF

rlhf_korean_dataset

PipelineLLM

nanoRLHF

LLM-Improving-Trained-Models-with-RLHF