Found 250 repositories(showing 30)
lucidrains
Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the PaLM architecture. Basically ChatGPT but with PaLM
CarperAI
A repo for distributed training of language models with Reinforcement Learning via Human Feedback (RLHF)
PKU-Alignment
Safe RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human Feedback
voidful
Implementation of ChatGPT RLHF (Reinforcement Learning with Human Feedback) on any generation model in huggingface's transformer (blommz-176B/bloom/gpt/bart/T5/MetaICL)
jackaduma
A full pipeline to finetune Vicuna LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the Vicuna architecture. Basically ChatGPT but with Vicuna
Jerry-XDL
AIDoctor training medical GPT model with ChatGPT training pipeline, implemantation of Pretraining, Supervised Finetuning, RLHF(Reward Modeling and Reinforcement Learning) and DPO(Direct Preferenc…
Comprehensive toolkit for Reinforcement Learning from Human Feedback (RLHF) training, featuring instruction fine-tuning, reward model training, and support for PPO and DPO algorithms with various configurations for the Alpaca, LLaMA, and LLaMA2 models.
xrsrke
Implementation of Reinforcement Learning from Human Feedback (RLHF)
lucidrains
Implementation of the Llama architecture with RLHF + Q-learning
jackaduma
A full pipeline to finetune ChatGLM LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the ChatGLM architecture. Basically ChatGPT but with ChatGLM
rkinas
This repository serves as a collection of research notes and resources on training large language models (LLMs) and Reinforcement Learning from Human Feedback (RLHF). It focuses on the latest research, methodologies, and techniques for fine-tuning language models.
l294265421
Finetuning LLaMA with RLHF (Reinforcement Learning with Human Feedback) based on DeepSpeed Chat
synlp
ChiMed-GPT is a Chinese medical large language model (LLM) built by continually training Ziya-v2 on Chinese medical data, where pre-training, supervised fine-tuning (SFT), and reinforcement learning from human feedback (RLHF) are comprehensively performed on it.
louieworth
An index of algorithms for reinforcement learning from human feedback (rlhf))
ZinYY
A PyTorch implementation of the paper "Provably Efficient Online RLHF with One-Pass Reward Modeling". This repository provides a flexible and modular approach to Online Reinforcement Learning from Human Feedback (Online RLHF).
OpenMOSE
Reinforcement Learning Toolkit for RWKV.(v6,v7,ARWKV) Distillation,SFT,RLHF(DPO,ORPO), infinite context training, Aligning. Exploring the possibilities for deeper fine-tuning of RWKV.
jackaduma
A full pipeline to finetune Alpaca LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the Alpaca architecture. Basically ChatGPT but with Alpaca
sinanuozdemir
This training offers an intensive exploration into the frontier of reinforcement learning techniques with large language models (LLMs). We will explore advanced topics such as Reinforcement Learning with Human Feedback (RLHF), Reinforcement Learning from AI Feedback (RLAIF), Reasoning LLMs, and demonstrate practical applications such as fine-tuning
michaelnny
Implements pre-training, supervised fine-tuning (SFT), and reinforcement learning from human feedback (RLHF), to train and fine-tune the LLaMA2 model to follow human instructions, similar to InstructGPT or ChatGPT, but on a much smaller scale.
firechecking
Reinforcement Learning algorithms and use-cases, including DQN, PG, A3C, PPO etc. and RLHF, AlphaZero implementations. Designed for clarity, ease of use, and educational purposes.
pickxiguapi
Uni-RLHF platform for "Uni-RLHF: Universal Platform and Benchmark Suite for Reinforcement Learning with Diverse Human Feedback" (ICLR2024)
pickxiguapi
Offline RLHF codebase implementation for "Uni-RLHF: Universal Platform and Benchmark Suite for Reinforcement Learning with Diverse Human Feedback" (ICLR2024)
astorfi
A comprehensive template for aligning large language models (LLMs) using Reinforcement Learning from Human Feedback (RLHF), transfer learning, and more. Build your own customizable LLM alignment solution with ease.
cloudguruab
Human reinforcement learning (RLHF) framework for AI models. Evaluate and compare LLM outputs, test quality, catch regressions and automate.
cassidylaidlaw
Code and data for the paper "Understanding Hidden Context in Preference Learning: Consequences for RLHF"
This repository contains lecture notes, practical materials, and implementations for the course: "Reinforcement Learning: from Bandits to RLHF" The course is designed to provide a deep and systematic understanding of RL, combining: solid mathematical foundations intuitive explanations practical implementations modern research insights
JoJo0217
For the rlhf learning environment of Koreans
iBacklight
PipelineLLM 是一个系统性的大语言模型(LLM)后训练学习项目,涵盖从监督微调(SFT)到偏好优化(DPO)、强化学习(RLHF/PPO/GRPO)再到持续学习(Continual Learning)的完整技术栈。
li-plus
Train a tiny LLaMA model from scratch to repeat your words using Reinforcement Learning from Human Feedback (RLHF)
Experimented with the three essential Reinforcement Learning with Human Feedback (RLHF) process stages. It starts by revisiting the Supervised Fine-Tuning (SFT) process, then proceeds with the training of a reward model, and finally concludes with the reinforcement learning phase. We explored and applied methods such as 4-bit quantization and LoRA