Search Results

Found 21 repositories(showing 21)

Vicuna-LoRA-RLHF-PyTorch

jackaduma

❤️35

A full pipeline to finetune Vicuna LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the Vicuna architecture. Basically ChatGPT but with Vicuna

221

MIT

Python

Updated 3 months ago

chatgptfinetunegpt+10

ChatGLM-LoRA-RLHF-PyTorch

jackaduma

🧡60

A full pipeline to finetune ChatGLM LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the ChatGLM architecture. Basically ChatGPT but with ChatGLM

139

MIT

Python

Updated 1 week ago

chatglmchatglm-6bchatgpt+11

Alpaca-LoRA-RLHF-PyTorch

jackaduma

🧡50

A full pipeline to finetune Alpaca LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the Alpaca architecture. Basically ChatGPT but with Alpaca

MIT

Python

Updated 1 month ago

alpacachatgptdeepspeed+10

Reinforcement-Learning-Full-Pipeline

calisweetleaf

💛70

This repository provides a production-grade implementation of the Reinforcement Learning from Human Feedback (RLHF) pipeline. It mirrors the post-training infrastructure used by major research labs, optimized for consumer hardware — including CPU-only environments with zero GPU requirement.

GPL-3.0

Python

Updated 2 days ago

dpogrpoppo+2

RL-Humanoid-Robot-Walking-Framework

lborogzj997

🧡50

A complete framework for training humanoid robots to walk using Reinforcement Learning in Isaac Gym. This project covers the full pipeline from simulation training to real-world deployment.

CC0-1.0

Updated 2 months ago

Autonomous_AI_Trading_Agent

kunal-ppatil

❤️40

An autonomous AI trading agent using Deep Reinforcement Learning (PPO). Unlike rule-based bots, it learns profitable strategies via trial-and-error using technical indicators (RSI, SMA). Built with Stable-Baselines3, Gymnasium, and yfinance. Includes full training and backtesting pipelines.

MIT

Jupyter Notebook

Updated 4 months ago

safe-llm-adaptation-peft-rlhf

nabeelshan78

❤️35

An end-to-end pipeline for adapting FLAN-T5 for dialogue summarization, exploring the full spectrum of modern LLM tuning. Implements and compares Full Fine-Tuning, PEFT (LoRA), and Reinforcement Learning (RLHF) for performance and alignment. Features a PPO-tuned model to reduce toxicity, in-depth analysis notebooks, and interactive Streamlit demo.

Jupyter Notebook

Updated 6 months ago

deep-learningfine-tuningflan-t5+14

mini-mamba-agent-1.58b

venim1103

💛70

Agentic-1.58b: A BitMamba reasoning engine built for consumer GPUs. By fusing 1.58-bit ternary quantization with Mamba-2 State Space Models via custom Triton kernels, this pipeline achieves massive context scaling on a single RTX 3090. Includes full scripts for pre-training, SFT, and GRPO reinforcement learning.

Apache-2.0

Python

Updated 1 day ago

Autonomous_Rover

ThomasBlalock

❤️35

Full pipeline codebase to train and run autonomous rover with behavioral cloning and reinforcement learning.

Python

Updated 1 year ago

rl_8puzzle

Chirag314

❤️45

Reinforcement Learning (Q-Learning) agent that solves the 8-Puzzle with a full end-to-end pipeline + 3D animation output (MP4/GIF).

Python

Updated 1 month ago

reinforcement-learning-q-learning-mdp-agent

RAProject_MedicalAILab

tianluoboding

❤️35

Built a full LLM–RF relapse prediction pipeline (BioClinicalBERT embeddings + Random Forest reward modeling + PPO reinforcement learning) for longitudinal MS patient data.

Python

Updated 4 months ago

data-lineage-tracker

Abhishekjha18

🧡65

An OpenEnv-compliant reinforcement learning environment for training AI agents to track and visualize data lineage across complex enterprise data pipelines with full regulatory compliance support.

Python

Updated 10 hours ago

Crypto-Einstein-AI---Advanced-Causal-Transformer---RL-Trading-Engine

Haqibm2003

❤️30

A research-grade crypto trading framework featuring Causal Transformers, Volatility Forecasting Models, Reinforcement Learning agents, heuristic strategies, and full backtesting + visualization pipelines, all written in pure PyTorch.

Python

Updated 4 months ago

Malliavin_IRL

LukeSnow0

🧡65

Passive Malliavin calculus-based inverse reinforcement learning. See https://arxiv.org/abs/2604.01345. Current repo holds a simple Jupyter demo replicating paper numerical experiments; full implementation pipeline in progress.

Jupyter Notebook

Updated 4 days ago

llm-chatbot

SystemSolution21

❤️35

A full-stack AI chatbot application that combines **Retrieval-Augmented Generation (RAG)** for real-time knowledge access and a complete **Reinforcement Learning from Human Feedback (RLHF)** pipeline for continuous model improvement.

MIT

Python

Updated 3 months ago

Reinforcement-Learning-Based-Fine-Tuning-of-Large-Language-Model-

AkashBadhautiya

❤️40

Implementing a full RLHF (Reinforcement Learning from Human Feedback) pipeline to fine-tune a pre-trained transformer (GPT-2) using PPO and GRPO optimization methods. The project integrates Supervised Fine-Tuning (SFT), Reward Modeling (RM), and Reinforcement Learning (RL) stages to align model behavior with human preference signals.

Jupyter Notebook

Updated 1 month ago

BACE1_de_novo_design

idd-lab

🧡65

An open-source, dual-paradigm de novo design workflow integrating ligand-based reinforcement learning and structure-based evolution. Includes full execution scripts, Colab notebooks, and MD/MMGBSA validation pipelines, benchmarked on BACE1

Jupyter Notebook

Updated 6 days ago

neurosnake-rl

sanikasurose

🧡55

An end-to-end reinforcement learning project that uses a Deep Q-Network (DQN) to train a neural network–based agent to play Snake, supported by a full experiment pipeline for training, evaluation, and visualization

Python

Updated 1 week ago

reinforcement-learning-for-robot-navigation

alizangeneh

❤️35

Research-grade reinforcement learning framework for robot navigation, covering discrete, obstacle-aware, continuous-control, and multi-agent environments with PPO and DQN, full evaluation pipeline, reproducible experiments, and LaTeX paper template for PhD-level research.

Python

Updated 4 months ago

controldqnmulti-agent+4

Polypharmacy-MORL-Multi-Objective-Reinforcement-Learning-for-Safe-Drug-Prescriptions

LusmicSam

❤️40

A full end-to-end research pipeline using Multi-Objective Reinforcement Learning (MORL) to optimize polypharmacy decisions across competing clinical objectives — drug efficacy, DDI (drug–drug interaction) risk, and patient tolerability. Includes a custom environment, GPIPD implementation, Pareto analysis, weight sweeps,

MIT

Jupyter Notebook

Updated 4 months ago

deep-rl-rebalance

JaiAnshSB26

🧡60

Deep Reinforcement Learning framework for cost-aware multi-asset portfolio rebalancing. Implements a PPO agent trained on 13 years of ETF data, benchmarked against classical strategies under realistic transaction costs. Includes reproducible experiments, evaluation pipeline, and full research report - deeprl.jaiansh.me

MIT

Python

Updated 3 weeks ago

aifinancegymnasium+6

All 21 repositories loaded

GitHub Explorer

Search Results

Vicuna-LoRA-RLHF-PyTorch

ChatGLM-LoRA-RLHF-PyTorch

Alpaca-LoRA-RLHF-PyTorch

Reinforcement-Learning-Full-Pipeline

RL-Humanoid-Robot-Walking-Framework

Autonomous_AI_Trading_Agent

safe-llm-adaptation-peft-rlhf

mini-mamba-agent-1.58b

Autonomous_Rover

rl_8puzzle

RAProject_MedicalAILab

data-lineage-tracker

Crypto-Einstein-AI---Advanced-Causal-Transformer---RL-Trading-Engine

Malliavin_IRL

llm-chatbot

Reinforcement-Learning-Based-Fine-Tuning-of-Large-Language-Model-

BACE1_de_novo_design

neurosnake-rl

reinforcement-learning-for-robot-navigation

Polypharmacy-MORL-Multi-Objective-Reinforcement-Learning-for-Safe-Drug-Prescriptions

deep-rl-rebalance

Vicuna-LoRA-RLHF-PyTorch

ChatGLM-LoRA-RLHF-PyTorch

Alpaca-LoRA-RLHF-PyTorch

Reinforcement-Learning-Full-Pipeline

RL-Humanoid-Robot-Walking-Framework

Autonomous_AI_Trading_Agent

safe-llm-adaptation-peft-rlhf

mini-mamba-agent-1.58b

Autonomous_Rover

rl_8puzzle

RAProject_MedicalAILab

data-lineage-tracker

Crypto-Einstein-AI---Advanced-Causal-Transformer---RL-Trading-Engine

Malliavin_IRL

llm-chatbot

Reinforcement-Learning-Based-Fine-Tuning-of-Large-Language-Model-

BACE1_de_novo_design

neurosnake-rl

reinforcement-learning-for-robot-navigation

Polypharmacy-MORL-Multi-Objective-Reinforcement-Learning-for-Safe-Drug-Prescriptions

deep-rl-rebalance