Found 21 repositories(showing 21)
jackaduma
A full pipeline to finetune Vicuna LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the Vicuna architecture. Basically ChatGPT but with Vicuna
jackaduma
A full pipeline to finetune ChatGLM LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the ChatGLM architecture. Basically ChatGPT but with ChatGLM
jackaduma
A full pipeline to finetune Alpaca LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the Alpaca architecture. Basically ChatGPT but with Alpaca
calisweetleaf
This repository provides a production-grade implementation of the Reinforcement Learning from Human Feedback (RLHF) pipeline. It mirrors the post-training infrastructure used by major research labs, optimized for consumer hardware — including CPU-only environments with zero GPU requirement.
lborogzj997
A complete framework for training humanoid robots to walk using Reinforcement Learning in Isaac Gym. This project covers the full pipeline from simulation training to real-world deployment.
kunal-ppatil
An autonomous AI trading agent using Deep Reinforcement Learning (PPO). Unlike rule-based bots, it learns profitable strategies via trial-and-error using technical indicators (RSI, SMA). Built with Stable-Baselines3, Gymnasium, and yfinance. Includes full training and backtesting pipelines.
nabeelshan78
An end-to-end pipeline for adapting FLAN-T5 for dialogue summarization, exploring the full spectrum of modern LLM tuning. Implements and compares Full Fine-Tuning, PEFT (LoRA), and Reinforcement Learning (RLHF) for performance and alignment. Features a PPO-tuned model to reduce toxicity, in-depth analysis notebooks, and interactive Streamlit demo.
venim1103
Agentic-1.58b: A BitMamba reasoning engine built for consumer GPUs. By fusing 1.58-bit ternary quantization with Mamba-2 State Space Models via custom Triton kernels, this pipeline achieves massive context scaling on a single RTX 3090. Includes full scripts for pre-training, SFT, and GRPO reinforcement learning.
ThomasBlalock
Full pipeline codebase to train and run autonomous rover with behavioral cloning and reinforcement learning.
Chirag314
Reinforcement Learning (Q-Learning) agent that solves the 8-Puzzle with a full end-to-end pipeline + 3D animation output (MP4/GIF).
tianluoboding
Built a full LLM–RF relapse prediction pipeline (BioClinicalBERT embeddings + Random Forest reward modeling + PPO reinforcement learning) for longitudinal MS patient data.
Abhishekjha18
An OpenEnv-compliant reinforcement learning environment for training AI agents to track and visualize data lineage across complex enterprise data pipelines with full regulatory compliance support.
A research-grade crypto trading framework featuring Causal Transformers, Volatility Forecasting Models, Reinforcement Learning agents, heuristic strategies, and full backtesting + visualization pipelines, all written in pure PyTorch.
LukeSnow0
Passive Malliavin calculus-based inverse reinforcement learning. See https://arxiv.org/abs/2604.01345. Current repo holds a simple Jupyter demo replicating paper numerical experiments; full implementation pipeline in progress.
SystemSolution21
A full-stack AI chatbot application that combines **Retrieval-Augmented Generation (RAG)** for real-time knowledge access and a complete **Reinforcement Learning from Human Feedback (RLHF)** pipeline for continuous model improvement.
Implementing a full RLHF (Reinforcement Learning from Human Feedback) pipeline to fine-tune a pre-trained transformer (GPT-2) using PPO and GRPO optimization methods. The project integrates Supervised Fine-Tuning (SFT), Reward Modeling (RM), and Reinforcement Learning (RL) stages to align model behavior with human preference signals.
idd-lab
An open-source, dual-paradigm de novo design workflow integrating ligand-based reinforcement learning and structure-based evolution. Includes full execution scripts, Colab notebooks, and MD/MMGBSA validation pipelines, benchmarked on BACE1
sanikasurose
An end-to-end reinforcement learning project that uses a Deep Q-Network (DQN) to train a neural network–based agent to play Snake, supported by a full experiment pipeline for training, evaluation, and visualization
alizangeneh
Research-grade reinforcement learning framework for robot navigation, covering discrete, obstacle-aware, continuous-control, and multi-agent environments with PPO and DQN, full evaluation pipeline, reproducible experiments, and LaTeX paper template for PhD-level research.
A full end-to-end research pipeline using Multi-Objective Reinforcement Learning (MORL) to optimize polypharmacy decisions across competing clinical objectives — drug efficacy, DDI (drug–drug interaction) risk, and patient tolerability. Includes a custom environment, GPIPD implementation, Pareto analysis, weight sweeps,
JaiAnshSB26
Deep Reinforcement Learning framework for cost-aware multi-asset portfolio rebalancing. Implements a PPO agent trained on 13 years of ETF data, benchmarked against classical strategies under realistic transaction costs. Includes reproducible experiments, evaluation pipeline, and full research report - deeprl.jaiansh.me
All 21 repositories loaded