Found 90 repositories(showing 30)
ZHONGJunjie86
Accepted by AROB 2021. A car-agent navigates in complex traffic conditions by Mixed_Input_PPO_CNN_LSTM model.
EricChen0104
This project implements a Proximal Policy Optimization (PPO) reinforcement learning agent to train the Minitaur robot to walk in the MinitaurBulletEnv-v0 environment using PyBullet. The agent uses a multilayer perceptron (MLP) to model the policy and value networks and learns to control the robot in a continuous action space.
This repository contains a project that leverages reinforcement learning to make a humanoid robot walk in a PyBullet simulation. It uses a custom Gym environment, a Proximal Policy Optimization (PPO) agent, and a provided URDF file for the robot model. The training process prints rewards per generation and visualizes the robot's behavior.
Zunchao
training ppo for mm kuka + husky, models and agent from pybullet
Ali-Najar
**This is a reinforcement learning model using PPO to train an agent to defeat Dark Souls 3 enemies.**
Nikelroid
Adversarial Co-Evolution of RL and LLM Agents: A framework for training high-performance PPO agents against Large Language Models in Gin Rummy, utilizing curriculum learning and knowledge distillation.
A reinforcement learning–based adaptive learning system that trains DQN and PPO teacher agents for question selection, using a Deep Knowledge Tracing (DKT) student model and alternative parametric student models.
Yuanzhe-Nikola-Chen
A benchmark comparing MPC and PPO for safe autonomous lane-keeping using a kinematic bicycle model. Includes a Gym-style environment, shooting-based MPC, and a PyTorch PPO agent. Not affiliated with Mobileye, but inspired by industry ADAS/AV planning practices.
justiNNovick
A research project that experiments integrating Reinforcement Learning into the Real Business Cycle economic model. This multi-agent environment features learning via Proximal Policy optimization (PPO), and tests different utility functions from economic theory as reward mediums.
SruthiVihitha
An end-to-end project that trains, evaluates, and visualizes RL trading agents which use RNNs (LSTM / GRU) or MLPs as market-state encoders. Includes an ablation study (LSTM vs GRU vs MLP), a production-grade LSTM agent trained with PPO, and a Streamlit GUI to load models, backtest and generate live signals.
Used a generative neural network as opponent model in Reinforcement Learning to improve PPO agent's performance in heads-up Leduc poker. The opponent predictor is trained synchronically along with the policy and value networks. 2-stage training with random opponent and self-play. Successfully increased reward and win rate.
Mel-Meijer
BSc Computer Science with Security and Forensics Final Project. Explores the capability of autonomous deep reinforcement learning models to defend a network against a simulated attacker in the CAGE challenge 1 environment scenario. Trains and compares PPO, DQN, and A2C in their ability to defend a network from the two different attacking agents.
Shafwansafi06
Use historical stock data and train a Deep Reinforcement Learning (DRL) agent using PPO to model market trends.
Devanik21
Dark Zero Point Genesis: PPO Latent World Models Under Thermodynamic Scarcity 256 agents. 128D Latent Manifolds. Zero supervision. Agents utilize PPO-clipped surrogate objectives. Survival = Predictive Error Coding (PEC) × Energy Efficiency across a 50/15 Seasonal Cycle.
green-hat-001
2D orbital rocket sim with PPO in PyTorch. Models thrust, drag, gravity, fuel; agent learns efficient ascent. Includes telemetry & visualization
Next Financial Decision Model: A reinforcement learning project that develops a trading agent for financial environments. The agent is implemented using the PPO algorithm from TensorFlow's tf-agents library. The project includes a custom financial environment and a data-driven reward system.
Labeeb-coder
Flappy Bird Reinforcement Learning Agent This project trains an AI agent to play the Flappy Bird game using Deep Reinforcement Learning techniques (DQN, PPO, and Stable-Baselines3). It includes game logic, training scripts, evaluation tools, and pre-trained models.
This repository contains code for training an AI agent to play the Super Mario Bros game using reinforcement learning algorithms such as DQN, A2C, and PPO. It also includes evaluation of the trained models and testing of the best-performing model.
BENKRIMEN
Reinforcement Learning-based Underwater Wireless Sensor Network Routing using PPO. This project implements a complete UWSN environment with acoustic propagation, energy models, path-loss, and a Proximal Policy Optimization agent for optimal routing.
Bautistao2
This project implements a robotic arm control system using Deep Reinforcement Learning. The model is trained using Proximal Policy Optimization (PPO) in a simulated environment built with PyBullet. The goal is to achieve precise target-reaching motions through optimized agent training.
varshil247
An analysis of Actor-Critic based A2C and PPO trading agents, in an custom built Open AI GYM. Evaluating effects of environment knowlege (OHLCV vs Technical Indicators), reward functions (Log Returns vs Sharpe), and model architecture (MLP vs LSTM) in relation to profitability and model stability metrics. Backtested against SPY index.
gavisangavi2502-max
Deep Reinforcement Learning is used to train a trading agent on synthetic financial time-series data. A custom Gym environment models Buy, Sell, Hold decisions. PPO learns to maximize returns vs a moving-average baseline using Sharpe ratio, drawdown, and cumulative profit metrics.
Automated LLM-Guided Reinforcement Learning Testbed. This project leverages the modern BipedalWalker-v3 environment from Gymnasium to orchestrate a continuous cycle of agent training and intelligent reward shaping. By combining Stable Baselines3's PPO algorithm with the reasoning capabilities of Large Language Models (LLMs)
pragyan2905
A hybrid quantitative framework fusing Bi-LSTM networks for latent alpha factor extraction and Proximal Policy Optimization (PPO) for model-free execution. The system processes high-dimensional sentiment and technical signals to drive a stochastic policy gradient agent, optimizing dynamic portfolio allocation for maximal risk-adjusted returns.
Licensed-Driver
A template for training a single‑ticker reinforcement learning (RL) trader entirely via backtesting. It pulls OHLCV data from the Alpaca API, adds many technical indicators, simulates realistic execution with bid/ask spread and IBKR commission models, and trains an LSTM PPO agent (Stable‑Baselines3) inside a Gymnasium environment.
soudeepan
Welcome to my repository showcasing my adventure into Reinforcement Learning with the CartPole-v1 environment using the powerful Proximal Policy Optimization (PPO) model. Here, you'll find the code and resources detailing my journey as I trained an AI agent to balance a pole on a moving cart.
Sovereign default prediction using World Bank and FRED macro data. Compares two-tower neural embeddings vs tree-based models, then applies PPO reinforcement learning for risk-aware bond allocation across 117 countries. RL agent achieves 13-20% improvement over equal-weight baseline. Temporal validation on 34 years of data from 1990-2023.
Acrazt03
A neural network is trained to land the Eagle lander on a virtual moon. It starts at an altitude of 80m at a random position and lands without crashing or running out of fuel. The model was trained using PPO and the ml-agents library and the enviorement was made on the Unity game engine.
ignius299792458
Implemented and benchmarked PPO agent across pytorch, OpenAI Gym environments (CartPole, LunarLander, MountainCar) — studying policy gradient convergence, reward shaping, and hyperparameter sensitivity under continuous and discrete action spaces
feiLinX
Deep Multi-agent Reinforcement Learning Model: DQN, PPO, ACKTR