Found 28 repositories(showing 28)
Liuziyu77
Official repository of 'Visual-RFT: Visual Reinforcement Fine-Tuning' & 'Visual-ARFT: Visual Agentic Reinforcement Fine-Tuning'’
OpenBMB
AgentCPM-GUI: An on-device GUI agent for operating Android apps, enhancing reasoning ability with reinforcement fine-tuning for efficient task execution.
KhoomeiK
Fine-tune LLM agents with online reinforcement learning
RL4VLM
Official Repo for Fine-Tuning Large Vision-Language Models as Decision-Making Agents via Reinforcement Learning
irisx3
Attention-based Deep Reinforcement Learning framework for portfolio allocation on S&P 500 equities. Includes custom environment, policy architecture with cross-sectional attention, PPO/A2C/REINFORCE agents, training/evaluation pipeline, and fine-tuning grid search.
Stanford-ILIAD
PantheonRL is a package for training and testing multi-agent reinforcement learning environments. PantheonRL supports cross-play, fine-tuning, ad-hoc coordination, and more.
MiChaelinzo
A trading agent AI is an artificial intelligence system that uses computational intelligence methods such as machine learning and deep reinforcement learning to automatically discover, implement, and fine-tune strategies for autonomous adaptive automated trading in financial markets
eval-protocol
Eval Protocol (EP) is an open solution for doing reinforcement learning fine-tuning on existing agents — across any language, container, or framework.
No description available
SuZeAI
This repository contains an AI agent for playing Tetris using the Deep Q-Learning (DQL) algorithm with fine-tuned rewards. It focuses on optimizing decision-making and performance through reinforcement learning techniques.
SandroHub013
🧪 Advanced LLM fine-tuning framework with Reinforcement Learning (GRPO/DPO), Multi-Agent Swarm Training, Adaptive Optimization, and Unsloth integration (2x faster, 70% less VRAM). Train 1.5B-70B+ models on 8GB+ GPUs with QLoRA, PEFT, LUFFY off-policy reasoning, and Search-R1. RAG-enabled with smart chunking.
projrlftsim
RLFTSim: Realistic and Controllable Multi-Agent Traffic Simulation via Reinforcement Learning Fine-Tuning [Under Review]]
seea-r1
official repo for SEEA-R1: Tree-Structured Reinforcement Fine-Tuning for Self-Evolving Embodied Agents
rchauhan1001
Distilling Multi-Agent Social Reasoning: Compressing MetaMind's Cognitive Architecture via Reinforcement Learning and Supervised Fine-Tuning
msritian
Benchmark and improve VLMs capabilities in complex, evolving markets as sequential traders - LLM post training, Reinforcement Learning Fine Tuning and Agentic AI
henyoushili111
Led the R&D of the first closed-loop intelligent agent framework for visual content generation that integrates "underlying model capability perception" with "high-level reinforcement fine-tuning," significantly boosting complex semantic alignment and execution success rates.
praveenmada
Reinforcement Learning Agents to fine tune telecom networks.
felattaoui
Workshop on Agentic finetuning for AOAI models
annavirvi-0x0598
Official Tuning Visual Fine Reinforcement Fine Reinforcement of ARFT Tuning Visual repository Agentic Visual Visual RFT
viditjain88
healthcare claim adjudication reinforcement fine-tuning agent
armundl3
Fine-Tuning AI Agents with Reinforcement Learning
JoshuaWenHIT
We present Query-MARFT, a query-guided multi-agent reinforcement fine-tuning framework.
AurumTian
official repo for SEEA-R1: Tree-Structured Reinforcement Fine-Tuning for Self-Evolving Embodied Agents
satyampurwar
Unlocking the Power of Generative AI: In-Context Learning, Instruction Fine-Tuning, Reinforcement Learning Fine-Tuning, Retrieval Augmented Generation and LangGraph Workflows for AI Agents.
mcar18
Core Idea Train a reinforcement learning agent that improves reasoning prompts for an LLM. Instead of fine-tuning the LLM directly, agent learns to optimize the reasoning process.
ankushsil17
Multi-agent debate meets reinforcement learning through game theory. Four specialized LLM agents (Researcher→Reasoning→Critic→Refiner) debate math problems; Nash equilibrium convergence and debate quality metrics provide rich reward signals for GRPO fine-tuning. Achieves +30% accuracy over baseline on GSM8K with Qwen2.5-3B-Instruct and LoRA.
This project focuses on creating a realistic, enterprise-scale seed dataset simulating how a large B2B SaaS organization uses Asana for project management. The generated dataset is designed to serve as seed data for a reinforcement learning (RL) environment, enabling evaluation and fine-tuning of AI agents.
ArumugamKrishnan
This project demonstrates how Agentic AI systems can be aligned using Reinforcement Learning from Human Feedback (RLHF) for aerospace engineering tasks. We build a browser-based aerospace dataset, create human preference pairs, and fine-tune a small language model using Direct Preference Optimization (DPO) to improve domain alignment.
All 28 repositories loaded