Found 1,156 repositories(showing 30)
OpenRLHF
An Easy-to-use, Scalable and High-performance Agentic RL Framework based on Ray (PPO & DAPO & REINFORCE++ & TIS & vLLM & Ray & Async RL)
BytedTsinghua-SIA
An Open-source RL System from ByteDance Seed and Tsinghua AIR
WangJingyao07
Codebase of GRPO: Implementations and Resources of GRPO and Its Variants
opendilab
LightRFT: Light, Efficient, Omni-modal & Reward-model Driven Reinforcement Fine-Tuning Framework
saikiranrallabandi
InfraMind: Fine-tuning toolkit for training SLMs on Infrastructure-as-Code using GRPO/DAPO. Achieves 97.3% accuracy on IaC generation.
Ruijian-Zha
🚀 A New DAPO Algorithm for Stock Trading (arXiv:2505.06408) Implementation of our IEEE IDS 2025 accepted algorithm combining Dynamic Sampling Policy Optimization (DAPO), Group Relative Policy Optimization (GRPO), and LLM-driven risk/sentiment signals for efficient and profitable stock trading on the NASDAQ-100 index.
komi22
Zero Trust Integrated Security Solution
mbzuai-oryx
Open Ended Medical Reinforcement Learning
lns
Source code for the paper "Divergence-Augmented Policy Optimization"
piXelicidio
Polygon Painter for Low-Poly style 3D Models. Plugin for Unity.
MystenLabs
DAPOL+ Proof of Liabilities using Bulletproofs and Sparse Merkle trees
KulunuOS
6D Assembly Pose Estimation by Point Cloud Registration for Robot Manipulation
egin10
scraping data sekolah dari web dapodik (Data Refrensi) : https://referensi.data.kemdikbud.go.id/index11.php
TeenLucifer
No description available
Yinghui-Li-New
No description available
A category to expand UINavigationController, UINavigationItem and UIViewController. You can customization UINavigationBar for each view controller and enjoy your life.
boschresearch
Accompanying code for paper "DAPO: Defect-aware Hybrid Prompt Optimization via Progressive Tuning for Zero-Shot Multi-type Anomaly Detection and Segmentation"
putradimas
Unofficial Dapodik SDK for PHP
dapodix
SDK python untuk aplikasi dapodik.
myaser
Dialectal Arabic Part Of Speech Tagger
ai-in-pm
This repository contains an implementation of the Decoupled Clip and Dynamic Sampling Policy Optimization (DAPO) algorithm for reinforcement learning with language models.
gjskywalker
No description available
egin10
Command Line App untuk scraping data sekolah dari web dapodik (Data Refrensi) : https://referensi.data.kemdikbud.go.id
rupc
DaPoA is an effort to enhance Ethereum PoA Clique algorithm using DAG-based BFT Consensus (ICBC 2024)
Dylsimple60
🤖 Enhance reinforcement learning stability and efficiency with advanced algorithms like TRPO, PPO, DPO, GRPO, DAPO, and GSPO for optimized policy training.
DevDizzle
An iterative pipeline for optimizing prompt engineering strategies to generate high-quality structured requirements documents. Uses Dynamic Adaptive Prompt Optimization (DAPO) and an LLM-as-a-Judge to evaluate and refine prompts automatically.
novay
Dapodik Unofficial API.
ztlmememe
Scripts and recipes for running DAPO training on NSCC cluster with Singularity and Ray.
AchoWu
Group Contrastive Policy Optimazation. Read the paper on arXiv: 👉 https://arxiv.org/abs/2510.07790
ahmaddyd
Custom Modul Dapoer Idita Odoo 14