Search Results

Found 2 repositories(showing 2)

llm-diffusion-preference-optimization

le9endary

❤️35

Direct Preference Optimization (DPO) for large language diffusion models (LLaDA-8B), using Monte Carlo ELBO-based preference loss, LoRA adapters, and 8-bit quantization for efficient single-GPU training. Achieves improved alignment +4% win rate over baseline on Anthropic HH-RLHF preference data

Jupyter Notebook

Updated 3 months ago

threshold-dpo-demo

demo11122

❤️40

A novel DPO framework that incorporates the strength of preferences in preference optimization. The framework is tested on state-of-the-art diffusion models and LLMs. All experiments in the submitted paper can be replicated using this code.

MIT

Python

Updated 12 months ago

All 2 repositories loaded

GitHub Explorer

Search Results

llm-diffusion-preference-optimization

threshold-dpo-demo

llm-diffusion-preference-optimization

threshold-dpo-demo