Found 100 repositories(showing 30)
modelscope
Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 600+ LLMs (Qwen3.5, DeepSeek-R1, GLM-5, InternLM3, Llama4, ...) and 300+ MLLMs (Qwen3-VL, Qwen3-Omni, InternVL3.5, Ovis2.5, GLM4.5v, Llava, Phi4, ...) (AAAI 2025).
Zeyi-Lin
Qwen3 Fine-tuning: Medical R1 Style Chat
CodeDuoGun
基于deepseek、qwen3大模型,lora sft 医疗行业数据
leeguandong
基于电商数据微调的Qwen3系列的电商大模型,电商数据sft后电商大模型。是https://github.com/leeguandong/EcommerceLLM和EcommerceLLMQwen2.5的升级版本。
Implementing Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) for Qwen3 and DeepSeek-Math models. Includes experimental code, training logs, and insights on improving mathematical reasoning in LLMs.
junqiangchen
This is a medical question-answering system fine-tuned using qwen3, lora and sft
ChenChiShui
Weibo Robert LLM 基于 Qwen3-4B 和 CommentR Interaction Dataset 的微博评论机器人训练项目,通过多阶段训练(SFT → Reward Model → RL)学习生成符合人类偏好的高质量评论回复。
一个基于LLaMA-Factory框架对Qwen3-4B模型进行监督微调(SFT)的项目,专注于中文医疗健康领域的问答任务。项目使用QLoRA方法(4-bit量化+LoRA)对模型进行高效微调,训练数据来自Huatuo26M-Lite中文医疗问答数据集,包含约26000个医疗问答对。项目提供了完整的训练、评估和推理流程,通过ChatGPT对微调模型的生成答案与标准答案进行质量比较,在医疗领域问答准确性上实现了显著提升。
yehchunhung
A fast MoE finetuning for ultimate efficiency.
tamashi486
基于 Qwen3-32B的医疗智能体系统。打通 SFT+DPO 对齐、混合检索 RAG、反思型 Agent 及 vLLM 高性能推理的全链路解决方案
c925777075
Qwen3-VL-SFT训练框架,支持LigerKernel
LiYu0524
No description available
PRITHIVSAKTHIUR
A Gradio-based demonstration for the AllenAI SAGE-MM-Qwen3-VL-4B-SFT_RL multimodal model, specialized in video reasoning tasks. Users upload MP4 videos, provide natural language prompts (e.g., "Describe this video in detail" or custom questions), and receive detailed textual analyses.
taegyeong-lee
Qwen3-VL-SFT-GRPO-Tutorial with Bitcoin Prediction
GRPO and SFT Finetune Qwen3 using Unsloth : Reasoning and Non-Reasoning Dataset
This repository contains an end-to-end pipeline for Supervised finetuning (SFT) of Qwen3-VL Vision–Language Model (VLM) for ADAS and Autonomous Driving video understanding using multi-image inputs with QLoRA, designed to run efficiently on Google Colab free-tier (T4 GPU)
slkhms777
【大模型后训练】基于 Qwen3 的监督微调(SFT)系统实现 | 核心模块:SFT 全流程训练、LoRA 参数高效微调、Model-level 数据获取与清洗 Pipeline
teleportjxh
基于LLaMA-Factory框架对Qwen3-4B模型进行监督微调(SFT),专注于中文医疗健康领域的问答任务。
Weibo SentimentBot is built Qwen3.5-0.6B Fine-Tuning Based. It is trained through a complete fine-tuning pipeline on the AutoDL cloud computing infrastructure. Demonstrating a complete LLMs-Alignment training pipeline: starting from Qwen3.5-0.6B, through stages: SFT-Training, LoRA, and DPO preference optimization, the model gradually evolves fr
xuxufei12
Fine-tuning Qwen3-1.7B for chain-of-thought medical Q&A with visualization via SwanLab.
zhaiwangyuxuan
Technical Report for Experiment 5: Large Language Model Development Experiment of Railway Intelligent Information Processing, Beijing Jiaotong University
cudnah124
Vision-language AI for chart question answering using Qwen3-VL with SFT and GRPO training
yilenpan
Play against qwen3 sft on pokerbench
xiaoyh43-alt
法律咨询小助手
FuzzyFade
Qwen3.5-9B-Base tool-calling SFT with Unsloth+TRL. 100k CN/EN training data, Colab notebook included.
xiangqian12345678
No description available
No description available
GodRayyy
Bilingual-SQL-Coder is a fine-tuned Text-to-SQL solution designed to robustly handle both English and Chinese queries. Built upon the powerful Qwen3-4B-Instruct, it achieves high execution accuracy through efficient SFT.
2Elian
记录Qwen3.5-0.8B-Base模型的后训练算法, 用于跑通SFT、RLHF流程。此外,应用一些新算子架构做后训练的探索。
JKYovo
从零构建 0.1B 参数中文语言模型完整训练框架,覆盖 Tokenizer → 预训练 → SFT 全链路。融合多个开源数据集构建 1.5B Token 预训练语料与 2M+ 条 SFT 对话数据,手写类 Qwen3 Dense 模型结构与全部训练代码,完成多卡分布式预训练与指令微调,最终使模型具备中文多轮对话与指令跟随能力。