Search Results

Found 100 repositories(showing 30)

ms-swift

modelscope

💚92

Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 600+ LLMs (Qwen3.5, DeepSeek-R1, GLM-5, InternLM3, Llama4, ...) and 300+ MLLMs (Qwen3-VL, Qwen3-Omni, InternVL3.5, Ovis2.5, GLM4.5v, Llava, Phi4, ...) (AAAI 2025).

13.6k

1.3k

Apache-2.0

Python

Updated 30 minutes ago

deepseek-r1embeddinggrpo+17

Qwen3-Medical-SFT

Zeyi-Lin

🧡66

Qwen3 Fine-tuning: Medical R1 Style Chat

302

Python

Updated 2 days ago

fine-tuningqwen3r1+1

deepseek_lora

CodeDuoGun

🧡55

基于deepseek、qwen3大模型，lora sft 医疗行业数据

MIT

Python

Updated 1 week ago

EcommerceLLMQwen3

leeguandong

🧡50

基于电商数据微调的Qwen3系列的电商大模型，电商数据sft后电商大模型。是https://github.com/leeguandong/EcommerceLLM和EcommerceLLMQwen2.5的升级版本。

Python

Updated 3 weeks ago

Learning-Supervised-Finetuning-and-Reinforcement-Learning-on-Math-LLMs

Shadow-to-Light

❤️45

Implementing Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) for Qwen3 and DeepSeek-Math models. Includes experimental code, training logs, and insights on improving mathematical reasoning in LLMs.

Python

Updated 2 months ago

qwen3_Lora_sft_medicalQA

junqiangchen

🧡55

This is a medical question-answering system fine-tuned using qwen3, lora and sft

Python

Updated 2 weeks ago

weibo_robert_llm

ChenChiShui

💛70

Weibo Robert LLM 基于 Qwen3-4B 和 CommentR Interaction Dataset 的微博评论机器人训练项目，通过多阶段训练（SFT → Reward Model → RL）学习生成符合人类偏好的高质量评论回复。

MIT

Python

Updated 4 days ago

LLaMA-Factory_Qwen3-4B_QLoRA_QA_Evaluation

SAYURIqvq

💛70

一个基于LLaMA-Factory框架对Qwen3-4B模型进行监督微调(SFT)的项目，专注于中文医疗健康领域的问答任务。项目使用QLoRA方法（4-bit量化+LoRA）对模型进行高效微调，训练数据来自Huatuo26M-Lite中文医疗问答数据集，包含约26000个医疗问答对。项目提供了完整的训练、评估和推理流程，通过ChatGPT对微调模型的生成答案与标准答案进行质量比较，在医疗领域问答准确性上实现了显著提升。

MIT

Python

Updated 2 days ago

qwen3-moe-fast-sft

yehchunhung

🧡50

A fast MoE finetuning for ultimate efficiency.

Python

Updated 3 weeks ago

Medical-Qwen-Agents

tamashi486

🧡55

基于 Qwen3-32B的医疗智能体系统。打通 SFT+DPO 对齐、混合检索 RAG、反思型 Agent 及 vLLM 高性能推理的全链路解决方案

Python

Updated 1 week ago

MLLMTrainer

c925777075

🧡55

Qwen3-VL-SFT训练框架，支持LigerKernel

Python

Updated 1 week ago

EgoCross_SFT_qwen3vl4b

LiYu0524

❤️45

No description available

Python

Updated 1 week ago

SAGE-MM-Video-Reasoning

PRITHIVSAKTHIUR

🧡50

A Gradio-based demonstration for the AllenAI SAGE-MM-Qwen3-VL-4B-SFT_RL multimodal model, specialized in video reasoning tasks. Users upload MP4 videos, provide natural language prompts (e.g., "Describe this video in detail" or custom questions), and receive detailed textual analyses.

Apache-2.0

Python

Updated 2 months ago

acceleratedecordgradio+9

Qwen3-VL-SFT-RL-Tutorial

taegyeong-lee

🧡55

Qwen3-VL-SFT-GRPO-Tutorial with Bitcoin Prediction

Python

Updated 1 week ago

SFT-GRPO-GSPO-Finetune-on-Qwen3-Unsloth-Reasoning-and-Non-Reasoning-Dataset

ambideXtrous9

🧡65

GRPO and SFT Finetune Qwen3 using Unsloth : Reasoning and Non-Reasoning Dataset

Jupyter Notebook

Updated 2 days ago

cotfinetuninggrpo+6

SFT-Qwen3-Vision-Language-Assistant-for-Autonomous-Driving

SarveshBTelang

💛70

This repository contains an end-to-end pipeline for Supervised finetuning (SFT) of Qwen3-VL Vision–Language Model (VLM) for ADAS and Autonomous Driving video understanding using multi-image inputs with QLoRA, designed to run efficiently on Google Colab free-tier (T4 GPU)

Apache-2.0

Jupyter Notebook

Updated 44 minutes ago

Qwen-SFT-Pipeline

slkhms777

🧡65

【大模型后训练】基于 Qwen3 的监督微调（SFT）系统实现 | 核心模块：SFT 全流程训练、LoRA 参数高效微调、Model-level 数据获取与清洗 Pipeline

Python

Updated 6 days ago

NLP_homework3_FINE-TUNING-OF-QWEN3-4B

teleportjxh

❤️45

基于LLaMA-Factory框架对Qwen3-4B模型进行监督微调(SFT)，专注于中文医疗健康领域的问答任务。

MIT

Python

Updated 2 weeks ago

WeiboPublics-SentimentBot-E2-FineTuning-Pipeline-of-Qwen3.5-0.6B-on-AutoDL-Cloud-Infrastrcture

SuleynanAuir

🧡55

Weibo SentimentBot is built Qwen3.5-0.6B Fine-Tuning Based. It is trained through a complete fine-tuning pipeline on the AutoDL cloud computing infrastructure. Demonstrating a complete LLMs-Alignment training pipeline: starting from Qwen3.5-0.6B, through stages: SFT-Training, LoRA, and DPO preference optimization, the model gradually evolves fr

Python

Updated 4 weeks ago

autodldpofinetuning-llms+3

qwen3_medical_sft

xuxufei12

❤️35

Fine-tuning Qwen3-1.7B for chain-of-thought medical Q&A with visualization via SwanLab.

Python

Updated 6 months ago

BJTU_Qwen3-8B-SFT-RL_technical_report

zhaiwangyuxuan

🧡60

Technical Report for Experiment 5: Large Language Model Development Experiment of Railway Intelligent Information Processing, Beijing Jiaotong University

CC-BY-4.0

Updated 2 weeks ago

ChartQA

cudnah124

💛70

Vision-language AI for chart question answering using Qwen3-VL with SFT and GRPO training

MIT

Python

Updated 3 days ago

chartqafine-tuninggrpo+2

playerpokerbot

yilenpan

❤️45

Play against qwen3 sft on pokerbench

Python

Updated 2 months ago

Qwen3-law-sft

xiaoyh43-alt

❤️45

法律咨询小助手

Python

Updated 2 months ago

qwen35-tool-calling-sft

FuzzyFade

🧡65

Qwen3.5-9B-Base tool-calling SFT with Unsloth+TRL. 100k CN/EN training data, Colab notebook included.

Jupyter Notebook

Updated 1 day ago

qwen3-pretrain-sft-rl-distill-eval

xiangqian12345678

❤️30

No description available

Jupyter Notebook

Updated 2 months ago

SFT-Qwen3-ommi-for-Audio-Reasoning-Challenge

whh07141

❤️35

No description available

Python

Updated 1 month ago

Bilingual-SQL-Coder

GodRayyy

❤️40

Bilingual-SQL-Coder is a fine-tuned Text-to-SQL solution designed to robustly handle both English and Chinese queries. Built upon the powerful Qwen3-4B-Instruct, it achieves high execution accuracy through efficient SFT.

MIT

Python

Updated 3 months ago

qwen-tiny

2Elian

🧡65

记录Qwen3.5-0.8B-Base模型的后训练算法, 用于跑通SFT、RLHF流程。此外，应用一些新算子架构做后训练的探索。

Updated 3 days ago

Rain

JKYovo

🧡55

从零构建 0.1B 参数中文语言模型完整训练框架，覆盖 Tokenizer → 预训练 → SFT 全链路。融合多个开源数据集构建 1.5B Token 预训练语料与 2M+ 条 SFT 对话数据，手写类 Qwen3 Dense 模型结构与全部训练代码，完成多卡分布式预训练与指令微调，最终使模型具备中文多轮对话与指令跟随能力。

Python

Updated 1 week ago

GitHub Explorer

Search Results

ms-swift

Qwen3-Medical-SFT

deepseek_lora

EcommerceLLMQwen3

Learning-Supervised-Finetuning-and-Reinforcement-Learning-on-Math-LLMs

qwen3_Lora_sft_medicalQA

weibo_robert_llm

LLaMA-Factory_Qwen3-4B_QLoRA_QA_Evaluation

qwen3-moe-fast-sft

Medical-Qwen-Agents

MLLMTrainer

EgoCross_SFT_qwen3vl4b

SAGE-MM-Video-Reasoning

Qwen3-VL-SFT-RL-Tutorial

SFT-GRPO-GSPO-Finetune-on-Qwen3-Unsloth-Reasoning-and-Non-Reasoning-Dataset

SFT-Qwen3-Vision-Language-Assistant-for-Autonomous-Driving

Qwen-SFT-Pipeline

NLP_homework3_FINE-TUNING-OF-QWEN3-4B

WeiboPublics-SentimentBot-E2-FineTuning-Pipeline-of-Qwen3.5-0.6B-on-AutoDL-Cloud-Infrastrcture

qwen3_medical_sft

BJTU_Qwen3-8B-SFT-RL_technical_report

ChartQA

playerpokerbot

Qwen3-law-sft

qwen35-tool-calling-sft

qwen3-pretrain-sft-rl-distill-eval

SFT-Qwen3-ommi-for-Audio-Reasoning-Challenge

Bilingual-SQL-Coder

qwen-tiny

Rain

ms-swift

Qwen3-Medical-SFT

deepseek_lora

EcommerceLLMQwen3

Learning-Supervised-Finetuning-and-Reinforcement-Learning-on-Math-LLMs

qwen3_Lora_sft_medicalQA

weibo_robert_llm

LLaMA-Factory_Qwen3-4B_QLoRA_QA_Evaluation

qwen3-moe-fast-sft

Medical-Qwen-Agents

MLLMTrainer

EgoCross_SFT_qwen3vl4b

SAGE-MM-Video-Reasoning

Qwen3-VL-SFT-RL-Tutorial

SFT-GRPO-GSPO-Finetune-on-Qwen3-Unsloth-Reasoning-and-Non-Reasoning-Dataset

SFT-Qwen3-Vision-Language-Assistant-for-Autonomous-Driving

Qwen-SFT-Pipeline

NLP_homework3_FINE-TUNING-OF-QWEN3-4B

WeiboPublics-SentimentBot-E2-FineTuning-Pipeline-of-Qwen3.5-0.6B-on-AutoDL-Cloud-Infrastrcture

qwen3_medical_sft

BJTU_Qwen3-8B-SFT-RL_technical_report

ChartQA

playerpokerbot

Qwen3-law-sft

qwen35-tool-calling-sft

qwen3-pretrain-sft-rl-distill-eval

SFT-Qwen3-ommi-for-Audio-Reasoning-Challenge

Bilingual-SQL-Coder

qwen-tiny

Rain