Search Results

Found 36 repositories(showing 30)

llama-trl

jasonvanf

🧡60

LLaMA-TRL: Fine-tuning LLaMA with PPO and LoRA

240

Apache-2.0

Python

Updated 1 week ago

adapterchatgptgpt+8

TRL_Llama3_Finetune_train

hellangleZ

❤️40

TRL train script, Automatic support DDP, support TP

Apache-2.0

Python

Updated 2 months ago

TRL_chat_mode_FT_LLAMA3-with-Deepspeed

hellangleZ

❤️45

Python

Updated 2 months ago

sft-note

luochang212

🧡55

三种方法实现监督微调 (SFT)：LLaMA Factory, trl 和 unsloth

Jupyter Notebook

Updated 2 weeks ago

fine-tuningllmsft

Llama-PEFT-QLoRA

wkenjii

🧡50

QLoRA fine-tuning of Llama-3.2-1B on the Dolly-15k dataset using PEFT and TRL.

MIT

Jupyter Notebook

Updated 1 month ago

TRL-Transformer-Reinforcement-Learning.-LLaMA7b

kevinmantyniemi98

❤️25

No description available

Updated 2 years ago

llama3-finetune-pricer

Suyash84270

❤️35

Fine-tuning Meta-Llama-3.1 for product price prediction (QLoRA, PEFT, TRL)

Jupyter Notebook

Updated 6 months ago

This repository demonstrates how to leverage the TRL (Text Reinforcement Learning) library’s SFTTrainer and the PEFT (Parameter-Efficient Fine-Tuning) approach to fine-tune (SFT: Supervised Fine-Tuning) large language models such as LLaMA or LLM-JP using LoRA (Low-Rank Adapters).

Python

Updated 1 year ago

Fine-Tuning-with-Unsloth-and-LoRA

ahmad-act

❤️40

This repository demonstrates how to fine-tune the Phi-3 Mini 4K Instruct model using Unsloth, LoRA (Low-Rank Adaptation), and trl’s SFTTrainer. It uses a chat-style dataset formatted in .jsonl with user and assistant roles. Final output is a GGUF quantized model ready for use with efficient inference engines like llama.cpp.

MIT

Jupyter Notebook

Updated 8 months ago

fine_tuning_llm_with_our_faq_data

dharsandip

❤️20

In this project, preparation of FAQ type of data from the scratch, converting and structuring of the data to right format for fine-tuning of LLM, fine-tuning of LLM (Llama-3.2) with that data and finally evaluation of the fine-tuned LLM are done. Unsloth, LoRA (Low-Rank Adaptation) technique. SFTTrainer (trl) etc. are used.

Jupyter Notebook

Updated 11 months ago

llama3-trl

zhiyu-zhao-ucas

❤️25

No description available

Python

Updated 1 year ago

llama-stack-provider-trl

Nehanth

❤️10

No description available

Python

Updated 8 months ago

llama-stack-provider-trl-remote

Nehanth

❤️25

No description available

Python

Updated 8 months ago

llm-fine-tuning

schnappi0723

❤️35

llm fine-tuning based on three methods: llama factory, trl and unsloth

Jupyter Notebook

Updated 3 months ago

Fine-Tuning-LLaMA-2

saurav-14

❤️35

Fine-tuned LLaMA 2 with LoRA adapters using PEFT, Transformers, and TRL for efficient supervised fine-tuning.

Jupyter Notebook

Updated 4 months ago

Fine-Tuning-LLaMA-2-

saurav-14

❤️35

Fine-tuned LLaMA 2 with LoRA adapters using PEFT, Transformers, and TRL for efficient supervised fine-tuning

Jupyter Notebook

Updated 4 months ago

llama_training_customDataset

harishjan

❤️35

This colab code trains llama custom Dataset from huggingface "travel-conversations-finetuning" using unsloth and Huggingface TRL's SFTTrainer

Jupyter Notebook

Updated 10 months ago

llama3-pytorch

winkash

❤️35

fine-tune a Llama 3 using PyTorch FSDP and Q-Lora with the help of Hugging Face TRL, Transformers, peft & datasets.

Python

Updated 1 year ago

llamaRagMaster

fmlucero

❤️45

Hybrid Retrieval + Re-ranking, Chunking Semántico de Contexto, Fine-Tuning Scripts con PEFT y TRL, Métricas de Evaluación Automática, LLaMA Prompt Engineering

Python

Updated 1 month ago

Finetuning_llma3.2-3B

Ayanp345

❤️35

This repository showcases end-to-end supervised fine-tuning of the Llama 3.2B model using Unsloth, Hugging Face’s Transformers, and TRL libraries

Jupyter Notebook

Updated 5 months ago

unsloth-finetuning-llama3

h-abid97

❤️35

🔧 Fine-tune LLaMA 3.2B (4-bit) with Unsloth, LoRA, and TRL on the FineTome-100k dataset — optimized for fast, memory-efficient instruction tuning.

Python

Updated 7 months ago

Sentiment_Analysis_Using_Llama-2

rohitmanurkar

❤️35

A project demonstrating how to efficiently fine-tune the meta-llama/Llama-2-7b-chat-hf model for financial news sentiment analysis using QLoRA (4-bit quantization and LoRA) with the Hugging Face TRL library.

Jupyter Notebook

Updated 6 months ago

fine-tuning-llama-2

Rushikesh-Chavan-777

❤️35

Fine tuning of Llama-2-7b-chat on a custom dataset (mlabonne/guanaco-llama2-1k) using the SFTTrainer from the trl library in Google Colab.

Jupyter Notebook

Updated 9 months ago

Medical-Llama-3-Clinical-Decision-Support-System

imaneelhy

❤️40

End-to-end pipeline to fine-tune Llama-3 8B for medical assistant conversations using QLoRA (4-bit + LoRA). Includes dataset formatting, SFT training (Unsloth/TRL).

MIT

Jupyter Notebook

Updated 3 months ago

llm-finetuning-rlhf-pipeline

hasnat23

🧡55

End-to-end LLM fine-tuning pipeline using RLHF, LoRA, and QLoRA. Fine-tunes LLaMA-2 and Mistral models for instruction following using TRL, PEFT, and Hugging Face.

Python

Updated 3 weeks ago

DAY22---Finetuning_LLMs_using_Quantization_

sushantkai

❤️35

Fine-tune LLaMA-2 7B for chatbots using LoRA and 4-bit quantization with Hugging Face Transformers and TRL. Efficient low-memory training with GPU support and Hugging Face Hub integration.”

Jupyter Notebook

Updated 3 months ago

llama2-finetuning-qlora-bnb-4bits

SaraTerani

❤️40

Fine-tuning Llama 2 with QLoRA and BitsAndBytes 4-bit quantization for efficient training on limited GPU resources. Includes dataset loading, LoRA configuration, supervised fine-tuning with trl.SFTTrainer, and evaluation.

Jupyter Notebook

Updated 2 months ago

Llama3-8B-SFT-RL

t1sun1012

❤️35

HumanEval-style SFT + (attempted) PPO to improve pass@1 for Llama-3-8B-Instruct. Synthetic HumanEval-format tasks and body-only solutions are distilled from GPT-5; SFT via LLaMA-Factory (LoRA), deterministic greedy evaluation with the official HumanEval harness. PPO (TRL + DeepSpeed ZeRO-3) attempted on MBPP→HumanEval-style

Python

Updated 5 months ago

chatbot-telegram-gpt3.5

afcoral124

❤️20

Trabajo de Grado: Construcción de un Asistente Virtual Inteligente capaz de evaluar la madurez de tecnologías en desarrollo utilizando LLMs como LlaMa 2 y GPT3.5 Turbo (Metodología CRISP-ML(Q) y el TRL de La NASA))

Python

Updated 11 months ago

Llama_3.2-vl_model_finetuning

Shridharpawar77

❤️35

A lightweight pipeline for fine-tuning Llama 3.2 Vision on custom image–text datasets. Converts local CSV + images into chat-format messages, applies QLoRA adapters, and trains using TRL’s SFTTrainer for high-quality vision-instruction generation.

Jupyter Notebook

Updated 4 months ago

GitHub Explorer

Search Results

llama-trl

TRL_Llama3_Finetune_train

TRL_chat_mode_FT_LLAMA3-with-Deepspeed

sft-note

Llama-PEFT-QLoRA

TRL-Transformer-Reinforcement-Learning.-LLaMA7b

llama3-finetune-pricer

LLM-SFT

Fine-Tuning-with-Unsloth-and-LoRA

fine_tuning_llm_with_our_faq_data

llama3-trl

llama-stack-provider-trl

llama-stack-provider-trl-remote

llm-fine-tuning

Fine-Tuning-LLaMA-2

Fine-Tuning-LLaMA-2-

llama_training_customDataset

llama3-pytorch

llamaRagMaster

Finetuning_llma3.2-3B

unsloth-finetuning-llama3

Sentiment_Analysis_Using_Llama-2

fine-tuning-llama-2

Medical-Llama-3-Clinical-Decision-Support-System

llm-finetuning-rlhf-pipeline

DAY22---Finetuning_LLMs_using_Quantization_

llama2-finetuning-qlora-bnb-4bits

Llama3-8B-SFT-RL

chatbot-telegram-gpt3.5

Llama_3.2-vl_model_finetuning

llama-trl

TRL_Llama3_Finetune_train

TRL_chat_mode_FT_LLAMA3-with-Deepspeed

sft-note

Llama-PEFT-QLoRA

TRL-Transformer-Reinforcement-Learning.-LLaMA7b

llama3-finetune-pricer

LLM-SFT

Fine-Tuning-with-Unsloth-and-LoRA

fine_tuning_llm_with_our_faq_data

llama3-trl

llama-stack-provider-trl

llama-stack-provider-trl-remote

llm-fine-tuning

Fine-Tuning-LLaMA-2

Fine-Tuning-LLaMA-2-

llama_training_customDataset

llama3-pytorch

llamaRagMaster

Finetuning_llma3.2-3B

unsloth-finetuning-llama3

Sentiment_Analysis_Using_Llama-2

fine-tuning-llama-2

Medical-Llama-3-Clinical-Decision-Support-System

llm-finetuning-rlhf-pipeline

DAY22---Finetuning_LLMs_using_Quantization_

llama2-finetuning-qlora-bnb-4bits

Llama3-8B-SFT-RL

chatbot-telegram-gpt3.5

Llama_3.2-vl_model_finetuning