Found 36 repositories(showing 30)
jasonvanf
LLaMA-TRL: Fine-tuning LLaMA with PPO and LoRA
hellangleZ
TRL train script, Automatic support DDP, support TP
hellangleZ
ww
luochang212
三种方法实现监督微调 (SFT):LLaMA Factory, trl 和 unsloth
wkenjii
QLoRA fine-tuning of Llama-3.2-1B on the Dolly-15k dataset using PEFT and TRL.
kevinmantyniemi98
No description available
Suyash84270
Fine-tuning Meta-Llama-3.1 for product price prediction (QLoRA, PEFT, TRL)
hanoi0126
This repository demonstrates how to leverage the TRL (Text Reinforcement Learning) library’s SFTTrainer and the PEFT (Parameter-Efficient Fine-Tuning) approach to fine-tune (SFT: Supervised Fine-Tuning) large language models such as LLaMA or LLM-JP using LoRA (Low-Rank Adapters).
ahmad-act
This repository demonstrates how to fine-tune the Phi-3 Mini 4K Instruct model using Unsloth, LoRA (Low-Rank Adaptation), and trl’s SFTTrainer. It uses a chat-style dataset formatted in .jsonl with user and assistant roles. Final output is a GGUF quantized model ready for use with efficient inference engines like llama.cpp.
dharsandip
In this project, preparation of FAQ type of data from the scratch, converting and structuring of the data to right format for fine-tuning of LLM, fine-tuning of LLM (Llama-3.2) with that data and finally evaluation of the fine-tuned LLM are done. Unsloth, LoRA (Low-Rank Adaptation) technique. SFTTrainer (trl) etc. are used.
zhiyu-zhao-ucas
No description available
Nehanth
No description available
No description available
schnappi0723
llm fine-tuning based on three methods: llama factory, trl and unsloth
saurav-14
Fine-tuned LLaMA 2 with LoRA adapters using PEFT, Transformers, and TRL for efficient supervised fine-tuning.
saurav-14
Fine-tuned LLaMA 2 with LoRA adapters using PEFT, Transformers, and TRL for efficient supervised fine-tuning
harishjan
This colab code trains llama custom Dataset from huggingface "travel-conversations-finetuning" using unsloth and Huggingface TRL's SFTTrainer
winkash
fine-tune a Llama 3 using PyTorch FSDP and Q-Lora with the help of Hugging Face TRL, Transformers, peft & datasets.
fmlucero
Hybrid Retrieval + Re-ranking, Chunking Semántico de Contexto, Fine-Tuning Scripts con PEFT y TRL, Métricas de Evaluación Automática, LLaMA Prompt Engineering
Ayanp345
This repository showcases end-to-end supervised fine-tuning of the Llama 3.2B model using Unsloth, Hugging Face’s Transformers, and TRL libraries
h-abid97
🔧 Fine-tune LLaMA 3.2B (4-bit) with Unsloth, LoRA, and TRL on the FineTome-100k dataset — optimized for fast, memory-efficient instruction tuning.
rohitmanurkar
A project demonstrating how to efficiently fine-tune the meta-llama/Llama-2-7b-chat-hf model for financial news sentiment analysis using QLoRA (4-bit quantization and LoRA) with the Hugging Face TRL library.
Rushikesh-Chavan-777
Fine tuning of Llama-2-7b-chat on a custom dataset (mlabonne/guanaco-llama2-1k) using the SFTTrainer from the trl library in Google Colab.
End-to-end pipeline to fine-tune Llama-3 8B for medical assistant conversations using QLoRA (4-bit + LoRA). Includes dataset formatting, SFT training (Unsloth/TRL).
hasnat23
End-to-end LLM fine-tuning pipeline using RLHF, LoRA, and QLoRA. Fine-tunes LLaMA-2 and Mistral models for instruction following using TRL, PEFT, and Hugging Face.
Fine-tune LLaMA-2 7B for chatbots using LoRA and 4-bit quantization with Hugging Face Transformers and TRL. Efficient low-memory training with GPU support and Hugging Face Hub integration.”
SaraTerani
Fine-tuning Llama 2 with QLoRA and BitsAndBytes 4-bit quantization for efficient training on limited GPU resources. Includes dataset loading, LoRA configuration, supervised fine-tuning with trl.SFTTrainer, and evaluation.
t1sun1012
HumanEval-style SFT + (attempted) PPO to improve pass@1 for Llama-3-8B-Instruct. Synthetic HumanEval-format tasks and body-only solutions are distilled from GPT-5; SFT via LLaMA-Factory (LoRA), deterministic greedy evaluation with the official HumanEval harness. PPO (TRL + DeepSpeed ZeRO-3) attempted on MBPP→HumanEval-style
afcoral124
Trabajo de Grado: Construcción de un Asistente Virtual Inteligente capaz de evaluar la madurez de tecnologías en desarrollo utilizando LLMs como LlaMa 2 y GPT3.5 Turbo (Metodología CRISP-ML(Q) y el TRL de La NASA))
Shridharpawar77
A lightweight pipeline for fine-tuning Llama 3.2 Vision on custom image–text datasets. Converts local CSV + images into chat-format messages, applies QLoRA adapters, and trains using TRL’s SFTTrainer for high-quality vision-instruction generation.