gabopachecoo2000's picture
Upload folder using huggingface_hub
06d3f5d verified
πŸš€ Starting fine-tuning job...
πŸ”„ Starting fine-tuning pipeline for Qwen2.5 with LoRA...
πŸ“‚ Loading dataset from: expanded_templates.json
βœ… Loaded 473 samples.
βœ… Converted to Hugging Face Dataset.
🧠 Loading tokenizer and 8-bit quantized model: Qwen/Qwen2.5-7B
βœ… Model and tokenizer loaded successfully.
βš™οΈ Configuring LoRA adapters...
trainable params: 5,046,272 || all params: 7,620,662,784 || trainable%: 0.0662
βœ… LoRA configuration complete.
🧩 Tokenizing dataset... (this might take a while)
βœ… Dataset tokenized and ready for training.
πŸ“˜ Setting up training arguments...
βœ… Training arguments configured.
πŸš€ Starting training...
{'loss': 1.2997, 'grad_norm': 1.201093316078186, 'learning_rate': 0.00019, 'epoch': 0.17}
{'loss': 0.235, 'grad_norm': 0.19356293976306915, 'learning_rate': 0.0001788888888888889, 'epoch': 0.34}
{'loss': 0.1779, 'grad_norm': 0.17987337708473206, 'learning_rate': 0.0001677777777777778, 'epoch': 0.5}
{'loss': 0.1548, 'grad_norm': 0.1447778195142746, 'learning_rate': 0.00015666666666666666, 'epoch': 0.67}
{'loss': 0.1412, 'grad_norm': 0.1598495990037918, 'learning_rate': 0.00014555555555555556, 'epoch': 0.84}
{'loss': 0.1253, 'grad_norm': 0.4339612126350403, 'learning_rate': 0.00013444444444444447, 'epoch': 1.0}
{'loss': 0.1195, 'grad_norm': 0.1683579534292221, 'learning_rate': 0.00012333333333333334, 'epoch': 1.17}
{'loss': 0.1107, 'grad_norm': 0.24865905940532684, 'learning_rate': 0.00011222222222222223, 'epoch': 1.34}
{'loss': 0.1026, 'grad_norm': 0.23968133330345154, 'learning_rate': 0.00010111111111111112, 'epoch': 1.5}
{'loss': 0.1006, 'grad_norm': 0.20606115460395813, 'learning_rate': 9e-05, 'epoch': 1.67}
{'loss': 0.0947, 'grad_norm': 0.2228280007839203, 'learning_rate': 7.88888888888889e-05, 'epoch': 1.84}
{'loss': 0.0923, 'grad_norm': 0.5545451045036316, 'learning_rate': 6.777777777777778e-05, 'epoch': 2.0}
{'loss': 0.0863, 'grad_norm': 0.1499655395746231, 'learning_rate': 5.666666666666667e-05, 'epoch': 2.17}
{'loss': 0.0907, 'grad_norm': 0.25048738718032837, 'learning_rate': 4.555555555555556e-05, 'epoch': 2.34}
{'loss': 0.0873, 'grad_norm': 0.17657813429832458, 'learning_rate': 3.444444444444445e-05, 'epoch': 2.5}
{'loss': 0.0854, 'grad_norm': 0.163704514503479, 'learning_rate': 2.3333333333333336e-05, 'epoch': 2.67}
{'loss': 0.0876, 'grad_norm': 0.1979411542415619, 'learning_rate': 1.2222222222222222e-05, 'epoch': 2.84}
{'loss': 0.0852, 'grad_norm': 0.4943380057811737, 'learning_rate': 1.1111111111111112e-06, 'epoch': 3.0}
{'train_runtime': 105.1798, 'train_samples_per_second': 13.491, 'train_steps_per_second': 1.711, 'train_loss': 0.18203519781430563, 'epoch': 3.0}
βœ… Training completed in 1.76 minutes.
πŸ’Ύ Saving LoRA adapter...
βœ… LoRA adapter saved at: ./qwen_lora_adapter
🏁 All done! Total pipeline time: 3.45 minutes.
βœ… Fine-tuning completed.