gabopachecoo2000's picture
Upload folder using huggingface_hub
4ca060b verified
πŸš€ Starting fine-tuning job...
πŸ”„ Starting fine-tuning pipeline for Qwen2.5 with LoRA...
πŸ“‚ Loading dataset from: expanded_templates.json
βœ… Loaded 473 samples.
βœ… Converted to Hugging Face Dataset.
🧠 Loading tokenizer and 8-bit quantized model: Qwen/Qwen2.5-7B
βœ… Model and tokenizer loaded successfully.
βš™οΈ Configuring LoRA adapters...
trainable params: 5,046,272 || all params: 7,620,662,784 || trainable%: 0.0662
βœ… LoRA configuration complete.
🧩 Tokenizing dataset... (this might take a while)
βœ… Dataset tokenized and ready for training.
πŸ“˜ Setting up training arguments...
βœ… Training arguments configured.
πŸš€ Starting training...
{'loss': 1.5142, 'grad_norm': 1.5487449169158936, 'learning_rate': 0.00019, 'epoch': 0.17}
{'loss': 0.2652, 'grad_norm': 0.2191646546125412, 'learning_rate': 0.0001788888888888889, 'epoch': 0.34}
{'loss': 0.1876, 'grad_norm': 0.1973702609539032, 'learning_rate': 0.0001677777777777778, 'epoch': 0.5}
{'loss': 0.1599, 'grad_norm': 0.13070529699325562, 'learning_rate': 0.00015666666666666666, 'epoch': 0.67}
{'loss': 0.1446, 'grad_norm': 0.13922926783561707, 'learning_rate': 0.00014555555555555556, 'epoch': 0.84}
{'loss': 0.1264, 'grad_norm': 0.38842177391052246, 'learning_rate': 0.00013444444444444447, 'epoch': 1.0}
{'loss': 0.1201, 'grad_norm': 0.1864204704761505, 'learning_rate': 0.00012333333333333334, 'epoch': 1.17}
{'loss': 0.1108, 'grad_norm': 0.16994109749794006, 'learning_rate': 0.00011222222222222223, 'epoch': 1.34}
{'loss': 0.1023, 'grad_norm': 0.27831146121025085, 'learning_rate': 0.00010111111111111112, 'epoch': 1.5}
{'loss': 0.1006, 'grad_norm': 0.22432690858840942, 'learning_rate': 9e-05, 'epoch': 1.67}
{'loss': 0.095, 'grad_norm': 0.22253672778606415, 'learning_rate': 7.88888888888889e-05, 'epoch': 1.84}
{'loss': 0.0926, 'grad_norm': 0.5716226100921631, 'learning_rate': 6.777777777777778e-05, 'epoch': 2.0}
{'loss': 0.0864, 'grad_norm': 0.15866339206695557, 'learning_rate': 5.666666666666667e-05, 'epoch': 2.17}
{'loss': 0.0914, 'grad_norm': 0.19458766281604767, 'learning_rate': 4.555555555555556e-05, 'epoch': 2.34}
{'loss': 0.0877, 'grad_norm': 0.17066717147827148, 'learning_rate': 3.444444444444445e-05, 'epoch': 2.5}
{'loss': 0.086, 'grad_norm': 0.1886722594499588, 'learning_rate': 2.3333333333333336e-05, 'epoch': 2.67}
{'loss': 0.0886, 'grad_norm': 0.18068477511405945, 'learning_rate': 1.2222222222222222e-05, 'epoch': 2.84}
{'loss': 0.0859, 'grad_norm': 0.4594188630580902, 'learning_rate': 1.1111111111111112e-06, 'epoch': 3.0}
{'train_runtime': 73.1295, 'train_samples_per_second': 19.404, 'train_steps_per_second': 2.461, 'train_loss': 0.19696807795100743, 'epoch': 3.0}
βœ… Training completed in 1.22 minutes.
πŸ’Ύ Saving LoRA adapter...
βœ… LoRA adapter saved at: ./qwen_lora_adapter
🏁 All done! Total pipeline time: 1.91 minutes.
βœ… Fine-tuning completed.