--- language: - ttj - en license: apache-2.0 library_name: llama.cpp pipeline_tag: text-generation base_model: Qwen/Qwen2.5-1.5B-Instruct tags: - gguf - rutooro - ttj - qwen2.5 - cpu - lora - qlora - instruction-tuning quantization_config: format: gguf method: q4_K_M --- # Qwen2.5-1.5B **TTJ (Rutooro)** — GGUF Quantized GGUF build of a QLoRA-finetuned **Qwen-2.5-1.5B-Instruct** model that *thinks and answers in Rutooro (ttj)*. This artifact is intended for **CPU-only** inference via **llama.cpp** and compatible runtimes. > **Files** > - `qwen_ttj_merged.q4_K_M.gguf` — ~0.99 GB (quantized with `llama-quantize q4_K_M`) > - (Optional in repo) `qwen_ttj_merged.q5_K_M.gguf` — larger, slightly higher quality ## ✨ What’s inside - **Base:** `Qwen/Qwen2.5-1.5B-Instruct` - **Finetune:** Supervised instruction tuning (LoRA / QLoRA) on curated Rutooro ChatML data - **Context length:** 2048 tokens (training truncation) - **Merge path:** base + LoRA → merged HF model → GGUF (via `convert_hf_to_gguf.py`) ## ✅ Intended use - Rutooro (ttj) conversational assistant and instruction following - Education, local information, general Q&A in Rutooro - When non-TTJ input is provided, pair with an external translator (e.g., NLLB-200) ## ⚠️ Limitations & bias - May produce mistakes or outdated facts; verify critical outputs. - Safety refusals/redirects translated for TTJ, but edge cases can slip through. - Not a replacement for professional advice. --- ## 🖥️ Run with **llama.cpp** Build llama.cpp, then: ```bash # Interactive prompt ./main -m qwen_ttj_merged.q4_K_M.gguf -t 8 -c 2048 -n 192 -p "Oteekateeka ki orungi okuhika aha wiiki?" # HTTP server ./llama-server -m qwen_ttj_merged.q4_K_M.gguf -t 8 -c 2048 -ngl 0 ``` > Adjust `-t` to your CPU threads. For slightly better quality, use the `q5_K_M` file if provided. ### Example (ChatML-style prompt) This model was trained with Qwen’s ChatML formatting; a simple single-turn prompt works fine, but you can also format full conversations: ```text <|im_start|>system Oli ntekereza ehangirwe ekugamba mu Runyoro/Rutooro. Hondera empabura z'abantu abakukukozesa kandi obe w'engeso nungi. <|im_end|> <|im_start|>user Nshoborora kutunga enteekateeka y’omulimo ogw’aha wiiki? <|im_end|> <|im_start|>assistant ``` --- ## 🔧 Repro (high level) - Load base in 4-bit (bitsandbytes) and train LoRA (`r=16, α=32, dropout=0.05`) with gradient checkpointing. - Loss on **assistant tokens only** (prompt masked). - Merge LoRA into base, then: ```bash python convert_hf_to_gguf.py --model-name qwen2 --outfile qwen_ttj_merged.gguf /path/to/merged_hf ./llama-quantize qwen_ttj_merged.gguf qwen_ttj_merged.q4_K_M.gguf q4_K_M ``` ## 📄 License - Base model: Apache-2.0 (Qwen 2.5 family) - Finetune & GGUF weights: Apache-2.0 Use responsibly and comply with local laws and the base model’s license. ## 🙌 Acknowledgements - Qwen team for the base model. - llama.cpp for GGUF conversion and CPU inference. - Runyoro/Rutooro AI project for data preparation and evaluation.