geoffmunn
/

Qwen3-8B

@@ -45,27 +45,27 @@ Converted for use with `llama.cpp` and compatible tools like OpenWebUI, LM Studi
 These variants were built from a **f16** base model to ensure consistency across quant levels.
-| Level     | Quality       | Speed     | Size      | Recommendation |
-|----------|--------------|----------|-----------|----------------|
-| Q2_K     | Very Low     | ⚡ Fastest | 3.28 GB   | Only on severely memory-constrained systems (<6GB RAM). Avoid for reasoning. |
-| Q3_K_S   | Low          | ⚡ Fast    | 3.77 GB   | Minimal viability; basic completion only. Not recommended. |
-| Q3_K_M   | Low-Medium   | ⚡ Fast    | 4.12 GB   | Acceptable for simple chat on older systems. No complex logic. |
-| Q4_K_S   | Medium       | 🚀 Fast    | 4.8 GB   | Good balance for low-end laptops or embedded platforms. |
-| Q4_K_M   | ✅ Balanced   | 🚀 Fast    | 5.85 GB   | Best overall for general use on average hardware. Great speed/quality trade-off. |
-| Q5_K_S   | High         | 🐢 Medium  | 5.72 GB   | Better reasoning; slightly faster than Q5_K_M. Ideal for coding. |
-| Q5_K_M   | ✅✅ High     | 🐢 Medium  | 5.85 GB   | Top pick for deep interactions, logic, and tool use. Recommended for desktops. |
-| Q6_K     | 🔥 Near-FP16 | 🐌 Slow    | 6.73 GB   | Excellent fidelity; ideal for RAG, retrieval, and accuracy-critical tasks. |
-| Q8_0     | 🏆 Lossless*  | 🐌 Slow    | 8.71 GB   | Maximum accuracy; best for research, benchmarking, or archival. |
-> 💡 **Recommendations by Use Case**
->
-> - 💻 **Low-end CPU / Old Laptop**: `Q4_K_M` (best balance under pressure)
-> - 🖥️ **Standard/Mid-tier Laptop (i5/i7/M1/M2)**: `Q5_K_M` (optimal quality)
-> - 🧠 **Reasoning, Coding, Math**: `Q5_K_M` or `Q6_K` (use thinking mode!)
-> - 🤖 **Agent & Tool Integration**: `Q5_K_M` — handles JSON, function calls well
-> - 🔍 **RAG, Retrieval, Precision Tasks**: `Q6_K` or `Q8_0`
-> - 📦 **Storage-Constrained Devices**: `Q4_K_S` or `Q4_K_M`
-> - 🛠️ **Development & Testing**: Test from `Q4_K_M` up to `Q8_0` to assess trade-offs
 ## Usage
@@ -73,10 +73,46 @@ Load this model using:
 - [OpenWebUI](https://openwebui.com) – self-hosted AI interface with RAG & tools
 - [LM Studio](https://lmstudio.ai) – desktop app with GPU support and chat templates
 - [GPT4All](https://gpt4all.io) – private, local AI chatbot (offline-first)
-- Or directly via \`llama.cpp\`
 Each quantized model includes its own `README.md` and shares a common `MODELFILE` for optimal configuration.
 ## Author
 👤 Geoff Munn (@geoffmunn)

 These variants were built from a **f16** base model to ensure consistency across quant levels.
+| Level     | Speed     | Size        | Recommendation                                                                        |
+|-----------|-----------|-------------|---------------------------------------------------------------------------------------|
+| Q2_K      | ⚡ Fastest | 3.28 GB     | Not recommended. Came first in the bat & ball question, no other appearances.         |
+| 🥉Q3_K_S  | ⚡ Fast    | 3.77 GB     | 🥉 Came first and second in questions covering both ends of the temperature spectrum. |
+| 🥇 Q3_K_M | ⚡ Fast    | 4.12 GB     | 🥇 **Best overall model.** Was a top 3 finisher for all questions except the haiku.   |
+| 🥉Q4_K_S  | 🚀 Fast   | 4.8 GB      | 🥉 Came first and second in questions covering both ends of the temperature spectrum. |
+| Q4_K_M    | 🚀 Fast   | 5.85 GB     | Came first and second in questions covering high temperature questions.               |
+| 🥈 Q5_K_S | 🐢 Medium | 5.72 GB     | 🥈 A good second place. Good for all query types.                                     |
+| Q5_K_M    | 🐢 Medium | 5.85 GB     | Not recommended, no appeareances in the top 3 for any question.                       |
+| Q6_K      | 🐌 Slow   | 6.73 GB     | Showed up in a few results, but not recommended.                                      |
+| Q8_0      | 🐌 Slow   | 8.71 GB     | Not recommended, Only one top 3 finish.                                               |
+## Model anaysis and rankings
+There are numerous good candidates - lots of different models showed up in the top 3 across all the quesionts. However, **Qwen3-8B-f16:Q5_K_M** was a finalist in all but one question so is the recommended model. **Qwen3-8B-f16:Q5_K_S** did nearly as well and is worth considering,
+The 'hello' question is the first time that all models got it exactly right. All models in the 8B range did well and it's mainly a question of what one works best on your hardware.
+You can read the results here: [Qwen3-8b-analysis.md](Qwen3-8b-analysis.md)
+If you find this useful, please give the project a ❤️ like.
 ## Usage
 - [OpenWebUI](https://openwebui.com) – self-hosted AI interface with RAG & tools
 - [LM Studio](https://lmstudio.ai) – desktop app with GPU support and chat templates
 - [GPT4All](https://gpt4all.io) – private, local AI chatbot (offline-first)
+- Or directly via `llama.cpp`
 Each quantized model includes its own `README.md` and shares a common `MODELFILE` for optimal configuration.
+Importing directly into Ollama should work, but you might encounter this error: `Error: invalid character '<' looking for beginning of value`.
+In this case try these steps:
+1. `wget https://huggingface.co/geoffmunn/Qwen3-4B/resolve/main/Qwen3-4B-f16%3AQ3_K_M.gguf` (replace the quantised version with the one you want)
+2. `nano Modelfile` and enter these details (again, replacing Q3_K_M with the version you want):
+```text
+FROM ./Qwen3-4B-f16:Q3_K_M.gguf
+# Chat template using ChatML (used by Qwen)
+SYSTEM You are a helpful assistant
+TEMPLATE "{{ if .System }}<|im_start|>system
+{{ .System }}<|im_end|>{{ end }}<|im_start|>user
+{{ .Prompt }}<|im_end|>
+<|im_start|>assistant
+"
+PARAMETER stop <|im_start|>
+PARAMETER stop <|im_end|>
+# Default sampling
+PARAMETER temperature 0.6
+PARAMETER top_p 0.95
+PARAMETER top_k 20
+PARAMETER min_p 0.0
+PARAMETER repeat_penalty 1.1
+PARAMETER num_ctx 4096
+```
+The `num_ctx` value has been dropped to increase speed significantly.
+3. Then run this command: `ollama create Qwen3-4B-f16:Q3_K_M -f Modelfile`
+You will now see "Qwen3-4B-f16:Q3_K_M" in your Ollama model list.
+These import steps are also useful if you want to customise the default parameters or system prompt.
 ## Author
 👤 Geoff Munn (@geoffmunn)