Results table updated
Browse files
README.md
CHANGED
|
@@ -45,27 +45,27 @@ Converted for use with `llama.cpp` and compatible tools like OpenWebUI, LM Studi
|
|
| 45 |
|
| 46 |
These variants were built from a **f16** base model to ensure consistency across quant levels.
|
| 47 |
|
| 48 |
-
| Level |
|
| 49 |
-
|
| 50 |
-
| Q2_K
|
| 51 |
-
| Q3_K_S
|
| 52 |
-
| Q3_K_M
|
| 53 |
-
| Q4_K_S
|
| 54 |
-
| Q4_K_M
|
| 55 |
-
| Q5_K_S
|
| 56 |
-
| Q5_K_M
|
| 57 |
-
| Q6_K
|
| 58 |
-
| Q8_0
|
| 59 |
-
|
| 60 |
-
|
| 61 |
-
|
| 62 |
-
|
| 63 |
-
|
| 64 |
-
|
| 65 |
-
|
| 66 |
-
|
| 67 |
-
|
| 68 |
-
|
| 69 |
|
| 70 |
## Usage
|
| 71 |
|
|
@@ -73,10 +73,46 @@ Load this model using:
|
|
| 73 |
- [OpenWebUI](https://openwebui.com) β self-hosted AI interface with RAG & tools
|
| 74 |
- [LM Studio](https://lmstudio.ai) β desktop app with GPU support and chat templates
|
| 75 |
- [GPT4All](https://gpt4all.io) β private, local AI chatbot (offline-first)
|
| 76 |
-
- Or directly via
|
| 77 |
|
| 78 |
Each quantized model includes its own `README.md` and shares a common `MODELFILE` for optimal configuration.
|
| 79 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 80 |
## Author
|
| 81 |
|
| 82 |
π€ Geoff Munn (@geoffmunn)
|
|
|
|
| 45 |
|
| 46 |
These variants were built from a **f16** base model to ensure consistency across quant levels.
|
| 47 |
|
| 48 |
+
| Level | Speed | Size | Recommendation |
|
| 49 |
+
|-----------|-----------|-------------|---------------------------------------------------------------------------------------|
|
| 50 |
+
| Q2_K | β‘ Fastest | 3.28 GB | Not recommended. Came first in the bat & ball question, no other appearances. |
|
| 51 |
+
| π₯Q3_K_S | β‘ Fast | 3.77 GB | π₯ Came first and second in questions covering both ends of the temperature spectrum. |
|
| 52 |
+
| π₯ Q3_K_M | β‘ Fast | 4.12 GB | π₯ **Best overall model.** Was a top 3 finisher for all questions except the haiku. |
|
| 53 |
+
| π₯Q4_K_S | π Fast | 4.8 GB | π₯ Came first and second in questions covering both ends of the temperature spectrum. |
|
| 54 |
+
| Q4_K_M | π Fast | 5.85 GB | Came first and second in questions covering high temperature questions. |
|
| 55 |
+
| π₯ Q5_K_S | π’ Medium | 5.72 GB | π₯ A good second place. Good for all query types. |
|
| 56 |
+
| Q5_K_M | π’ Medium | 5.85 GB | Not recommended, no appeareances in the top 3 for any question. |
|
| 57 |
+
| Q6_K | π Slow | 6.73 GB | Showed up in a few results, but not recommended. |
|
| 58 |
+
| Q8_0 | π Slow | 8.71 GB | Not recommended, Only one top 3 finish. |
|
| 59 |
+
|
| 60 |
+
## Model anaysis and rankings
|
| 61 |
+
|
| 62 |
+
There are numerous good candidates - lots of different models showed up in the top 3 across all the quesionts. However, **Qwen3-8B-f16:Q5_K_M** was a finalist in all but one question so is the recommended model. **Qwen3-8B-f16:Q5_K_S** did nearly as well and is worth considering,
|
| 63 |
+
|
| 64 |
+
The 'hello' question is the first time that all models got it exactly right. All models in the 8B range did well and it's mainly a question of what one works best on your hardware.
|
| 65 |
+
|
| 66 |
+
You can read the results here: [Qwen3-8b-analysis.md](Qwen3-8b-analysis.md)
|
| 67 |
+
|
| 68 |
+
If you find this useful, please give the project a β€οΈ like.
|
| 69 |
|
| 70 |
## Usage
|
| 71 |
|
|
|
|
| 73 |
- [OpenWebUI](https://openwebui.com) β self-hosted AI interface with RAG & tools
|
| 74 |
- [LM Studio](https://lmstudio.ai) β desktop app with GPU support and chat templates
|
| 75 |
- [GPT4All](https://gpt4all.io) β private, local AI chatbot (offline-first)
|
| 76 |
+
- Or directly via `llama.cpp`
|
| 77 |
|
| 78 |
Each quantized model includes its own `README.md` and shares a common `MODELFILE` for optimal configuration.
|
| 79 |
|
| 80 |
+
Importing directly into Ollama should work, but you might encounter this error: `Error: invalid character '<' looking for beginning of value`.
|
| 81 |
+
In this case try these steps:
|
| 82 |
+
|
| 83 |
+
1. `wget https://huggingface.co/geoffmunn/Qwen3-4B/resolve/main/Qwen3-4B-f16%3AQ3_K_M.gguf` (replace the quantised version with the one you want)
|
| 84 |
+
2. `nano Modelfile` and enter these details (again, replacing Q3_K_M with the version you want):
|
| 85 |
+
```text
|
| 86 |
+
FROM ./Qwen3-4B-f16:Q3_K_M.gguf
|
| 87 |
+
|
| 88 |
+
# Chat template using ChatML (used by Qwen)
|
| 89 |
+
SYSTEM You are a helpful assistant
|
| 90 |
+
|
| 91 |
+
TEMPLATE "{{ if .System }}<|im_start|>system
|
| 92 |
+
{{ .System }}<|im_end|>{{ end }}<|im_start|>user
|
| 93 |
+
{{ .Prompt }}<|im_end|>
|
| 94 |
+
<|im_start|>assistant
|
| 95 |
+
"
|
| 96 |
+
PARAMETER stop <|im_start|>
|
| 97 |
+
PARAMETER stop <|im_end|>
|
| 98 |
+
|
| 99 |
+
# Default sampling
|
| 100 |
+
PARAMETER temperature 0.6
|
| 101 |
+
PARAMETER top_p 0.95
|
| 102 |
+
PARAMETER top_k 20
|
| 103 |
+
PARAMETER min_p 0.0
|
| 104 |
+
PARAMETER repeat_penalty 1.1
|
| 105 |
+
PARAMETER num_ctx 4096
|
| 106 |
+
```
|
| 107 |
+
|
| 108 |
+
The `num_ctx` value has been dropped to increase speed significantly.
|
| 109 |
+
|
| 110 |
+
3. Then run this command: `ollama create Qwen3-4B-f16:Q3_K_M -f Modelfile`
|
| 111 |
+
|
| 112 |
+
You will now see "Qwen3-4B-f16:Q3_K_M" in your Ollama model list.
|
| 113 |
+
|
| 114 |
+
These import steps are also useful if you want to customise the default parameters or system prompt.
|
| 115 |
+
|
| 116 |
## Author
|
| 117 |
|
| 118 |
π€ Geoff Munn (@geoffmunn)
|