geoffmunn commited on
Commit
2d8fdf4
Β·
verified Β·
1 Parent(s): 7330008

Results table updated

Browse files
Files changed (1) hide show
  1. README.md +58 -22
README.md CHANGED
@@ -45,27 +45,27 @@ Converted for use with `llama.cpp` and compatible tools like OpenWebUI, LM Studi
45
 
46
  These variants were built from a **f16** base model to ensure consistency across quant levels.
47
 
48
- | Level | Quality | Speed | Size | Recommendation |
49
- |----------|--------------|----------|-----------|----------------|
50
- | Q2_K | Very Low | ⚑ Fastest | 3.28 GB | Only on severely memory-constrained systems (<6GB RAM). Avoid for reasoning. |
51
- | Q3_K_S | Low | ⚑ Fast | 3.77 GB | Minimal viability; basic completion only. Not recommended. |
52
- | Q3_K_M | Low-Medium | ⚑ Fast | 4.12 GB | Acceptable for simple chat on older systems. No complex logic. |
53
- | Q4_K_S | Medium | πŸš€ Fast | 4.8 GB | Good balance for low-end laptops or embedded platforms. |
54
- | Q4_K_M | βœ… Balanced | πŸš€ Fast | 5.85 GB | Best overall for general use on average hardware. Great speed/quality trade-off. |
55
- | Q5_K_S | High | 🐒 Medium | 5.72 GB | Better reasoning; slightly faster than Q5_K_M. Ideal for coding. |
56
- | Q5_K_M | βœ…βœ… High | 🐒 Medium | 5.85 GB | Top pick for deep interactions, logic, and tool use. Recommended for desktops. |
57
- | Q6_K | πŸ”₯ Near-FP16 | 🐌 Slow | 6.73 GB | Excellent fidelity; ideal for RAG, retrieval, and accuracy-critical tasks. |
58
- | Q8_0 | πŸ† Lossless* | 🐌 Slow | 8.71 GB | Maximum accuracy; best for research, benchmarking, or archival. |
59
-
60
- > πŸ’‘ **Recommendations by Use Case**
61
- >
62
- > - πŸ’» **Low-end CPU / Old Laptop**: `Q4_K_M` (best balance under pressure)
63
- > - πŸ–₯️ **Standard/Mid-tier Laptop (i5/i7/M1/M2)**: `Q5_K_M` (optimal quality)
64
- > - 🧠 **Reasoning, Coding, Math**: `Q5_K_M` or `Q6_K` (use thinking mode!)
65
- > - πŸ€– **Agent & Tool Integration**: `Q5_K_M` β€” handles JSON, function calls well
66
- > - πŸ” **RAG, Retrieval, Precision Tasks**: `Q6_K` or `Q8_0`
67
- > - πŸ“¦ **Storage-Constrained Devices**: `Q4_K_S` or `Q4_K_M`
68
- > - πŸ› οΈ **Development & Testing**: Test from `Q4_K_M` up to `Q8_0` to assess trade-offs
69
 
70
  ## Usage
71
 
@@ -73,10 +73,46 @@ Load this model using:
73
  - [OpenWebUI](https://openwebui.com) – self-hosted AI interface with RAG & tools
74
  - [LM Studio](https://lmstudio.ai) – desktop app with GPU support and chat templates
75
  - [GPT4All](https://gpt4all.io) – private, local AI chatbot (offline-first)
76
- - Or directly via \`llama.cpp\`
77
 
78
  Each quantized model includes its own `README.md` and shares a common `MODELFILE` for optimal configuration.
79
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
80
  ## Author
81
 
82
  πŸ‘€ Geoff Munn (@geoffmunn)
 
45
 
46
  These variants were built from a **f16** base model to ensure consistency across quant levels.
47
 
48
+ | Level | Speed | Size | Recommendation |
49
+ |-----------|-----------|-------------|---------------------------------------------------------------------------------------|
50
+ | Q2_K | ⚑ Fastest | 3.28 GB | Not recommended. Came first in the bat & ball question, no other appearances. |
51
+ | πŸ₯‰Q3_K_S | ⚑ Fast | 3.77 GB | πŸ₯‰ Came first and second in questions covering both ends of the temperature spectrum. |
52
+ | πŸ₯‡ Q3_K_M | ⚑ Fast | 4.12 GB | πŸ₯‡ **Best overall model.** Was a top 3 finisher for all questions except the haiku. |
53
+ | πŸ₯‰Q4_K_S | πŸš€ Fast | 4.8 GB | πŸ₯‰ Came first and second in questions covering both ends of the temperature spectrum. |
54
+ | Q4_K_M | πŸš€ Fast | 5.85 GB | Came first and second in questions covering high temperature questions. |
55
+ | πŸ₯ˆ Q5_K_S | 🐒 Medium | 5.72 GB | πŸ₯ˆ A good second place. Good for all query types. |
56
+ | Q5_K_M | 🐒 Medium | 5.85 GB | Not recommended, no appeareances in the top 3 for any question. |
57
+ | Q6_K | 🐌 Slow | 6.73 GB | Showed up in a few results, but not recommended. |
58
+ | Q8_0 | 🐌 Slow | 8.71 GB | Not recommended, Only one top 3 finish. |
59
+
60
+ ## Model anaysis and rankings
61
+
62
+ There are numerous good candidates - lots of different models showed up in the top 3 across all the quesionts. However, **Qwen3-8B-f16:Q5_K_M** was a finalist in all but one question so is the recommended model. **Qwen3-8B-f16:Q5_K_S** did nearly as well and is worth considering,
63
+
64
+ The 'hello' question is the first time that all models got it exactly right. All models in the 8B range did well and it's mainly a question of what one works best on your hardware.
65
+
66
+ You can read the results here: [Qwen3-8b-analysis.md](Qwen3-8b-analysis.md)
67
+
68
+ If you find this useful, please give the project a ❀️ like.
69
 
70
  ## Usage
71
 
 
73
  - [OpenWebUI](https://openwebui.com) – self-hosted AI interface with RAG & tools
74
  - [LM Studio](https://lmstudio.ai) – desktop app with GPU support and chat templates
75
  - [GPT4All](https://gpt4all.io) – private, local AI chatbot (offline-first)
76
+ - Or directly via `llama.cpp`
77
 
78
  Each quantized model includes its own `README.md` and shares a common `MODELFILE` for optimal configuration.
79
 
80
+ Importing directly into Ollama should work, but you might encounter this error: `Error: invalid character '<' looking for beginning of value`.
81
+ In this case try these steps:
82
+
83
+ 1. `wget https://huggingface.co/geoffmunn/Qwen3-4B/resolve/main/Qwen3-4B-f16%3AQ3_K_M.gguf` (replace the quantised version with the one you want)
84
+ 2. `nano Modelfile` and enter these details (again, replacing Q3_K_M with the version you want):
85
+ ```text
86
+ FROM ./Qwen3-4B-f16:Q3_K_M.gguf
87
+
88
+ # Chat template using ChatML (used by Qwen)
89
+ SYSTEM You are a helpful assistant
90
+
91
+ TEMPLATE "{{ if .System }}<|im_start|>system
92
+ {{ .System }}<|im_end|>{{ end }}<|im_start|>user
93
+ {{ .Prompt }}<|im_end|>
94
+ <|im_start|>assistant
95
+ "
96
+ PARAMETER stop <|im_start|>
97
+ PARAMETER stop <|im_end|>
98
+
99
+ # Default sampling
100
+ PARAMETER temperature 0.6
101
+ PARAMETER top_p 0.95
102
+ PARAMETER top_k 20
103
+ PARAMETER min_p 0.0
104
+ PARAMETER repeat_penalty 1.1
105
+ PARAMETER num_ctx 4096
106
+ ```
107
+
108
+ The `num_ctx` value has been dropped to increase speed significantly.
109
+
110
+ 3. Then run this command: `ollama create Qwen3-4B-f16:Q3_K_M -f Modelfile`
111
+
112
+ You will now see "Qwen3-4B-f16:Q3_K_M" in your Ollama model list.
113
+
114
+ These import steps are also useful if you want to customise the default parameters or system prompt.
115
+
116
  ## Author
117
 
118
  πŸ‘€ Geoff Munn (@geoffmunn)