Transformers
GGUF
English
llama-cpp
gguf-my-repo
conversational
Triangle104 commited on
Commit
b9364ae
·
verified ·
1 Parent(s): 4465f54

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +57 -0
README.md CHANGED
@@ -17,6 +17,63 @@ tags:
17
  This model was converted to GGUF format from [`allura-org/GLM4-9B-Neon-v2`](https://huggingface.co/allura-org/GLM4-9B-Neon-v2) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
18
  Refer to the [original model card](https://huggingface.co/allura-org/GLM4-9B-Neon-v2) for more details on the model.
19
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
20
  ## Use with llama.cpp
21
  Install llama.cpp through brew (works on Mac and Linux)
22
 
 
17
  This model was converted to GGUF format from [`allura-org/GLM4-9B-Neon-v2`](https://huggingface.co/allura-org/GLM4-9B-Neon-v2) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
18
  Refer to the [original model card](https://huggingface.co/allura-org/GLM4-9B-Neon-v2) for more details on the model.
19
 
20
+ ---
21
+ RP finetune of GLM-4-9B-0414. Feels nice, lots of personality, if bit
22
+ quirky sometimes. Nice prose, not too Claude-ish or Gemini-ish. Doesn't
23
+ seem to like too long system prompts or charcards though. Seems to like
24
+ JSON formatted system prompts.
25
+
26
+ Model was trained by Auri.
27
+
28
+ Training notes
29
+ -
30
+ Model was trained on a dataset consisting of 77M tokens of synthetic
31
+ RP and short story gen data for one epoch. Training took around 11 hours
32
+ on 2xRTX 3090 workstation, generously provided by OwenArli.
33
+ Went with some sane defaults for training config, QLoRA plus CCE for a
34
+ nice chunk of memory usage optimization, 16k fit on 48GB nicely with
35
+ some room to spare. I seem to have a problem with Eval/Loss being
36
+ broken, not sure why, otherwise it trained smoothly.
37
+
38
+ Huge thanks to ArliAI for providing compute and collaborating on this run!
39
+
40
+ Format
41
+ -
42
+ Model responds to GLM4 instruct formatting, exactly like it's base
43
+ model. Backends struggle to add BOS token automatically, so you'll need
44
+ to do it yourself. Jinja template should work for chat completions.
45
+
46
+ [gMASK]<sop><|system|>
47
+
48
+ {system_prompt}<|user|>
49
+
50
+ {prompt}<|assistant|>
51
+
52
+ Recommended Samplers
53
+ -
54
+ Nothing special, just classics.
55
+
56
+ Temperature - 1
57
+
58
+ Min-P - 0.1
59
+
60
+ Repetition Penalty - 1.03
61
+
62
+ Example master import for SillyTavern (using Shingane-v1 system prompt by Steelskull)
63
+
64
+ Running on KoboldCPP and other backends
65
+ -
66
+ To run GGUFs correctly, you need the most recent version of KoboldCPP, and to pass --overridekv glm4.rope.dimension_count=int:64 to the CLI command or put glm4.rope.dimension_count=int:64 into overridekv box in the GUI (under the Tokens tab at the very bottom).
67
+
68
+ Thanks to DaringDuck and tofumagnate for info how to apply this fix.
69
+
70
+ To run this model on vLLM, you'll need to build it from source from the git repo, full GLM4 support hasn't reached release yet.
71
+
72
+ ExLLaMAv2 and v3 based backends, such as TabbyAPI should support the model out of the box.
73
+
74
+ Latest versions of llama.cpp server should also allow running GGUFs out-of-the-box.
75
+
76
+ ---
77
  ## Use with llama.cpp
78
  Install llama.cpp through brew (works on Mac and Linux)
79