Can not run it on llamacpp + Mac m4

by danielfl - opened Aug 25

Aug 25

Just tried this model on a Macbook Air M4 with 24GB of RAM:

./llama-server --port 8888 -ngl 99 -hf unsloth/Seed-OSS-36B-Instruct-GGUF:Q2_K_XL

the model seems to load but when I send a command using the interface I get nothing in return.

{# Copyright 2025-present Unsloth. Apache 2.0 License. #}, example_format: 'seed:bossystem
You are a helpful assistantseed:eos seed:bosuser
Helloseed:eos seed:bosassistant
Hi thereseed:eos seed:bosuser
How are you?seed:eos seed:bosassistant
'
main: server is listening on http://127.0.0.1:8888 - starting the main loop
srv update_slots: all slots are idle
srv params_from_: Chat format: Content-only
slot launch_slot_: id 0 | task 0 | processing task
slot update_slots: id 0 | task 0 | new prompt, n_ctx_slot = 4096, n_keep = 0, n_prompt_tokens = 64
slot update_slots: id 0 | task 0 | kv cache rm [0, end)
slot update_slots: id 0 | task 0 | prompt processing progress, n_past = 64, n_tokens = 64, progress = 1.000000
slot update_slots: id 0 | task 0 | prompt done, n_past = 64, n_tokens = 64

CHAT:

main: interactive mode on.
sampler seed: 2428926723
sampler params:
repeat_last_n = 64, repeat_penalty = 1,000, frequency_penalty = 0,000, presence_penalty = 0,000
dry_multiplier = 0,000, dry_base = 1,750, dry_allowed_length = 2, dry_penalty_last_n = 4096
top_k = 40, top_p = 0,950, min_p = 0,050, xtc_probability = 0,000, xtc_threshold = 0,100, typical_p = 1,000, top_n_sigma = -1,000, temp = 0,800
mirostat = 0, mirostat_lr = 0,100, mirostat_ent = 5,000
sampler chain: logits -> logit-bias -> penalties -> dry -> top-n-sigma -> top-k -> typical -> top-p -> min-p -> xtc -> temp-ext -> dist
generate: n_ctx = 4096, n_batch = 2048, n_predict = -1, n_keep = 0

== Running in interactive mode. ==

Press Ctrl+C to interject at any time.
Press Return to return control to the AI.
To return control without starting a new line, end your input with '/'.
If you want to submit another line, end your input with ''.
Not using system message. To change it, set a different value via -sys PROMPT

hello

verbose:

embd_inp.size(): 0, n_consumed: 0
waiting for user input

wazzup
buffer: 'wazzup'
formatted: 'seed:bosuser
wazzupseed:eos seed:bosassistant
'
input tokens: [ 'seed:bos':0, 'user':4169, '
':326, 'w':214, 'azz':12543, 'up':1496, 'seed:eos':2, 'seed:bos':0, 'ass':712, 'istant':14300, '
':326 ]
n_remain: -12
embd_inp.size(): 11, n_consumed: 0
eval: [ 'seed:bos':0, 'user':4169, '
':326, 'w':214, 'azz':12543, 'up':1496, 'seed:eos':2, 'seed:bos':0, 'ass':712, 'istant':14300, '
':326 ]
n_past = 11
n_remain: -13
eval: [ '<[PLHD31_never_used]>':31 ]
n_past = 12
n_remain: -14
eval: [ '<[PLHD31_never_used]>':31 ]
n_past = 13
n_remain: -15
eval: [ '<[PLHD31_never_used]>':31 ]
n_past = 14
n_remain: -16
eval: [ '<[PLHD31_never_used]>':31 ]
n_past = 15
n_remain: -17

ox-ox

Aug 27

past it to chat gpt it will save your time and our time too.

danielfl

Aug 27

fair enough.

tried grok4 before asking, but gpt5 really found out which flags to use.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment