Can not run it on llamacpp + Mac m4

#2
by danielfl - opened

Just tried this model on a Macbook Air M4 with 24GB of RAM:

./llama-server --port 8888 -ngl 99 -hf unsloth/Seed-OSS-36B-Instruct-GGUF:Q2_K_XL

the model seems to load but when I send a command using the interface I get nothing in return.

{# Copyright 2025-present Unsloth. Apache 2.0 License. #}, example_format: 'seed:bossystem
You are a helpful assistantseed:eosseed:bosuser
Helloseed:eosseed:bosassistant
Hi thereseed:eosseed:bosuser
How are you?seed:eosseed:bosassistant
'
main: server is listening on http://127.0.0.1:8888 - starting the main loop
srv update_slots: all slots are idle
srv params_from_: Chat format: Content-only
slot launch_slot_: id 0 | task 0 | processing task
slot update_slots: id 0 | task 0 | new prompt, n_ctx_slot = 4096, n_keep = 0, n_prompt_tokens = 64
slot update_slots: id 0 | task 0 | kv cache rm [0, end)
slot update_slots: id 0 | task 0 | prompt processing progress, n_past = 64, n_tokens = 64, progress = 1.000000
slot update_slots: id 0 | task 0 | prompt done, n_past = 64, n_tokens = 64

CHAT:

system_info: n_threads = 4 (n_threads_batch = 4) / 10 | Metal : EMBED_LIBRARY = 1 | BF16 = 1 | CPU : NEON = 1 | ARM_FMA = 1 | FP16_VA = 1 | DOTPROD = 1 | LLAMAFILE = 1 | ACCELERATE = 1 | REPACK = 1 |

main: interactive mode on.
sampler seed: 2428926723
sampler params:
repeat_last_n = 64, repeat_penalty = 1,000, frequency_penalty = 0,000, presence_penalty = 0,000
dry_multiplier = 0,000, dry_base = 1,750, dry_allowed_length = 2, dry_penalty_last_n = 4096
top_k = 40, top_p = 0,950, min_p = 0,050, xtc_probability = 0,000, xtc_threshold = 0,100, typical_p = 1,000, top_n_sigma = -1,000, temp = 0,800
mirostat = 0, mirostat_lr = 0,100, mirostat_ent = 5,000
sampler chain: logits -> logit-bias -> penalties -> dry -> top-n-sigma -> top-k -> typical -> top-p -> min-p -> xtc -> temp-ext -> dist
generate: n_ctx = 4096, n_batch = 2048, n_predict = -1, n_keep = 0

== Running in interactive mode. ==

  • Press Ctrl+C to interject at any time.
  • Press Return to return control to the AI.
  • To return control without starting a new line, end your input with '/'.
  • If you want to submit another line, end your input with ''.
  • Not using system message. To change it, set a different value via -sys PROMPT

hello

verbose:

embd_inp.size(): 0, n_consumed: 0
waiting for user input

wazzup
buffer: 'wazzup'
formatted: 'seed:bosuser
wazzupseed:eosseed:bosassistant
'
input tokens: [ 'seed:bos':0, 'user':4169, '
':326, 'w':214, 'azz':12543, 'up':1496, 'seed:eos':2, 'seed:bos':0, 'ass':712, 'istant':14300, '
':326 ]
n_remain: -12
embd_inp.size(): 11, n_consumed: 0
eval: [ 'seed:bos':0, 'user':4169, '
':326, 'w':214, 'azz':12543, 'up':1496, 'seed:eos':2, 'seed:bos':0, 'ass':712, 'istant':14300, '
':326 ]
n_past = 11
n_remain: -13
eval: [ '<[PLHD31_never_used]>':31 ]
n_past = 12
n_remain: -14
eval: [ '<[PLHD31_never_used]>':31 ]
n_past = 13
n_remain: -15
eval: [ '<[PLHD31_never_used]>':31 ]
n_past = 14
n_remain: -16
eval: [ '<[PLHD31_never_used]>':31 ]
n_past = 15
n_remain: -17

past it to chat gpt it will save your time and our time too.

fair enough.

tried grok4 before asking, but gpt5 really found out which flags to use.

Sign up or log in to comment