Qwen3-Embedding-0.6B -> how do I pass instructions to it in llama.cpp?

#42
by bp50 - opened

https://huggingface.co/Qwen/Qwen3-Embedding-0.6B - this embedding model.

Questions:

  1. how do I pass instructions to it using llama.cpp embedding api?
    Is it similar to TEI instructions? would I use the same instruction format with llama.cpp?:
    https://huggingface.co/Qwen/Qwen3-Embedding-0.6B#text-embeddings-inference-tei-usage
curl http://localhost:8080/embed \
    -X POST \
    -d '{"inputs": ["Instruct: Given a web search query, retrieve relevant passages that answer the query\nQuery: What is the capital of China?", "Instruct: Given a web search query, retrieve relevant passages that answer the query\nQuery: Explain gravity"]}' \
    -H "Content-Type: application/json"
  1. is "Instruct: X \n Query: X" format important?

  2. Can I just use freestyle instructions format like this: ?
    Generate document embeddings like this:

"Represent this food blog post title for semantic search: Easy Homemade Pizza Recipe"

Generate for user searches like this:

"Encode this search query for retrieving food blog posts: quick pizza dough"

1️⃣ Start llama.cpp server with embeddings

llama-server
--model /path/Qwen3-Embedding-0.6B-*.gguf
--embedding
--pooling last
--port 8080

2️⃣ Get embeddings for search queries (use instruction format)

curl http://localhost:8080/v1/embeddings
-H "Content-Type: application/json"
-d '{
"input": [
"Instruct: Given a web search query, retrieve relevant passages that answer the query\nQuery: What is the capital of China?",
"Instruct: Given a web search query, retrieve relevant passages that answer the query\nQuery: Explain gravity"
]
}'

3️⃣ Get embeddings for documents (no instruction)

curl http://localhost:8080/v1/embeddings
-H "Content-Type: application/json"
-d '{
"input": [
"The capital of China is Beijing.",
"Gravity is a force that attracts two bodies towards each other."
]
}'

✅ Use “Instruct: … \nQuery: …” format for queries — that’s how Qwen was trained.
✅ No instruction needed for documents.
✅ Works the same way as TEI — just send strings to /v1/embeddings.

Sign up or log in to comment