Instructions to use Entropicengine/Pinecone-sage-24b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Entropicengine/Pinecone-sage-24b with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Entropicengine/Pinecone-sage-24b")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("Entropicengine/Pinecone-sage-24b")
model = AutoModelForCausalLM.from_pretrained("Entropicengine/Pinecone-sage-24b")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Inference
Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use Entropicengine/Pinecone-sage-24b with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Entropicengine/Pinecone-sage-24b"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Entropicengine/Pinecone-sage-24b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Entropicengine/Pinecone-sage-24b

SGLang

How to use Entropicengine/Pinecone-sage-24b with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Entropicengine/Pinecone-sage-24b" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Entropicengine/Pinecone-sage-24b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Entropicengine/Pinecone-sage-24b" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Entropicengine/Pinecone-sage-24b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use Entropicengine/Pinecone-sage-24b with Docker Model Runner:
```
docker model run hf.co/Entropicengine/Pinecone-sage-24b
```

Model Goal request.

by MrParivir - opened Jun 30, 2025

Discussion

MrParivir

Jun 30, 2025

Hello, I've been enjoying your previous 2 24b models, Trifecta-Max and Dark-Triad, and I plan on giving this a spin too once a q8 gguf is available (I've put in a request to mradermacher.)
I was wandering, would it be possible to get a bit of insight into your idea behind each model? Just a line or two about the intent, to be included in the model card? It's just, without downloading and using these models for an extended time, there's no real way to see what the intended difference is between say, Dark-Triad and this model?
No worries if you're not interested, I just figured it couldn't hurt to ask.

Entropicengine

Owner Jun 30, 2025

Hello, really appreciate the kind words and glad you've been enjoying the previous models! To be honest, right now I’m mostly experimenting, trying out different merging techniques and parameter setups using the models that are already pretty good. Models like Trifecta-Max, Dark-Triad, and now Pinecone-Sage are quite close in general use, and the differences can be subtle. That said, I’m planning to move into more advanced stuff like SFT, DPO, and QLoRA soon, and as that happens, the intent behind each model will get clearer. For now, it’s a bit of creative exploration and learning through iteration.

Entropicengine changed discussion status to closed Jul 1, 2025

Entropicengine changed discussion status to open Jul 1, 2025

Entropicengine changed discussion status to closed Jul 1, 2025

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment