Instructions to use Entropicengine/Pinecone-sage-24b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Entropicengine/Pinecone-sage-24b with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Entropicengine/Pinecone-sage-24b") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("Entropicengine/Pinecone-sage-24b") model = AutoModelForCausalLM.from_pretrained("Entropicengine/Pinecone-sage-24b") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Inference
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use Entropicengine/Pinecone-sage-24b with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Entropicengine/Pinecone-sage-24b" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Entropicengine/Pinecone-sage-24b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/Entropicengine/Pinecone-sage-24b
- SGLang
How to use Entropicengine/Pinecone-sage-24b with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Entropicengine/Pinecone-sage-24b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Entropicengine/Pinecone-sage-24b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Entropicengine/Pinecone-sage-24b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Entropicengine/Pinecone-sage-24b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use Entropicengine/Pinecone-sage-24b with Docker Model Runner:
docker model run hf.co/Entropicengine/Pinecone-sage-24b
Model Goal request.
Hello, I've been enjoying your previous 2 24b models, Trifecta-Max and Dark-Triad, and I plan on giving this a spin too once a q8 gguf is available (I've put in a request to mradermacher.)
I was wandering, would it be possible to get a bit of insight into your idea behind each model? Just a line or two about the intent, to be included in the model card? It's just, without downloading and using these models for an extended time, there's no real way to see what the intended difference is between say, Dark-Triad and this model?
No worries if you're not interested, I just figured it couldn't hurt to ask.
Hello, really appreciate the kind words and glad you've been enjoying the previous models! To be honest, right now I’m mostly experimenting, trying out different merging techniques and parameter setups using the models that are already pretty good. Models like Trifecta-Max, Dark-Triad, and now Pinecone-Sage are quite close in general use, and the differences can be subtle. That said, I’m planning to move into more advanced stuff like SFT, DPO, and QLoRA soon, and as that happens, the intent behind each model will get clearer. For now, it’s a bit of creative exploration and learning through iteration.