How can I use it in local?

by Day1Kim - opened May 22

May 22

Use HuggingFacePipeline instead of HuggingFaceEndpoint for local models

from langchain_huggingface import HuggingFacePipeline
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

Load the model and tokenizer from local path

model_path = "unsloth/Llama-3.3-70B-Instruct-bnb-4bit"
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(
model_path,
torch_dtype=torch.bfloat16,
quantization_config=quantization_config,
device_map="auto",
trust_remote_code=True
)

Can I use this unsloth/Llama-4-Scout-17B-16E-Instruct-GGUF same as unsloth/llama3.3?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment