How can I use it in local?
#6
by
						
Day1Kim
	
							
						- opened
							
					
Use HuggingFacePipeline instead of HuggingFaceEndpoint for local models
from langchain_huggingface import HuggingFacePipeline
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
Load the model and tokenizer from local path
model_path = "unsloth/Llama-3.3-70B-Instruct-bnb-4bit"
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(
    model_path,
    torch_dtype=torch.bfloat16,
    quantization_config=quantization_config,
    device_map="auto",
    trust_remote_code=True
)
Can I use this unsloth/Llama-4-Scout-17B-16E-Instruct-GGUF same as unsloth/llama3.3?