Instructions to use google/gemma-2-2b-it with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use google/gemma-2-2b-it with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="google/gemma-2-2b-it") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("google/gemma-2-2b-it") model = AutoModelForCausalLM.from_pretrained("google/gemma-2-2b-it") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Inference
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use google/gemma-2-2b-it with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "google/gemma-2-2b-it" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "google/gemma-2-2b-it", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/google/gemma-2-2b-it
- SGLang
How to use google/gemma-2-2b-it with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "google/gemma-2-2b-it" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "google/gemma-2-2b-it", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "google/gemma-2-2b-it" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "google/gemma-2-2b-it", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use google/gemma-2-2b-it with Docker Model Runner:
docker model run hf.co/google/gemma-2-2b-it
access
1
#78 opened 13 days ago
by
alparslankoc
Install & run google/gemma-2-2b-it easily using llmpm
1
#77 opened 2 months ago
by
sarthak-saxena
Request: DOI
1
#76 opened 3 months ago
by
steal44
Access , Can you give us access to fine tune your module?
1
#74 opened 6 months ago
by
alikayh
Recommended Settings
3
#73 opened 7 months ago
by
Lef00
model_down
#71 opened 9 months ago
by
rahayadav
Request: DOI
1
#70 opened 10 months ago
by
metfalan
Local
#69 opened 11 months ago
by
Mingmasherpa
text-generation-with-past
1
#68 opened about 1 year ago
by
prudhvikumar
CUDA error
1
#66 opened about 1 year ago
by
katebor
how to solve this error
11
#64 opened about 1 year ago
by
vinayakarsh
Request: DOI
1
#63 opened over 1 year ago
by
VibhaB
Request: DOI
1
#62 opened over 1 year ago
by
Vibha10
TypeError: GemmaModel.forward() got an unexpected keyword argument 'num_items_in_batch'
1
#61 opened over 1 year ago
by
smkhant
TypeError: 'HybridCache' object is not iterable
5
#60 opened over 1 year ago
by
kingcreatorpulga
GGUF version
3
#59 opened over 1 year ago
by
appsforbd
TypeError: BackendCompilerFailed.__init__() missing 1 required positional argument: 'inner_exception'
3
#58 opened over 1 year ago
by
pofce
TypeError: 'NoneType' object is not callable
1
#57 opened over 1 year ago
by
pofce
unable to access model despite clicking on prompted button to gain access.
6
#54 opened over 1 year ago
by
Khyatikhandelwal20
Extract output
1
#49 opened over 1 year ago
by
tkaintura
Update README.md
#48 opened over 1 year ago
by
ARJ544
torch.compile error
3
#47 opened over 1 year ago
by
LD-inform
Update README.md
#46 opened over 1 year ago
by
zhongyue030518
Update chat tempalte to allow the use of the assistant_mask
1
#45 opened over 1 year ago
by
Butanium
Slow loading??
1
#44 opened over 1 year ago
by
Cigsdev
test
2
#43 opened over 1 year ago
by
mehrez1110
Unable to access Gemma-2-2b-it despite having access to the repo and logging in
3
#42 opened over 1 year ago
by
Kirrmu
Using Axolotl question
1
#41 opened over 1 year ago
by
mrheinen
Problem with 'google/gemma-2-2b-it''s API for Chat completion
4
#40 opened over 1 year ago
by
adelamare-blockchain
update the link to ARC (AI2 Reasoning Challenge, not Abstraction and Reasoning Corpus)
#39 opened over 1 year ago
by
tipsybohemian
GPU training makes loss=nan
👀 1
5
#37 opened over 1 year ago
by
hidonbush
running it on cpu using pretrained
1
#35 opened over 1 year ago
by
himanshuyadav62
dimension error
1
#32 opened almost 2 years ago
by
Yingshu
Cannot access Gemma even after approval
5
#30 opened almost 2 years ago
by
Daaku-C5
Conversion to onnx
3
#29 opened almost 2 years ago
by
Parma7876
Running Gemma-2b with Torch 2.0.1?
1
#28 opened almost 2 years ago
by
insdaguirre
Request: DOI
1
#27 opened almost 2 years ago
by
taloot
version of Torch
1
#24 opened almost 2 years ago
by
xxbadarxx
Finetuning
3
#22 opened almost 2 years ago
by
omarabb315
Can i get the API for this so I can make an app with it?
3
#21 opened almost 2 years ago
by
riggscodes
RuntimeError: probability tensor contains either `inf`, `nan` or element < 0
3
#18 opened almost 2 years ago
by
lcahill
My quants and the "silly" experiment.
2
#13 opened almost 2 years ago
by
ZeroWw