Instructions to use bartowski/gemma-2-9b-it-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use bartowski/gemma-2-9b-it-GGUF with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="bartowski/gemma-2-9b-it-GGUF")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("bartowski/gemma-2-9b-it-GGUF", dtype="auto")

llama-cpp-python

How to use bartowski/gemma-2-9b-it-GGUF with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="bartowski/gemma-2-9b-it-GGUF",
	filename="gemma-2-9b-it-IQ2_M.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use bartowski/gemma-2-9b-it-GGUF with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf bartowski/gemma-2-9b-it-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf bartowski/gemma-2-9b-it-GGUF:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf bartowski/gemma-2-9b-it-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf bartowski/gemma-2-9b-it-GGUF:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf bartowski/gemma-2-9b-it-GGUF:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf bartowski/gemma-2-9b-it-GGUF:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf bartowski/gemma-2-9b-it-GGUF:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf bartowski/gemma-2-9b-it-GGUF:Q4_K_M

Use Docker

docker model run hf.co/bartowski/gemma-2-9b-it-GGUF:Q4_K_M

LM Studio
Jan

vLLM

How to use bartowski/gemma-2-9b-it-GGUF with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "bartowski/gemma-2-9b-it-GGUF"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "bartowski/gemma-2-9b-it-GGUF",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/bartowski/gemma-2-9b-it-GGUF:Q4_K_M

SGLang

How to use bartowski/gemma-2-9b-it-GGUF with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "bartowski/gemma-2-9b-it-GGUF" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "bartowski/gemma-2-9b-it-GGUF",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "bartowski/gemma-2-9b-it-GGUF" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "bartowski/gemma-2-9b-it-GGUF",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Ollama
How to use bartowski/gemma-2-9b-it-GGUF with Ollama:
```
ollama run hf.co/bartowski/gemma-2-9b-it-GGUF:Q4_K_M
```

Unsloth Studio

How to use bartowski/gemma-2-9b-it-GGUF with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for bartowski/gemma-2-9b-it-GGUF to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for bartowski/gemma-2-9b-it-GGUF to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for bartowski/gemma-2-9b-it-GGUF to start chatting

Docker Model Runner
How to use bartowski/gemma-2-9b-it-GGUF with Docker Model Runner:
```
docker model run hf.co/bartowski/gemma-2-9b-it-GGUF:Q4_K_M
```

Lemonade

How to use bartowski/gemma-2-9b-it-GGUF with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull bartowski/gemma-2-9b-it-GGUF:Q4_K_M

Run and chat with the model

lemonade run user.gemma-2-9b-it-GGUF-Q4_K_M

List all available models

lemonade list

bartowski commited on Jul 2, 2024

Commit

87a3bf8

verified ·

1 Parent(s): 147d9dc

Llamacpp quants

Browse files

Files changed (26) hide show

README.md +1 -1
gemma-2-9b-it-IQ2_M.gguf +1 -1
gemma-2-9b-it-IQ2_S.gguf +1 -1
gemma-2-9b-it-IQ2_XS.gguf +1 -1
gemma-2-9b-it-IQ3_M.gguf +1 -1
gemma-2-9b-it-IQ3_XS.gguf +1 -1
gemma-2-9b-it-IQ3_XXS.gguf +1 -1
gemma-2-9b-it-IQ4_XS.gguf +1 -1
gemma-2-9b-it-Q2_K.gguf +1 -1
gemma-2-9b-it-Q2_K_L.gguf +1 -1
gemma-2-9b-it-Q3_K_L.gguf +1 -1
gemma-2-9b-it-Q3_K_M.gguf +1 -1
gemma-2-9b-it-Q3_K_S.gguf +1 -1
gemma-2-9b-it-Q3_K_XL.gguf +1 -1
gemma-2-9b-it-Q4_K_L.gguf +1 -1
gemma-2-9b-it-Q4_K_M.gguf +1 -1
gemma-2-9b-it-Q4_K_S.gguf +1 -1
gemma-2-9b-it-Q5_K_L.gguf +1 -1
gemma-2-9b-it-Q5_K_M.gguf +1 -1
gemma-2-9b-it-Q5_K_S.gguf +1 -1
gemma-2-9b-it-Q6_K.gguf +1 -1
gemma-2-9b-it-Q6_K_L.gguf +1 -1
gemma-2-9b-it-Q8_0.gguf +1 -1
gemma-2-9b-it-Q8_0_L.gguf +1 -1
gemma-2-9b-it-f32.gguf +1 -1
gemma-2-9b-it.imatrix +1 -1

README.md CHANGED Viewed

@@ -15,7 +15,7 @@ quantized_by: bartowski
 ## Llamacpp imatrix Quantizations of gemma-2-9b-it
-Using <a href="https://github.com/ggerganov/llama.cpp/">llama.cpp</a> release <a href="https://github.com/ggerganov/llama.cpp/releases/tag/b3274">b3274</a> for quantization.
 Original model: https://huggingface.co/google/gemma-2-9b-it

 ## Llamacpp imatrix Quantizations of gemma-2-9b-it
+Using <a href="https://github.com/ggerganov/llama.cpp/">llama.cpp</a> release <a href="https://github.com/ggerganov/llama.cpp/releases/tag/b3277">b3277</a> for quantization.
 Original model: https://huggingface.co/google/gemma-2-9b-it

gemma-2-9b-it-IQ2_M.gguf CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:3f9886a5f8d9d8e2c1c3d5a3e5da9f96e0ce6f4fdc0480a960c126fa5f18fb36
 size 3434669952

 version https://git-lfs.github.com/spec/v1
+oid sha256:db3b6ab5b60cd33b8a864bb94bf4ac848563544e3a2dc45e1707bcd37f57be27
 size 3434669952

gemma-2-9b-it-IQ2_S.gguf CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:13cd744736288aa0285b14870800389831c363ba044593d13705a47d7d1c8f88
 size 3211487104

 version https://git-lfs.github.com/spec/v1
+oid sha256:0725ca79e7415d9a23d0cdcd3f2376c1b989777dd07cd8c088af23bb4fc1fa28
 size 3211487104

gemma-2-9b-it-IQ2_XS.gguf CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:fbdeb1d1423204096b4dc28e67d5d7e2ec11b8913b51c9ec02eb01b65b56059a
 size 3067381632

 version https://git-lfs.github.com/spec/v1
+oid sha256:010fb98c74fd5b413008adb14f62a0a6886883936dbc5693f45fd389b54bfd40
 size 3067381632

gemma-2-9b-it-IQ3_M.gguf CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:58211a13b5bcb5982f6d43d6fb2e91608330a6da15f0afc4801c52198c2a3d90
 size 4494616448

 version https://git-lfs.github.com/spec/v1
+oid sha256:92a9f2ff1ed14de6a6ae30f062aa64c2299f4d7eefaf5afedd16193128b886c7
 size 4494616448

gemma-2-9b-it-IQ3_XS.gguf CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:90e57856b6242c7c0d4d58c2d299055e6d15db0248394e44b63429319b1d7c2c
 size 4144990080

 version https://git-lfs.github.com/spec/v1
+oid sha256:a12f6a1bdff35de65550312309c08a1e4bd46403cc7a93975fa7a5b2812ad546
 size 4144990080

gemma-2-9b-it-IQ3_XXS.gguf CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:ad2b0263ebff9cb8d22986d7a1d462ced0c83a95636268282087257504f91f9f
 size 3796739968

 version https://git-lfs.github.com/spec/v1
+oid sha256:b749312e171137755d7140719de2768805fc93b183f248828c4fa1373b1287e6
 size 3796739968

gemma-2-9b-it-IQ4_XS.gguf CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:5db6f9a4f4c63b2d526106ecf3e83892fe4451d85a4fec5066e5136eaf944b76
 size 5183031168

 version https://git-lfs.github.com/spec/v1
+oid sha256:c2a4737b6ffa7799fc722c85d1c9a5c49f9c7e9eeb759dfe81ff36682a79a4dd
 size 5183031168

gemma-2-9b-it-Q2_K.gguf CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:e4fb2ad3d575c3c54f6ba4d8eb4bb0ba4864b37b1e561139bf89068f69439a81
 size 3805398912

 version https://git-lfs.github.com/spec/v1
+oid sha256:a2de123f64585b1f0f937ea0c7c3f5be2127cd799d7bef1ca4094df75eff2588
 size 3805398912

gemma-2-9b-it-Q2_K_L.gguf CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:7723971dbbfe877c397732816422b2433cb5fff3425f43b4cb114860edff5737
 size 4887766912

 version https://git-lfs.github.com/spec/v1
+oid sha256:bb12771d34313003fbffaea4a12a96b5f1f3a61fcb7bfbd9b2b8bb8ebecce72c
 size 4887766912

gemma-2-9b-it-Q3_K_L.gguf CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:9aeb70915c136c0f4a77055198629d958d87ba771589364c0bdea126c1584360
 size 5132453760

 version https://git-lfs.github.com/spec/v1
+oid sha256:ad29f86fa78d235ff9a655e5d2d2c3f883456b557f6ab454d5b73c16a68cca80
 size 5132453760

gemma-2-9b-it-Q3_K_M.gguf CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:5e824f6f0c9cde71d27e6ca2893c26caff96482adb3d3b48560bc8d9dd84a6d1
 size 4761782144

 version https://git-lfs.github.com/spec/v1
+oid sha256:daf6b3cf23e1e5c31f9d7e58b96aa9311331e1ececbaa8fdd89b2390764950d1
 size 4761782144

gemma-2-9b-it-Q3_K_S.gguf CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:06cc0f8133631e4ba203214cbca28754646e4af4bfc89d6cb6503259e5dac5ef
 size 4337665920

 version https://git-lfs.github.com/spec/v1
+oid sha256:03e8889ca0cc5460433d79d1bca616da0e031a79960f6b77beb77cfba0a4faa8
 size 4337665920

gemma-2-9b-it-Q3_K_XL.gguf CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:466fed80bc4a3ede2a1eaedfa4eae1fd2c74229f09fb7debc0e18df3e8a99f73
 size 6214821760

 version https://git-lfs.github.com/spec/v1
+oid sha256:d68d8ef9a35e8e46185b4415454aa06c3e1911e5d9da38278e563083eed5f9a9
 size 6214821760

gemma-2-9b-it-Q4_K_L.gguf CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:e88ec9e7009e4448231855119efd408e8e296b7ff2ee2fb8893fe530b49acc17
 size 6843426688

 version https://git-lfs.github.com/spec/v1
+oid sha256:845733875d16e1e1d32703d7fa34c8f10afa0f59af0134499e0d66e479d36c30
 size 6843426688

gemma-2-9b-it-Q4_K_M.gguf CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:69c8e8f0cbd77d37c2861f72e6c11ae33a0da5f5f097f93e7ba90562269c8248
 size 5761058688

 version https://git-lfs.github.com/spec/v1
+oid sha256:05390244866abc0e7108a2b1e3db07b82df3cd82f006256a75fc21137054151f
 size 5761058688

gemma-2-9b-it-Q4_K_S.gguf CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:9af9c9340359c12b28a15fbb6295fc44cbe45e72cd147b39a459190697dafb35
 size 5478926208

 version https://git-lfs.github.com/spec/v1
+oid sha256:994bf7d6ee459ec3a277ce5a2c5d2aa532a7260cab85e11d03e005add2c6e3a3
 size 5478926208

gemma-2-9b-it-Q5_K_L.gguf CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:65eb6343a9a249ee63a9feb05bfc7949107e9e90a2bab03b61be5d968cc397e1
 size 7729735552

 version https://git-lfs.github.com/spec/v1
+oid sha256:dd364f54b7e37823fa8cbb857cfeb2091a9796d2ebbf1c06255040d5e549c851
 size 7729735552

gemma-2-9b-it-Q5_K_M.gguf CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:69acf0e31f3e7c49771af9d443f3d98eb1e0cacaed473b523bd96ddb425cb502
 size 6647367552

 version https://git-lfs.github.com/spec/v1
+oid sha256:a3da50620fae5b96b20c6a1dbce013c5f007a0926521bad2e64cc15db3044e51
 size 6647367552

gemma-2-9b-it-Q5_K_S.gguf CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:c8e8ce55de8f7db08a9de158c3448845b077fb5eb417d4991ed903645564a0cc
 size 6483593088

 version https://git-lfs.github.com/spec/v1
+oid sha256:f4694dc90b24f8b177b3589c69cee9e17c838beb42f01173dbde400e28f9b07a
 size 6483593088

gemma-2-9b-it-Q6_K.gguf CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:19d6c53fd0a2b3ad0f550ec880f39aa2f1904e377456e84cff735c594e91b169
 size 7589070720

 version https://git-lfs.github.com/spec/v1
+oid sha256:936968f307fa9d705ab050b46a3d3c5a121ef22911d0941ffcd1828750d56244
 size 7589070720

gemma-2-9b-it-Q6_K_L.gguf CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:0c044c1eff0fcaa4b45f4414c9317e56469492932ef20667924446e464312c8d
 size 8671438720

 version https://git-lfs.github.com/spec/v1
+oid sha256:79017a2d248eacdd5fef530ceda921077564716c2d42a864832c8a21ac58a5cb
 size 8671438720

gemma-2-9b-it-Q8_0.gguf CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:924a339cd17640b1c54fb648835bc33a441eb77231ef74e417b47c839ad2c4dd
 size 9827149696

 version https://git-lfs.github.com/spec/v1
+oid sha256:108a675c4ceed14a73dcbf0c43ff1bba273cf2cf515ae20ca667294aff2262a9
 size 9827149696

gemma-2-9b-it-Q8_0_L.gguf CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:f17c9e0f73a61a85cfaa621a249f9e83ec14cce5f33c7e3b2362881f6a35ec17
 size 10687309696

 version https://git-lfs.github.com/spec/v1
+oid sha256:e19cb326ba857bec47383ada7af3b0fd965605a44e0a3b6ab0b1e080a541597a
 size 10687309696

gemma-2-9b-it-f32.gguf CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:da30533a73c2d680e7cd2de47d6ca57744eda1a6060e4a1cc7d476cccfb992dd
 size 36972881536

 version https://git-lfs.github.com/spec/v1
+oid sha256:d57770bc4d9ed34aaffe5e7afaaec79dcebc8ea40d6694f90f7858c577bfb1f6
 size 36972881536

gemma-2-9b-it.imatrix CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:e09ef8351870c9f3778d42176021611d5b8ca52b393352b2a55dfd9c8ab52de3
 size 6116901

 version https://git-lfs.github.com/spec/v1
+oid sha256:cb2a0732cf3a4b668251a51e7bc6b1338119cde0dd213911bf0cf58bc005261b
 size 6116901