Instructions to use GuminiResearch/Gumini-1B-Base with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use GuminiResearch/Gumini-1B-Base with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="GuminiResearch/Gumini-1B-Base") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("GuminiResearch/Gumini-1B-Base") model = AutoModelForCausalLM.from_pretrained("GuminiResearch/Gumini-1B-Base") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use GuminiResearch/Gumini-1B-Base with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "GuminiResearch/Gumini-1B-Base" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "GuminiResearch/Gumini-1B-Base", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/GuminiResearch/Gumini-1B-Base
- SGLang
How to use GuminiResearch/Gumini-1B-Base with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "GuminiResearch/Gumini-1B-Base" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "GuminiResearch/Gumini-1B-Base", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "GuminiResearch/Gumini-1B-Base" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "GuminiResearch/Gumini-1B-Base", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use GuminiResearch/Gumini-1B-Base with Docker Model Runner:
docker model run hf.co/GuminiResearch/Gumini-1B-Base
🐻 Gumini-1B (구미니)
Built with Qwen
Model Description
Gumini (구미니) is a bilingual Korean-English base language model created by inheriting the first 10 layers of Qwen 2.5 3B using the Inheritune methodology, followed by continued pretraining on a Korean–English mixed corpus (~393M tokens).
This is a BASE model, not instruction-tuned.
It produces text continuations rather than conversational responses.
What We Modified
The original Qwen 2.5 3B model was modified as follows:
Layer Inheritance (Inheritune)
- Inherited the first 10 transformer layers out of 36
- Reduced model size while preserving early linguistic abilities
Pretraining
- Trained for 393M tokens on a Korean–English dataset
- Maintains base-model behavior (not SFT or instruction-tuning)
Identity Injection
- Added system-level identity tokens for model conditioning
This model inherits early layers from Qwen 2.5 3B and is retrained with progressive layer expansion using Inheritune methodology.
Model Details
| Attribute | Value |
|---|---|
| Researcher | Gumin Kwon (권구민) |
| Base Model | Qwen/Qwen2.5-3B |
| Training Method | Inheritune + Pretraining |
| Parameters | 1.08B |
| Layers | 10 |
| Hidden Size | 2048 |
| Attention Heads | 16 |
| KV Heads | 2 (GQA) |
| Vocab Size | 151,936 |
| Tokens Trained | 393M |
Training Data
| Dataset | Language | Weight |
|---|---|---|
| FineWeb-Edu | English | 20% |
| CulturaX-ko | Korean | 50% |
| Wikipedia-ko | Korean | 30% |
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model = AutoModelForCausalLM.from_pretrained(
"GuminiResearch/Gumini-1B-Base",
torch_dtype=torch.bfloat16,
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("GuminiResearch/Gumini-1B-Base")
prompt = "저는 구미니입니다."
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
**inputs,
max_new_tokens=100,
repetition_penalty=1.2,
do_sample=True,
temperature=0.7
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
from transformers import pipeline
generator = pipeline(
"text-generation",
model="GuminiResearch/Gumini-1B-Base",
)
prompt = "저는 구미니입니다."
output = generator(prompt, max_new_tokens=100, temperature=0.7, repetition_penalty=1.2)
print(output[0]["generated_text"])
Limitations
- Base model: no instruction-tuning or safety alignment
- High repetition risk: use
repetition_penalty >= 1.2 - May generate incorrect or outdated information
- Should not be used in sensitive or safety-critical contexts
License
Qwen Research License (Non-Commercial)
This model is Built with Qwen and derived from Qwen 2.5 3B.
Qwen is licensed under the Qwen RESEARCH LICENSE AGREEMENT.
Copyright (c) Alibaba Cloud. All Rights Reserved.
This model is for NON-COMMERCIAL / RESEARCH use only.
For commercial use, contact Alibaba Cloud.
Inheritune Paper (CC BY 4.0)
@inproceedings{Sanyal2024inheritune,
title={Inheritune: Training Smaller Yet More Attentive Language Models},
author={Sunny Sanyal and Ravid Shwartz-Ziv and Alexandros G. Dimakis and Sujay Sanghavi},
year={2024},
url={https://arxiv.org/abs/2404.08634}
}
Citation
@misc{gumini2025,
title={Gumini-1B: Bilingual Language Model Built with Qwen via Inheritune},
author={Gumin Kwon},
year={2025},
note={Built with Qwen},
url={https://huggingface.co/GuminiResearch/Gumini-1B-Base}
}
Author
- LinkedIn: linkedin.com/in/devgumin
- HuggingFace: GuminiResearch
Built with Qwen
Gumini - 작지만 똑똑한 AI
- Downloads last month
- 7