C2S-Scale-Gemma-2-27B GGUF Models

Model Generation Details

This model was generated using llama.cpp at commit 03792ad93.

Quantization Beyond the IMatrix

I've been experimenting with a new quantization approach that selectively elevates the precision of key layers beyond what the default IMatrix configuration provides.

In my testing, standard IMatrix quantization underperforms at lower bit depths, especially with Mixture of Experts (MoE) models. To address this, I'm using the --tensor-type option in llama.cpp to manually "bump" important layers to higher precision. You can see the implementation here:
👉 Layer bumping with llama.cpp

While this does increase model file size, it significantly improves precision for a given quantization level.

I'd love your feedback—have you tried this? How does it perform for you?

Click here to get info on choosing the right GGUF model format

C2S-Scale-Gemma-27B model card

GitHub homepage: Cell2Sentence GitHub

Model documentation: Cell2Sentence Documentation

Resources:

C2S-Scale Paper: Scaling Large Language Models for Next-Generation Single-Cell Analysis
HuggingFace C2S Collection: C2S-Scale Models
GitHub Repository: vandijklab/cell2sentence (for code, tutorials, and discussions)
Google Research Blog Post: Teaching machines the language of biology

Author: van Dijk Lab (Yale), Google Research, Google DeepMind

Model information

This section describes the C2S-Scale model and how to use it.

Description

C2S-Scale-Gemma-27B is a state-of-the-art, open language model built upon the Gemma-2 27B architecture and fine-tuned for single-cell biology. Developed through the Cell2Sentence (C2S) framework, the model processes and understands single-cell RNA sequencing (scRNA-seq) data by treating it as a language. It converts high-dimensional scRNA-seq expression data into "cell sentences" - ordered sequences of gene names - enabling a wide range of biological analyses.

This work is the result of a collaboration between Yale University, Google Research, and Google DeepMind to scale up C2S models. The C2S-Scale models were trained on Google's TPU v5s, which allowed for a significant increase in model size and capability. These models excel at tasks such as cell type prediction, tissue classification, and generating biologically meaningful cell representations.

Key Features

Versatility: Demonstrates strong performance across a diverse set of single-cell and multi-cell tasks.
Scalability: Trained on a massive dataset of over 57 million cells, showcasing the power of scaling LLMs for biological data.
Generative Power: Capable of generating realistic single-cell gene expression profiles.
Foundation for Fine-tuning: Can serve as a powerful pretrained foundation for specialized, domain-specific single-cell analysis tasks.

Potential Applications

C2S-Scale can be a valuable tool for researchers in the following areas:

In Silico Experiments: Generate cells under specific conditions or predict perturbational changes to form and test new biological hypotheses.
Cell Atlas Annotation: Streamline the process of annotating large-scale single-cell datasets by predicting cell types and tissues.
Biomarker Discovery: Analyze gene patterns within cell sentences to identify potential markers for specific cell states or diseases.

How to use

Below are code snippets to help you get started running the model locally on a GPU. The model can be used for various tasks, further described in the C2S-Scale paper.

Formatting prompts for cell type prediction

To perform cell type prediction, the model expects a prompt containing the cell sentence followed by a query.

# A "cell sentence" is a space-separated string of gene names
# ordered by expression level, from highest to lowest.
cell_sentence = "MALAT1 TMSB4X B2M EEF1A1 H3F3B ACTB FTL RPL13 ..." # Truncated for example purposes
num_genes = 1000
organism = "Homo sapiens"

# Construct the prompt for cell type prediction
prompt = f"""The following is a list of {num_genes} gene names ordered by descending expression level in a {organism} cell. Your task is to give the cell type which this cell belongs to based on its gene expression.
Cell sentence: {cell_sentence}.
The cell type corresponding to these genes is:"""

print(prompt)

The resulting prompt is in the format expected by the model for this task:

The following is a list of 1000 gene names ordered by descending expression level in a Homo sapiens cell. Your task is to give the cell type which this cell belongs to based on its gene expression.
Cell sentence: MALAT1 TMSB4X B2M EEF1A1 H3F3B ACTB FTL RPL13 ... .
The cell type corresponding to these genes is:

Running the model on predictive tasks

# pip install accelerate transformers sentencepiece
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

device = "cuda" if torch.cuda.is_available() else "cpu"

# Load model directly from Hugging Face Hub
model_id = "vandijklab/C2S-Scale-Gemma-2-27B"

# Load tokenizer; requires sentencepiece to be installed
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
).to(device)

# Format prompt (see previous section)
cell_sentence = "MALAT1 TMSB4X B2M EEF1A1 H3F3B ACTB FTL RPL13 ..." # Truncated for example, use at least 200 genes for inference
num_genes = 1000
organism = "Homo sapiens"

prompt = f"""The following is a list of {num_genes} gene names ordered by descending expression level in a {organism} cell. Your task is to give the cell type which this cell belongs to based on its gene expression.
Cell sentence: {cell_sentence}.
The cell type corresponding to these genes is:"""

# Prepare tokenized inputs
input_ids = tokenizer(prompt, return_tensors="pt").to(device)

# Generate response
outputs = model.generate(**input_ids, max_new_tokens=20)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)

# The predicted cell type will be the text immediately following the prompt
predicted_cell_type = response.split("The cell type corresponding to these genes is:")[1].strip()
print(f"Predicted Cell Type: {predicted_cell_type}")

Examples

See the following Colab notebooks in our GitHub repository for examples of how to use C2S-Scale models:

To quickly get started with the model for tasks like cell type prediction and generation: C2S Tutorials

Model architecture overview

C2S-Scale is based on the Gemma 2 family of lightweight, state-of-the-art open LLMs, which utilizes a decoder-only transformer architecture.
Base Model: Gemma-2 27B.
Fine-tuning Data: A comprehensive collection of over 800 datasets from CellxGene and the Human Cell Atlas, totaling over 57 million human and mouse cells.
Training Approach: Instruction fine-tuning using the Cell2Sentence framework, which converts scRNA-seq expression data into sequences of gene tokens.

Technical Specifications

Model type: Decoder-only Transformer (based on Gemma-2)
Key publication: Scaling Large Language Models for Next-Generation Single-Cell Analysis

Performance & Validation

The performance of C2S-Scale models was validated on a wide range of single-cell and multi-cell tasks, including advanced downstream tasks such as cluster captioning, question answering, and perturbation prediction. C2S-Scale models demonstrated significant improvements over other open and closed-source models, establishing new state-of-the-art benchmarks for LLMs in single-cell biology. Please see our preprint for a full breakdown of performance metrics.

Inputs and outputs

Input: Text. For best performance, prompts should be structured according to the specific task (e.g., cell type prediction, conditioned generation). Inputs are "cell sentences"—ordered, space-separated lists of gene names.
Output: Text. The model generates text as a response, which can be a predicted label (like a cell type or tissue), a full cell sentence, or a natural language abstract.

Dataset details

Training dataset

CellxGene and Human Cell Atlas: The model was trained on a curated collection of over 800 public scRNA-seq datasets, encompassing more than 57 million cells. This data covers a broad range of tissues, cell types, and experimental conditions from both human and mouse, ensuring the model learns a robust and generalizable representation of cellular states.

Evaluation dataset

Evaluation was performed using held-out datasets and standardized benchmarks designed to test the model's capabilities on the tasks listed above. All evaluation methodologies followed established best practices for splitting data to ensure robust and unbiased assessment.

License

The model weights shared on Huggingface are CC-by-4.0.

Implementation information

Software

The model was trained using JAX, leveraging Google's TPU v5 hardware for efficient and large-scale training.

Use and limitations

Intended use

Research in single-cell genomics and computational biology.
As a foundational model for fine-tuning on specific biological domains or datasets.
To aid in the annotation and interpretation of large-scale scRNA-seq experiments.

Benefits

C2S-Scale provides a powerful, versatile, and scalable tool for single-cell analysis. It offers:

State-of-the-art performance on a wide range of scRNA-seq tasks.
A unified framework for handling diverse single-cell analysis challenges.
A foundation for building more specialized models from private or proprietary data.
The ability to perform in silico generation of cellular data to explore biological hypotheses.

Limitations

The model is trained on public data and its knowledge is limited to the genes, cell types, and conditions present in that data.
Performance on out-of-distribution data (e.g., completely novel cell types or technologies) is not guaranteed and requires validation.
Performance of the models on input prompt formats that greatly deviate from training prompt formatting is not guaranteed.

Citation

@article{Rizvi2025.04.14.648850,
    abstract = {Single-cell RNA sequencing has transformed our understanding of cellular diversity, yet current single-cell foundation models (scFMs) remain limited in their scalability, flexibility across diverse tasks, and ability to natively integrate textual information. In this work, we build upon the Cell2Sentence (C2S) framework, which represents scRNA-seq profiles as textual {\textquotedblleft}cell sentences,{\textquotedblright} to train Large Language Models (LLMs) on a corpus comprising over one billion tokens of transcriptomic data, biological text, and metadata. By scaling model size to 27 billion parameters, we observe consistent improvements in predictive and generative capabilities, as well as the capacity for advanced downstream tasks requiring synthesis of information across multicellular contexts. Through targeted fine-tuning supported by modern reinforcement learning techniques, our approach excels in tasks such as perturbation response prediction, natural language interpretation, and complex biological reasoning. By unifying transcriptomic and textual data at unprecedented scales, this approach not only surpasses both specialized single-cell models and general-purpose LLMs, but also establishes a powerful platform for next-generation single-cell analysis, paving the way for the development of {\textquotedblleft}virtual cells.{\textquotedblright}Competing Interest StatementThe authors have declared no competing interest.},
    author = {Rizvi, Syed Asad and Levine, Daniel and Patel, Aakash and Zhang, Shiyang and Wang, Eric and He, Sizhuang and Zhang, David and Tang, Cerise and Lyu, Zhuoyang and Darji, Rayyan and Li, Chang and Sun, Emily and Jeong, David and Zhao, Lawrence and Kwan, Jennifer and Braun, David and Hafler, Brian and Ishizuka, Jeffrey and Dhodapkar, Rahul M. and Chung, Hattie and Azizi, Shekoofeh and Perozzi, Bryan and van Dijk, David},
    doi = {10.1101/2025.04.14.648850},
    elocation-id = {2025.04.14.648850},
    eprint = {https://www.biorxiv.org/content/early/2025/04/17/2025.04.14.648850.full.pdf},
    journal = {bioRxiv},
    publisher = {Cold Spring Harbor Laboratory},
    title = {Scaling Large Language Models for Next-Generation Single-Cell Analysis},
    url = {https://www.biorxiv.org/content/early/2025/04/17/2025.04.14.648850},
    year = {2025},
    Bdsk-Url-1 = {https://www.biorxiv.org/content/early/2025/04/17/2025.04.14.648850},
    Bdsk-Url-2 = {https://doi.org/10.1101/2025.04.14.648850}}

C2S-Scale Links

Paper: Scaling Large Language Models for Next-Generation Single-Cell Analysis
Google Research Blog Post: Teaching machines the language of biology: Scaling large language models for next-generation single-cell analysis
GitHub: https://github.com/vandijklab/cell2sentence (Note: Codebase has CC BY-NC-ND 4.0 license. Only weights shared on Hugging Face are CC-by-4.0)

Gemma-2 Links

HuggingFace: https://huggingface.co/google/gemma-2-27b
Gemma-2 Blog Post: Gemma explained: What's new in Gemma 2
Technical report: https://storage.googleapis.com/deepmind-media/gemma/gemma-2-report.pdf

🚀 If you find these models useful

Help me test my AI-Powered Quantum Network Monitor Assistant with quantum-ready security checks:

👉 Quantum Network Monitor

The full Open Source Code for the Quantum Network Monitor Service available at my github repos ( repos with NetworkMonitor in the name) : Source Code Quantum Network Monitor. You will also find the code I use to quantize the models if you want to do it yourself GGUFModelBuilder

💬 How to test:
Choose an AI assistant type:

TurboLLM (GPT-4.1-mini)
HugLLM (Hugginface Open-source models)
TestLLM (Experimental CPU-only)

What I’m Testing

I’m pushing the limits of small open-source models for AI network monitoring, specifically:

Function calling against live network services
How small can a model go while still handling:
- Automated Nmap security scans
- Quantum-readiness checks
- Network Monitoring tasks

🟡 TestLLM – Current experimental model (llama.cpp on 2 CPU threads on huggingface docker space):

✅ Zero-configuration setup
⏳ 30s load time (slow inference but no API costs) . No token limited as the cost is low.
🔧 Help wanted! If you’re into edge-device AI, let’s collaborate!

Other Assistants

🟢 TurboLLM – Uses gpt-4.1-mini :

**It performs very well but unfortunatly OpenAI charges per token. For this reason tokens usage is limited.
Create custom cmd processors to run .net code on Quantum Network Monitor Agents
Real-time network diagnostics and monitoring
Security Audits
Penetration testing (Nmap/Metasploit)

🔵 HugLLM – Latest Open-source models:

🌐 Runs on Hugging Face Inference API. Performs pretty well using the lastest models hosted on Novita.

💡 Example commands you could test:

"Give me info on my websites SSL certificate"
"Check if my server is using quantum safe encyption for communication"
"Run a comprehensive security audit on my server"
'"Create a cmd processor to .. (what ever you want)" Note you need to install a Quantum Network Monitor Agent to run the .net code on. This is a very flexible and powerful feature. Use with caution!

Final Word

I fund the servers used to create these model files, run the Quantum Network Monitor service, and pay for inference from Novita and OpenAI—all out of my own pocket. All the code behind the model creation and the Quantum Network Monitor project is open source. Feel free to use whatever you find helpful.

If you appreciate the work, please consider buying me a coffee ☕. Your support helps cover service costs and allows me to raise token limits for everyone.

I'm also open to job opportunities or sponsorship.

Thank you! 😊

Downloads last month: 9,090

GGUF

Model size

27B params

Architecture

gemma2

Hardware compatibility

1-bit

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

View +1 variant

Model tree for Mungert/C2S-Scale-Gemma-2-27B-GGUF

Base model

google/gemma-2-27b

Quantized

(16)

this model