---
license: mit
datasets:
- karpathy/fineweb-edu-100b-shuffle
- HuggingFaceTB/smoltalk
language:
- en
model-index:
- name: chat-d10
  results:
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: AI2 Reasoning Challenge (25-Shot)
      type: ai2_arc
      config: ARC-Challenge
      split: test
    metrics:
    - type: acc_norm
      value: 27.82
      name: normalized accuracy
    source:
      url: https://github.com/karpathy/nanochat
      name: nanochat
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: AI2 Reasoning Easy (25-Shot)
      type: ai2_arc
      config: ARC-Easy
      split: test
    metrics:
    - type: acc_norm
      value: 38.64
      name: normalized accuracy
    source:
      url: https://github.com/karpathy/nanochat
      name: nanochat
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: MMLU (5-Shot)
      type: cais/mmlu
      config: all
      split: test
    metrics:
    - type: acc
      value: 31.66
      name: accuracy
    source:
      url: https://github.com/karpathy/nanochat
      name: nanochat
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: GSM8k (5-shot)
      type: gsm8k
      config: main
      split: test
    metrics:
    - type: acc
      value: 4.55
      name: accuracy
    source:
      url: https://github.com/karpathy/nanochat
      name: nanochat
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: HumanEval
      type: openai_humaneval
      split: test
    metrics:
    - type: pass@1
      value: 5.49
      name: pass@1
    source:
      url: https://github.com/karpathy/nanochat
      name: nanochat
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: ChatCORE
      type: chatcore
      split: test
    metrics:
    - type: score
      value: 23.22
      name: ChatCORE metric
    source:
      url: https://github.com/karpathy/nanochat
      name: nanochat
---

# 🌸 SEA Model series Op.0: Saint Iberis d20 (Parameters: 542M)

![Nanochat_saint_iberis](https://cdn-uploads.huggingface.co/production/uploads/6629ba7d59854b02da014f64/PFxNhrZ3A_UUbE6YfxChl.jpeg)

This repository employs a module called **SLC2**, inspired by **Liquid Time-Constant Networks (LTCs)** and **Liquid Foundation Models (LFM2)**, to enable faster training and inference for **nanochat**.
The **SEA Model series Op.0: Saint Iberis** achieves comparable performance while reducing training time by more than **30 minutes** and lowering computational costs by over **$10**.
You are free to use the model from the repository below.

このリポジトリはnanochatをより高速に学習・推論するために、LTCsおよびLFM2から着想を得たSLC2というモジュールを使用しています。
SEA Model series Op.0: Saint Iberisは元のnanoGPTと比較して学習時間を30分以上、$10以上のコストを削減しながら、同等の性能を達成することが可能です。
モデルは下記リポジトリからご自由に利用できます。

Ripository: [Liquid_Time_nanochat](https://github.com/Rikka-Botan/Liquid_Time_nanochat)

# 🌸 Saint Iberis Architecture

![Saint_Iberis](https://cdn-uploads.huggingface.co/production/uploads/6629ba7d59854b02da014f64/uI6F82n7B6qaGbYSrKbiB.png)


| Property              | Saint Iberis d20              | Remarks                                               |
| --------------------- | ----------------------------- |------------------------------------------------------ |
| **Total parameters**  | 542,035,200 (542M)            | n_layer: 20, n_head: 10, n_kv_head: 10, n_embd: 1280  |
| **Layers**            | 20 (13 slc2 + 7 attn)         | attn layers: 1, 4, 7, 10, 13, 16, 19                  |
| **Vocabulary size**   | 65,536                        | -                                                     |
| **License**           | MIT                           | -                                                    |

# 🌸 SLC2 Formulation

```markdown
y = B ⋅ ∏ᵢ₌ⱼ⁽ʲ⁺ᵏ⁾ Aᵢ ⋅ xᵢ
```

# 🌸 SLC2 pseudo code

```python
----------------------------------------
Algorithm: SLC2
----------------------------------------
Input: x: (B, S, E)
Output: y: (B, S, E)
    1: alpha, A, B, x₁ <- Linear(x)
    2: x₂: (B, S, E) <- Convolution1D(E, E)(SiLU(alpha)*A*x₁)
    3: x₃: (B, S, E) <- B*SiLU(x₂)
    4: y: (B, S, E) <- Linear(x₃)
    5: return y
----------------------------------------
```

# 🌸 Performance

| Metric          |   BASE     |   MID      |   SFT      |   RL       |
|-----------------|------------|------------|------------|------------|
| CORE            |   0.1796   | -          | -          | -          |
| ARC-Challenge   | -          |   0.2910   |   0.2782   | -          |
| ARC-Easy        | -          |   0.3792   |   0.3864   | -          |
| GSM8K           | -          |   0.0341   |   0.0455   | -          |
| HumanEval       | -          |   0.0732   |   0.0549   | -          |
| MMLU            | -          |   0.3146   |   0.3166   | -          |
| ChatCORE        | -          |   0.2348   |   0.2322   | -          |
**Total wall clock time: 3h15m**

# 🌸 Comparison with nanoGPT

| Metric                |   GPT([karpathy/nanochat](https://github.com/karpathy/nanochat))|   Saint Iberis                                  |
|-----------------------|------------------------------------------------------------     |-----------------------------------------------  |
| Total wall clock time |   3h51m                                                         |   **3h15m**                                     |
| ARC-Challenge         |   **0.2807**                                                    |   0.2782                                        |
| ARC-Easy              |   **0.3876**                                                    |   0.3864                                        |
| HumanEval             |   **0.0854**                                                    |   0.0549                                        |
| MMLU                  |   0.3151                                                        |   **0.3166**                                    |
| ChatCORE              |   0.0844                                                        |   **0.2322**                                    |
| Task Average          |   0.1998                                                        |   **0.2190**                                    |

# 🌸 Training result

## Base Training
- Minimum validation bpb: 0.8287
- Final validation bpb: 0.8287

## Mid Training
- Minimum validation bpb: 0.4116

## SFT Training
- Training loss: 0.5825
- Validation loss: 1.0657

# 🌸 Usage

install the ripository:

```sh
git clone https://github.com/Rikka-Botan/Liquid_Time_nanochat.git
```

Then, you can run this inference snippet:

```python
import os
import sys
import torch
import json
import time
from huggingface_hub import hf_hub_download

if not os.path.exists("Liquid_Time_nanochat"):
    os.system("git clone https://github.com/Rikka-Botan/Liquid_Time_nanochat")

os.chdir("Liquid_Time_nanochat")
sys.path.append(os.getcwd())

from nanochat.gpt import GPT, GPTConfig
from nanochat.tokenizer import RustBPETokenizer

repo_id = "RikkaBotan/nanochat_d20_saint_iberis"
model_file = "model_000700.pt"
meta_file = "meta_000700.json"
tokenizer_file = "tokenizer.pkl"

local_pt_path = hf_hub_download(repo_id=repo_id, filename=model_file)
local_meta_path = hf_hub_download(repo_id=repo_id, filename=meta_file)
local_tokenizer_path = hf_hub_download(repo_id=repo_id, filename=tokenizer_file, local_dir=os.getcwd())

with open(local_meta_path, "r", encoding="utf-8") as f:
    meta_data = json.load(f)

model_config = GPTConfig(**meta_data["model_config"])

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = GPT(model_config).to(device)

state_dict = torch.load(local_pt_path, map_location=device)
state_dict = {k.removeprefix("_orig_mod."): v for k, v in state_dict.items()}
model.load_state_dict(state_dict, strict=True)
model.eval()

tokenizer = RustBPETokenizer.from_directory(os.getcwd())

try:
    tokenizer.bos_token_id = tokenizer.enc.encode_single_token("<|bos|>")
except KeyError:
    tokenizer.bos_token_id = tokenizer.enc.encode_single_token("<|endoftext|>")

tokenizer.user_start_id = tokenizer.enc.encode_single_token("<|user_start|>")
tokenizer.user_end_id = tokenizer.enc.encode_single_token("<|user_end|>")
tokenizer.assistant_start_id = tokenizer.enc.encode_single_token("<|assistant_start|>")
tokenizer.assistant_end_id = tokenizer.enc.encode_single_token("<|assistant_end|>")
tokenizer.stop_tokens = {tokenizer.assistant_end_id, tokenizer.bos_token_id}

def format_conversation(tokenizer, history):
    tokens = [tokenizer.bos_token_id]
    for message in history:
        role = message["role"]
        content = message["content"]
        content_tokens = tokenizer.encode(content)
        if role == "user":
            tokens.extend([tokenizer.user_start_id, *content_tokens, tokenizer.user_end_id])
        elif role == "assistant":
            tokens.extend([tokenizer.assistant_start_id, *content_tokens, tokenizer.assistant_end_id])
    tokens.append(tokenizer.assistant_start_id)
    return tokens

def generate_reply(prompt, conv_history, temperature=0.7, top_k=20, top_p=0.8,
                   repetition_penalty=1.15, max_new_tokens=64):
    conv_history.append({"role": "user", "content": prompt})
    tokens = format_conversation(tokenizer, conv_history)
    input_ids = torch.tensor(tokens, dtype=torch.long).unsqueeze(0).to(device)

    stream = model.generate(
        input_ids,
        max_new_tokens=max_new_tokens,
        temperature=temperature,
        top_k=top_k,
        top_p=top_p,
        repetition_penalty=repetition_penalty,
    )

    buffer_text = ""
    for token_id in stream:
        text_piece = tokenizer.decode([token_id])
        if text_piece == "<|assistant_end|>":
            break
        buffer_text += text_piece
    conv_history.append({"role": "assistant", "content": buffer_text})
    return buffer_text

if __name__ == "__main__":
    print("🌸 NanoChat - Saint Iberis CLI")
    print("Type 'exit' to quit.\n")
    conv_history = []

    while True:
        prompt = input("You: ")
        if prompt.lower() in {"exit", "quit"}:
            print("Goodbye!")
            break

        reply = generate_reply(prompt, conv_history)
        print(f"AI: {reply}\n")
```

# 🌸 Acknowledgments

I thank [Andrej Karpathy's](https://huggingface.co/karpathy) fullstack llm project to build an LLM, [nanochat](https://github.com/karpathy/nanochat).

I thank the developers of python and pytorch.

I thank all the researchers for their efforts to date.

I thank Japan's high standard of education.

And most of all, thank you for your interest in this repository.

# 🌸 About us

Japanese independent researcher having shy and pampered personality. Twin-tail hair is a charm point. Interested in nlp. Usually using python and C.

![RikkaBotan_Logo](https://cdn-uploads.huggingface.co/production/uploads/6629ba7d59854b02da014f64/vo4azDEv3SZNVDB6O609i.png)