--- license: mit datasets: - karpathy/fineweb-edu-100b-shuffle - HuggingFaceTB/smoltalk language: - en model-index: - name: chat-d10 results: - task: type: text-generation name: Text Generation dataset: name: AI2 Reasoning Challenge (25-Shot) type: ai2_arc config: ARC-Challenge split: test metrics: - type: acc_norm value: 27.82 name: normalized accuracy source: url: https://github.com/karpathy/nanochat name: nanochat - task: type: text-generation name: Text Generation dataset: name: AI2 Reasoning Easy (25-Shot) type: ai2_arc config: ARC-Easy split: test metrics: - type: acc_norm value: 38.64 name: normalized accuracy source: url: https://github.com/karpathy/nanochat name: nanochat - task: type: text-generation name: Text Generation dataset: name: MMLU (5-Shot) type: cais/mmlu config: all split: test metrics: - type: acc value: 31.66 name: accuracy source: url: https://github.com/karpathy/nanochat name: nanochat - task: type: text-generation name: Text Generation dataset: name: GSM8k (5-shot) type: gsm8k config: main split: test metrics: - type: acc value: 4.55 name: accuracy source: url: https://github.com/karpathy/nanochat name: nanochat - task: type: text-generation name: Text Generation dataset: name: HumanEval type: openai_humaneval split: test metrics: - type: pass@1 value: 5.49 name: pass@1 source: url: https://github.com/karpathy/nanochat name: nanochat - task: type: text-generation name: Text Generation dataset: name: ChatCORE type: chatcore split: test metrics: - type: score value: 23.22 name: ChatCORE metric source: url: https://github.com/karpathy/nanochat name: nanochat --- # 🌸 SEA Model series Op.0: Saint Iberis d20 (Parameters: 542M) ![Nanochat_saint_iberis](https://cdn-uploads.huggingface.co/production/uploads/6629ba7d59854b02da014f64/PFxNhrZ3A_UUbE6YfxChl.jpeg) This repository employs a module called **SLC2**, inspired by **Liquid Time-Constant Networks (LTCs)** and **Liquid Foundation Models (LFM2)**, to enable faster training and inference for **nanochat**. The **SEA Model series Op.0: Saint Iberis** achieves comparable performance while reducing training time by more than **30 minutes** and lowering computational costs by over **$10**. You are free to use the model from the repository below. このリポジトリはnanochatをより高速に学習・推論するために、LTCsおよびLFM2から着想を得たSLC2というモジュールを使用しています。 SEA Model series Op.0: Saint Iberisは元のnanoGPTと比較して学習時間を30分以上、$10以上のコストを削減しながら、同等の性能を達成することが可能です。 モデルは下記リポジトリからご自由に利用できます。 Ripository: [Liquid_Time_nanochat](https://github.com/Rikka-Botan/Liquid_Time_nanochat) # 🌸 Saint Iberis Architecture ![Saint_Iberis](https://cdn-uploads.huggingface.co/production/uploads/6629ba7d59854b02da014f64/uI6F82n7B6qaGbYSrKbiB.png) | Property | Saint Iberis d20 | Remarks | | --------------------- | ----------------------------- |------------------------------------------------------ | | **Total parameters** | 542,035,200 (542M) | n_layer: 20, n_head: 10, n_kv_head: 10, n_embd: 1280 | | **Layers** | 20 (13 slc2 + 7 attn) | attn layers: 1, 4, 7, 10, 13, 16, 19 | | **Vocabulary size** | 65,536 | - | | **License** | MIT | - | # 🌸 SLC2 Formulation ```markdown y = B ⋅ ∏ᵢ₌ⱼ⁽ʲ⁺ᵏ⁾ Aᵢ ⋅ xᵢ ``` # 🌸 SLC2 pseudo code ```python ---------------------------------------- Algorithm: SLC2 ---------------------------------------- Input: x: (B, S, E) Output: y: (B, S, E) 1: alpha, A, B, x₁ <- Linear(x) 2: x₂: (B, S, E) <- Convolution1D(E, E)(SiLU(alpha)*A*x₁) 3: x₃: (B, S, E) <- B*SiLU(x₂) 4: y: (B, S, E) <- Linear(x₃) 5: return y ---------------------------------------- ``` # 🌸 Performance | Metric | BASE | MID | SFT | RL | |-----------------|------------|------------|------------|------------| | CORE | 0.1796 | - | - | - | | ARC-Challenge | - | 0.2910 | 0.2782 | - | | ARC-Easy | - | 0.3792 | 0.3864 | - | | GSM8K | - | 0.0341 | 0.0455 | - | | HumanEval | - | 0.0732 | 0.0549 | - | | MMLU | - | 0.3146 | 0.3166 | - | | ChatCORE | - | 0.2348 | 0.2322 | - | **Total wall clock time: 3h15m** # 🌸 Comparison with nanoGPT | Metric | GPT([karpathy/nanochat](https://github.com/karpathy/nanochat))| Saint Iberis | |-----------------------|------------------------------------------------------------ |----------------------------------------------- | | Total wall clock time | 3h51m | **3h15m** | | ARC-Challenge | **0.2807** | 0.2782 | | ARC-Easy | **0.3876** | 0.3864 | | HumanEval | **0.0854** | 0.0549 | | MMLU | 0.3151 | **0.3166** | | ChatCORE | 0.0844 | **0.2322** | | Task Average | 0.1998 | **0.2190** | # 🌸 Training result ## Base Training - Minimum validation bpb: 0.8287 - Final validation bpb: 0.8287 ## Mid Training - Minimum validation bpb: 0.4116 ## SFT Training - Training loss: 0.5825 - Validation loss: 1.0657 # 🌸 Usage install the ripository: ```sh git clone https://github.com/Rikka-Botan/Liquid_Time_nanochat.git ``` Then, you can run this inference snippet: ```python import os import sys import torch import json import time from huggingface_hub import hf_hub_download if not os.path.exists("Liquid_Time_nanochat"): os.system("git clone https://github.com/Rikka-Botan/Liquid_Time_nanochat") os.chdir("Liquid_Time_nanochat") sys.path.append(os.getcwd()) from nanochat.gpt import GPT, GPTConfig from nanochat.tokenizer import RustBPETokenizer repo_id = "RikkaBotan/nanochat_d20_saint_iberis" model_file = "model_000700.pt" meta_file = "meta_000700.json" tokenizer_file = "tokenizer.pkl" local_pt_path = hf_hub_download(repo_id=repo_id, filename=model_file) local_meta_path = hf_hub_download(repo_id=repo_id, filename=meta_file) local_tokenizer_path = hf_hub_download(repo_id=repo_id, filename=tokenizer_file, local_dir=os.getcwd()) with open(local_meta_path, "r", encoding="utf-8") as f: meta_data = json.load(f) model_config = GPTConfig(**meta_data["model_config"]) device = torch.device("cuda" if torch.cuda.is_available() else "cpu") model = GPT(model_config).to(device) state_dict = torch.load(local_pt_path, map_location=device) state_dict = {k.removeprefix("_orig_mod."): v for k, v in state_dict.items()} model.load_state_dict(state_dict, strict=True) model.eval() tokenizer = RustBPETokenizer.from_directory(os.getcwd()) try: tokenizer.bos_token_id = tokenizer.enc.encode_single_token("<|bos|>") except KeyError: tokenizer.bos_token_id = tokenizer.enc.encode_single_token("<|endoftext|>") tokenizer.user_start_id = tokenizer.enc.encode_single_token("<|user_start|>") tokenizer.user_end_id = tokenizer.enc.encode_single_token("<|user_end|>") tokenizer.assistant_start_id = tokenizer.enc.encode_single_token("<|assistant_start|>") tokenizer.assistant_end_id = tokenizer.enc.encode_single_token("<|assistant_end|>") tokenizer.stop_tokens = {tokenizer.assistant_end_id, tokenizer.bos_token_id} def format_conversation(tokenizer, history): tokens = [tokenizer.bos_token_id] for message in history: role = message["role"] content = message["content"] content_tokens = tokenizer.encode(content) if role == "user": tokens.extend([tokenizer.user_start_id, *content_tokens, tokenizer.user_end_id]) elif role == "assistant": tokens.extend([tokenizer.assistant_start_id, *content_tokens, tokenizer.assistant_end_id]) tokens.append(tokenizer.assistant_start_id) return tokens def generate_reply(prompt, conv_history, temperature=0.7, top_k=20, top_p=0.8, repetition_penalty=1.15, max_new_tokens=64): conv_history.append({"role": "user", "content": prompt}) tokens = format_conversation(tokenizer, conv_history) input_ids = torch.tensor(tokens, dtype=torch.long).unsqueeze(0).to(device) stream = model.generate( input_ids, max_new_tokens=max_new_tokens, temperature=temperature, top_k=top_k, top_p=top_p, repetition_penalty=repetition_penalty, ) buffer_text = "" for token_id in stream: text_piece = tokenizer.decode([token_id]) if text_piece == "<|assistant_end|>": break buffer_text += text_piece conv_history.append({"role": "assistant", "content": buffer_text}) return buffer_text if __name__ == "__main__": print("🌸 NanoChat - Saint Iberis CLI") print("Type 'exit' to quit.\n") conv_history = [] while True: prompt = input("You: ") if prompt.lower() in {"exit", "quit"}: print("Goodbye!") break reply = generate_reply(prompt, conv_history) print(f"AI: {reply}\n") ``` # 🌸 Acknowledgments I thank [Andrej Karpathy's](https://huggingface.co/karpathy) fullstack llm project to build an LLM, [nanochat](https://github.com/karpathy/nanochat). I thank the developers of python and pytorch. I thank all the researchers for their efforts to date. I thank Japan's high standard of education. And most of all, thank you for your interest in this repository. # 🌸 About us Japanese independent researcher having shy and pampered personality. Twin-tail hair is a charm point. Interested in nlp. Usually using python and C. ![RikkaBotan_Logo](https://cdn-uploads.huggingface.co/production/uploads/6629ba7d59854b02da014f64/vo4azDEv3SZNVDB6O609i.png)