Henyo-153M-CulturaX

Henyo is a 153M parameter Tagalog Language Model trained on the MaAIos/culturax-filipino-subset dataset. It utilizes a custom efficient architecture heavily inspired by Llama 2/3 and PaLM.

Architecture Details

This model uses a custom Decoder-Only Transformer architecture built from scratch in PyTorch.

Hyperparameter Value
Parameters ~153M
Context Window 1024 tokens
Embedding Dim 768
Layers (Depth) 12
Attention Heads 12
KV Heads (GQA) 4
Vocab Size 50,257 (GPT-2 tokenizer)

Key Features

  1. SwiGLU Activation: High-performance gated linear unit activation.
  2. Grouped Query Attention (GQA): 12 Query heads sharing 4 KV heads (3:1 ratio) for efficient inference.
  3. Rotary Positional Embeddings (RoPE): For better generalization on sequence lengths.
  4. RMSNorm: Pre-normalization for training stability.

Training Configuration

  • Dataset: MaAIos/culturax-filipino-subset
  • Mode: Streaming (Iterable Dataset)
  • Optimizer: AdamW
  • Scheduler: Cosine Decay
  • Gradient Accumulation: 8 steps (Effective batch size ~32)
  • Precision: Mixed Precision (FP16)

Usage

Since this model uses a custom architecture, you must include the class definitions (provided in the train_henyo.py file in this repo) or use the inference script below.

# See inference_henyo.py in files for full class definitions
from transformers import AutoTokenizer

model_id = "marcuscedricridia/Henyo-153M-CulturaX"
tokenizer = AutoTokenizer.from_pretrained(model_id)
# Load model using custom class wrapper...

Reproducibility

The full training script (train_henyo.py) is included in the file listing of this repository.

Downloads last month
25
Safetensors
Model size
0.1B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support