Henyo-153M-CulturaX
Henyo is a 153M parameter Tagalog Language Model trained on the MaAIos/culturax-filipino-subset dataset. It utilizes a custom efficient architecture heavily inspired by Llama 2/3 and PaLM.
Architecture Details
This model uses a custom Decoder-Only Transformer architecture built from scratch in PyTorch.
| Hyperparameter | Value |
|---|---|
| Parameters | ~153M |
| Context Window | 1024 tokens |
| Embedding Dim | 768 |
| Layers (Depth) | 12 |
| Attention Heads | 12 |
| KV Heads (GQA) | 4 |
| Vocab Size | 50,257 (GPT-2 tokenizer) |
Key Features
- SwiGLU Activation: High-performance gated linear unit activation.
- Grouped Query Attention (GQA): 12 Query heads sharing 4 KV heads (3:1 ratio) for efficient inference.
- Rotary Positional Embeddings (RoPE): For better generalization on sequence lengths.
- RMSNorm: Pre-normalization for training stability.
Training Configuration
- Dataset: MaAIos/culturax-filipino-subset
- Mode: Streaming (Iterable Dataset)
- Optimizer: AdamW
- Scheduler: Cosine Decay
- Gradient Accumulation: 8 steps (Effective batch size ~32)
- Precision: Mixed Precision (FP16)
Usage
Since this model uses a custom architecture, you must include the class definitions (provided in the train_henyo.py file in this repo) or use the inference script below.
# See inference_henyo.py in files for full class definitions
from transformers import AutoTokenizer
model_id = "marcuscedricridia/Henyo-153M-CulturaX"
tokenizer = AutoTokenizer.from_pretrained(model_id)
# Load model using custom class wrapper...
Reproducibility
The full training script (train_henyo.py) is included in the file listing of this repository.
- Downloads last month
- 25