Henyo-153M / README.md
marcuscedricridia's picture
Upload README.md with huggingface_hub
b6cacd0 verified
metadata
language:
  - tl
dataset:
  - MaAIos/culturax-filipino-subset
library_name: transformers
tags:
  - text-generation
  - pytorch
  - custom-architecture
  - henyo
license: mit

Henyo-153M-CulturaX

Henyo is a 153M parameter Tagalog Language Model trained on the MaAIos/culturax-filipino-subset dataset. It utilizes a custom efficient architecture heavily inspired by Llama 2/3 and PaLM.

Architecture Details

This model uses a custom Decoder-Only Transformer architecture built from scratch in PyTorch.

Hyperparameter Value
Parameters ~153M
Context Window 1024 tokens
Embedding Dim 768
Layers (Depth) 12
Attention Heads 12
KV Heads (GQA) 4
Vocab Size 50,257 (GPT-2 tokenizer)

Key Features

  1. SwiGLU Activation: High-performance gated linear unit activation.
  2. Grouped Query Attention (GQA): 12 Query heads sharing 4 KV heads (3:1 ratio) for efficient inference.
  3. Rotary Positional Embeddings (RoPE): For better generalization on sequence lengths.
  4. RMSNorm: Pre-normalization for training stability.

Training Configuration

  • Dataset: MaAIos/culturax-filipino-subset
  • Mode: Streaming (Iterable Dataset)
  • Optimizer: AdamW
  • Scheduler: Cosine Decay
  • Gradient Accumulation: 8 steps (Effective batch size ~32)
  • Precision: Mixed Precision (FP16)

Usage

Since this model uses a custom architecture, you must include the class definitions (provided in the train_henyo.py file in this repo) or use the inference script below.

# See inference_henyo.py in files for full class definitions
from transformers import AutoTokenizer

model_id = "marcuscedricridia/Henyo-153M-CulturaX"
tokenizer = AutoTokenizer.from_pretrained(model_id)
# Load model using custom class wrapper...

Reproducibility

The full training script (train_henyo.py) is included in the file listing of this repository.