Henyo-153M / README.md

marcuscedricridia

Upload README.md with huggingface_hub

b6cacd0 verified 13 days ago

preview code

raw

history blame contribute delete

2.03 kB

metadata

language:
  - tl
dataset:
  - MaAIos/culturax-filipino-subset
library_name: transformers
tags:
  - text-generation
  - pytorch
  - custom-architecture
  - henyo
license: mit

Henyo-153M-CulturaX

Henyo is a 153M parameter Tagalog Language Model trained on the MaAIos/culturax-filipino-subset dataset. It utilizes a custom efficient architecture heavily inspired by Llama 2/3 and PaLM.

Architecture Details

This model uses a custom Decoder-Only Transformer architecture built from scratch in PyTorch.

Hyperparameter	Value
Parameters	~153M
Context Window	1024 tokens
Embedding Dim	768
Layers (Depth)	12
Attention Heads	12
KV Heads (GQA)	4
Vocab Size	50,257 (GPT-2 tokenizer)

Key Features

SwiGLU Activation: High-performance gated linear unit activation.
Grouped Query Attention (GQA): 12 Query heads sharing 4 KV heads (3:1 ratio) for efficient inference.
Rotary Positional Embeddings (RoPE): For better generalization on sequence lengths.
RMSNorm: Pre-normalization for training stability.

Training Configuration

Dataset: MaAIos/culturax-filipino-subset
Mode: Streaming (Iterable Dataset)
Optimizer: AdamW
Scheduler: Cosine Decay
Gradient Accumulation: 8 steps (Effective batch size ~32)
Precision: Mixed Precision (FP16)

Usage

Since this model uses a custom architecture, you must include the class definitions (provided in the train_henyo.py file in this repo) or use the inference script below.

# See inference_henyo.py in files for full class definitions
from transformers import AutoTokenizer

model_id = "marcuscedricridia/Henyo-153M-CulturaX"
tokenizer = AutoTokenizer.from_pretrained(model_id)
# Load model using custom class wrapper...

Reproducibility

The full training script (train_henyo.py) is included in the file listing of this repository.