wikicmbaV1
wikicmbaV1 is an experimental text generation model based on the. It was trained from scratch on the WikiText-103 dataset, a large-scale language modeling benchmark derived from high-quality Wikipedia articles.
The model utilizes the HRM structure, consisting of a "Specialist" module for low-level processing and a "Manager" module for high-level abstraction and planning. This architecture aims to handle long-range dependencies more effectively by summarizing information at different temporal scales.
Model Description
- Architecture: Hierarchical Recurrent Memory (HRM)
- Training Data: WikiText-103
- Original Paper: Hierarchical Reasoning Model
- Tokenizer:
t5-small(slow T5 SentencePiece) - Vocab Size: 32100
- Objective: Causal Language Modeling
Latest Performance (Epoch 45)
- Validation Loss:
3.1813 - Validation Perplexity:
24.07879638671875
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support
Model tree for Viharikvs/wikicmbaV1
Base model
google-t5/t5-small