wikicmbaV1

wikicmbaV1 is an experimental text generation model based on the. It was trained from scratch on the WikiText-103 dataset, a large-scale language modeling benchmark derived from high-quality Wikipedia articles.

The model utilizes the HRM structure, consisting of a "Specialist" module for low-level processing and a "Manager" module for high-level abstraction and planning. This architecture aims to handle long-range dependencies more effectively by summarizing information at different temporal scales.

Model Description

  • Architecture: Hierarchical Recurrent Memory (HRM)
  • Training Data: WikiText-103
  • Original Paper: Hierarchical Reasoning Model
  • Tokenizer: t5-small (slow T5 SentencePiece)
  • Vocab Size: 32100
  • Objective: Causal Language Modeling

Latest Performance (Epoch 45)

  • Validation Loss: 3.1813
  • Validation Perplexity: 24.07879638671875
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Viharikvs/wikicmbaV1

Base model

google-t5/t5-small
Finetuned
(2195)
this model