TempVerseFormer - Pre-trained Models

Hugging Face Hub GitHub Code Shape Dataset Toolbox WandB Logs

This repository hosts pre-trained models for TempVerseFormer: Temporal Modeling with Reversible Transformers, a novel architecture introduced in the research article "Temporal Modeling with Reversible Transformers".

These models are designed for memory-efficient temporal sequence prediction, particularly for tasks involving continuous and evolving data streams. They are trained on a synthetic dataset of rotating 2D shapes, designed to evaluate temporal modeling capabilities in a controlled environment.

Models Included

This repository contains pre-trained weights for the following models, as described in the research article:

  • TempFormer (Vanilla-Transformer): A standard Vanilla Transformer architecture with temporal chaining, serving as a baseline to compare against TempVerseFormer.
  • TempVerseFormer (Rev-Transformer): The core Reversible Temporal Transformer architecture, leveraging reversible blocks and time-agnostic backpropagation for memory efficiency.
  • Standard Transformer (Pipe-Transformer): A standard Transformer model that predicts only one next element at once.
  • LSTM: A Long Short-Term Memory network, representing a traditional recurrent sequence modeling approach.
  • VAE Models: Variational Autoencoder (VAE) models used for encoding and decoding images to and from a latent space:
    • Vanilla VAE: Standard VAE architecture.

Each model checkpoint is provided as a .pt file containing the state_dict of the trained model.

  • For all of the models checkpoints available for different training configurations (e.g., with/without temporal patterns).*

Intended Use

These pre-trained models are intended for:

  • Research: Facilitating further research in memory-efficient temporal modeling, reversible architectures, and time-agnostic backpropagation.
  • Benchmarking: Providing baselines for comparison with new temporal sequence modeling architectures.
  • Fine-tuning: Serving as a starting point for fine-tuning on new datasets or for related temporal prediction tasks.
  • Demonstration: Illustrating the capabilities of TempVerseFormer and its memory efficiency advantages.

Please note: These models were primarily trained and evaluated on a synthetic dataset of rotating shapes. While they demonstrate promising results in this controlled environment, their performance on real-world datasets may vary and require further evaluation and fine-tuning.

How to Use

  • Configuration: Ensure you use the correct model configuration (e.g., config_rev_transformer, config_vae) that corresponds to the pre-trained checkpoint you are loading. You can find example configurations in the configs/train directory of the GitHub repository.
  • Data Preprocessing: Input data should be preprocessed in the same way as the training data. Refer to the ShapeDataset class in the GitHub repository for details on data loading and preprocessing.
  • Device: Load models and data onto the appropriate device ('cpu' or 'cuda').
  • Evaluation Mode: Remember to set models to .eval() mode for inference.

For more detailed usage examples and specific code for different models and tasks, please refer to the GitHub repository and the train.py, eval.py, and memory_test.py scripts.

Dataset

The models were trained on a synthetic dataset of rotating 2D shapes generated using the Simple Shape Dataset Toolbox. This toolbox allows for procedural generation of customizable shape datasets.

License

These pre-trained models are released under the [MIT] license.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support