GPT2-large-Finetuned-WikiText103

Model Description

A fine-tuned version of GPT2-large on the WikiText-103 dataset.

Performance on WikiText-103

Model Perplexity Improvement
GPT2-large (baseline) 15.80 -
GPT2-large-Finetuned 10.42 -5.38

Training Details

  • Training Data: WikiText-103 (103M tokens)
  • Optimizer: AdamW
  • Learning Rate: 2e-5 with cosine schedule

Citation

This model was released as part of the paper "MLP Memory: A Retriever-Pretrained Memory for Large Language Models".

For more information, see: https://github.com/Binn0/MLPMemory.

If you use this model, please cite:

@inproceedings{Wei2025MLPMA,
  title={MLP Memory: A Retriever-Pretrained Memory for Large Language Models},
  author={Rubin Wei and Jiaqi Cao and Jiarui Wang and Jushi Kai and Qipeng Guo and Bowen Zhou and Zhouhan Lin},
  year={2025},
  url={https://api.semanticscholar.org/CorpusID:281658735}
}
Downloads last month
4
Safetensors
Model size
0.8B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Rubin-Wei/gpt2-large-finetuned-wikitext103

Finetuned
(67)
this model

Collection including Rubin-Wei/gpt2-large-finetuned-wikitext103