Jjzzzz
/

distilgpt2-finetuned-stories

Text Generation

Generated from Trainer

text-generation-inference

Model card Files Files and versions

Metrics Training metrics Community

distilgpt2-finetuned-stories / README.md

Jjzzzz's picture

Update README.md

222515f verified almost 2 years ago

|

history blame contribute delete

2.17 kB

	---
	license: apache-2.0
	base_model: distilgpt2
	tags:
	- generated_from_trainer
	model-index:
	- name: distilgpt2-finetuned-stories
	results: []
	language:
	- en
	metrics:
	- perplexity
	pipeline_tag: text-generation
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# distilgpt2-finetuned-stories

	This model is a fine-tuned version of [distilgpt2](https://huggingface.co/distilgpt2) on the [demelin/understanding_fables](https://huggingface.co/datasets/demelin/understanding_fables) dataset.
	It achieves the following results on the evaluation set:
	- Loss: 3.3089

	## Autoregressive and Prefix Language Modelling

	Language Modelling, especially text generation works on the principle of generating the next token based on its previous antecedents.

	This is what Autoregressive modelling are based on, it predicts the next token i.e. word here on the basis of token preceding it. Here, we take P(wi\|wi-1), where wi is next word and wi-1 is token preceeding it, and P is the probbaility pf generating wi wrt wi-1

	But for Prefix Language modelling, we consider input into function and consider it in generation of our next word, i.e. the input is used as a context for generation of next tokens, calculating the conditional probability of next work wrt context. P(w\|x), where w is next token and x is context and P is probability of getting w wrt x context.


	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 2e-05
	- train_batch_size: 8
	- eval_batch_size: 8
	- seed: 42
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- num_epochs: 3.0

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|
	\| No log \| 1.0 \| 20 \| 3.4065 \|
	\| No log \| 2.0 \| 40 \| 3.3288 \|
	\| No log \| 3.0 \| 60 \| 3.3089 \|


	### Framework versions

	- Transformers 4.36.2
	- Pytorch 2.1.0+cu121
	- Datasets 2.16.1
	- Tokenizers 0.15.0