BookSum: A Collection of Datasets for Long-form Narrative Summarization
Paper
•
2105.08209
•
Published
•
2
this is the "latest" version of the model that has been trained the longest, currently at 70k steps
google/bigbird-pegasus-large-bigpatentAn extended example, including a demo of batch summarization, is here.
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
from transformers import pipeline
model = AutoModelForSeq2SeqLM.from_pretrained(
"pszemraj/bigbird-pegasus-large-K-booksum",
low_cpu_mem_usage=True,
)
tokenizer = AutoTokenizer.from_pretrained(
"pszemraj/bigbird-pegasus-large-K-booksum",
)
summarizer = pipeline(
"summarization",
model=model,
tokenizer=tokenizer,
)
wall_of_text = "your text to be summarized goes here."
result = summarizer(
wall_of_text,
min_length=16,
max_length=256,
no_repeat_ngram_size=3,
clean_up_tokenization_spaces=True,
)
print(result[0]["summary_text"])