Context size

by aari1995 - opened Jul 20

aari1995

Jul 20

Hi I just came across your models and first of all would like to thank you for the great work! Especially with the tokenizer(s).

I am wondering, do you plan on releasing newer versions? I am especially looking for something with a bigger context window/ max position embeddings. 4k would be great :)!

aari1995

Jul 20

https://huggingface.co/Alibaba-NLP/new-impl

MariaFjodorowa

HPLT org Jul 21

Hi, thank you for the feedback.

The newer (3rd) versions of these models will be released in the fall. They will be trained on longer inputs, but the exact architecture is not decided yet.

aari1995

Sep 4

•

edited Sep 4

looking forward to it!

aari1995

Sep 10

Hi again, is the decision already done?
I honestly fine tuned quite some models and from a consumer hardware perspective, gte is the most friendly and the infrastructure is really efficient and the model learns really nice. :) I would hope to see this.

But really happy that you keep pushing boundaries!

MariaFjodorowa

HPLT org Sep 11

Hi Aaron,

Thank you for your interest in our work. No worries, our planned encoder-only models will use FlashAttention (like the model that you posted above).

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment