Context size

#3
by aari1995 - opened

Hi I just came across your models and first of all would like to thank you for the great work! Especially with the tokenizer(s).

I am wondering, do you plan on releasing newer versions? I am especially looking for something with a bigger context window/ max position embeddings. 4k would be great :)!

Hi, thank you for the feedback.

The newer (3rd) versions of these models will be released in the fall. They will be trained on longer inputs, but the exact architecture is not decided yet.

looking forward to it!

Hi again, is the decision already done?
I honestly fine tuned quite some models and from a consumer hardware perspective, gte is the most friendly and the infrastructure is really efficient and the model learns really nice. :) I would hope to see this.

But really happy that you keep pushing boundaries!

Hi Aaron,

Thank you for your interest in our work. No worries, our planned encoder-only models will use FlashAttention (like the model that you posted above).

Sign up or log in to comment