Model

This is a fine-tuned version of ProtGPT2 : a language model that speaks the protein language and can be used for de novo protein design and engineering.


Dataset

Protein data set retrieved from Research Collaboratory for Structural Bioinformatics (RCSB) Protein Data Bank (PDB).
Only the OXIDOREDUCTASE enzymes were used.
You can find the JSON formatted data @ oxidos.json


How to Use?

>>> from transformers import pipeline
>>> pipe = pipeline("text-generation", model="zypchn/ProtGPT2-Oxido")
>>> sequences = pipe("", max_length=100, do_sample=True, top_k=950, repetition_penalty=1.2, num_return_sequences=5, eos_token_id=0)
# input field has left blank for diversity
[
  {"generated_text": "SNANQAPQPQTPTRATDAKKGSYGHPADRVGMEDNKYQVGVFYYDGPNPSYAEWNRDTQFWVETAKTAEKGKFDSIFFADTLGIYDSFKGSFEANLRHGAQFPVNDPLVAISAIAGATTKLGLVATASTTYSEPFHIARRFASLDHLSNGRAGWNIVTSYLDSAARNFGRTEQMEHDERYAIAEEYIDVVYKLWEGSWEDNAVIKDKETGLFTDPAKVHQINHEGEHFRVAGPLNIPRSPQGHPVIFQAGTSERGRDFAARHAEAVFTAQLDLEAGREFYEDIKSRAAKLGRDPDDVKILPGISVFVGKTREEAERKFRELQSLIDEEGALTRFSSYTGTDLSTYDPDGPLPELAGIDPTTPIAKLEGLLGKSKMTVREIALKQGGVSLREYQPFVGATAGSALVGGTPEQIADFMQDWFIEGTVDGFNIMPPYLPDGLEDFVDHVVPELQRRGLFRTEYEGTTLREHLGLAKPLEHHHHHH"},
  {"generated_text": "MGSSHHHHHHSSGLVPRGSHMASMTGGQQMGRGSMGPCLICRSTSLKCVFCVRDPNGYKKCSKCDAFFCSRECQTEHWQRHHKFECPAAVAQPQIPPLPKPQQKQLTAAELGMFMEVRNQFALLKTNLERLDYEIFILERNVKLANTVTPPTNRTYFQSTMRYAPNPLRPNMTDAMRQQYLDKNKSSAALEHDLKELIKFKCYLLNDEYVEKEREENPFIWEYFLNKEWRKRNVWGNK\n"},
  {"generated_text": "MGHHHHHHSSGLVPRGSHMTVEQAKKLRAEAEAQAQIQDKAKAIAQTHGKVEVMVDGKHRVVDLDATTRRQLTDGELQAIVVAAQEAAAKQLKAQRQALLEQHQDAELRKLALEGEIV\nAVITGAAQGIGRAIALRLAKDGFRVAVADIDLAAAEAVAAEIEAKGGKALVIEGDVSREEDVKRLVRKAIDQFGRLDYAVNNAGIQGPLAPTEELPLALWNKVIDVNLTGVFLCMKYEIAQMVKQGRGGAIVNTASVAGLSGQPGMVAYCASKHGVVGLTKTVAIEYAKHGIRINAVAPGFIDTPMVQKLPEEKRARIAAAIPMRRLGQPDEIAAVVAFLLSDDASFITGQCIAVDGGFTAGLLA"},
  {"generated_text": "MAASKAADSLAEGAAKLEHHHHHH"},
  {"generated_text": "GSKPQPGVQVEGAKCQVLQAVYDFTVQSASELSFKAGDVICVTGQYDPTLGWWLAEERRTGKSGLVPENYVELLSTGPAQHHHHHH"}
]
Downloads last month
22
Safetensors
Model size
0.8B params
Tensor type
F16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for zypchn/ProtGPT2-Oxido

Base model

nferruz/ProtGPT2
Adapter
(7)
this model