Model
This is a fine-tuned version of ProtGPT2 : a language model that speaks the protein language and can be used for de novo protein design and engineering.
Dataset
Protein data set retrieved from Research Collaboratory for Structural Bioinformatics (RCSB) Protein Data Bank (PDB).
Only the OXIDOREDUCTASE enzymes were used.
You can find the JSON formatted data @ oxidos.json
How to Use?
>>> from transformers import pipeline
>>> pipe = pipeline("text-generation", model="zypchn/ProtGPT2-Oxido")
>>> sequences = pipe("", max_length=100, do_sample=True, top_k=950, repetition_penalty=1.2, num_return_sequences=5, eos_token_id=0)
# input field has left blank for diversity
[
{"generated_text": "SNANQAPQPQTPTRATDAKKGSYGHPADRVGMEDNKYQVGVFYYDGPNPSYAEWNRDTQFWVETAKTAEKGKFDSIFFADTLGIYDSFKGSFEANLRHGAQFPVNDPLVAISAIAGATTKLGLVATASTTYSEPFHIARRFASLDHLSNGRAGWNIVTSYLDSAARNFGRTEQMEHDERYAIAEEYIDVVYKLWEGSWEDNAVIKDKETGLFTDPAKVHQINHEGEHFRVAGPLNIPRSPQGHPVIFQAGTSERGRDFAARHAEAVFTAQLDLEAGREFYEDIKSRAAKLGRDPDDVKILPGISVFVGKTREEAERKFRELQSLIDEEGALTRFSSYTGTDLSTYDPDGPLPELAGIDPTTPIAKLEGLLGKSKMTVREIALKQGGVSLREYQPFVGATAGSALVGGTPEQIADFMQDWFIEGTVDGFNIMPPYLPDGLEDFVDHVVPELQRRGLFRTEYEGTTLREHLGLAKPLEHHHHHH"},
{"generated_text": "MGSSHHHHHHSSGLVPRGSHMASMTGGQQMGRGSMGPCLICRSTSLKCVFCVRDPNGYKKCSKCDAFFCSRECQTEHWQRHHKFECPAAVAQPQIPPLPKPQQKQLTAAELGMFMEVRNQFALLKTNLERLDYEIFILERNVKLANTVTPPTNRTYFQSTMRYAPNPLRPNMTDAMRQQYLDKNKSSAALEHDLKELIKFKCYLLNDEYVEKEREENPFIWEYFLNKEWRKRNVWGNK\n"},
{"generated_text": "MGHHHHHHSSGLVPRGSHMTVEQAKKLRAEAEAQAQIQDKAKAIAQTHGKVEVMVDGKHRVVDLDATTRRQLTDGELQAIVVAAQEAAAKQLKAQRQALLEQHQDAELRKLALEGEIV\nAVITGAAQGIGRAIALRLAKDGFRVAVADIDLAAAEAVAAEIEAKGGKALVIEGDVSREEDVKRLVRKAIDQFGRLDYAVNNAGIQGPLAPTEELPLALWNKVIDVNLTGVFLCMKYEIAQMVKQGRGGAIVNTASVAGLSGQPGMVAYCASKHGVVGLTKTVAIEYAKHGIRINAVAPGFIDTPMVQKLPEEKRARIAAAIPMRRLGQPDEIAAVVAFLLSDDASFITGQCIAVDGGFTAGLLA"},
{"generated_text": "MAASKAADSLAEGAAKLEHHHHHH"},
{"generated_text": "GSKPQPGVQVEGAKCQVLQAVYDFTVQSASELSFKAGDVICVTGQYDPTLGWWLAEERRTGKSGLVPENYVELLSTGPAQHHHHHH"}
]
- Downloads last month
- 22
Model tree for zypchn/ProtGPT2-Oxido
Base model
nferruz/ProtGPT2