--- library_name: transformers tags: - biology - Proteins license: mit datasets: - MichelNivard/UniRef90-GPCR-Proteins --- # Model Card for Model ID This is a protein language model trained on the "G-protein coupled receptor" cluster of UniRef90. This is a very crude selection of GPCR proteins, selected to train a small model purely for training perpuses. By focusing on 80k GCPR sequences (which are relatively similar) we are able to train a small model on a MacBook Air, yet still run some follow up experiments within this particular protein domain. ## Model Details ### Model Description This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated. - **Developed by:** Michel Nivard - **Model type:** 30m parameter protein language model with a ModernBERT architecture - **Language(s) (NLP):** Protein sequences ## Uses ### Direct Use [More Information Needed] ### Downstream Use [optional] [More Information Needed] ### Out-of-Scope Use [More Information Needed] ## Bias, Risks, and Limitations [More Information Needed] ### Recommendations Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. ## How to Get Started with the Model Use the code below to get started with the model. [More Information Needed] ## Training Details ### Training Data 80k GPCR protein sequences ### Training Procedure MLM with 15% of amino-acids masked