--- license: apache-2.0 datasets: - gretelai/synthetic_text_to_sql language: - en base_model: - mistralai/Mistral-7B-v0.1 pipeline_tag: text2text-generation tags: - text-to-sql --- # Model Card for Fine-Tuned Mistral 7B for Text-to-SQL Generation ## Model Details - **Base Model**: mistralai/Mistral-7B-v0.1 - **Library Name**: peft ## Model Description This model is a fine-tuned version of **Mistral-7b**, fine-tuned specifically for generating SQL queries from natural language descriptions in the **forestry** domain. It is capable of transforming user queries into SQL commands by using a pre-trained large language model and synthetic text-to-SQL dataset. **Developed by**: Srishti Rai **Model Type**: Fine-tuned language model **Language(s)**: English **Finetuned from model**: mistralai/Mistral-7B-v0.1 **Model Sources**: Fine-tuned on a synthetic text-to-SQL dataset for the forestry domain ## Uses ### Direct Use This model can be used to generate SQL queries for database interactions from natural language descriptions. It is particularly fine-tuned for queries related to forestry and environmental data, including timber production, wildlife habitat, and carbon sequestration. ### Downstream Use (optional) This model can also be used in downstream applications where SQL query generation is required, such as: - Reporting tools that require SQL query generation from user inputs - Natural language interfaces for database management ### Out-of-Scope Use The model is not designed for: - Tasks outside of SQL query generation, particularly those that require deeper contextual understanding - Use cases with sensitive or highly regulated data (manual validation of queries is recommended) ## Bias, Risks, and Limitations This model may exhibit bias due to the nature of the synthetic data it was trained on. Users should be aware that the model might generate incomplete or incorrect SQL queries. Additionally, the model may struggle with queries that deviate from the patterns seen during training. ## Recommendations Users should ensure that generated queries are manually reviewed, especially in critical or sensitive environments, as the model might not always generate accurate SQL statements. ## How to Get Started with the Model To get started with the fine-tuned model, use the following code: ```python from transformers import AutoModelForCausalLM, AutoTokenizer model_name = "path_to_your_model_on_kaggle" # Load model and tokenizer model = AutoModelForCausalLM.from_pretrained(model_name) tokenizer = AutoTokenizer.from_pretrained(model_name) # Generate SQL query input_text = "Your input question here" inputs = tokenizer(input_text, return_tensors="pt") # Generate response outputs = model.generate( input_ids=inputs["input_ids"], attention_mask=inputs["attention_mask"], max_new_tokens=256, temperature=0.1, do_sample=False, pad_token_id=tokenizer.eos_token_id ) generated_sql = tokenizer.decode(outputs[0], skip_special_tokens=True) print(generated_sql)