--- license: cc-by-nc-nd-4.0 datasets: - Bapynshngain/English-Khasi-Parallel-Corpus language: - en - kha metrics: - bleu base_model: - Helsinki-NLP/opus-mt-en-vi library_name: transformers tags: - Khasi - Translation - Khasi-Translation - Khasi Translation - English-to-Khasi pipeline_tag: translation --- What it is: - English-to-Khasi Translation Model More about this model: - This model is a fine-tuned version of my previous model: Bapynshngain/MarianMT-en-kha - The training was conducted on my own curated dataset. The dataset comprises of approximately 40,000 high quality parallel pairs. - Almost half of it was manually translated and vetted by me. - The rest of the dataset was obtained from NIT Silchar, and Tatoeba project. - I would also like to acknowledge Ahlad from IIIT Guwahati for helping me in curating the dataset. usage: from transformers import AutoTokenizer, AutoModelForSeq2SeqLM import torch tokenizer = AutoTokenizer.from_pretrained("Bapynshngain/BarHeli-en-kha") model = AutoModelForSeq2SeqLM.from_pretrained("Bapynshngain/BarHeli-en-kha") def translate_to_khasi(text): inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True, max_length=512) with torch.no_grad(): translated = model.generate(**inputs, num_beams=4, max_length=512) translated_text = tokenizer.decode(translated[0], skip_special_tokens=True) return translated_text if __name__ == "__main__": while True: english_sentence = input("Enter an English sentence (or type 'q' to quit): ") if english_sentence.lower() == 'q': break khasi_translation = translate_to_khasi(english_sentence) print(f"Khasi Translation: {khasi_translation}")