--- library_name: transformers license: apache-2.0 datasets: - HumanLLMs/Human-Like-DPO-Dataset language: - en base_model: - HuggingFaceTB/SmolLM-135M-Instruct pipeline_tag: text-generation --- # Описание Данная модель была создана в рамках курса по [NLP от VK](https://education.vk.company/program/kurs-nlp-yazykovye-modeli-ml). Задание заключалось в том, чтобы сделать alignment модели с помощью DPO (модель бралась instruct, то есть уже после SFT). Базовая модель: HuggingFaceTB/SmolLM-135M-Instruct, набор данных: HumanLLMs/Human-Like-DPO-Dataset. # Обучение * BATCH_SIZE = 1 * NUM_EPOCHS = 1 * LR = 5e-5 * MAX_SEQ_LEN = 1024 # this also can be adjusted * MAX_PROMPT_LEN = 256 # this also can be adjusted * MAX_COMPLETION_LEN = None * BETA = 1.0 Обучалось в Kaggle на GPU P100 примерно 2 часа. BATCH_SIZE = 1, т.к. больше не вмещалось даже с использованием bfloat16. В итоге bf16 не использовался. Не использовал gradient accumulation (использовал во второй версии). # Качество Качество модели не очень: 1. Постоянно циклится 2. Галлюцинирует - видимо большой эффект катастрофического забывания 3. Но ответы получаются гораздо более человечными, есть эмоджи и много эмоциональных фраз ## Пример генерации ======== BEFORE TUNING ======== user What's your morning routine like? assistant What a wonderful question! As a conversational AI, I have my own wake-up routine tailored to my needs. Here's my "Awake Morning Routine" for you: **Morning Routine (10-15 minutes)** 1. **10:00 AM**: Start with a gentle exercise routine to loosen up my muscles and get my blood flowing. This could be a simple yoga pose like downward-facing dog, child's pose, or a standing forward fold. 2. **11:00 AM**: Engage in a gentle stretching exercise to loosen up my muscles and prepare for the next day. This can include chest opens, shoulder rolls, or neck stretches. Hold each stretch for 20-30 seconds and then relaxation. 3. **5:00 AM**: Drink a caffeine-free warm or cold drink to start my day on a positive note. This could be a joe-toap, herbal tea, or a cold beverage like peppermint or ginger. 4. **7:00 AM**: Take a mid-morning coffee shop break or grab a quick snack from a grab-and-go supply like a whole wheat toast or a bag of chips. This will give my body ======== AFTER TUNING ======== user What's your morning routine like? assistant Have a morning! That's a great question! 🌾 for day!! This morning, my morning routine is for the week: going upstairs to the kitchen, then into the shower, and then upstairs!! Let's get some morning already!!! 5-10 minutes each of these doors and your countertops are getting a chilly start!! 10-15 minutes between of these, adding a cup of hot water and a pinch of garlic!! 10-15 minutes) 10-15 minutes between of these 10-15 minutes each of these, adding a pinch of salt, oregano, and oregano!! 10-15 minutes) 10-15 minutes between of these 10-15 minutes each of these, adding a pinch of salt, oregano, and oregano!! 10-15 minutes) 10-15 minutes between of these 10-15 minutes each of these 10-15 minutes each of these 10-15 minutes each of these 10-15 minutes each of these 10-15 minutes each of these 10