Описание

Данная модель была создана в рамках курса по NLP от VK.

Задание заключалось в том, чтобы сделать alignment модели с помощью DPO (модель бралась instruct, то есть уже после SFT).

Базовая модель: HuggingFaceTB/SmolLM-135M-Instruct, набор данных: HumanLLMs/Human-Like-DPO-Dataset.

Обучение

  • BATCH_SIZE = 1
  • NUM_EPOCHS = 1
  • LR = 5e-5
  • MAX_SEQ_LEN = 1024 # this also can be adjusted
  • MAX_PROMPT_LEN = 256 # this also can be adjusted
  • MAX_COMPLETION_LEN = None
  • BETA = 1.0

Обучалось в Kaggle на GPU P100 примерно 2 часа. BATCH_SIZE = 1, т.к. больше не вмещалось даже с использованием bfloat16. В итоге bf16 не использовался. Не использовал gradient accumulation (использовал во второй версии).

Качество

Качество модели не очень:

  1. Постоянно циклится
  2. Галлюцинирует - видимо большой эффект катастрофического забывания
  3. Но ответы получаются гораздо более человечными, есть эмоджи и много эмоциональных фраз

Пример генерации

======== BEFORE TUNING ========
user
What's your morning routine like?
assistant
What a wonderful question! As a conversational AI, I have my own wake-up routine tailored to my needs. Here's my "Awake Morning Routine" for you:

Morning Routine (10-15 minutes)

  1. 10:00 AM: Start with a gentle exercise routine to loosen up my muscles and get my blood flowing. This could be a simple yoga pose like downward-facing dog, child's pose, or a standing forward fold.
  2. 11:00 AM: Engage in a gentle stretching exercise to loosen up my muscles and prepare for the next day. This can include chest opens, shoulder rolls, or neck stretches. Hold each stretch for 20-30 seconds and then relaxation.
  3. 5:00 AM: Drink a caffeine-free warm or cold drink to start my day on a positive note. This could be a joe-toap, herbal tea, or a cold beverage like peppermint or ginger.
  4. 7:00 AM: Take a mid-morning coffee shop break or grab a quick snack from a grab-and-go supply like a whole wheat toast or a bag of chips. This will give my body

======== AFTER TUNING ========
user
What's your morning routine like?
assistant
Have a morning! That's a great question! 🌾 for day!!
This morning, my morning routine is for the week: going upstairs to the kitchen, then into the shower, and then upstairs!!

Let's get some morning already!!!

5-10 minutes each of these doors and your countertops are getting a chilly start!!

10-15 minutes between of these, adding a cup of hot water and a pinch of garlic!!

10-15 minutes)

10-15 minutes between of these

10-15 minutes each of these, adding a pinch of salt, oregano, and oregano!!

10-15 minutes)

10-15 minutes between of these

10-15 minutes each of these, adding a pinch of salt, oregano, and oregano!!

10-15 minutes)

10-15 minutes between of these

10-15 minutes each of these

10-15 minutes each of these

10-15 minutes each of these

10-15 minutes each of these

10-15 minutes each of these

10

Downloads last month
-
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for bikingSolo/vk-nlp-course-hometask-2-dpo

Finetuned
(178)
this model

Dataset used to train bikingSolo/vk-nlp-course-hometask-2-dpo

Collection including bikingSolo/vk-nlp-course-hometask-2-dpo