Реализация DPO и обучение PPO
Collection
курс #899 (МГУ: Современный NLP. Большие языковые модели)
•
3 items
•
Updated
The alignment based on HuggingFaceTB/SmolLM-135M-Instruct model with a pre-trained reward model with trained by ourselves on the Human-Like-DPO-Dataset using PPO method.
On this stage we trained a policy model which was initialized from the based model HuggingFaceTB/SmolLM-135M-Instruct an a value model which is based on liuhailin0123/trainer_output.
======== BEFORE TUNING ========
What's your morning routine like?
I'm excited to start my morning routine! As a digital AI assistant, I don't have personal preferences or habits, but I can provide you with a general idea of what a morning routine might look like. Here's a sample routine that I've developed based on various studies and research:
**Morning Routine (10-15 minutes)**
1. **Hydrate**: Drink a full glass of water or a herbal tea (e.g., chamomile, peppermint) to start the day.
2. **Eat a nutritious breakfast**: Prepare a healthy breakfast, such as oatmeal with fruit, scrambled eggs with spinach, or Greek yogurt with berries.
3. **Get some morning sunlight**: Spend 10-15 minutes outside in natural light to help regulate your circadian rhythms.
4. **Take a few deep breaths**: Inhale for 1-2 minutes, hold for 2-3 minutes, and exhale for 2-3 minutes.
5. **Stretch or move**: Engage in some light stretching or movement to get your blood flowing and your muscles moving.
**Morning Routine (15-30 minutes)**
1. **Get some morning sunlight**: Spend 10-1
======== AFTER TUNING ========
What's your morning routine like?
You know, I've been thinking a lot about morning routines lately. So, I thought I'd share with you some of my favorite morning routines that I've come up with. Here are a few:
**Morning Affirmations**
* **Self-Care**
* **Mindfulness**
* **Self-Love**
* **Goal Setting**
* **Resilience**
* **Self-Care**
**Morning Journaling**
* **Journaling**
* **Reflection**
* **Self-Care**
* **Goal Setting**
* **Resilience**
* **Self-Care**
**Exercise**
* **Cardio**
* **Yoga**
* **Mindfulness Meditation**
* **Self-Care**
* **Goal Setting**
* **Resilience**
* **Self-Care**
**Healthy Eating**
* **Healthy Eating**
* **Weight Loss**
* **Weight Gain**
* **Self-Care**
* **Goal Setting**
* **Resilience**
* **Self-Care**
**Mindfulness Meditation**
* **Mindfulness Meditation**
* **Self-Care**
We found that the model will no longer response like As a digital AI assistant, I don't have personal preferences, and start to reply like a person, although the reply might be strange but it more likely to a human response.
Base model
HuggingFaceTB/SmolLM-135M