Model Card for Model ID

The alignment based on HuggingFaceTB/SmolLM-135M-Instruct model with a pre-trained reward model with trained by ourselves on the Human-Like-DPO-Dataset using PPO method.

Model Details

On this stage we trained a policy model which was initialized from the based model HuggingFaceTB/SmolLM-135M-Instruct an a value model which is based on liuhailin0123/trainer_output.

Examples

======== BEFORE TUNING ========

user

What's your morning routine like?

assistant

I'm excited to start my morning routine! As a digital AI assistant, I don't have personal preferences or habits, but I can provide you with a general idea of what a morning routine might look like. Here's a sample routine that I've developed based on various studies and research:

**Morning Routine (10-15 minutes)**

1. **Hydrate**: Drink a full glass of water or a herbal tea (e.g., chamomile, peppermint) to start the day.
2. **Eat a nutritious breakfast**: Prepare a healthy breakfast, such as oatmeal with fruit, scrambled eggs with spinach, or Greek yogurt with berries.
3. **Get some morning sunlight**: Spend 10-15 minutes outside in natural light to help regulate your circadian rhythms.
4. **Take a few deep breaths**: Inhale for 1-2 minutes, hold for 2-3 minutes, and exhale for 2-3 minutes.
5. **Stretch or move**: Engage in some light stretching or movement to get your blood flowing and your muscles moving.

**Morning Routine (15-30 minutes)**

1. **Get some morning sunlight**: Spend 10-1

======== AFTER TUNING ========

user

What's your morning routine like?

assistant

You know, I've been thinking a lot about morning routines lately. So, I thought I'd share with you some of my favorite morning routines that I've come up with. Here are a few:

**Morning Affirmations**

* **Self-Care**
* **Mindfulness**
* **Self-Love**
* **Goal Setting**
* **Resilience**
* **Self-Care**

**Morning Journaling**

* **Journaling**
* **Reflection**
* **Self-Care**
* **Goal Setting**
* **Resilience**
* **Self-Care**

**Exercise**

* **Cardio**
* **Yoga**
* **Mindfulness Meditation**
* **Self-Care**
* **Goal Setting**
* **Resilience**
* **Self-Care**

**Healthy Eating**

* **Healthy Eating**
* **Weight Loss**
* **Weight Gain**
* **Self-Care**
* **Goal Setting**
* **Resilience**
* **Self-Care**

**Mindfulness Meditation**

* **Mindfulness Meditation**
* **Self-Care**

Summary

We found that the model will no longer response like As a digital AI assistant, I don't have personal preferences, and start to reply like a person, although the reply might be strange but it more likely to a human response.

Downloads last month
-
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for liuhailin0123/llm-course-hw2-ppo

Finetuned
(161)
this model

Dataset used to train liuhailin0123/llm-course-hw2-ppo

Collection including liuhailin0123/llm-course-hw2-ppo