File size: 5,070 Bytes
ab12a12
 
 
6aea83e
 
 
5ba7106
 
 
ab12a12
 
 
e7e3979
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ab12a12
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
e7e3979
 
 
 
 
 
 
6aea83e
ab12a12
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
---
library_name: transformers.js
base_model: deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
license: apache-2.0
datasets:
- Kukedlc/dpo-orpo-spanish-15k
language:
- en
- es
---


[<img src="https://cdn-avatars.huggingface.co/v1/production/uploads/67b2f4e49edebc815a3a4739/R1g957j1aBbx8lhZbWmxw.jpeg" width="200"/>](https://huggingface.co/fjmgAI)

## Fine-Tuned Model

**`fjmgAI/b1-R1-1.5B-ONNX`**

## Base Model
**`deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B`**

## Fine-Tuning Method
Fine-tuning was performed using **[`unsloth`](https://github.com/unslothai/unsloth)**, an efficient fine-tuning framework optimized for low-resource environments and Huggingface's TRL library.
Using ONNx runtime to transform the resulting model weights and make it compatible with Transformers.js.

## Dataset
**[`Kukedlc/dpo-orpo-spanish-15k`](https://huggingface.co/datasets/Kukedlc/dpo-orpo-spanish-15k)**

### Description
A Spanish-language dataset containing **15,000 examples**, designed for **Direct Preference Optimization (DPO)** or **Outcome-Regularized Preference Optimization (ORPO).**

### Adaptation
The dataset was adapted to a reasoning-based format for GPRO, enhancing its ability to guide preference-based decision-making during fine-tuning. This adaptation ensures better alignment with instruction-following tasks in Spanish.

## Fine-Tuning Details
- The model was trained using the **GPRO algorithm**, leveraging structured preference data to refine its response generation.
- The focus was on retaining the model's **instructional abilities** while improving its **understanding and generation** of Spanish text.


## Usage (Transformers.js)

If you haven't already, you can install the [Transformers.js](https://huggingface.co/docs/transformers.js) JavaScript library from [NPM](https://www.npmjs.com/package/@huggingface/transformers) using:
```bash
npm i @huggingface/transformers
```


**Example:** Text-generation w/ `fjmgAI/b1-R1-1.5B-ONNX`

```js
import { pipeline, TextStreamer } from "@huggingface/transformers";

// Create a text generation pipeline
const generator = await pipeline(
  "text-generation",
  "fjmgAI/b1-R1-1.5B-ONNX",
  { dtype: "q4f16" },
);

// Define the list of messages
const messages = [
  { role: "user", content:  "Resuelve esta ecuación: x^2 - 3x + 2 = 0" },
];

// Create text streamer
const streamer = new TextStreamer(generator.tokenizer, {
  skip_prompt: true,
  // callback_function: (text) => { }, // Optional callback function
})

// Generate a response
const output = await generator(messages, { max_new_tokens: 512, do_sample: false, streamer });
console.log(output[0].generated_text.at(-1).content);
```

<details>
<summary>See example output</summary>

```
<think>
To solve the quadratic equation \( x^2 - 3x + 2 = 0 \), I'll start by factoring the left-hand side. I need to find two numbers that multiply to 2 and add up to -3. These numbers are -1 and -2.

Next, I'll rewrite the equation as \( (x - 1)(x - 2) = 0 \). 

Using the zero product property, I'll set each factor equal to zero:
1. \( x - 1 = 0 \) leads to \( x = 1 \).
2. \( x - 2 = 0 \) leads to \( x = 2 \).

Therefore, the solutions to the equation are \( x = 1 \) and \( x = 2 \).
</think>

To solve the quadratic equation:

\[
x^2 - 3x + 2 = 0
\]

**Step 1: Factor the Quadratic**

We look for two numbers that multiply to \( +2 \) and add up to \( -3 \). These numbers are \( -1 \) and \( -2 \).

\[
x^2 - 3x + 2 = (x - 1)(x - 2) = 0
\]

**Step 2: Apply the Zero Product Property**

If the product of two factors is zero, at least one of the factors must be zero.

\[
x - 1 = 0 \quad \text{or} \quad x - 2 = 0
\]

**Step 3: Solve for \( x \)**

\[
x = 1 \quad \text{or} \quad x = 2
\]

**Final Answer:**

\[
\boxed{1 \text{ and } 2}
\]
```
  
</details>

---

## Purpose
This fine-tuned model is intended for **Spanish-language applications**  that require efficient AI that follows instructions using a **lightweight reasoning process.**

- **Developed by:** fjmgAI
- **License:** apache-2.0

[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)  [<img src="https://camo.githubusercontent.com/9585eb3e70c8138cbc0f73de7e970be4c668e957e45d16fc3ee6687fcc1da905/68747470733a2f2f68756767696e67666163652e636f2f64617461736574732f74726c2d6c69622f646f63756d656e746174696f6e2d696d616765732f7265736f6c76652f6d61696e2f74726c5f62616e6e65725f6461726b2e706e67" width="200"/>](https://github.com/huggingface/trl?tab=readme-ov-file)
[<img src="https://github.com/microsoft/onnxruntime/blob/main/docs/images/ONNX_Runtime_logo_dark.png?raw=true" width="200"/>](https://github.com/microsoft/onnxruntime)
Note: Having a separate repo for ONNX weights is intended to be a temporary solution until WebML gains more traction. If you would like to make your models web-ready, we recommend converting to ONNX using [🤗 Optimum](https://huggingface.co/docs/optimum/index) and structuring your repo like this one (with ONNX weights located in a subfolder named `onnx`).