manmah commited on
Commit
1917064
·
verified ·
1 Parent(s): 7fa3ffe

Add new SentenceTransformer model

Browse files
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 1024,
3
+ "pooling_mode_cls_token": true,
4
+ "pooling_mode_mean_tokens": false,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md ADDED
@@ -0,0 +1,727 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - sentence-transformers
4
+ - sentence-similarity
5
+ - feature-extraction
6
+ - generated_from_trainer
7
+ - dataset_size:156
8
+ - loss:MatryoshkaLoss
9
+ - loss:MultipleNegativesRankingLoss
10
+ base_model: Snowflake/snowflake-arctic-embed-l
11
+ widget:
12
+ - source_sentence: What challenge related to prompt injection has seen little progress
13
+ since September 2022?
14
+ sentences:
15
+ - 'Except... you can run generated code to see if it’s correct. And with patterns
16
+ like ChatGPT Code Interpreter the LLM can execute the code itself, process the
17
+ error message, then rewrite it and keep trying until it works!
18
+
19
+ So hallucination is a much lesser problem for code generation than for anything
20
+ else. If only we had the equivalent of Code Interpreter for fact-checking natural
21
+ language!
22
+
23
+ How should we feel about this as software engineers?
24
+
25
+ On the one hand, this feels like a threat: who needs a programmer if ChatGPT can
26
+ write code for you?'
27
+ - 'Prompt injection is a natural consequence of this gulibility. I’ve seen precious
28
+ little progress on tackling that problem in 2024, and we’ve been talking about
29
+ it since September 2022.
30
+
31
+ I’m beginning to see the most popular idea of “agents” as dependent on AGI itself.
32
+ A model that’s robust against gulliblity is a very tall order indeed.
33
+
34
+ Evals really matter
35
+
36
+ Anthropic’s Amanda Askell (responsible for much of the work behind Claude’s Character):'
37
+ - "Industry’s Tardy Response to the AI Prompt Injection Vulnerability on RedMonk\
38
+ \ Conversations\n\n\nPosted 31st December 2023 at 11:59 pm · Follow me on Mastodon,\
39
+ \ Bluesky, Twitter or subscribe to my newsletter\n\n\nMore recent articles\n\n\
40
+ Qwen 3 offers a case study in how to effectively release a model - 29th April\
41
+ \ 2025\nWatching o3 guess a photo's location is surreal, dystopian and wildly\
42
+ \ entertaining - 26th April 2025\nExploring Promptfoo via Dave Guarino's SNAP\
43
+ \ evals - 24th April 2025\n\n\n \n\n\nThis is Stuff we figured out about AI in\
44
+ \ 2023 by Simon Willison, posted on 31st December 2023.\n\nPart of series LLMs\
45
+ \ annual review\n\nStuff we figured out about AI in 2023 - Dec. 31, 2023, 11:59\
46
+ \ p.m. \nThings we learned about LLMs in 2024 - Dec. 31, 2024, 6:07 p.m."
47
+ - source_sentence: Which company released the QwQ model under an Apache 20 license?
48
+ sentences:
49
+ - 'I also gave a bunch of talks and podcast appearances. I’ve started habitually
50
+ turning my talks into annotated presentations—here are my best from 2023:
51
+
52
+
53
+ Prompt injection explained, with video, slides, and a transcript
54
+
55
+ Catching up on the weird world of LLMs
56
+
57
+ Making Large Language Models work for you
58
+
59
+ Open questions for AI engineering
60
+
61
+ Embeddings: What they are and why they matter
62
+
63
+ Financial sustainability for open source projects at GitHub Universe
64
+
65
+
66
+ And in podcasts:
67
+
68
+
69
+
70
+ What AI can do for you on the Theory of Change
71
+
72
+
73
+ Working in public on Path to Citus Con
74
+
75
+
76
+ LLMs break the internet on the Changelog
77
+
78
+
79
+ Talking Large Language Models on Rooftop Ruby
80
+
81
+
82
+ Thoughts on the OpenAI board situation on Newsroom Robots'
83
+ - 'OpenAI are not the only game in town here. Google released their first entrant
84
+ in the category, gemini-2.0-flash-thinking-exp, on December 19th.
85
+
86
+ Alibaba’s Qwen team released their QwQ model on November 28th—under an Apache
87
+ 2.0 license, and that one I could run on my own machine. They followed that up
88
+ with a vision reasoning model called QvQ on December 24th, which I also ran locally.
89
+
90
+ DeepSeek made their DeepSeek-R1-Lite-Preview model available to try out through
91
+ their chat interface on November 20th.
92
+
93
+ To understand more about inference scaling I recommend Is AI progress slowing
94
+ down? by Arvind Narayanan and Sayash Kapoor.'
95
+ - '“Agents” still haven’t really happened yet
96
+
97
+ I find the term “agents” extremely frustrating. It lacks a single, clear and widely
98
+ understood meaning... but the people who use the term never seem to acknowledge
99
+ that.
100
+
101
+ If you tell me that you are building “agents”, you’ve conveyed almost no information
102
+ to me at all. Without reading your mind I have no way of telling which of the
103
+ dozens of possible definitions you are talking about.'
104
+ - source_sentence: How has Apple’s MLX library impacted the performance of running
105
+ machine learning models on Apple Silicon?
106
+ sentences:
107
+ - 'These abilities are just a few weeks old at this point, and I don’t think their
108
+ impact has been fully felt yet. If you haven’t tried them out yet you really should.
109
+
110
+ Both Gemini and OpenAI offer API access to these features as well. OpenAI started
111
+ with a WebSocket API that was quite challenging to use, but in December they announced
112
+ a new WebRTC API which is much easier to get started with. Building a web app
113
+ that a user can talk to via voice is easy now!
114
+
115
+ Prompt driven app generation is a commodity already
116
+
117
+ This was possible with GPT-4 in 2023, but the value it provides became evident
118
+ in 2024.'
119
+ - 'On paper, a 64GB Mac should be a great machine for running models due to the
120
+ way the CPU and GPU can share the same memory. In practice, many models are released
121
+ as model weights and libraries that reward NVIDIA’s CUDA over other platforms.
122
+
123
+ The llama.cpp ecosystem helped a lot here, but the real breakthrough has been
124
+ Apple’s MLX library, “an array framework for Apple Silicon”. It’s fantastic.
125
+
126
+ Apple’s mlx-lm Python library supports running a wide range of MLX-compatible
127
+ models on my Mac, with excellent performance. mlx-community on Hugging Face offers
128
+ more than 1,000 models that have been converted to the necessary format.'
129
+ - 'On the one hand, we keep on finding new things that LLMs can do that we didn’t
130
+ expect—and that the people who trained the models didn’t expect either. That’s
131
+ usually really fun!
132
+
133
+ But on the other hand, the things you sometimes have to do to get the models to
134
+ behave are often incredibly dumb.
135
+
136
+ Does ChatGPT get lazy in December, because its hidden system prompt includes the
137
+ current date and its training data shows that people provide less useful answers
138
+ coming up to the holidays?
139
+
140
+ The honest answer is “maybe”! No-one is entirely sure, but if you give it a different
141
+ date its answers may skew slightly longer.'
142
+ - source_sentence: What are some ways to run local, private large language models
143
+ (LLMs) mentioned in the context?
144
+ sentences:
145
+ - 'We don’t yet know how to build GPT-4
146
+
147
+ Frustratingly, despite the enormous leaps ahead we’ve had this year, we are yet
148
+ to see an alternative model that’s better than GPT-4.
149
+
150
+ OpenAI released GPT-4 in March, though it later turned out we had a sneak peak
151
+ of it in February when Microsoft used it as part of the new Bing.
152
+
153
+ This may well change in the next few weeks: Google’s Gemini Ultra has big claims,
154
+ but isn’t yet available for us to try out.
155
+
156
+ The team behind Mistral are working to beat GPT-4 as well, and their track record
157
+ is already extremely strong considering their first public model only came out
158
+ in September, and they’ve released two significant improvements since then.'
159
+ - 'I’m still trying to figure out the best patterns for doing this for my own work.
160
+ Everyone knows that evals are important, but there remains a lack of great guidance
161
+ for how to best implement them—I’m tracking this under my evals tag. My SVG pelican
162
+ riding a bicycle benchmark is a pale imitation of what a real eval suite should
163
+ look like.
164
+
165
+ Apple Intelligence is bad, Apple’s MLX library is excellent
166
+
167
+ As a Mac user I’ve been feeling a lot better about my choice of platform this
168
+ year.
169
+
170
+ Last year it felt like my lack of a Linux/Windows machine with an NVIDIA GPU
171
+ was a huge disadvantage in terms of trying out new models.'
172
+ - 'I run a bunch of them on my laptop. I run Mistral 7B (a surprisingly great model)
173
+ on my iPhone. You can install several different apps to get your own, local, completely
174
+ private LLM. My own LLM project provides a CLI tool for running an array of different
175
+ models via plugins.
176
+
177
+ You can even run them entirely in your browser using WebAssembly and the latest
178
+ Chrome!
179
+
180
+ Hobbyists can build their own fine-tuned models
181
+
182
+ I said earlier that building an LLM was still out of reach of hobbyists. That
183
+ may be true for training from scratch, but fine-tuning one of those models is
184
+ another matter entirely.'
185
+ - source_sentence: What is the most important factor in determining the quality of
186
+ a trained model according to the context?
187
+ sentences:
188
+ - 'Intuitively, one would expect that systems this powerful would take millions
189
+ of lines of complex code. Instead, it turns out a few hundred lines of Python
190
+ is genuinely enough to train a basic version!
191
+
192
+ What matters most is the training data. You need a lot of data to make these
193
+ things work, and the quantity and quality of the training data appears to be the
194
+ most important factor in how good the resulting model is.
195
+
196
+ If you can gather the right data, and afford to pay for the GPUs to train it,
197
+ you can build an LLM.'
198
+ - 'Now add a walrus: Prompt engineering in DALL-E 3
199
+
200
+ 32.8k
201
+
202
+ 41.2k
203
+
204
+
205
+
206
+ Web LLM runs the vicuna-7b Large Language Model entirely in your browser, and
207
+ it’s very impressive
208
+
209
+ 32.5k
210
+
211
+ 38.2k
212
+
213
+
214
+
215
+ ChatGPT can’t access the internet, even though it really looks like it can
216
+
217
+ 30.5k
218
+
219
+ 34.2k
220
+
221
+
222
+
223
+ Stanford Alpaca, and the acceleration of on-device large language model development
224
+
225
+ 29.7k
226
+
227
+ 35.7k
228
+
229
+
230
+
231
+ Run Llama 2 on your own Mac using LLM and Homebrew
232
+
233
+ 27.9k
234
+
235
+ 33.6k
236
+
237
+
238
+
239
+ Midjourney 5.1
240
+
241
+ 26.7k
242
+
243
+ 33.4k
244
+
245
+
246
+
247
+ Think of language models like ChatGPT as a “calculator for words”
248
+
249
+ 25k
250
+
251
+ 31.8k
252
+
253
+
254
+
255
+ Multi-modal prompt injection image attacks against GPT-4V
256
+
257
+ 23.7k
258
+
259
+ 27.4k'
260
+ - 'I think people who complain that LLM improvement has slowed are often missing
261
+ the enormous advances in these multi-modal models. Being able to run prompts against
262
+ images (and audio and video) is a fascinating new way to apply these models.
263
+
264
+ Voice and live camera mode are science fiction come to life
265
+
266
+ The audio and live video modes that have started to emerge deserve a special mention.
267
+
268
+ The ability to talk to ChatGPT first arrived in September 2023, but it was mostly
269
+ an illusion: OpenAI used their excellent Whisper speech-to-text model and a new
270
+ text-to-speech model (creatively named tts-1) to enable conversations with the
271
+ ChatGPT mobile apps, but the actual model just saw text.'
272
+ pipeline_tag: sentence-similarity
273
+ library_name: sentence-transformers
274
+ metrics:
275
+ - cosine_accuracy@1
276
+ - cosine_accuracy@3
277
+ - cosine_accuracy@5
278
+ - cosine_accuracy@10
279
+ - cosine_precision@1
280
+ - cosine_precision@3
281
+ - cosine_precision@5
282
+ - cosine_precision@10
283
+ - cosine_recall@1
284
+ - cosine_recall@3
285
+ - cosine_recall@5
286
+ - cosine_recall@10
287
+ - cosine_ndcg@10
288
+ - cosine_mrr@10
289
+ - cosine_map@100
290
+ model-index:
291
+ - name: SentenceTransformer based on Snowflake/snowflake-arctic-embed-l
292
+ results:
293
+ - task:
294
+ type: information-retrieval
295
+ name: Information Retrieval
296
+ dataset:
297
+ name: Unknown
298
+ type: unknown
299
+ metrics:
300
+ - type: cosine_accuracy@1
301
+ value: 0.9166666666666666
302
+ name: Cosine Accuracy@1
303
+ - type: cosine_accuracy@3
304
+ value: 1.0
305
+ name: Cosine Accuracy@3
306
+ - type: cosine_accuracy@5
307
+ value: 1.0
308
+ name: Cosine Accuracy@5
309
+ - type: cosine_accuracy@10
310
+ value: 1.0
311
+ name: Cosine Accuracy@10
312
+ - type: cosine_precision@1
313
+ value: 0.9166666666666666
314
+ name: Cosine Precision@1
315
+ - type: cosine_precision@3
316
+ value: 0.3333333333333333
317
+ name: Cosine Precision@3
318
+ - type: cosine_precision@5
319
+ value: 0.20000000000000004
320
+ name: Cosine Precision@5
321
+ - type: cosine_precision@10
322
+ value: 0.10000000000000002
323
+ name: Cosine Precision@10
324
+ - type: cosine_recall@1
325
+ value: 0.9166666666666666
326
+ name: Cosine Recall@1
327
+ - type: cosine_recall@3
328
+ value: 1.0
329
+ name: Cosine Recall@3
330
+ - type: cosine_recall@5
331
+ value: 1.0
332
+ name: Cosine Recall@5
333
+ - type: cosine_recall@10
334
+ value: 1.0
335
+ name: Cosine Recall@10
336
+ - type: cosine_ndcg@10
337
+ value: 0.9692441461309548
338
+ name: Cosine Ndcg@10
339
+ - type: cosine_mrr@10
340
+ value: 0.9583333333333334
341
+ name: Cosine Mrr@10
342
+ - type: cosine_map@100
343
+ value: 0.9583333333333334
344
+ name: Cosine Map@100
345
+ ---
346
+
347
+ # SentenceTransformer based on Snowflake/snowflake-arctic-embed-l
348
+
349
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [Snowflake/snowflake-arctic-embed-l](https://huggingface.co/Snowflake/snowflake-arctic-embed-l). It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
350
+
351
+ ## Model Details
352
+
353
+ ### Model Description
354
+ - **Model Type:** Sentence Transformer
355
+ - **Base model:** [Snowflake/snowflake-arctic-embed-l](https://huggingface.co/Snowflake/snowflake-arctic-embed-l) <!-- at revision d8fb21ca8d905d2832ee8b96c894d3298964346b -->
356
+ - **Maximum Sequence Length:** 512 tokens
357
+ - **Output Dimensionality:** 1024 dimensions
358
+ - **Similarity Function:** Cosine Similarity
359
+ <!-- - **Training Dataset:** Unknown -->
360
+ <!-- - **Language:** Unknown -->
361
+ <!-- - **License:** Unknown -->
362
+
363
+ ### Model Sources
364
+
365
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
366
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
367
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
368
+
369
+ ### Full Model Architecture
370
+
371
+ ```
372
+ SentenceTransformer(
373
+ (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
374
+ (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
375
+ (2): Normalize()
376
+ )
377
+ ```
378
+
379
+ ## Usage
380
+
381
+ ### Direct Usage (Sentence Transformers)
382
+
383
+ First install the Sentence Transformers library:
384
+
385
+ ```bash
386
+ pip install -U sentence-transformers
387
+ ```
388
+
389
+ Then you can load this model and run inference.
390
+ ```python
391
+ from sentence_transformers import SentenceTransformer
392
+
393
+ # Download from the 🤗 Hub
394
+ model = SentenceTransformer("manmah/legal-ft-717cb2ad-5d19-4d52-ad34-5656c2895fa9")
395
+ # Run inference
396
+ sentences = [
397
+ 'What is the most important factor in determining the quality of a trained model according to the context?',
398
+ 'Intuitively, one would expect that systems this powerful would take millions of lines of complex code. Instead, it turns out a few hundred lines of Python is genuinely enough to train a basic version!\nWhat matters most is the training data. You need a lot of data to make these things work, and the quantity and quality of the training data appears to be the most important factor in how good the resulting model is.\nIf you can gather the right data, and afford to pay for the GPUs to train it, you can build an LLM.',
399
+ 'I think people who complain that LLM improvement has slowed are often missing the enormous advances in these multi-modal models. Being able to run prompts against images (and audio and video) is a fascinating new way to apply these models.\nVoice and live camera mode are science fiction come to life\nThe audio and live video modes that have started to emerge deserve a special mention.\nThe ability to talk to ChatGPT first arrived in September 2023, but it was mostly an illusion: OpenAI used their excellent Whisper speech-to-text model and a new text-to-speech model (creatively named tts-1) to enable conversations with the ChatGPT mobile apps, but the actual model just saw text.',
400
+ ]
401
+ embeddings = model.encode(sentences)
402
+ print(embeddings.shape)
403
+ # [3, 1024]
404
+
405
+ # Get the similarity scores for the embeddings
406
+ similarities = model.similarity(embeddings, embeddings)
407
+ print(similarities.shape)
408
+ # [3, 3]
409
+ ```
410
+
411
+ <!--
412
+ ### Direct Usage (Transformers)
413
+
414
+ <details><summary>Click to see the direct usage in Transformers</summary>
415
+
416
+ </details>
417
+ -->
418
+
419
+ <!--
420
+ ### Downstream Usage (Sentence Transformers)
421
+
422
+ You can finetune this model on your own dataset.
423
+
424
+ <details><summary>Click to expand</summary>
425
+
426
+ </details>
427
+ -->
428
+
429
+ <!--
430
+ ### Out-of-Scope Use
431
+
432
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
433
+ -->
434
+
435
+ ## Evaluation
436
+
437
+ ### Metrics
438
+
439
+ #### Information Retrieval
440
+
441
+ * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
442
+
443
+ | Metric | Value |
444
+ |:--------------------|:-----------|
445
+ | cosine_accuracy@1 | 0.9167 |
446
+ | cosine_accuracy@3 | 1.0 |
447
+ | cosine_accuracy@5 | 1.0 |
448
+ | cosine_accuracy@10 | 1.0 |
449
+ | cosine_precision@1 | 0.9167 |
450
+ | cosine_precision@3 | 0.3333 |
451
+ | cosine_precision@5 | 0.2 |
452
+ | cosine_precision@10 | 0.1 |
453
+ | cosine_recall@1 | 0.9167 |
454
+ | cosine_recall@3 | 1.0 |
455
+ | cosine_recall@5 | 1.0 |
456
+ | cosine_recall@10 | 1.0 |
457
+ | **cosine_ndcg@10** | **0.9692** |
458
+ | cosine_mrr@10 | 0.9583 |
459
+ | cosine_map@100 | 0.9583 |
460
+
461
+ <!--
462
+ ## Bias, Risks and Limitations
463
+
464
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
465
+ -->
466
+
467
+ <!--
468
+ ### Recommendations
469
+
470
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
471
+ -->
472
+
473
+ ## Training Details
474
+
475
+ ### Training Dataset
476
+
477
+ #### Unnamed Dataset
478
+
479
+ * Size: 156 training samples
480
+ * Columns: <code>sentence_0</code> and <code>sentence_1</code>
481
+ * Approximate statistics based on the first 156 samples:
482
+ | | sentence_0 | sentence_1 |
483
+ |:--------|:-----------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------|
484
+ | type | string | string |
485
+ | details | <ul><li>min: 12 tokens</li><li>mean: 20.92 tokens</li><li>max: 35 tokens</li></ul> | <ul><li>min: 43 tokens</li><li>mean: 135.28 tokens</li><li>max: 214 tokens</li></ul> |
486
+ * Samples:
487
+ | sentence_0 | sentence_1 |
488
+ |:---------------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
489
+ | <code>What are the two main categories of AI agents described in the context?</code> | <code>The two main categories I see are people who think AI agents are obviously things that go and act on your behalf—the travel agent model—and people who think in terms of LLMs that have been given access to tools which they can run in a loop as part of solving a problem. The term “autonomy” is often thrown into the mix too, again without including a clear definition.<br>(I also collected 211 definitions on Twitter a few months ago—here they are in Datasette Lite—and had gemini-exp-1206 attempt to summarize them.)<br>Whatever the term may mean, agents still have that feeling of perpetually “coming soon”.</code> |
490
+ | <code>How is the term "autonomy" treated in discussions about AI agents according to the context?</code> | <code>The two main categories I see are people who think AI agents are obviously things that go and act on your behalf—the travel agent model—and people who think in terms of LLMs that have been given access to tools which they can run in a loop as part of solving a problem. The term “autonomy” is often thrown into the mix too, again without including a clear definition.<br>(I also collected 211 definitions on Twitter a few months ago—here they are in Datasette Lite—and had gemini-exp-1206 attempt to summarize them.)<br>Whatever the term may mean, agents still have that feeling of perpetually “coming soon”.</code> |
491
+ | <code>What colors and patterns are described on the two butterflies positioned in the feeder?</code> | <code>Against this photo of butterflies at the California Academy of Sciences:<br><br><br>A shallow dish, likely a hummingbird or butterfly feeder, is red. Pieces of orange slices of fruit are visible inside the dish.<br>Two butterflies are positioned in the feeder, one is a dark brown/black butterfly with white/cream-colored markings. The other is a large, brown butterfly with patterns of lighter brown, beige, and black markings, including prominent eye spots. The larger brown butterfly appears to be feeding on the fruit.</code> |
492
+ * Loss: [<code>MatryoshkaLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#matryoshkaloss) with these parameters:
493
+ ```json
494
+ {
495
+ "loss": "MultipleNegativesRankingLoss",
496
+ "matryoshka_dims": [
497
+ 768,
498
+ 512,
499
+ 256,
500
+ 128,
501
+ 64
502
+ ],
503
+ "matryoshka_weights": [
504
+ 1,
505
+ 1,
506
+ 1,
507
+ 1,
508
+ 1
509
+ ],
510
+ "n_dims_per_step": -1
511
+ }
512
+ ```
513
+
514
+ ### Training Hyperparameters
515
+ #### Non-Default Hyperparameters
516
+
517
+ - `eval_strategy`: steps
518
+ - `per_device_train_batch_size`: 10
519
+ - `per_device_eval_batch_size`: 10
520
+ - `num_train_epochs`: 10
521
+ - `multi_dataset_batch_sampler`: round_robin
522
+
523
+ #### All Hyperparameters
524
+ <details><summary>Click to expand</summary>
525
+
526
+ - `overwrite_output_dir`: False
527
+ - `do_predict`: False
528
+ - `eval_strategy`: steps
529
+ - `prediction_loss_only`: True
530
+ - `per_device_train_batch_size`: 10
531
+ - `per_device_eval_batch_size`: 10
532
+ - `per_gpu_train_batch_size`: None
533
+ - `per_gpu_eval_batch_size`: None
534
+ - `gradient_accumulation_steps`: 1
535
+ - `eval_accumulation_steps`: None
536
+ - `torch_empty_cache_steps`: None
537
+ - `learning_rate`: 5e-05
538
+ - `weight_decay`: 0.0
539
+ - `adam_beta1`: 0.9
540
+ - `adam_beta2`: 0.999
541
+ - `adam_epsilon`: 1e-08
542
+ - `max_grad_norm`: 1
543
+ - `num_train_epochs`: 10
544
+ - `max_steps`: -1
545
+ - `lr_scheduler_type`: linear
546
+ - `lr_scheduler_kwargs`: {}
547
+ - `warmup_ratio`: 0.0
548
+ - `warmup_steps`: 0
549
+ - `log_level`: passive
550
+ - `log_level_replica`: warning
551
+ - `log_on_each_node`: True
552
+ - `logging_nan_inf_filter`: True
553
+ - `save_safetensors`: True
554
+ - `save_on_each_node`: False
555
+ - `save_only_model`: False
556
+ - `restore_callback_states_from_checkpoint`: False
557
+ - `no_cuda`: False
558
+ - `use_cpu`: False
559
+ - `use_mps_device`: False
560
+ - `seed`: 42
561
+ - `data_seed`: None
562
+ - `jit_mode_eval`: False
563
+ - `use_ipex`: False
564
+ - `bf16`: False
565
+ - `fp16`: False
566
+ - `fp16_opt_level`: O1
567
+ - `half_precision_backend`: auto
568
+ - `bf16_full_eval`: False
569
+ - `fp16_full_eval`: False
570
+ - `tf32`: None
571
+ - `local_rank`: 0
572
+ - `ddp_backend`: None
573
+ - `tpu_num_cores`: None
574
+ - `tpu_metrics_debug`: False
575
+ - `debug`: []
576
+ - `dataloader_drop_last`: False
577
+ - `dataloader_num_workers`: 0
578
+ - `dataloader_prefetch_factor`: None
579
+ - `past_index`: -1
580
+ - `disable_tqdm`: False
581
+ - `remove_unused_columns`: True
582
+ - `label_names`: None
583
+ - `load_best_model_at_end`: False
584
+ - `ignore_data_skip`: False
585
+ - `fsdp`: []
586
+ - `fsdp_min_num_params`: 0
587
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
588
+ - `tp_size`: 0
589
+ - `fsdp_transformer_layer_cls_to_wrap`: None
590
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
591
+ - `deepspeed`: None
592
+ - `label_smoothing_factor`: 0.0
593
+ - `optim`: adamw_torch
594
+ - `optim_args`: None
595
+ - `adafactor`: False
596
+ - `group_by_length`: False
597
+ - `length_column_name`: length
598
+ - `ddp_find_unused_parameters`: None
599
+ - `ddp_bucket_cap_mb`: None
600
+ - `ddp_broadcast_buffers`: False
601
+ - `dataloader_pin_memory`: True
602
+ - `dataloader_persistent_workers`: False
603
+ - `skip_memory_metrics`: True
604
+ - `use_legacy_prediction_loop`: False
605
+ - `push_to_hub`: False
606
+ - `resume_from_checkpoint`: None
607
+ - `hub_model_id`: None
608
+ - `hub_strategy`: every_save
609
+ - `hub_private_repo`: None
610
+ - `hub_always_push`: False
611
+ - `gradient_checkpointing`: False
612
+ - `gradient_checkpointing_kwargs`: None
613
+ - `include_inputs_for_metrics`: False
614
+ - `include_for_metrics`: []
615
+ - `eval_do_concat_batches`: True
616
+ - `fp16_backend`: auto
617
+ - `push_to_hub_model_id`: None
618
+ - `push_to_hub_organization`: None
619
+ - `mp_parameters`:
620
+ - `auto_find_batch_size`: False
621
+ - `full_determinism`: False
622
+ - `torchdynamo`: None
623
+ - `ray_scope`: last
624
+ - `ddp_timeout`: 1800
625
+ - `torch_compile`: False
626
+ - `torch_compile_backend`: None
627
+ - `torch_compile_mode`: None
628
+ - `include_tokens_per_second`: False
629
+ - `include_num_input_tokens_seen`: False
630
+ - `neftune_noise_alpha`: None
631
+ - `optim_target_modules`: None
632
+ - `batch_eval_metrics`: False
633
+ - `eval_on_start`: False
634
+ - `use_liger_kernel`: False
635
+ - `eval_use_gather_object`: False
636
+ - `average_tokens_across_devices`: False
637
+ - `prompts`: None
638
+ - `batch_sampler`: batch_sampler
639
+ - `multi_dataset_batch_sampler`: round_robin
640
+
641
+ </details>
642
+
643
+ ### Training Logs
644
+ | Epoch | Step | cosine_ndcg@10 |
645
+ |:-----:|:----:|:--------------:|
646
+ | 1.0 | 16 | 0.9554 |
647
+ | 2.0 | 32 | 0.9484 |
648
+ | 3.0 | 48 | 0.9692 |
649
+ | 3.125 | 50 | 0.9692 |
650
+ | 4.0 | 64 | 0.9692 |
651
+ | 5.0 | 80 | 0.9692 |
652
+ | 6.0 | 96 | 0.9692 |
653
+ | 6.25 | 100 | 0.9692 |
654
+ | 7.0 | 112 | 0.9692 |
655
+ | 8.0 | 128 | 0.9692 |
656
+ | 9.0 | 144 | 0.9692 |
657
+ | 9.375 | 150 | 0.9692 |
658
+ | 10.0 | 160 | 0.9692 |
659
+
660
+
661
+ ### Framework Versions
662
+ - Python: 3.13.2
663
+ - Sentence Transformers: 4.1.0
664
+ - Transformers: 4.51.3
665
+ - PyTorch: 2.7.0
666
+ - Accelerate: 1.6.0
667
+ - Datasets: 3.5.1
668
+ - Tokenizers: 0.21.1
669
+
670
+ ## Citation
671
+
672
+ ### BibTeX
673
+
674
+ #### Sentence Transformers
675
+ ```bibtex
676
+ @inproceedings{reimers-2019-sentence-bert,
677
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
678
+ author = "Reimers, Nils and Gurevych, Iryna",
679
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
680
+ month = "11",
681
+ year = "2019",
682
+ publisher = "Association for Computational Linguistics",
683
+ url = "https://arxiv.org/abs/1908.10084",
684
+ }
685
+ ```
686
+
687
+ #### MatryoshkaLoss
688
+ ```bibtex
689
+ @misc{kusupati2024matryoshka,
690
+ title={Matryoshka Representation Learning},
691
+ author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
692
+ year={2024},
693
+ eprint={2205.13147},
694
+ archivePrefix={arXiv},
695
+ primaryClass={cs.LG}
696
+ }
697
+ ```
698
+
699
+ #### MultipleNegativesRankingLoss
700
+ ```bibtex
701
+ @misc{henderson2017efficient,
702
+ title={Efficient Natural Language Response Suggestion for Smart Reply},
703
+ author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
704
+ year={2017},
705
+ eprint={1705.00652},
706
+ archivePrefix={arXiv},
707
+ primaryClass={cs.CL}
708
+ }
709
+ ```
710
+
711
+ <!--
712
+ ## Glossary
713
+
714
+ *Clearly define terms in order to be accessible across audiences.*
715
+ -->
716
+
717
+ <!--
718
+ ## Model Card Authors
719
+
720
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
721
+ -->
722
+
723
+ <!--
724
+ ## Model Card Contact
725
+
726
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
727
+ -->
config.json ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "BertModel"
4
+ ],
5
+ "attention_probs_dropout_prob": 0.1,
6
+ "classifier_dropout": null,
7
+ "hidden_act": "gelu",
8
+ "hidden_dropout_prob": 0.1,
9
+ "hidden_size": 1024,
10
+ "initializer_range": 0.02,
11
+ "intermediate_size": 4096,
12
+ "layer_norm_eps": 1e-12,
13
+ "max_position_embeddings": 512,
14
+ "model_type": "bert",
15
+ "num_attention_heads": 16,
16
+ "num_hidden_layers": 24,
17
+ "pad_token_id": 0,
18
+ "position_embedding_type": "absolute",
19
+ "torch_dtype": "float32",
20
+ "transformers_version": "4.51.3",
21
+ "type_vocab_size": 2,
22
+ "use_cache": true,
23
+ "vocab_size": 30522
24
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "4.1.0",
4
+ "transformers": "4.51.3",
5
+ "pytorch": "2.7.0"
6
+ },
7
+ "prompts": {
8
+ "query": "Represent this sentence for searching relevant passages: "
9
+ },
10
+ "default_prompt_name": null,
11
+ "similarity_fn_name": "cosine"
12
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:245e75fdcbc8fb2ca94b376398c2ed622ef36679a825f381b62f4c95df769196
3
+ size 1336413848
modules.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ },
14
+ {
15
+ "idx": 2,
16
+ "name": "2",
17
+ "path": "2_Normalize",
18
+ "type": "sentence_transformers.models.Normalize"
19
+ }
20
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 512,
3
+ "do_lower_case": false
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": {
3
+ "content": "[CLS]",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "mask_token": {
10
+ "content": "[MASK]",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "[PAD]",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "sep_token": {
24
+ "content": "[SEP]",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "unk_token": {
31
+ "content": "[UNK]",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ }
37
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,63 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": true,
45
+ "cls_token": "[CLS]",
46
+ "do_lower_case": true,
47
+ "extra_special_tokens": {},
48
+ "mask_token": "[MASK]",
49
+ "max_length": 512,
50
+ "model_max_length": 512,
51
+ "pad_to_multiple_of": null,
52
+ "pad_token": "[PAD]",
53
+ "pad_token_type_id": 0,
54
+ "padding_side": "right",
55
+ "sep_token": "[SEP]",
56
+ "stride": 0,
57
+ "strip_accents": null,
58
+ "tokenize_chinese_chars": true,
59
+ "tokenizer_class": "BertTokenizer",
60
+ "truncation_side": "right",
61
+ "truncation_strategy": "longest_first",
62
+ "unk_token": "[UNK]"
63
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff