AlexanderMaz commited on Sep 14

Commit

fa14510

verified ·

1 Parent(s): c920bee

Upload acta anonymizer adapter - Latest (v20250914_065801)

Browse files

Files changed (27) hide show

.gitattributes +2 -0
README.md +179 -64
adapter_config.json +2 -2
adapter_model.safetensors +1 -1
checkpoint-8000/README.md +206 -0
checkpoint-8000/adapter_config.json +42 -0
checkpoint-8000/adapter_model.safetensors +3 -0
checkpoint-8000/optimizer.pt +3 -0
checkpoint-8000/rng_state.pth +3 -0
checkpoint-8000/scheduler.pt +3 -0
checkpoint-8000/special_tokens_map.json +51 -0
checkpoint-8000/tokenizer.json +3 -0
checkpoint-8000/tokenizer_config.json +59 -0
checkpoint-8000/trainer_state.json +1346 -0
checkpoint-8000/training_args.bin +3 -0
checkpoint-8716/README.md +206 -0
checkpoint-8716/adapter_config.json +42 -0
checkpoint-8716/adapter_model.safetensors +3 -0
checkpoint-8716/optimizer.pt +3 -0
checkpoint-8716/rng_state.pth +3 -0
checkpoint-8716/scheduler.pt +3 -0
checkpoint-8716/special_tokens_map.json +51 -0
checkpoint-8716/tokenizer.json +3 -0
checkpoint-8716/tokenizer_config.json +59 -0
checkpoint-8716/trainer_state.json +1456 -0
checkpoint-8716/training_args.bin +3 -0
training_args.bin +1 -1

.gitattributes CHANGED Viewed

@@ -47,3 +47,5 @@ checkpoint-13074/tokenizer.json filter=lfs diff=lfs merge=lfs -text
 versions/20250914_065801/checkpoint-8000/tokenizer.json filter=lfs diff=lfs merge=lfs -text
 versions/20250914_065801/checkpoint-8716/tokenizer.json filter=lfs diff=lfs merge=lfs -text
 versions/20250914_065801/tokenizer.json filter=lfs diff=lfs merge=lfs -text

 versions/20250914_065801/checkpoint-8000/tokenizer.json filter=lfs diff=lfs merge=lfs -text
 versions/20250914_065801/checkpoint-8716/tokenizer.json filter=lfs diff=lfs merge=lfs -text
 versions/20250914_065801/tokenizer.json filter=lfs diff=lfs merge=lfs -text
+checkpoint-8000/tokenizer.json filter=lfs diff=lfs merge=lfs -text
+checkpoint-8716/tokenizer.json filter=lfs diff=lfs merge=lfs -text

README.md CHANGED Viewed

@@ -1,91 +1,206 @@
 ---
-license: apache-2.0
-language:
-- ro
 base_model: EvanD/xlm-roberta-base-romanian-ner-ronec
 tags:
-- token-classification
-- named-entity-recognition
-- pii-detection
-- romanian
-- moldova
-- financial-pii
-- banking
-- fintech
 ---
-# Finguys/acta-anonymizer-financial
-Acta Anonymizer Financial Adapter
-This model is a fine-tuned adapter for Romanian financial text anonymization.
-It's based on XLM-RoBERTa and trained specifically for detecting and anonymizing
-PII in Romanian financial documents from Moldova.
-Key features:
-- Romanian language support
-- Financial domain specialization
-- GDPR compliance focused
-- High accuracy PII detection
-Use cases:
-- Banking document anonymization
-- Financial report processing
-- Compliance data handling
-**Current Version**: 20250914_035417
-## Key Features
-- Romanian language support
-- GDPR compliance focused
-- High accuracy PII detection
-- Domain-specific fine-tuning
-## Use Cases
-- Banking document anonymization
-- Financial report processing
-- Compliance data handling
-## Training Data
-This model was trained on synthetic Moldovan PII data for financial domain anonymization.
-## Usage
-```python
-from peft import PeftModel
-from transformers import AutoModelForTokenClassification, AutoTokenizer, pipeline
-# Load base model
-model = AutoModelForTokenClassification.from_pretrained("EvanD/xlm-roberta-base-romanian-ner-ronec")
-tokenizer = AutoTokenizer.from_pretrained("EvanD/xlm-roberta-base-romanian-ner-ronec")
-# Load adapter
-model = PeftModel.from_pretrained(model, "Finguys/acta-anonymizer-financial")
-# Create pipeline
-ner_pipeline = pipeline(
-    "token-classification",
-    model=model,
-    tokenizer=tokenizer,
-    aggregation_strategy="simple"
-)
-# Example usage
-text = "Ion Popescu are un cont la Banca Transilvania cu IBAN RO49AAAA1B310075938400000."
-entities = ner_pipeline(text)
-print(entities)
-```
-## Training
-This model was trained using LoRA (Low-Rank Adaptation) on synthetic Moldovan PII data.
-## Versions
-- **Latest**: Root level contains the most recent version
-- **Archived**: Previous versions are stored in `versions/` folder
-- **Version Index**: See `version_history.yaml` for complete version history

 ---
 base_model: EvanD/xlm-roberta-base-romanian-ner-ronec
+library_name: peft
 tags:
+- base_model:adapter:EvanD/xlm-roberta-base-romanian-ner-ronec
+- lora
+- transformers
 ---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.17.1

adapter_config.json CHANGED Viewed

@@ -29,9 +29,9 @@
   "revision": null,
   "target_modules": [
     "dense",
-    "key",
     "query",
-    "value"
   ],
   "target_parameters": null,
   "task_type": "TOKEN_CLS",

   "revision": null,
   "target_modules": [
     "dense",
     "query",
+    "value",
+    "key"
   ],
   "target_parameters": null,
   "task_type": "TOKEN_CLS",

adapter_model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:ff2623b5e82b5d0e70983ebf344f330bb8f3e1226a105a5aed87766442cec175
 size 10899068

 version https://git-lfs.github.com/spec/v1
+oid sha256:0f6538ee1e725e137fe76752f42a04511d52a963b7bef676ab1c896d99c808d0
 size 10899068

checkpoint-8000/README.md ADDED Viewed

	@@ -0,0 +1,206 @@

+---
+base_model: EvanD/xlm-roberta-base-romanian-ner-ronec
+library_name: peft
+tags:
+- base_model:adapter:EvanD/xlm-roberta-base-romanian-ner-ronec
+- lora
+- transformers
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.17.1

checkpoint-8000/adapter_config.json ADDED Viewed

	@@ -0,0 +1,42 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "EvanD/xlm-roberta-base-romanian-ner-ronec",
+  "bias": "none",
+  "corda_config": null,
+  "eva_config": null,
+  "exclude_modules": null,
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 32,
+  "lora_bias": false,
+  "lora_dropout": 0.1,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": [
+    "classifier",
+    "score"
+  ],
+  "peft_type": "LORA",
+  "qalora_group_size": 16,
+  "r": 16,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "dense",
+    "query",
+    "value",
+    "key"
+  ],
+  "target_parameters": null,
+  "task_type": "TOKEN_CLS",
+  "trainable_token_indices": null,
+  "use_dora": false,
+  "use_qalora": false,
+  "use_rslora": false
+}

checkpoint-8000/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:0f6538ee1e725e137fe76752f42a04511d52a963b7bef676ab1c896d99c808d0
+size 10899068

checkpoint-8000/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:c1833e690858935a83452d84fbad03a711776aaaadcfec3fb7984a68efda8c74
+size 21881739

checkpoint-8000/rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:bd539f5a45211df22b0045edde016f045a350615c710e3630fa74ecc2365b3ce
+size 14645

checkpoint-8000/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:fca6568f7f9ddae79824057661dc2f909b293d4cff9027842a2246bca50ebddf
+size 1465

checkpoint-8000/special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,51 @@

+{
+  "bos_token": {
+    "content": "<s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "cls_token": {
+    "content": "<s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "eos_token": {
+    "content": "</s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "mask_token": {
+    "content": "<mask>",
+    "lstrip": true,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "<pad>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "sep_token": {
+    "content": "</s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "unk_token": {
+    "content": "<unk>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

checkpoint-8000/tokenizer.json ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:8373f9cd3d27591e1924426bcc1c8799bc5a9affc4fc857982c5d66668dd1f41
+size 17082832

checkpoint-8000/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,59 @@

+{
+  "added_tokens_decoder": {
+    "0": {
+      "content": "<s>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "1": {
+      "content": "<pad>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "2": {
+      "content": "</s>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "3": {
+      "content": "<unk>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "250001": {
+      "content": "<mask>",
+      "lstrip": true,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "bos_token": "<s>",
+  "clean_up_tokenization_spaces": true,
+  "cls_token": "<s>",
+  "eos_token": "</s>",
+  "extra_special_tokens": {},
+  "mask_token": "<mask>",
+  "max_length": 512,
+  "model_max_length": 512,
+  "pad_token": "<pad>",
+  "sep_token": "</s>",
+  "stride": 0,
+  "tokenizer_class": "XLMRobertaTokenizer",
+  "truncation_side": "right",
+  "truncation_strategy": "longest_first",
+  "unk_token": "<unk>"
+}

checkpoint-8000/trainer_state.json ADDED Viewed

	@@ -0,0 +1,1346 @@

+{
+  "best_global_step": 8000,
+  "best_metric": 0.9692734951155788,
+  "best_model_checkpoint": "./models/financial_adapter_20250914_060658/checkpoint-8000",
+  "epoch": 1.835704451583295,
+  "eval_steps": 500,
+  "global_step": 8000,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.011473152822395595,
+      "grad_norm": 6.211507797241211,
+      "learning_rate": 4.9000000000000005e-05,
+      "loss": 3.7315,
+      "step": 50
+    },
+    {
+      "epoch": 0.02294630564479119,
+      "grad_norm": 0.805833637714386,
+      "learning_rate": 9.900000000000001e-05,
+      "loss": 1.2821,
+      "step": 100
+    },
+    {
+      "epoch": 0.03441945846718678,
+      "grad_norm": 0.7927971482276917,
+      "learning_rate": 0.000149,
+      "loss": 0.644,
+      "step": 150
+    },
+    {
+      "epoch": 0.04589261128958238,
+      "grad_norm": 0.95986407995224,
+      "learning_rate": 0.000199,
+      "loss": 0.3733,
+      "step": 200
+    },
+    {
+      "epoch": 0.05736576411197797,
+      "grad_norm": 0.7364535927772522,
+      "learning_rate": 0.000249,
+      "loss": 0.2499,
+      "step": 250
+    },
+    {
+      "epoch": 0.06883891693437356,
+      "grad_norm": 0.8872820734977722,
+      "learning_rate": 0.000299,
+      "loss": 0.1937,
+      "step": 300
+    },
+    {
+      "epoch": 0.08031206975676916,
+      "grad_norm": 0.5113154053688049,
+      "learning_rate": 0.00034899999999999997,
+      "loss": 0.1646,
+      "step": 350
+    },
+    {
+      "epoch": 0.09178522257916476,
+      "grad_norm": 0.7045756578445435,
+      "learning_rate": 0.00039900000000000005,
+      "loss": 0.1466,
+      "step": 400
+    },
+    {
+      "epoch": 0.10325837540156035,
+      "grad_norm": 0.5810624957084656,
+      "learning_rate": 0.000449,
+      "loss": 0.1314,
+      "step": 450
+    },
+    {
+      "epoch": 0.11473152822395594,
+      "grad_norm": 0.5461702346801758,
+      "learning_rate": 0.000499,
+      "loss": 0.1276,
+      "step": 500
+    },
+    {
+      "epoch": 0.11473152822395594,
+      "eval_accuracy": 0.970071396651017,
+      "eval_f1": 0.931817690038444,
+      "eval_loss": 0.1108999103307724,
+      "eval_precision": 0.9314176027125088,
+      "eval_recall": 0.9343327320933421,
+      "eval_runtime": 138.5938,
+      "eval_samples_per_second": 143.729,
+      "eval_steps_per_second": 8.983,
+      "step": 500
+    },
+    {
+      "epoch": 0.12620468104635155,
+      "grad_norm": 0.5111148357391357,
+      "learning_rate": 0.0004970180136319377,
+      "loss": 0.1157,
+      "step": 550
+    },
+    {
+      "epoch": 0.13767783386874713,
+      "grad_norm": 0.5293119549751282,
+      "learning_rate": 0.000493975170399221,
+      "loss": 0.1091,
+      "step": 600
+    },
+    {
+      "epoch": 0.14915098669114274,
+      "grad_norm": 0.5645154714584351,
+      "learning_rate": 0.0004909323271665044,
+      "loss": 0.1054,
+      "step": 650
+    },
+    {
+      "epoch": 0.16062413951353832,
+      "grad_norm": 0.3088572025299072,
+      "learning_rate": 0.0004878894839337877,
+      "loss": 0.1017,
+      "step": 700
+    },
+    {
+      "epoch": 0.1720972923359339,
+      "grad_norm": 0.3965695798397064,
+      "learning_rate": 0.0004848466407010711,
+      "loss": 0.0881,
+      "step": 750
+    },
+    {
+      "epoch": 0.18357044515832951,
+      "grad_norm": 0.44487106800079346,
+      "learning_rate": 0.0004818037974683545,
+      "loss": 0.0908,
+      "step": 800
+    },
+    {
+      "epoch": 0.1950435979807251,
+      "grad_norm": 0.5573059320449829,
+      "learning_rate": 0.00047876095423563783,
+      "loss": 0.0923,
+      "step": 850
+    },
+    {
+      "epoch": 0.2065167508031207,
+      "grad_norm": 0.242904931306839,
+      "learning_rate": 0.00047571811100292113,
+      "loss": 0.0893,
+      "step": 900
+    },
+    {
+      "epoch": 0.2179899036255163,
+      "grad_norm": 0.4724123477935791,
+      "learning_rate": 0.0004726752677702045,
+      "loss": 0.0878,
+      "step": 950
+    },
+    {
+      "epoch": 0.22946305644791187,
+      "grad_norm": 0.32161369919776917,
+      "learning_rate": 0.00046963242453748784,
+      "loss": 0.0816,
+      "step": 1000
+    },
+    {
+      "epoch": 0.22946305644791187,
+      "eval_accuracy": 0.977059938804148,
+      "eval_f1": 0.9485042354538835,
+      "eval_loss": 0.07678939402103424,
+      "eval_precision": 0.948923113654349,
+      "eval_recall": 0.9476371557635869,
+      "eval_runtime": 118.5842,
+      "eval_samples_per_second": 167.982,
+      "eval_steps_per_second": 10.499,
+      "step": 1000
+    },
+    {
+      "epoch": 0.24093620927030748,
+      "grad_norm": 1.1815085411071777,
+      "learning_rate": 0.0004665895813047712,
+      "loss": 0.0837,
+      "step": 1050
+    },
+    {
+      "epoch": 0.2524093620927031,
+      "grad_norm": 0.26893287897109985,
+      "learning_rate": 0.00046354673807205455,
+      "loss": 0.0816,
+      "step": 1100
+    },
+    {
+      "epoch": 0.2638825149150987,
+      "grad_norm": 0.31159281730651855,
+      "learning_rate": 0.0004605038948393379,
+      "loss": 0.082,
+      "step": 1150
+    },
+    {
+      "epoch": 0.27535566773749426,
+      "grad_norm": 0.3065606951713562,
+      "learning_rate": 0.0004574610516066212,
+      "loss": 0.0742,
+      "step": 1200
+    },
+    {
+      "epoch": 0.28682882055988984,
+      "grad_norm": 0.2774142324924469,
+      "learning_rate": 0.00045441820837390456,
+      "loss": 0.0792,
+      "step": 1250
+    },
+    {
+      "epoch": 0.2983019733822855,
+      "grad_norm": 0.23560093343257904,
+      "learning_rate": 0.0004513753651411879,
+      "loss": 0.0719,
+      "step": 1300
+    },
+    {
+      "epoch": 0.30977512620468106,
+      "grad_norm": 0.29983004927635193,
+      "learning_rate": 0.0004483325219084713,
+      "loss": 0.0722,
+      "step": 1350
+    },
+    {
+      "epoch": 0.32124827902707664,
+      "grad_norm": 0.26415759325027466,
+      "learning_rate": 0.00044528967867575467,
+      "loss": 0.0724,
+      "step": 1400
+    },
+    {
+      "epoch": 0.3327214318494722,
+      "grad_norm": 0.34820911288261414,
+      "learning_rate": 0.000442246835443038,
+      "loss": 0.0756,
+      "step": 1450
+    },
+    {
+      "epoch": 0.3441945846718678,
+      "grad_norm": 0.35296013951301575,
+      "learning_rate": 0.0004392039922103213,
+      "loss": 0.0676,
+      "step": 1500
+    },
+    {
+      "epoch": 0.3441945846718678,
+      "eval_accuracy": 0.9795261937874608,
+      "eval_f1": 0.9560398155032158,
+      "eval_loss": 0.06693130731582642,
+      "eval_precision": 0.9546461779958985,
+      "eval_recall": 0.9579011468881853,
+      "eval_runtime": 122.2351,
+      "eval_samples_per_second": 162.965,
+      "eval_steps_per_second": 10.185,
+      "step": 1500
+    },
+    {
+      "epoch": 0.35566773749426345,
+      "grad_norm": 0.4301516115665436,
+      "learning_rate": 0.0004361611489776047,
+      "loss": 0.0719,
+      "step": 1550
+    },
+    {
+      "epoch": 0.36714089031665903,
+      "grad_norm": 0.3780372440814972,
+      "learning_rate": 0.00043311830574488803,
+      "loss": 0.072,
+      "step": 1600
+    },
+    {
+      "epoch": 0.3786140431390546,
+      "grad_norm": 0.3334237337112427,
+      "learning_rate": 0.0004300754625121714,
+      "loss": 0.0664,
+      "step": 1650
+    },
+    {
+      "epoch": 0.3900871959614502,
+      "grad_norm": 0.21125715970993042,
+      "learning_rate": 0.00042703261927945474,
+      "loss": 0.0708,
+      "step": 1700
+    },
+    {
+      "epoch": 0.4015603487838458,
+      "grad_norm": 0.36593177914619446,
+      "learning_rate": 0.0004239897760467381,
+      "loss": 0.0674,
+      "step": 1750
+    },
+    {
+      "epoch": 0.4130335016062414,
+      "grad_norm": 0.5742707252502441,
+      "learning_rate": 0.0004209469328140214,
+      "loss": 0.0709,
+      "step": 1800
+    },
+    {
+      "epoch": 0.424506654428637,
+      "grad_norm": 0.43670088052749634,
+      "learning_rate": 0.00041790408958130475,
+      "loss": 0.0626,
+      "step": 1850
+    },
+    {
+      "epoch": 0.4359798072510326,
+      "grad_norm": 0.3064088225364685,
+      "learning_rate": 0.0004148612463485881,
+      "loss": 0.067,
+      "step": 1900
+    },
+    {
+      "epoch": 0.44745296007342816,
+      "grad_norm": 0.26380443572998047,
+      "learning_rate": 0.0004118184031158715,
+      "loss": 0.0673,
+      "step": 1950
+    },
+    {
+      "epoch": 0.45892611289582375,
+      "grad_norm": 0.2760469913482666,
+      "learning_rate": 0.00040877555988315487,
+      "loss": 0.0686,
+      "step": 2000
+    },
+    {
+      "epoch": 0.45892611289582375,
+      "eval_accuracy": 0.9792398202709974,
+      "eval_f1": 0.9566771424147363,
+      "eval_loss": 0.06424970924854279,
+      "eval_precision": 0.9518499916341474,
+      "eval_recall": 0.9651666524432667,
+      "eval_runtime": 119.1986,
+      "eval_samples_per_second": 167.116,
+      "eval_steps_per_second": 10.445,
+      "step": 2000
+    },
+    {
+      "epoch": 0.4703992657182194,
+      "grad_norm": 0.36512625217437744,
+      "learning_rate": 0.0004057327166504382,
+      "loss": 0.065,
+      "step": 2050
+    },
+    {
+      "epoch": 0.48187241854061497,
+      "grad_norm": 0.3270319402217865,
+      "learning_rate": 0.0004026898734177215,
+      "loss": 0.0663,
+      "step": 2100
+    },
+    {
+      "epoch": 0.49334557136301055,
+      "grad_norm": 0.2962779700756073,
+      "learning_rate": 0.0003996470301850049,
+      "loss": 0.0681,
+      "step": 2150
+    },
+    {
+      "epoch": 0.5048187241854062,
+      "grad_norm": 0.4675407409667969,
+      "learning_rate": 0.00039660418695228823,
+      "loss": 0.0715,
+      "step": 2200
+    },
+    {
+      "epoch": 0.5162918770078018,
+      "grad_norm": 0.27184540033340454,
+      "learning_rate": 0.0003935613437195716,
+      "loss": 0.066,
+      "step": 2250
+    },
+    {
+      "epoch": 0.5277650298301974,
+      "grad_norm": 0.3290219008922577,
+      "learning_rate": 0.00039051850048685494,
+      "loss": 0.0666,
+      "step": 2300
+    },
+    {
+      "epoch": 0.5392381826525929,
+      "grad_norm": 0.18070432543754578,
+      "learning_rate": 0.00038747565725413824,
+      "loss": 0.0626,
+      "step": 2350
+    },
+    {
+      "epoch": 0.5507113354749885,
+      "grad_norm": 0.27334141731262207,
+      "learning_rate": 0.0003844328140214216,
+      "loss": 0.0595,
+      "step": 2400
+    },
+    {
+      "epoch": 0.5621844882973841,
+      "grad_norm": 0.29450327157974243,
+      "learning_rate": 0.00038138997078870495,
+      "loss": 0.0655,
+      "step": 2450
+    },
+    {
+      "epoch": 0.5736576411197797,
+      "grad_norm": 0.49441081285476685,
+      "learning_rate": 0.0003783471275559883,
+      "loss": 0.0644,
+      "step": 2500
+    },
+    {
+      "epoch": 0.5736576411197797,
+      "eval_accuracy": 0.9802089297119565,
+      "eval_f1": 0.9567031119548932,
+      "eval_loss": 0.060597751289606094,
+      "eval_precision": 0.9516312084805371,
+      "eval_recall": 0.9679770998439283,
+      "eval_runtime": 122.2434,
+      "eval_samples_per_second": 162.954,
+      "eval_steps_per_second": 10.185,
+      "step": 2500
+    },
+    {
+      "epoch": 0.5851307939421753,
+      "grad_norm": 0.2936118543148041,
+      "learning_rate": 0.00037530428432327166,
+      "loss": 0.0726,
+      "step": 2550
+    },
+    {
+      "epoch": 0.596603946764571,
+      "grad_norm": 0.29937854409217834,
+      "learning_rate": 0.00037226144109055506,
+      "loss": 0.0646,
+      "step": 2600
+    },
+    {
+      "epoch": 0.6080770995869665,
+      "grad_norm": 0.2445821762084961,
+      "learning_rate": 0.00036921859785783836,
+      "loss": 0.0666,
+      "step": 2650
+    },
+    {
+      "epoch": 0.6195502524093621,
+      "grad_norm": 0.36757129430770874,
+      "learning_rate": 0.0003661757546251217,
+      "loss": 0.0773,
+      "step": 2700
+    },
+    {
+      "epoch": 0.6310234052317577,
+      "grad_norm": 0.16537484526634216,
+      "learning_rate": 0.0003631329113924051,
+      "loss": 0.0605,
+      "step": 2750
+    },
+    {
+      "epoch": 0.6424965580541533,
+      "grad_norm": 0.4437476396560669,
+      "learning_rate": 0.00036009006815968843,
+      "loss": 0.0641,
+      "step": 2800
+    },
+    {
+      "epoch": 0.6539697108765489,
+      "grad_norm": 0.29756319522857666,
+      "learning_rate": 0.0003570472249269718,
+      "loss": 0.0611,
+      "step": 2850
+    },
+    {
+      "epoch": 0.6654428636989445,
+      "grad_norm": 0.22879567742347717,
+      "learning_rate": 0.00035400438169425514,
+      "loss": 0.0591,
+      "step": 2900
+    },
+    {
+      "epoch": 0.67691601652134,
+      "grad_norm": 0.4005909264087677,
+      "learning_rate": 0.00035096153846153844,
+      "loss": 0.0648,
+      "step": 2950
+    },
+    {
+      "epoch": 0.6883891693437356,
+      "grad_norm": 0.4542585611343384,
+      "learning_rate": 0.0003479186952288218,
+      "loss": 0.0574,
+      "step": 3000
+    },
+    {
+      "epoch": 0.6883891693437356,
+      "eval_accuracy": 0.9813785676808373,
+      "eval_f1": 0.9602148371355144,
+      "eval_loss": 0.056131936609745026,
+      "eval_precision": 0.9611373242869105,
+      "eval_recall": 0.9617183217159495,
+      "eval_runtime": 119.7516,
+      "eval_samples_per_second": 166.344,
+      "eval_steps_per_second": 10.397,
+      "step": 3000
+    },
+    {
+      "epoch": 0.6998623221661312,
+      "grad_norm": 0.16639067232608795,
+      "learning_rate": 0.00034487585199610514,
+      "loss": 0.0583,
+      "step": 3050
+    },
+    {
+      "epoch": 0.7113354749885269,
+      "grad_norm": 0.24682307243347168,
+      "learning_rate": 0.0003418330087633885,
+      "loss": 0.0599,
+      "step": 3100
+    },
+    {
+      "epoch": 0.7228086278109225,
+      "grad_norm": 0.8064585328102112,
+      "learning_rate": 0.00033879016553067185,
+      "loss": 0.0572,
+      "step": 3150
+    },
+    {
+      "epoch": 0.7342817806333181,
+      "grad_norm": 0.19956223666667938,
+      "learning_rate": 0.00033574732229795526,
+      "loss": 0.0582,
+      "step": 3200
+    },
+    {
+      "epoch": 0.7457549334557136,
+      "grad_norm": 0.24573862552642822,
+      "learning_rate": 0.00033270447906523856,
+      "loss": 0.0641,
+      "step": 3250
+    },
+    {
+      "epoch": 0.7572280862781092,
+      "grad_norm": 0.2404450923204422,
+      "learning_rate": 0.0003296616358325219,
+      "loss": 0.0662,
+      "step": 3300
+    },
+    {
+      "epoch": 0.7687012391005048,
+      "grad_norm": 0.2951129376888275,
+      "learning_rate": 0.00032661879259980527,
+      "loss": 0.0593,
+      "step": 3350
+    },
+    {
+      "epoch": 0.7801743919229004,
+      "grad_norm": 0.27735939621925354,
+      "learning_rate": 0.0003235759493670886,
+      "loss": 0.0551,
+      "step": 3400
+    },
+    {
+      "epoch": 0.791647544745296,
+      "grad_norm": 0.22863982617855072,
+      "learning_rate": 0.000320533106134372,
+      "loss": 0.0528,
+      "step": 3450
+    },
+    {
+      "epoch": 0.8031206975676916,
+      "grad_norm": 0.15240560472011566,
+      "learning_rate": 0.00031749026290165533,
+      "loss": 0.0607,
+      "step": 3500
+    },
+    {
+      "epoch": 0.8031206975676916,
+      "eval_accuracy": 0.9817161352139049,
+      "eval_f1": 0.9618400997245777,
+      "eval_loss": 0.05566277727484703,
+      "eval_precision": 0.9594136950612108,
+      "eval_recall": 0.9670788559757807,
+      "eval_runtime": 119.7185,
+      "eval_samples_per_second": 166.39,
+      "eval_steps_per_second": 10.399,
+      "step": 3500
+    },
+    {
+      "epoch": 0.8145938503900872,
+      "grad_norm": 0.22435520589351654,
+      "learning_rate": 0.00031444741966893863,
+      "loss": 0.0619,
+      "step": 3550
+    },
+    {
+      "epoch": 0.8260670032124828,
+      "grad_norm": 0.23223020136356354,
+      "learning_rate": 0.000311404576436222,
+      "loss": 0.0563,
+      "step": 3600
+    },
+    {
+      "epoch": 0.8375401560348784,
+      "grad_norm": 0.3050450384616852,
+      "learning_rate": 0.00030836173320350534,
+      "loss": 0.0581,
+      "step": 3650
+    },
+    {
+      "epoch": 0.849013308857274,
+      "grad_norm": 0.2995171546936035,
+      "learning_rate": 0.0003053188899707887,
+      "loss": 0.0539,
+      "step": 3700
+    },
+    {
+      "epoch": 0.8604864616796696,
+      "grad_norm": 0.25285205245018005,
+      "learning_rate": 0.00030227604673807205,
+      "loss": 0.0597,
+      "step": 3750
+    },
+    {
+      "epoch": 0.8719596145020652,
+      "grad_norm": 0.4498445689678192,
+      "learning_rate": 0.00029923320350535546,
+      "loss": 0.0582,
+      "step": 3800
+    },
+    {
+      "epoch": 0.8834327673244607,
+      "grad_norm": 0.24611692130565643,
+      "learning_rate": 0.00029619036027263876,
+      "loss": 0.0568,
+      "step": 3850
+    },
+    {
+      "epoch": 0.8949059201468563,
+      "grad_norm": 0.3124069571495056,
+      "learning_rate": 0.0002931475170399221,
+      "loss": 0.0591,
+      "step": 3900
+    },
+    {
+      "epoch": 0.9063790729692519,
+      "grad_norm": 0.2108747363090515,
+      "learning_rate": 0.00029010467380720547,
+      "loss": 0.0548,
+      "step": 3950
+    },
+    {
+      "epoch": 0.9178522257916475,
+      "grad_norm": 0.22898589074611664,
+      "learning_rate": 0.0002870618305744888,
+      "loss": 0.0603,
+      "step": 4000
+    },
+    {
+      "epoch": 0.9178522257916475,
+      "eval_accuracy": 0.9814983929773434,
+      "eval_f1": 0.9609698369421166,
+      "eval_loss": 0.05435480922460556,
+      "eval_precision": 0.9558103401819693,
+      "eval_recall": 0.9701771464195363,
+      "eval_runtime": 119.6005,
+      "eval_samples_per_second": 166.554,
+      "eval_steps_per_second": 10.41,
+      "step": 4000
+    },
+    {
+      "epoch": 0.9293253786140432,
+      "grad_norm": 0.27442702651023865,
+      "learning_rate": 0.0002840189873417722,
+      "loss": 0.0596,
+      "step": 4050
+    },
+    {
+      "epoch": 0.9407985314364388,
+      "grad_norm": 0.1897002011537552,
+      "learning_rate": 0.00028097614410905553,
+      "loss": 0.0575,
+      "step": 4100
+    },
+    {
+      "epoch": 0.9522716842588343,
+      "grad_norm": 0.31244686245918274,
+      "learning_rate": 0.00027793330087633883,
+      "loss": 0.0569,
+      "step": 4150
+    },
+    {
+      "epoch": 0.9637448370812299,
+      "grad_norm": 0.23371103405952454,
+      "learning_rate": 0.0002748904576436222,
+      "loss": 0.0582,
+      "step": 4200
+    },
+    {
+      "epoch": 0.9752179899036255,
+      "grad_norm": 0.2830590307712555,
+      "learning_rate": 0.00027184761441090554,
+      "loss": 0.0551,
+      "step": 4250
+    },
+    {
+      "epoch": 0.9866911427260211,
+      "grad_norm": 0.17691777646541595,
+      "learning_rate": 0.0002688047711781889,
+      "loss": 0.0556,
+      "step": 4300
+    },
+    {
+      "epoch": 0.9981642955484167,
+      "grad_norm": 0.32038599252700806,
+      "learning_rate": 0.00026576192794547224,
+      "loss": 0.0524,
+      "step": 4350
+    },
+    {
+      "epoch": 1.0096374483708124,
+      "grad_norm": 0.1972804069519043,
+      "learning_rate": 0.00026271908471275565,
+      "loss": 0.0521,
+      "step": 4400
+    },
+    {
+      "epoch": 1.0211106011932078,
+      "grad_norm": 0.35761380195617676,
+      "learning_rate": 0.00025967624148003895,
+      "loss": 0.0572,
+      "step": 4450
+    },
+    {
+      "epoch": 1.0325837540156035,
+      "grad_norm": 0.285580039024353,
+      "learning_rate": 0.0002566333982473223,
+      "loss": 0.0487,
+      "step": 4500
+    },
+    {
+      "epoch": 1.0325837540156035,
+      "eval_accuracy": 0.9825631838117813,
+      "eval_f1": 0.9641975912207242,
+      "eval_loss": 0.05233108997344971,
+      "eval_precision": 0.9613379614792094,
+      "eval_recall": 0.9699254645627606,
+      "eval_runtime": 119.1944,
+      "eval_samples_per_second": 167.122,
+      "eval_steps_per_second": 10.445,
+      "step": 4500
+    },
+    {
+      "epoch": 1.044056906837999,
+      "grad_norm": 0.2022152990102768,
+      "learning_rate": 0.00025359055501460566,
+      "loss": 0.0539,
+      "step": 4550
+    },
+    {
+      "epoch": 1.0555300596603947,
+      "grad_norm": 0.29692327976226807,
+      "learning_rate": 0.000250547711781889,
+      "loss": 0.047,
+      "step": 4600
+    },
+    {
+      "epoch": 1.0670032124827902,
+      "grad_norm": 0.2476482093334198,
+      "learning_rate": 0.0002475048685491723,
+      "loss": 0.053,
+      "step": 4650
+    },
+    {
+      "epoch": 1.0784763653051859,
+      "grad_norm": 0.17114070057868958,
+      "learning_rate": 0.0002444620253164557,
+      "loss": 0.0519,
+      "step": 4700
+    },
+    {
+      "epoch": 1.0899495181275816,
+      "grad_norm": 0.11371100693941116,
+      "learning_rate": 0.00024141918208373905,
+      "loss": 0.0547,
+      "step": 4750
+    },
+    {
+      "epoch": 1.101422670949977,
+      "grad_norm": 0.25711262226104736,
+      "learning_rate": 0.00023837633885102238,
+      "loss": 0.0543,
+      "step": 4800
+    },
+    {
+      "epoch": 1.1128958237723727,
+      "grad_norm": 0.2982866168022156,
+      "learning_rate": 0.00023533349561830576,
+      "loss": 0.0561,
+      "step": 4850
+    },
+    {
+      "epoch": 1.1243689765947682,
+      "grad_norm": 0.3269876539707184,
+      "learning_rate": 0.00023229065238558911,
+      "loss": 0.0494,
+      "step": 4900
+    },
+    {
+      "epoch": 1.135842129417164,
+      "grad_norm": 0.26729336380958557,
+      "learning_rate": 0.00022924780915287244,
+      "loss": 0.047,
+      "step": 4950
+    },
+    {
+      "epoch": 1.1473152822395594,
+      "grad_norm": 0.39272695779800415,
+      "learning_rate": 0.0002262049659201558,
+      "loss": 0.0517,
+      "step": 5000
+    },
+    {
+      "epoch": 1.1473152822395594,
+      "eval_accuracy": 0.9833051617205572,
+      "eval_f1": 0.9649115475893709,
+      "eval_loss": 0.05118980631232262,
+      "eval_precision": 0.96238450457423,
+      "eval_recall": 0.9685961793536983,
+      "eval_runtime": 120.695,
+      "eval_samples_per_second": 165.044,
+      "eval_steps_per_second": 10.315,
+      "step": 5000
+    },
+    {
+      "epoch": 1.158788435061955,
+      "grad_norm": 0.2898092567920685,
+      "learning_rate": 0.00022316212268743915,
+      "loss": 0.0499,
+      "step": 5050
+    },
+    {
+      "epoch": 1.1702615878843505,
+      "grad_norm": 0.20259062945842743,
+      "learning_rate": 0.00022011927945472248,
+      "loss": 0.0506,
+      "step": 5100
+    },
+    {
+      "epoch": 1.1817347407067462,
+      "grad_norm": 0.26172712445259094,
+      "learning_rate": 0.00021707643622200586,
+      "loss": 0.0512,
+      "step": 5150
+    },
+    {
+      "epoch": 1.193207893529142,
+      "grad_norm": 0.26839691400527954,
+      "learning_rate": 0.0002140335929892892,
+      "loss": 0.0524,
+      "step": 5200
+    },
+    {
+      "epoch": 1.2046810463515374,
+      "grad_norm": 0.19788499176502228,
+      "learning_rate": 0.00021099074975657254,
+      "loss": 0.0532,
+      "step": 5250
+    },
+    {
+      "epoch": 1.216154199173933,
+      "grad_norm": 0.22159354388713837,
+      "learning_rate": 0.0002079479065238559,
+      "loss": 0.0539,
+      "step": 5300
+    },
+    {
+      "epoch": 1.2276273519963286,
+      "grad_norm": 0.274666428565979,
+      "learning_rate": 0.00020490506329113925,
+      "loss": 0.0533,
+      "step": 5350
+    },
+    {
+      "epoch": 1.2391005048187242,
+      "grad_norm": 0.2635292410850525,
+      "learning_rate": 0.00020186222005842257,
+      "loss": 0.0511,
+      "step": 5400
+    },
+    {
+      "epoch": 1.2505736576411197,
+      "grad_norm": 0.19532188773155212,
+      "learning_rate": 0.00019881937682570596,
+      "loss": 0.0493,
+      "step": 5450
+    },
+    {
+      "epoch": 1.2620468104635154,
+      "grad_norm": 0.17796900868415833,
+      "learning_rate": 0.0001957765335929893,
+      "loss": 0.049,
+      "step": 5500
+    },
+    {
+      "epoch": 1.2620468104635154,
+      "eval_accuracy": 0.9832237878251686,
+      "eval_f1": 0.9660702602061538,
+      "eval_loss": 0.05031489580869675,
+      "eval_precision": 0.9649079959020276,
+      "eval_recall": 0.9685498930352109,
+      "eval_runtime": 121.0196,
+      "eval_samples_per_second": 164.601,
+      "eval_steps_per_second": 10.288,
+      "step": 5500
+    },
+    {
+      "epoch": 1.2735199632859109,
+      "grad_norm": 0.2736414968967438,
+      "learning_rate": 0.00019273369036027264,
+      "loss": 0.0518,
+      "step": 5550
+    },
+    {
+      "epoch": 1.2849931161083066,
+      "grad_norm": 0.27350395917892456,
+      "learning_rate": 0.000189690847127556,
+      "loss": 0.0463,
+      "step": 5600
+    },
+    {
+      "epoch": 1.2964662689307023,
+      "grad_norm": 0.20141524076461792,
+      "learning_rate": 0.00018664800389483935,
+      "loss": 0.0531,
+      "step": 5650
+    },
+    {
+      "epoch": 1.3079394217530977,
+      "grad_norm": 0.2544547915458679,
+      "learning_rate": 0.00018360516066212267,
+      "loss": 0.0499,
+      "step": 5700
+    },
+    {
+      "epoch": 1.3194125745754932,
+      "grad_norm": 0.15668709576129913,
+      "learning_rate": 0.00018056231742940605,
+      "loss": 0.0489,
+      "step": 5750
+    },
+    {
+      "epoch": 1.330885727397889,
+      "grad_norm": 0.254363089799881,
+      "learning_rate": 0.0001775194741966894,
+      "loss": 0.0514,
+      "step": 5800
+    },
+    {
+      "epoch": 1.3423588802202846,
+      "grad_norm": 0.34548887610435486,
+      "learning_rate": 0.00017447663096397273,
+      "loss": 0.0535,
+      "step": 5850
+    },
+    {
+      "epoch": 1.35383203304268,
+      "grad_norm": 0.2949070334434509,
+      "learning_rate": 0.0001714337877312561,
+      "loss": 0.0483,
+      "step": 5900
+    },
+    {
+      "epoch": 1.3653051858650758,
+      "grad_norm": 0.18696388602256775,
+      "learning_rate": 0.00016839094449853944,
+      "loss": 0.0491,
+      "step": 5950
+    },
+    {
+      "epoch": 1.3767783386874712,
+      "grad_norm": 0.1810847669839859,
+      "learning_rate": 0.00016534810126582277,
+      "loss": 0.045,
+      "step": 6000
+    },
+    {
+      "epoch": 1.3767783386874712,
+      "eval_accuracy": 0.9835904174637322,
+      "eval_f1": 0.9660336726703211,
+      "eval_loss": 0.04917869716882706,
+      "eval_precision": 0.9631038353574605,
+      "eval_recall": 0.9706993139499732,
+      "eval_runtime": 119.0826,
+      "eval_samples_per_second": 167.279,
+      "eval_steps_per_second": 10.455,
+      "step": 6000
+    },
+    {
+      "epoch": 1.388251491509867,
+      "grad_norm": 0.15973134338855743,
+      "learning_rate": 0.00016230525803310615,
+      "loss": 0.0444,
+      "step": 6050
+    },
+    {
+      "epoch": 1.3997246443322626,
+      "grad_norm": 0.24404683709144592,
+      "learning_rate": 0.0001592624148003895,
+      "loss": 0.046,
+      "step": 6100
+    },
+    {
+      "epoch": 1.411197797154658,
+      "grad_norm": 0.3005557656288147,
+      "learning_rate": 0.00015621957156767283,
+      "loss": 0.0525,
+      "step": 6150
+    },
+    {
+      "epoch": 1.4226709499770536,
+      "grad_norm": 0.2723326086997986,
+      "learning_rate": 0.0001531767283349562,
+      "loss": 0.0487,
+      "step": 6200
+    },
+    {
+      "epoch": 1.4341441027994493,
+      "grad_norm": 0.2460734099149704,
+      "learning_rate": 0.00015013388510223954,
+      "loss": 0.0455,
+      "step": 6250
+    },
+    {
+      "epoch": 1.445617255621845,
+      "grad_norm": 0.32472339272499084,
+      "learning_rate": 0.00014709104186952287,
+      "loss": 0.0444,
+      "step": 6300
+    },
+    {
+      "epoch": 1.4570904084442404,
+      "grad_norm": 0.35655590891838074,
+      "learning_rate": 0.00014404819863680625,
+      "loss": 0.053,
+      "step": 6350
+    },
+    {
+      "epoch": 1.4685635612666361,
+      "grad_norm": 0.23245665431022644,
+      "learning_rate": 0.0001410053554040896,
+      "loss": 0.0472,
+      "step": 6400
+    },
+    {
+      "epoch": 1.4800367140890316,
+      "grad_norm": 0.16704195737838745,
+      "learning_rate": 0.00013796251217137293,
+      "loss": 0.0483,
+      "step": 6450
+    },
+    {
+      "epoch": 1.4915098669114273,
+      "grad_norm": 0.16793860495090485,
+      "learning_rate": 0.00013491966893865629,
+      "loss": 0.0453,
+      "step": 6500
+    },
+    {
+      "epoch": 1.4915098669114273,
+      "eval_accuracy": 0.9840196424064407,
+      "eval_f1": 0.9677403990141918,
+      "eval_loss": 0.04730767011642456,
+      "eval_precision": 0.9666379623287346,
+      "eval_recall": 0.9698169810038056,
+      "eval_runtime": 119.877,
+      "eval_samples_per_second": 166.17,
+      "eval_steps_per_second": 10.386,
+      "step": 6500
+    },
+    {
+      "epoch": 1.502983019733823,
+      "grad_norm": 0.24766087532043457,
+      "learning_rate": 0.00013187682570593964,
+      "loss": 0.0433,
+      "step": 6550
+    },
+    {
+      "epoch": 1.5144561725562184,
+      "grad_norm": 0.28402864933013916,
+      "learning_rate": 0.00012883398247322297,
+      "loss": 0.0451,
+      "step": 6600
+    },
+    {
+      "epoch": 1.525929325378614,
+      "grad_norm": 0.23255027830600739,
+      "learning_rate": 0.00012579113924050635,
+      "loss": 0.0494,
+      "step": 6650
+    },
+    {
+      "epoch": 1.5374024782010096,
+      "grad_norm": 0.2084839642047882,
+      "learning_rate": 0.00012274829600778967,
+      "loss": 0.0524,
+      "step": 6700
+    },
+    {
+      "epoch": 1.5488756310234053,
+      "grad_norm": 0.2631557285785675,
+      "learning_rate": 0.00011970545277507303,
+      "loss": 0.0447,
+      "step": 6750
+    },
+    {
+      "epoch": 1.5603487838458008,
+      "grad_norm": 0.203273743391037,
+      "learning_rate": 0.00011666260954235638,
+      "loss": 0.046,
+      "step": 6800
+    },
+    {
+      "epoch": 1.5718219366681965,
+      "grad_norm": 0.17683915793895721,
+      "learning_rate": 0.00011361976630963972,
+      "loss": 0.0488,
+      "step": 6850
+    },
+    {
+      "epoch": 1.583295089490592,
+      "grad_norm": 0.22857971489429474,
+      "learning_rate": 0.00011057692307692308,
+      "loss": 0.047,
+      "step": 6900
+    },
+    {
+      "epoch": 1.5947682423129876,
+      "grad_norm": 0.13033239543437958,
+      "learning_rate": 0.00010753407984420643,
+      "loss": 0.0477,
+      "step": 6950
+    },
+    {
+      "epoch": 1.6062413951353833,
+      "grad_norm": 0.1830282211303711,
+      "learning_rate": 0.00010449123661148977,
+      "loss": 0.0455,
+      "step": 7000
+    },
+    {
+      "epoch": 1.6062413951353833,
+      "eval_accuracy": 0.9839478813613316,
+      "eval_f1": 0.9673812546821595,
+      "eval_loss": 0.04686596244573593,
+      "eval_precision": 0.9634884038843466,
+      "eval_recall": 0.9731322385654713,
+      "eval_runtime": 121.2935,
+      "eval_samples_per_second": 164.23,
+      "eval_steps_per_second": 10.264,
+      "step": 7000
+    },
+    {
+      "epoch": 1.6177145479577788,
+      "grad_norm": 0.262114942073822,
+      "learning_rate": 0.00010144839337877313,
+      "loss": 0.0491,
+      "step": 7050
+    },
+    {
+      "epoch": 1.6291877007801743,
+      "grad_norm": 0.2282840460538864,
+      "learning_rate": 9.840555014605648e-05,
+      "loss": 0.0478,
+      "step": 7100
+    },
+    {
+      "epoch": 1.64066085360257,
+      "grad_norm": 0.3352207839488983,
+      "learning_rate": 9.536270691333982e-05,
+      "loss": 0.047,
+      "step": 7150
+    },
+    {
+      "epoch": 1.6521340064249657,
+      "grad_norm": 0.21367865800857544,
+      "learning_rate": 9.231986368062318e-05,
+      "loss": 0.0438,
+      "step": 7200
+    },
+    {
+      "epoch": 1.6636071592473611,
+      "grad_norm": 0.29630246758461,
+      "learning_rate": 8.927702044790653e-05,
+      "loss": 0.0476,
+      "step": 7250
+    },
+    {
+      "epoch": 1.6750803120697566,
+      "grad_norm": 0.18861912190914154,
+      "learning_rate": 8.623417721518987e-05,
+      "loss": 0.0404,
+      "step": 7300
+    },
+    {
+      "epoch": 1.6865534648921523,
+      "grad_norm": 0.24099300801753998,
+      "learning_rate": 8.319133398247323e-05,
+      "loss": 0.0438,
+      "step": 7350
+    },
+    {
+      "epoch": 1.698026617714548,
+      "grad_norm": 0.4395439326763153,
+      "learning_rate": 8.014849074975658e-05,
+      "loss": 0.0434,
+      "step": 7400
+    },
+    {
+      "epoch": 1.7094997705369437,
+      "grad_norm": 0.30208903551101685,
+      "learning_rate": 7.710564751703992e-05,
+      "loss": 0.0459,
+      "step": 7450
+    },
+    {
+      "epoch": 1.7209729233593392,
+      "grad_norm": 0.2277510166168213,
+      "learning_rate": 7.406280428432327e-05,
+      "loss": 0.0445,
+      "step": 7500
+    },
+    {
+      "epoch": 1.7209729233593392,
+      "eval_accuracy": 0.9839078650776103,
+      "eval_f1": 0.9673493479575279,
+      "eval_loss": 0.04653547704219818,
+      "eval_precision": 0.9639073876576003,
+      "eval_recall": 0.9731134347485857,
+      "eval_runtime": 120.0655,
+      "eval_samples_per_second": 165.909,
+      "eval_steps_per_second": 10.369,
+      "step": 7500
+    },
+    {
+      "epoch": 1.7324460761817346,
+      "grad_norm": 0.26645800471305847,
+      "learning_rate": 7.101996105160661e-05,
+      "loss": 0.0487,
+      "step": 7550
+    },
+    {
+      "epoch": 1.7439192290041303,
+      "grad_norm": 0.254226416349411,
+      "learning_rate": 6.797711781888997e-05,
+      "loss": 0.0476,
+      "step": 7600
+    },
+    {
+      "epoch": 1.755392381826526,
+      "grad_norm": 0.35157519578933716,
+      "learning_rate": 6.493427458617332e-05,
+      "loss": 0.047,
+      "step": 7650
+    },
+    {
+      "epoch": 1.7668655346489215,
+      "grad_norm": 0.28356507420539856,
+      "learning_rate": 6.189143135345668e-05,
+      "loss": 0.0453,
+      "step": 7700
+    },
+    {
+      "epoch": 1.778338687471317,
+      "grad_norm": 0.24292881786823273,
+      "learning_rate": 5.8848588120740025e-05,
+      "loss": 0.043,
+      "step": 7750
+    },
+    {
+      "epoch": 1.7898118402937127,
+      "grad_norm": 0.23205772042274475,
+      "learning_rate": 5.5805744888023366e-05,
+      "loss": 0.047,
+      "step": 7800
+    },
+    {
+      "epoch": 1.8012849931161083,
+      "grad_norm": 0.22511674463748932,
+      "learning_rate": 5.276290165530672e-05,
+      "loss": 0.0485,
+      "step": 7850
+    },
+    {
+      "epoch": 1.812758145938504,
+      "grad_norm": 0.11639175564050674,
+      "learning_rate": 4.9720058422590074e-05,
+      "loss": 0.0468,
+      "step": 7900
+    },
+    {
+      "epoch": 1.8242312987608995,
+      "grad_norm": 0.3817405104637146,
+      "learning_rate": 4.6677215189873415e-05,
+      "loss": 0.0436,
+      "step": 7950
+    },
+    {
+      "epoch": 1.835704451583295,
+      "grad_norm": 0.23685210943222046,
+      "learning_rate": 4.363437195715677e-05,
+      "loss": 0.0485,
+      "step": 8000
+    },
+    {
+      "epoch": 1.835704451583295,
+      "eval_accuracy": 0.9847846464449556,
+      "eval_f1": 0.9692734951155788,
+      "eval_loss": 0.044191647320985794,
+      "eval_precision": 0.9682748217756245,
+      "eval_recall": 0.9715440392623697,
+      "eval_runtime": 119.9631,
+      "eval_samples_per_second": 166.051,
+      "eval_steps_per_second": 10.378,
+      "step": 8000
+    }
+  ],
+  "logging_steps": 50,
+  "max_steps": 8716,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 2,
+  "save_steps": 1000,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": false
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 1.9995964607635416e+16,
+  "train_batch_size": 16,
+  "trial_name": null,
+  "trial_params": null
+}

checkpoint-8000/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:c25f499fd0637dd40aacba2ee37a71fbf01438bc7cbb57bc752e6d4140572a35
+size 5841

checkpoint-8716/README.md ADDED Viewed

	@@ -0,0 +1,206 @@

+---
+base_model: EvanD/xlm-roberta-base-romanian-ner-ronec
+library_name: peft
+tags:
+- base_model:adapter:EvanD/xlm-roberta-base-romanian-ner-ronec
+- lora
+- transformers
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.17.1

checkpoint-8716/adapter_config.json ADDED Viewed

	@@ -0,0 +1,42 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "EvanD/xlm-roberta-base-romanian-ner-ronec",
+  "bias": "none",
+  "corda_config": null,
+  "eva_config": null,
+  "exclude_modules": null,
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 32,
+  "lora_bias": false,
+  "lora_dropout": 0.1,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": [
+    "classifier",
+    "score"
+  ],
+  "peft_type": "LORA",
+  "qalora_group_size": 16,
+  "r": 16,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "dense",
+    "query",
+    "value",
+    "key"
+  ],
+  "target_parameters": null,
+  "task_type": "TOKEN_CLS",
+  "trainable_token_indices": null,
+  "use_dora": false,
+  "use_qalora": false,
+  "use_rslora": false
+}

checkpoint-8716/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:0aa903eab75c938a4c8495d5d2b7bb8390f6f8ea49fb0c40f27161165064bab1
+size 10899068

checkpoint-8716/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:06400082ba0869f4235df57463a777e00331795fad4823bb94770ae8a016db0b
+size 21881739

checkpoint-8716/rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:658f87d3c9d6d7e83e7fe1e6c105d7d265e1bd84f0ef76d24771b35bc48a2da4
+size 14645

checkpoint-8716/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:8ae2b9cfdc80cf085e07da3a22852ba2187f065729233222b1397cc6fe83f7fa
+size 1465

checkpoint-8716/special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,51 @@

+{
+  "bos_token": {
+    "content": "<s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "cls_token": {
+    "content": "<s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "eos_token": {
+    "content": "</s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "mask_token": {
+    "content": "<mask>",
+    "lstrip": true,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "<pad>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "sep_token": {
+    "content": "</s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "unk_token": {
+    "content": "<unk>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

checkpoint-8716/tokenizer.json ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:8373f9cd3d27591e1924426bcc1c8799bc5a9affc4fc857982c5d66668dd1f41
+size 17082832

checkpoint-8716/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,59 @@

+{
+  "added_tokens_decoder": {
+    "0": {
+      "content": "<s>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "1": {
+      "content": "<pad>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "2": {
+      "content": "</s>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "3": {
+      "content": "<unk>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "250001": {
+      "content": "<mask>",
+      "lstrip": true,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "bos_token": "<s>",
+  "clean_up_tokenization_spaces": true,
+  "cls_token": "<s>",
+  "eos_token": "</s>",
+  "extra_special_tokens": {},
+  "mask_token": "<mask>",
+  "max_length": 512,
+  "model_max_length": 512,
+  "pad_token": "<pad>",
+  "sep_token": "</s>",
+  "stride": 0,
+  "tokenizer_class": "XLMRobertaTokenizer",
+  "truncation_side": "right",
+  "truncation_strategy": "longest_first",
+  "unk_token": "<unk>"
+}

checkpoint-8716/trainer_state.json ADDED Viewed

	@@ -0,0 +1,1456 @@

+{
+  "best_global_step": 8500,
+  "best_metric": 0.9694837693189375,
+  "best_model_checkpoint": "./models/financial_adapter_20250914_060658/checkpoint-8000",
+  "epoch": 2.0,
+  "eval_steps": 500,
+  "global_step": 8716,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.011473152822395595,
+      "grad_norm": 6.211507797241211,
+      "learning_rate": 4.9000000000000005e-05,
+      "loss": 3.7315,
+      "step": 50
+    },
+    {
+      "epoch": 0.02294630564479119,
+      "grad_norm": 0.805833637714386,
+      "learning_rate": 9.900000000000001e-05,
+      "loss": 1.2821,
+      "step": 100
+    },
+    {
+      "epoch": 0.03441945846718678,
+      "grad_norm": 0.7927971482276917,
+      "learning_rate": 0.000149,
+      "loss": 0.644,
+      "step": 150
+    },
+    {
+      "epoch": 0.04589261128958238,
+      "grad_norm": 0.95986407995224,
+      "learning_rate": 0.000199,
+      "loss": 0.3733,
+      "step": 200
+    },
+    {
+      "epoch": 0.05736576411197797,
+      "grad_norm": 0.7364535927772522,
+      "learning_rate": 0.000249,
+      "loss": 0.2499,
+      "step": 250
+    },
+    {
+      "epoch": 0.06883891693437356,
+      "grad_norm": 0.8872820734977722,
+      "learning_rate": 0.000299,
+      "loss": 0.1937,
+      "step": 300
+    },
+    {
+      "epoch": 0.08031206975676916,
+      "grad_norm": 0.5113154053688049,
+      "learning_rate": 0.00034899999999999997,
+      "loss": 0.1646,
+      "step": 350
+    },
+    {
+      "epoch": 0.09178522257916476,
+      "grad_norm": 0.7045756578445435,
+      "learning_rate": 0.00039900000000000005,
+      "loss": 0.1466,
+      "step": 400
+    },
+    {
+      "epoch": 0.10325837540156035,
+      "grad_norm": 0.5810624957084656,
+      "learning_rate": 0.000449,
+      "loss": 0.1314,
+      "step": 450
+    },
+    {
+      "epoch": 0.11473152822395594,
+      "grad_norm": 0.5461702346801758,
+      "learning_rate": 0.000499,
+      "loss": 0.1276,
+      "step": 500
+    },
+    {
+      "epoch": 0.11473152822395594,
+      "eval_accuracy": 0.970071396651017,
+      "eval_f1": 0.931817690038444,
+      "eval_loss": 0.1108999103307724,
+      "eval_precision": 0.9314176027125088,
+      "eval_recall": 0.9343327320933421,
+      "eval_runtime": 138.5938,
+      "eval_samples_per_second": 143.729,
+      "eval_steps_per_second": 8.983,
+      "step": 500
+    },
+    {
+      "epoch": 0.12620468104635155,
+      "grad_norm": 0.5111148357391357,
+      "learning_rate": 0.0004970180136319377,
+      "loss": 0.1157,
+      "step": 550
+    },
+    {
+      "epoch": 0.13767783386874713,
+      "grad_norm": 0.5293119549751282,
+      "learning_rate": 0.000493975170399221,
+      "loss": 0.1091,
+      "step": 600
+    },
+    {
+      "epoch": 0.14915098669114274,
+      "grad_norm": 0.5645154714584351,
+      "learning_rate": 0.0004909323271665044,
+      "loss": 0.1054,
+      "step": 650
+    },
+    {
+      "epoch": 0.16062413951353832,
+      "grad_norm": 0.3088572025299072,
+      "learning_rate": 0.0004878894839337877,
+      "loss": 0.1017,
+      "step": 700
+    },
+    {
+      "epoch": 0.1720972923359339,
+      "grad_norm": 0.3965695798397064,
+      "learning_rate": 0.0004848466407010711,
+      "loss": 0.0881,
+      "step": 750
+    },
+    {
+      "epoch": 0.18357044515832951,
+      "grad_norm": 0.44487106800079346,
+      "learning_rate": 0.0004818037974683545,
+      "loss": 0.0908,
+      "step": 800
+    },
+    {
+      "epoch": 0.1950435979807251,
+      "grad_norm": 0.5573059320449829,
+      "learning_rate": 0.00047876095423563783,
+      "loss": 0.0923,
+      "step": 850
+    },
+    {
+      "epoch": 0.2065167508031207,
+      "grad_norm": 0.242904931306839,
+      "learning_rate": 0.00047571811100292113,
+      "loss": 0.0893,
+      "step": 900
+    },
+    {
+      "epoch": 0.2179899036255163,
+      "grad_norm": 0.4724123477935791,
+      "learning_rate": 0.0004726752677702045,
+      "loss": 0.0878,
+      "step": 950
+    },
+    {
+      "epoch": 0.22946305644791187,
+      "grad_norm": 0.32161369919776917,
+      "learning_rate": 0.00046963242453748784,
+      "loss": 0.0816,
+      "step": 1000
+    },
+    {
+      "epoch": 0.22946305644791187,
+      "eval_accuracy": 0.977059938804148,
+      "eval_f1": 0.9485042354538835,
+      "eval_loss": 0.07678939402103424,
+      "eval_precision": 0.948923113654349,
+      "eval_recall": 0.9476371557635869,
+      "eval_runtime": 118.5842,
+      "eval_samples_per_second": 167.982,
+      "eval_steps_per_second": 10.499,
+      "step": 1000
+    },
+    {
+      "epoch": 0.24093620927030748,
+      "grad_norm": 1.1815085411071777,
+      "learning_rate": 0.0004665895813047712,
+      "loss": 0.0837,
+      "step": 1050
+    },
+    {
+      "epoch": 0.2524093620927031,
+      "grad_norm": 0.26893287897109985,
+      "learning_rate": 0.00046354673807205455,
+      "loss": 0.0816,
+      "step": 1100
+    },
+    {
+      "epoch": 0.2638825149150987,
+      "grad_norm": 0.31159281730651855,
+      "learning_rate": 0.0004605038948393379,
+      "loss": 0.082,
+      "step": 1150
+    },
+    {
+      "epoch": 0.27535566773749426,
+      "grad_norm": 0.3065606951713562,
+      "learning_rate": 0.0004574610516066212,
+      "loss": 0.0742,
+      "step": 1200
+    },
+    {
+      "epoch": 0.28682882055988984,
+      "grad_norm": 0.2774142324924469,
+      "learning_rate": 0.00045441820837390456,
+      "loss": 0.0792,
+      "step": 1250
+    },
+    {
+      "epoch": 0.2983019733822855,
+      "grad_norm": 0.23560093343257904,
+      "learning_rate": 0.0004513753651411879,
+      "loss": 0.0719,
+      "step": 1300
+    },
+    {
+      "epoch": 0.30977512620468106,
+      "grad_norm": 0.29983004927635193,
+      "learning_rate": 0.0004483325219084713,
+      "loss": 0.0722,
+      "step": 1350
+    },
+    {
+      "epoch": 0.32124827902707664,
+      "grad_norm": 0.26415759325027466,
+      "learning_rate": 0.00044528967867575467,
+      "loss": 0.0724,
+      "step": 1400
+    },
+    {
+      "epoch": 0.3327214318494722,
+      "grad_norm": 0.34820911288261414,
+      "learning_rate": 0.000442246835443038,
+      "loss": 0.0756,
+      "step": 1450
+    },
+    {
+      "epoch": 0.3441945846718678,
+      "grad_norm": 0.35296013951301575,
+      "learning_rate": 0.0004392039922103213,
+      "loss": 0.0676,
+      "step": 1500
+    },
+    {
+      "epoch": 0.3441945846718678,
+      "eval_accuracy": 0.9795261937874608,
+      "eval_f1": 0.9560398155032158,
+      "eval_loss": 0.06693130731582642,
+      "eval_precision": 0.9546461779958985,
+      "eval_recall": 0.9579011468881853,
+      "eval_runtime": 122.2351,
+      "eval_samples_per_second": 162.965,
+      "eval_steps_per_second": 10.185,
+      "step": 1500
+    },
+    {
+      "epoch": 0.35566773749426345,
+      "grad_norm": 0.4301516115665436,
+      "learning_rate": 0.0004361611489776047,
+      "loss": 0.0719,
+      "step": 1550
+    },
+    {
+      "epoch": 0.36714089031665903,
+      "grad_norm": 0.3780372440814972,
+      "learning_rate": 0.00043311830574488803,
+      "loss": 0.072,
+      "step": 1600
+    },
+    {
+      "epoch": 0.3786140431390546,
+      "grad_norm": 0.3334237337112427,
+      "learning_rate": 0.0004300754625121714,
+      "loss": 0.0664,
+      "step": 1650
+    },
+    {
+      "epoch": 0.3900871959614502,
+      "grad_norm": 0.21125715970993042,
+      "learning_rate": 0.00042703261927945474,
+      "loss": 0.0708,
+      "step": 1700
+    },
+    {
+      "epoch": 0.4015603487838458,
+      "grad_norm": 0.36593177914619446,
+      "learning_rate": 0.0004239897760467381,
+      "loss": 0.0674,
+      "step": 1750
+    },
+    {
+      "epoch": 0.4130335016062414,
+      "grad_norm": 0.5742707252502441,
+      "learning_rate": 0.0004209469328140214,
+      "loss": 0.0709,
+      "step": 1800
+    },
+    {
+      "epoch": 0.424506654428637,
+      "grad_norm": 0.43670088052749634,
+      "learning_rate": 0.00041790408958130475,
+      "loss": 0.0626,
+      "step": 1850
+    },
+    {
+      "epoch": 0.4359798072510326,
+      "grad_norm": 0.3064088225364685,
+      "learning_rate": 0.0004148612463485881,
+      "loss": 0.067,
+      "step": 1900
+    },
+    {
+      "epoch": 0.44745296007342816,
+      "grad_norm": 0.26380443572998047,
+      "learning_rate": 0.0004118184031158715,
+      "loss": 0.0673,
+      "step": 1950
+    },
+    {
+      "epoch": 0.45892611289582375,
+      "grad_norm": 0.2760469913482666,
+      "learning_rate": 0.00040877555988315487,
+      "loss": 0.0686,
+      "step": 2000
+    },
+    {
+      "epoch": 0.45892611289582375,
+      "eval_accuracy": 0.9792398202709974,
+      "eval_f1": 0.9566771424147363,
+      "eval_loss": 0.06424970924854279,
+      "eval_precision": 0.9518499916341474,
+      "eval_recall": 0.9651666524432667,
+      "eval_runtime": 119.1986,
+      "eval_samples_per_second": 167.116,
+      "eval_steps_per_second": 10.445,
+      "step": 2000
+    },
+    {
+      "epoch": 0.4703992657182194,
+      "grad_norm": 0.36512625217437744,
+      "learning_rate": 0.0004057327166504382,
+      "loss": 0.065,
+      "step": 2050
+    },
+    {
+      "epoch": 0.48187241854061497,
+      "grad_norm": 0.3270319402217865,
+      "learning_rate": 0.0004026898734177215,
+      "loss": 0.0663,
+      "step": 2100
+    },
+    {
+      "epoch": 0.49334557136301055,
+      "grad_norm": 0.2962779700756073,
+      "learning_rate": 0.0003996470301850049,
+      "loss": 0.0681,
+      "step": 2150
+    },
+    {
+      "epoch": 0.5048187241854062,
+      "grad_norm": 0.4675407409667969,
+      "learning_rate": 0.00039660418695228823,
+      "loss": 0.0715,
+      "step": 2200
+    },
+    {
+      "epoch": 0.5162918770078018,
+      "grad_norm": 0.27184540033340454,
+      "learning_rate": 0.0003935613437195716,
+      "loss": 0.066,
+      "step": 2250
+    },
+    {
+      "epoch": 0.5277650298301974,
+      "grad_norm": 0.3290219008922577,
+      "learning_rate": 0.00039051850048685494,
+      "loss": 0.0666,
+      "step": 2300
+    },
+    {
+      "epoch": 0.5392381826525929,
+      "grad_norm": 0.18070432543754578,
+      "learning_rate": 0.00038747565725413824,
+      "loss": 0.0626,
+      "step": 2350
+    },
+    {
+      "epoch": 0.5507113354749885,
+      "grad_norm": 0.27334141731262207,
+      "learning_rate": 0.0003844328140214216,
+      "loss": 0.0595,
+      "step": 2400
+    },
+    {
+      "epoch": 0.5621844882973841,
+      "grad_norm": 0.29450327157974243,
+      "learning_rate": 0.00038138997078870495,
+      "loss": 0.0655,
+      "step": 2450
+    },
+    {
+      "epoch": 0.5736576411197797,
+      "grad_norm": 0.49441081285476685,
+      "learning_rate": 0.0003783471275559883,
+      "loss": 0.0644,
+      "step": 2500
+    },
+    {
+      "epoch": 0.5736576411197797,
+      "eval_accuracy": 0.9802089297119565,
+      "eval_f1": 0.9567031119548932,
+      "eval_loss": 0.060597751289606094,
+      "eval_precision": 0.9516312084805371,
+      "eval_recall": 0.9679770998439283,
+      "eval_runtime": 122.2434,
+      "eval_samples_per_second": 162.954,
+      "eval_steps_per_second": 10.185,
+      "step": 2500
+    },
+    {
+      "epoch": 0.5851307939421753,
+      "grad_norm": 0.2936118543148041,
+      "learning_rate": 0.00037530428432327166,
+      "loss": 0.0726,
+      "step": 2550
+    },
+    {
+      "epoch": 0.596603946764571,
+      "grad_norm": 0.29937854409217834,
+      "learning_rate": 0.00037226144109055506,
+      "loss": 0.0646,
+      "step": 2600
+    },
+    {
+      "epoch": 0.6080770995869665,
+      "grad_norm": 0.2445821762084961,
+      "learning_rate": 0.00036921859785783836,
+      "loss": 0.0666,
+      "step": 2650
+    },
+    {
+      "epoch": 0.6195502524093621,
+      "grad_norm": 0.36757129430770874,
+      "learning_rate": 0.0003661757546251217,
+      "loss": 0.0773,
+      "step": 2700
+    },
+    {
+      "epoch": 0.6310234052317577,
+      "grad_norm": 0.16537484526634216,
+      "learning_rate": 0.0003631329113924051,
+      "loss": 0.0605,
+      "step": 2750
+    },
+    {
+      "epoch": 0.6424965580541533,
+      "grad_norm": 0.4437476396560669,
+      "learning_rate": 0.00036009006815968843,
+      "loss": 0.0641,
+      "step": 2800
+    },
+    {
+      "epoch": 0.6539697108765489,
+      "grad_norm": 0.29756319522857666,
+      "learning_rate": 0.0003570472249269718,
+      "loss": 0.0611,
+      "step": 2850
+    },
+    {
+      "epoch": 0.6654428636989445,
+      "grad_norm": 0.22879567742347717,
+      "learning_rate": 0.00035400438169425514,
+      "loss": 0.0591,
+      "step": 2900
+    },
+    {
+      "epoch": 0.67691601652134,
+      "grad_norm": 0.4005909264087677,
+      "learning_rate": 0.00035096153846153844,
+      "loss": 0.0648,
+      "step": 2950
+    },
+    {
+      "epoch": 0.6883891693437356,
+      "grad_norm": 0.4542585611343384,
+      "learning_rate": 0.0003479186952288218,
+      "loss": 0.0574,
+      "step": 3000
+    },
+    {
+      "epoch": 0.6883891693437356,
+      "eval_accuracy": 0.9813785676808373,
+      "eval_f1": 0.9602148371355144,
+      "eval_loss": 0.056131936609745026,
+      "eval_precision": 0.9611373242869105,
+      "eval_recall": 0.9617183217159495,
+      "eval_runtime": 119.7516,
+      "eval_samples_per_second": 166.344,
+      "eval_steps_per_second": 10.397,
+      "step": 3000
+    },
+    {
+      "epoch": 0.6998623221661312,
+      "grad_norm": 0.16639067232608795,
+      "learning_rate": 0.00034487585199610514,
+      "loss": 0.0583,
+      "step": 3050
+    },
+    {
+      "epoch": 0.7113354749885269,
+      "grad_norm": 0.24682307243347168,
+      "learning_rate": 0.0003418330087633885,
+      "loss": 0.0599,
+      "step": 3100
+    },
+    {
+      "epoch": 0.7228086278109225,
+      "grad_norm": 0.8064585328102112,
+      "learning_rate": 0.00033879016553067185,
+      "loss": 0.0572,
+      "step": 3150
+    },
+    {
+      "epoch": 0.7342817806333181,
+      "grad_norm": 0.19956223666667938,
+      "learning_rate": 0.00033574732229795526,
+      "loss": 0.0582,
+      "step": 3200
+    },
+    {
+      "epoch": 0.7457549334557136,
+      "grad_norm": 0.24573862552642822,
+      "learning_rate": 0.00033270447906523856,
+      "loss": 0.0641,
+      "step": 3250
+    },
+    {
+      "epoch": 0.7572280862781092,
+      "grad_norm": 0.2404450923204422,
+      "learning_rate": 0.0003296616358325219,
+      "loss": 0.0662,
+      "step": 3300
+    },
+    {
+      "epoch": 0.7687012391005048,
+      "grad_norm": 0.2951129376888275,
+      "learning_rate": 0.00032661879259980527,
+      "loss": 0.0593,
+      "step": 3350
+    },
+    {
+      "epoch": 0.7801743919229004,
+      "grad_norm": 0.27735939621925354,
+      "learning_rate": 0.0003235759493670886,
+      "loss": 0.0551,
+      "step": 3400
+    },
+    {
+      "epoch": 0.791647544745296,
+      "grad_norm": 0.22863982617855072,
+      "learning_rate": 0.000320533106134372,
+      "loss": 0.0528,
+      "step": 3450
+    },
+    {
+      "epoch": 0.8031206975676916,
+      "grad_norm": 0.15240560472011566,
+      "learning_rate": 0.00031749026290165533,
+      "loss": 0.0607,
+      "step": 3500
+    },
+    {
+      "epoch": 0.8031206975676916,
+      "eval_accuracy": 0.9817161352139049,
+      "eval_f1": 0.9618400997245777,
+      "eval_loss": 0.05566277727484703,
+      "eval_precision": 0.9594136950612108,
+      "eval_recall": 0.9670788559757807,
+      "eval_runtime": 119.7185,
+      "eval_samples_per_second": 166.39,
+      "eval_steps_per_second": 10.399,
+      "step": 3500
+    },
+    {
+      "epoch": 0.8145938503900872,
+      "grad_norm": 0.22435520589351654,
+      "learning_rate": 0.00031444741966893863,
+      "loss": 0.0619,
+      "step": 3550
+    },
+    {
+      "epoch": 0.8260670032124828,
+      "grad_norm": 0.23223020136356354,
+      "learning_rate": 0.000311404576436222,
+      "loss": 0.0563,
+      "step": 3600
+    },
+    {
+      "epoch": 0.8375401560348784,
+      "grad_norm": 0.3050450384616852,
+      "learning_rate": 0.00030836173320350534,
+      "loss": 0.0581,
+      "step": 3650
+    },
+    {
+      "epoch": 0.849013308857274,
+      "grad_norm": 0.2995171546936035,
+      "learning_rate": 0.0003053188899707887,
+      "loss": 0.0539,
+      "step": 3700
+    },
+    {
+      "epoch": 0.8604864616796696,
+      "grad_norm": 0.25285205245018005,
+      "learning_rate": 0.00030227604673807205,
+      "loss": 0.0597,
+      "step": 3750
+    },
+    {
+      "epoch": 0.8719596145020652,
+      "grad_norm": 0.4498445689678192,
+      "learning_rate": 0.00029923320350535546,
+      "loss": 0.0582,
+      "step": 3800
+    },
+    {
+      "epoch": 0.8834327673244607,
+      "grad_norm": 0.24611692130565643,
+      "learning_rate": 0.00029619036027263876,
+      "loss": 0.0568,
+      "step": 3850
+    },
+    {
+      "epoch": 0.8949059201468563,
+      "grad_norm": 0.3124069571495056,
+      "learning_rate": 0.0002931475170399221,
+      "loss": 0.0591,
+      "step": 3900
+    },
+    {
+      "epoch": 0.9063790729692519,
+      "grad_norm": 0.2108747363090515,
+      "learning_rate": 0.00029010467380720547,
+      "loss": 0.0548,
+      "step": 3950
+    },
+    {
+      "epoch": 0.9178522257916475,
+      "grad_norm": 0.22898589074611664,
+      "learning_rate": 0.0002870618305744888,
+      "loss": 0.0603,
+      "step": 4000
+    },
+    {
+      "epoch": 0.9178522257916475,
+      "eval_accuracy": 0.9814983929773434,
+      "eval_f1": 0.9609698369421166,
+      "eval_loss": 0.05435480922460556,
+      "eval_precision": 0.9558103401819693,
+      "eval_recall": 0.9701771464195363,
+      "eval_runtime": 119.6005,
+      "eval_samples_per_second": 166.554,
+      "eval_steps_per_second": 10.41,
+      "step": 4000
+    },
+    {
+      "epoch": 0.9293253786140432,
+      "grad_norm": 0.27442702651023865,
+      "learning_rate": 0.0002840189873417722,
+      "loss": 0.0596,
+      "step": 4050
+    },
+    {
+      "epoch": 0.9407985314364388,
+      "grad_norm": 0.1897002011537552,
+      "learning_rate": 0.00028097614410905553,
+      "loss": 0.0575,
+      "step": 4100
+    },
+    {
+      "epoch": 0.9522716842588343,
+      "grad_norm": 0.31244686245918274,
+      "learning_rate": 0.00027793330087633883,
+      "loss": 0.0569,
+      "step": 4150
+    },
+    {
+      "epoch": 0.9637448370812299,
+      "grad_norm": 0.23371103405952454,
+      "learning_rate": 0.0002748904576436222,
+      "loss": 0.0582,
+      "step": 4200
+    },
+    {
+      "epoch": 0.9752179899036255,
+      "grad_norm": 0.2830590307712555,
+      "learning_rate": 0.00027184761441090554,
+      "loss": 0.0551,
+      "step": 4250
+    },
+    {
+      "epoch": 0.9866911427260211,
+      "grad_norm": 0.17691777646541595,
+      "learning_rate": 0.0002688047711781889,
+      "loss": 0.0556,
+      "step": 4300
+    },
+    {
+      "epoch": 0.9981642955484167,
+      "grad_norm": 0.32038599252700806,
+      "learning_rate": 0.00026576192794547224,
+      "loss": 0.0524,
+      "step": 4350
+    },
+    {
+      "epoch": 1.0096374483708124,
+      "grad_norm": 0.1972804069519043,
+      "learning_rate": 0.00026271908471275565,
+      "loss": 0.0521,
+      "step": 4400
+    },
+    {
+      "epoch": 1.0211106011932078,
+      "grad_norm": 0.35761380195617676,
+      "learning_rate": 0.00025967624148003895,
+      "loss": 0.0572,
+      "step": 4450
+    },
+    {
+      "epoch": 1.0325837540156035,
+      "grad_norm": 0.285580039024353,
+      "learning_rate": 0.0002566333982473223,
+      "loss": 0.0487,
+      "step": 4500
+    },
+    {
+      "epoch": 1.0325837540156035,
+      "eval_accuracy": 0.9825631838117813,
+      "eval_f1": 0.9641975912207242,
+      "eval_loss": 0.05233108997344971,
+      "eval_precision": 0.9613379614792094,
+      "eval_recall": 0.9699254645627606,
+      "eval_runtime": 119.1944,
+      "eval_samples_per_second": 167.122,
+      "eval_steps_per_second": 10.445,
+      "step": 4500
+    },
+    {
+      "epoch": 1.044056906837999,
+      "grad_norm": 0.2022152990102768,
+      "learning_rate": 0.00025359055501460566,
+      "loss": 0.0539,
+      "step": 4550
+    },
+    {
+      "epoch": 1.0555300596603947,
+      "grad_norm": 0.29692327976226807,
+      "learning_rate": 0.000250547711781889,
+      "loss": 0.047,
+      "step": 4600
+    },
+    {
+      "epoch": 1.0670032124827902,
+      "grad_norm": 0.2476482093334198,
+      "learning_rate": 0.0002475048685491723,
+      "loss": 0.053,
+      "step": 4650
+    },
+    {
+      "epoch": 1.0784763653051859,
+      "grad_norm": 0.17114070057868958,
+      "learning_rate": 0.0002444620253164557,
+      "loss": 0.0519,
+      "step": 4700
+    },
+    {
+      "epoch": 1.0899495181275816,
+      "grad_norm": 0.11371100693941116,
+      "learning_rate": 0.00024141918208373905,
+      "loss": 0.0547,
+      "step": 4750
+    },
+    {
+      "epoch": 1.101422670949977,
+      "grad_norm": 0.25711262226104736,
+      "learning_rate": 0.00023837633885102238,
+      "loss": 0.0543,
+      "step": 4800
+    },
+    {
+      "epoch": 1.1128958237723727,
+      "grad_norm": 0.2982866168022156,
+      "learning_rate": 0.00023533349561830576,
+      "loss": 0.0561,
+      "step": 4850
+    },
+    {
+      "epoch": 1.1243689765947682,
+      "grad_norm": 0.3269876539707184,
+      "learning_rate": 0.00023229065238558911,
+      "loss": 0.0494,
+      "step": 4900
+    },
+    {
+      "epoch": 1.135842129417164,
+      "grad_norm": 0.26729336380958557,
+      "learning_rate": 0.00022924780915287244,
+      "loss": 0.047,
+      "step": 4950
+    },
+    {
+      "epoch": 1.1473152822395594,
+      "grad_norm": 0.39272695779800415,
+      "learning_rate": 0.0002262049659201558,
+      "loss": 0.0517,
+      "step": 5000
+    },
+    {
+      "epoch": 1.1473152822395594,
+      "eval_accuracy": 0.9833051617205572,
+      "eval_f1": 0.9649115475893709,
+      "eval_loss": 0.05118980631232262,
+      "eval_precision": 0.96238450457423,
+      "eval_recall": 0.9685961793536983,
+      "eval_runtime": 120.695,
+      "eval_samples_per_second": 165.044,
+      "eval_steps_per_second": 10.315,
+      "step": 5000
+    },
+    {
+      "epoch": 1.158788435061955,
+      "grad_norm": 0.2898092567920685,
+      "learning_rate": 0.00022316212268743915,
+      "loss": 0.0499,
+      "step": 5050
+    },
+    {
+      "epoch": 1.1702615878843505,
+      "grad_norm": 0.20259062945842743,
+      "learning_rate": 0.00022011927945472248,
+      "loss": 0.0506,
+      "step": 5100
+    },
+    {
+      "epoch": 1.1817347407067462,
+      "grad_norm": 0.26172712445259094,
+      "learning_rate": 0.00021707643622200586,
+      "loss": 0.0512,
+      "step": 5150
+    },
+    {
+      "epoch": 1.193207893529142,
+      "grad_norm": 0.26839691400527954,
+      "learning_rate": 0.0002140335929892892,
+      "loss": 0.0524,
+      "step": 5200
+    },
+    {
+      "epoch": 1.2046810463515374,
+      "grad_norm": 0.19788499176502228,
+      "learning_rate": 0.00021099074975657254,
+      "loss": 0.0532,
+      "step": 5250
+    },
+    {
+      "epoch": 1.216154199173933,
+      "grad_norm": 0.22159354388713837,
+      "learning_rate": 0.0002079479065238559,
+      "loss": 0.0539,
+      "step": 5300
+    },
+    {
+      "epoch": 1.2276273519963286,
+      "grad_norm": 0.274666428565979,
+      "learning_rate": 0.00020490506329113925,
+      "loss": 0.0533,
+      "step": 5350
+    },
+    {
+      "epoch": 1.2391005048187242,
+      "grad_norm": 0.2635292410850525,
+      "learning_rate": 0.00020186222005842257,
+      "loss": 0.0511,
+      "step": 5400
+    },
+    {
+      "epoch": 1.2505736576411197,
+      "grad_norm": 0.19532188773155212,
+      "learning_rate": 0.00019881937682570596,
+      "loss": 0.0493,
+      "step": 5450
+    },
+    {
+      "epoch": 1.2620468104635154,
+      "grad_norm": 0.17796900868415833,
+      "learning_rate": 0.0001957765335929893,
+      "loss": 0.049,
+      "step": 5500
+    },
+    {
+      "epoch": 1.2620468104635154,
+      "eval_accuracy": 0.9832237878251686,
+      "eval_f1": 0.9660702602061538,
+      "eval_loss": 0.05031489580869675,
+      "eval_precision": 0.9649079959020276,
+      "eval_recall": 0.9685498930352109,
+      "eval_runtime": 121.0196,
+      "eval_samples_per_second": 164.601,
+      "eval_steps_per_second": 10.288,
+      "step": 5500
+    },
+    {
+      "epoch": 1.2735199632859109,
+      "grad_norm": 0.2736414968967438,
+      "learning_rate": 0.00019273369036027264,
+      "loss": 0.0518,
+      "step": 5550
+    },
+    {
+      "epoch": 1.2849931161083066,
+      "grad_norm": 0.27350395917892456,
+      "learning_rate": 0.000189690847127556,
+      "loss": 0.0463,
+      "step": 5600
+    },
+    {
+      "epoch": 1.2964662689307023,
+      "grad_norm": 0.20141524076461792,
+      "learning_rate": 0.00018664800389483935,
+      "loss": 0.0531,
+      "step": 5650
+    },
+    {
+      "epoch": 1.3079394217530977,
+      "grad_norm": 0.2544547915458679,
+      "learning_rate": 0.00018360516066212267,
+      "loss": 0.0499,
+      "step": 5700
+    },
+    {
+      "epoch": 1.3194125745754932,
+      "grad_norm": 0.15668709576129913,
+      "learning_rate": 0.00018056231742940605,
+      "loss": 0.0489,
+      "step": 5750
+    },
+    {
+      "epoch": 1.330885727397889,
+      "grad_norm": 0.254363089799881,
+      "learning_rate": 0.0001775194741966894,
+      "loss": 0.0514,
+      "step": 5800
+    },
+    {
+      "epoch": 1.3423588802202846,
+      "grad_norm": 0.34548887610435486,
+      "learning_rate": 0.00017447663096397273,
+      "loss": 0.0535,
+      "step": 5850
+    },
+    {
+      "epoch": 1.35383203304268,
+      "grad_norm": 0.2949070334434509,
+      "learning_rate": 0.0001714337877312561,
+      "loss": 0.0483,
+      "step": 5900
+    },
+    {
+      "epoch": 1.3653051858650758,
+      "grad_norm": 0.18696388602256775,
+      "learning_rate": 0.00016839094449853944,
+      "loss": 0.0491,
+      "step": 5950
+    },
+    {
+      "epoch": 1.3767783386874712,
+      "grad_norm": 0.1810847669839859,
+      "learning_rate": 0.00016534810126582277,
+      "loss": 0.045,
+      "step": 6000
+    },
+    {
+      "epoch": 1.3767783386874712,
+      "eval_accuracy": 0.9835904174637322,
+      "eval_f1": 0.9660336726703211,
+      "eval_loss": 0.04917869716882706,
+      "eval_precision": 0.9631038353574605,
+      "eval_recall": 0.9706993139499732,
+      "eval_runtime": 119.0826,
+      "eval_samples_per_second": 167.279,
+      "eval_steps_per_second": 10.455,
+      "step": 6000
+    },
+    {
+      "epoch": 1.388251491509867,
+      "grad_norm": 0.15973134338855743,
+      "learning_rate": 0.00016230525803310615,
+      "loss": 0.0444,
+      "step": 6050
+    },
+    {
+      "epoch": 1.3997246443322626,
+      "grad_norm": 0.24404683709144592,
+      "learning_rate": 0.0001592624148003895,
+      "loss": 0.046,
+      "step": 6100
+    },
+    {
+      "epoch": 1.411197797154658,
+      "grad_norm": 0.3005557656288147,
+      "learning_rate": 0.00015621957156767283,
+      "loss": 0.0525,
+      "step": 6150
+    },
+    {
+      "epoch": 1.4226709499770536,
+      "grad_norm": 0.2723326086997986,
+      "learning_rate": 0.0001531767283349562,
+      "loss": 0.0487,
+      "step": 6200
+    },
+    {
+      "epoch": 1.4341441027994493,
+      "grad_norm": 0.2460734099149704,
+      "learning_rate": 0.00015013388510223954,
+      "loss": 0.0455,
+      "step": 6250
+    },
+    {
+      "epoch": 1.445617255621845,
+      "grad_norm": 0.32472339272499084,
+      "learning_rate": 0.00014709104186952287,
+      "loss": 0.0444,
+      "step": 6300
+    },
+    {
+      "epoch": 1.4570904084442404,
+      "grad_norm": 0.35655590891838074,
+      "learning_rate": 0.00014404819863680625,
+      "loss": 0.053,
+      "step": 6350
+    },
+    {
+      "epoch": 1.4685635612666361,
+      "grad_norm": 0.23245665431022644,
+      "learning_rate": 0.0001410053554040896,
+      "loss": 0.0472,
+      "step": 6400
+    },
+    {
+      "epoch": 1.4800367140890316,
+      "grad_norm": 0.16704195737838745,
+      "learning_rate": 0.00013796251217137293,
+      "loss": 0.0483,
+      "step": 6450
+    },
+    {
+      "epoch": 1.4915098669114273,
+      "grad_norm": 0.16793860495090485,
+      "learning_rate": 0.00013491966893865629,
+      "loss": 0.0453,
+      "step": 6500
+    },
+    {
+      "epoch": 1.4915098669114273,
+      "eval_accuracy": 0.9840196424064407,
+      "eval_f1": 0.9677403990141918,
+      "eval_loss": 0.04730767011642456,
+      "eval_precision": 0.9666379623287346,
+      "eval_recall": 0.9698169810038056,
+      "eval_runtime": 119.877,
+      "eval_samples_per_second": 166.17,
+      "eval_steps_per_second": 10.386,
+      "step": 6500
+    },
+    {
+      "epoch": 1.502983019733823,
+      "grad_norm": 0.24766087532043457,
+      "learning_rate": 0.00013187682570593964,
+      "loss": 0.0433,
+      "step": 6550
+    },
+    {
+      "epoch": 1.5144561725562184,
+      "grad_norm": 0.28402864933013916,
+      "learning_rate": 0.00012883398247322297,
+      "loss": 0.0451,
+      "step": 6600
+    },
+    {
+      "epoch": 1.525929325378614,
+      "grad_norm": 0.23255027830600739,
+      "learning_rate": 0.00012579113924050635,
+      "loss": 0.0494,
+      "step": 6650
+    },
+    {
+      "epoch": 1.5374024782010096,
+      "grad_norm": 0.2084839642047882,
+      "learning_rate": 0.00012274829600778967,
+      "loss": 0.0524,
+      "step": 6700
+    },
+    {
+      "epoch": 1.5488756310234053,
+      "grad_norm": 0.2631557285785675,
+      "learning_rate": 0.00011970545277507303,
+      "loss": 0.0447,
+      "step": 6750
+    },
+    {
+      "epoch": 1.5603487838458008,
+      "grad_norm": 0.203273743391037,
+      "learning_rate": 0.00011666260954235638,
+      "loss": 0.046,
+      "step": 6800
+    },
+    {
+      "epoch": 1.5718219366681965,
+      "grad_norm": 0.17683915793895721,
+      "learning_rate": 0.00011361976630963972,
+      "loss": 0.0488,
+      "step": 6850
+    },
+    {
+      "epoch": 1.583295089490592,
+      "grad_norm": 0.22857971489429474,
+      "learning_rate": 0.00011057692307692308,
+      "loss": 0.047,
+      "step": 6900
+    },
+    {
+      "epoch": 1.5947682423129876,
+      "grad_norm": 0.13033239543437958,
+      "learning_rate": 0.00010753407984420643,
+      "loss": 0.0477,
+      "step": 6950
+    },
+    {
+      "epoch": 1.6062413951353833,
+      "grad_norm": 0.1830282211303711,
+      "learning_rate": 0.00010449123661148977,
+      "loss": 0.0455,
+      "step": 7000
+    },
+    {
+      "epoch": 1.6062413951353833,
+      "eval_accuracy": 0.9839478813613316,
+      "eval_f1": 0.9673812546821595,
+      "eval_loss": 0.04686596244573593,
+      "eval_precision": 0.9634884038843466,
+      "eval_recall": 0.9731322385654713,
+      "eval_runtime": 121.2935,
+      "eval_samples_per_second": 164.23,
+      "eval_steps_per_second": 10.264,
+      "step": 7000
+    },
+    {
+      "epoch": 1.6177145479577788,
+      "grad_norm": 0.262114942073822,
+      "learning_rate": 0.00010144839337877313,
+      "loss": 0.0491,
+      "step": 7050
+    },
+    {
+      "epoch": 1.6291877007801743,
+      "grad_norm": 0.2282840460538864,
+      "learning_rate": 9.840555014605648e-05,
+      "loss": 0.0478,
+      "step": 7100
+    },
+    {
+      "epoch": 1.64066085360257,
+      "grad_norm": 0.3352207839488983,
+      "learning_rate": 9.536270691333982e-05,
+      "loss": 0.047,
+      "step": 7150
+    },
+    {
+      "epoch": 1.6521340064249657,
+      "grad_norm": 0.21367865800857544,
+      "learning_rate": 9.231986368062318e-05,
+      "loss": 0.0438,
+      "step": 7200
+    },
+    {
+      "epoch": 1.6636071592473611,
+      "grad_norm": 0.29630246758461,
+      "learning_rate": 8.927702044790653e-05,
+      "loss": 0.0476,
+      "step": 7250
+    },
+    {
+      "epoch": 1.6750803120697566,
+      "grad_norm": 0.18861912190914154,
+      "learning_rate": 8.623417721518987e-05,
+      "loss": 0.0404,
+      "step": 7300
+    },
+    {
+      "epoch": 1.6865534648921523,
+      "grad_norm": 0.24099300801753998,
+      "learning_rate": 8.319133398247323e-05,
+      "loss": 0.0438,
+      "step": 7350
+    },
+    {
+      "epoch": 1.698026617714548,
+      "grad_norm": 0.4395439326763153,
+      "learning_rate": 8.014849074975658e-05,
+      "loss": 0.0434,
+      "step": 7400
+    },
+    {
+      "epoch": 1.7094997705369437,
+      "grad_norm": 0.30208903551101685,
+      "learning_rate": 7.710564751703992e-05,
+      "loss": 0.0459,
+      "step": 7450
+    },
+    {
+      "epoch": 1.7209729233593392,
+      "grad_norm": 0.2277510166168213,
+      "learning_rate": 7.406280428432327e-05,
+      "loss": 0.0445,
+      "step": 7500
+    },
+    {
+      "epoch": 1.7209729233593392,
+      "eval_accuracy": 0.9839078650776103,
+      "eval_f1": 0.9673493479575279,
+      "eval_loss": 0.04653547704219818,
+      "eval_precision": 0.9639073876576003,
+      "eval_recall": 0.9731134347485857,
+      "eval_runtime": 120.0655,
+      "eval_samples_per_second": 165.909,
+      "eval_steps_per_second": 10.369,
+      "step": 7500
+    },
+    {
+      "epoch": 1.7324460761817346,
+      "grad_norm": 0.26645800471305847,
+      "learning_rate": 7.101996105160661e-05,
+      "loss": 0.0487,
+      "step": 7550
+    },
+    {
+      "epoch": 1.7439192290041303,
+      "grad_norm": 0.254226416349411,
+      "learning_rate": 6.797711781888997e-05,
+      "loss": 0.0476,
+      "step": 7600
+    },
+    {
+      "epoch": 1.755392381826526,
+      "grad_norm": 0.35157519578933716,
+      "learning_rate": 6.493427458617332e-05,
+      "loss": 0.047,
+      "step": 7650
+    },
+    {
+      "epoch": 1.7668655346489215,
+      "grad_norm": 0.28356507420539856,
+      "learning_rate": 6.189143135345668e-05,
+      "loss": 0.0453,
+      "step": 7700
+    },
+    {
+      "epoch": 1.778338687471317,
+      "grad_norm": 0.24292881786823273,
+      "learning_rate": 5.8848588120740025e-05,
+      "loss": 0.043,
+      "step": 7750
+    },
+    {
+      "epoch": 1.7898118402937127,
+      "grad_norm": 0.23205772042274475,
+      "learning_rate": 5.5805744888023366e-05,
+      "loss": 0.047,
+      "step": 7800
+    },
+    {
+      "epoch": 1.8012849931161083,
+      "grad_norm": 0.22511674463748932,
+      "learning_rate": 5.276290165530672e-05,
+      "loss": 0.0485,
+      "step": 7850
+    },
+    {
+      "epoch": 1.812758145938504,
+      "grad_norm": 0.11639175564050674,
+      "learning_rate": 4.9720058422590074e-05,
+      "loss": 0.0468,
+      "step": 7900
+    },
+    {
+      "epoch": 1.8242312987608995,
+      "grad_norm": 0.3817405104637146,
+      "learning_rate": 4.6677215189873415e-05,
+      "loss": 0.0436,
+      "step": 7950
+    },
+    {
+      "epoch": 1.835704451583295,
+      "grad_norm": 0.23685210943222046,
+      "learning_rate": 4.363437195715677e-05,
+      "loss": 0.0485,
+      "step": 8000
+    },
+    {
+      "epoch": 1.835704451583295,
+      "eval_accuracy": 0.9847846464449556,
+      "eval_f1": 0.9692734951155788,
+      "eval_loss": 0.044191647320985794,
+      "eval_precision": 0.9682748217756245,
+      "eval_recall": 0.9715440392623697,
+      "eval_runtime": 119.9631,
+      "eval_samples_per_second": 166.051,
+      "eval_steps_per_second": 10.378,
+      "step": 8000
+    },
+    {
+      "epoch": 1.8471776044056907,
+      "grad_norm": 0.28396812081336975,
+      "learning_rate": 4.0591528724440116e-05,
+      "loss": 0.0444,
+      "step": 8050
+    },
+    {
+      "epoch": 1.8586507572280864,
+      "grad_norm": 0.21980591118335724,
+      "learning_rate": 3.7548685491723464e-05,
+      "loss": 0.0419,
+      "step": 8100
+    },
+    {
+      "epoch": 1.8701239100504818,
+      "grad_norm": 0.1881207823753357,
+      "learning_rate": 3.450584225900682e-05,
+      "loss": 0.046,
+      "step": 8150
+    },
+    {
+      "epoch": 1.8815970628728773,
+      "grad_norm": 0.18880091607570648,
+      "learning_rate": 3.1462999026290165e-05,
+      "loss": 0.043,
+      "step": 8200
+    },
+    {
+      "epoch": 1.893070215695273,
+      "grad_norm": 0.22343367338180542,
+      "learning_rate": 2.8420155793573516e-05,
+      "loss": 0.051,
+      "step": 8250
+    },
+    {
+      "epoch": 1.9045433685176687,
+      "grad_norm": 0.18008331954479218,
+      "learning_rate": 2.5377312560856864e-05,
+      "loss": 0.0451,
+      "step": 8300
+    },
+    {
+      "epoch": 1.9160165213400644,
+      "grad_norm": 0.4132407009601593,
+      "learning_rate": 2.2334469328140214e-05,
+      "loss": 0.0435,
+      "step": 8350
+    },
+    {
+      "epoch": 1.9274896741624599,
+      "grad_norm": 0.2056247442960739,
+      "learning_rate": 1.9291626095423565e-05,
+      "loss": 0.0471,
+      "step": 8400
+    },
+    {
+      "epoch": 1.9389628269848553,
+      "grad_norm": 0.4490518569946289,
+      "learning_rate": 1.6248782862706913e-05,
+      "loss": 0.0421,
+      "step": 8450
+    },
+    {
+      "epoch": 1.950435979807251,
+      "grad_norm": 0.2667485475540161,
+      "learning_rate": 1.3205939629990263e-05,
+      "loss": 0.0438,
+      "step": 8500
+    },
+    {
+      "epoch": 1.950435979807251,
+      "eval_accuracy": 0.9849789154424627,
+      "eval_f1": 0.9694837693189375,
+      "eval_loss": 0.043701887130737305,
+      "eval_precision": 0.9670561105755402,
+      "eval_recall": 0.9732580794938591,
+      "eval_runtime": 120.742,
+      "eval_samples_per_second": 164.98,
+      "eval_steps_per_second": 10.311,
+      "step": 8500
+    },
+    {
+      "epoch": 1.9619091326296467,
+      "grad_norm": 0.14612677693367004,
+      "learning_rate": 1.0163096397273614e-05,
+      "loss": 0.0512,
+      "step": 8550
+    },
+    {
+      "epoch": 1.9733822854520422,
+      "grad_norm": 0.27156341075897217,
+      "learning_rate": 7.120253164556962e-06,
+      "loss": 0.0442,
+      "step": 8600
+    },
+    {
+      "epoch": 1.9848554382744377,
+      "grad_norm": 0.20725247263908386,
+      "learning_rate": 4.077409931840312e-06,
+      "loss": 0.0425,
+      "step": 8650
+    },
+    {
+      "epoch": 1.9963285910968334,
+      "grad_norm": 0.2892477810382843,
+      "learning_rate": 1.0345666991236611e-06,
+      "loss": 0.0473,
+      "step": 8700
+    }
+  ],
+  "logging_steps": 50,
+  "max_steps": 8716,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 2,
+  "save_steps": 1000,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": true
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 2.1788367731619216e+16,
+  "train_batch_size": 16,
+  "trial_name": null,
+  "trial_params": null
+}

checkpoint-8716/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:c25f499fd0637dd40aacba2ee37a71fbf01438bc7cbb57bc752e6d4140572a35
+size 5841

training_args.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:0dd103f98dee7758a7916a783307af9f65932119d66eedade7204a203817a6cc
 size 5841

 version https://git-lfs.github.com/spec/v1
+oid sha256:c25f499fd0637dd40aacba2ee37a71fbf01438bc7cbb57bc752e6d4140572a35
 size 5841