AlexanderMaz commited on Sep 14

Commit

848d60e

verified ·

1 Parent(s): f95b4b4

Upload acta anonymizer adapter - Latest (v20250914_035417)

Browse files

Files changed (27) hide show

.gitattributes +2 -0
README.md +179 -64
adapter_config.json +3 -3
adapter_model.safetensors +1 -1
checkpoint-12000/README.md +206 -0
checkpoint-12000/adapter_config.json +42 -0
checkpoint-12000/adapter_model.safetensors +3 -0
checkpoint-12000/optimizer.pt +3 -0
checkpoint-12000/rng_state.pth +3 -0
checkpoint-12000/scheduler.pt +3 -0
checkpoint-12000/special_tokens_map.json +51 -0
checkpoint-12000/tokenizer.json +3 -0
checkpoint-12000/tokenizer_config.json +59 -0
checkpoint-12000/trainer_state.json +2002 -0
checkpoint-12000/training_args.bin +3 -0
checkpoint-13074/README.md +206 -0
checkpoint-13074/adapter_config.json +42 -0
checkpoint-13074/adapter_model.safetensors +3 -0
checkpoint-13074/optimizer.pt +3 -0
checkpoint-13074/rng_state.pth +3 -0
checkpoint-13074/scheduler.pt +3 -0
checkpoint-13074/special_tokens_map.json +51 -0
checkpoint-13074/tokenizer.json +3 -0
checkpoint-13074/tokenizer_config.json +59 -0
checkpoint-13074/trainer_state.json +2173 -0
checkpoint-13074/training_args.bin +3 -0
training_args.bin +2 -2

.gitattributes CHANGED Viewed

@@ -42,3 +42,5 @@ versions/20250914_034323/tokenizer.json filter=lfs diff=lfs merge=lfs -text
 versions/20250914_035417/checkpoint-12000/tokenizer.json filter=lfs diff=lfs merge=lfs -text
 versions/20250914_035417/checkpoint-13074/tokenizer.json filter=lfs diff=lfs merge=lfs -text
 versions/20250914_035417/tokenizer.json filter=lfs diff=lfs merge=lfs -text

 versions/20250914_035417/checkpoint-12000/tokenizer.json filter=lfs diff=lfs merge=lfs -text
 versions/20250914_035417/checkpoint-13074/tokenizer.json filter=lfs diff=lfs merge=lfs -text
 versions/20250914_035417/tokenizer.json filter=lfs diff=lfs merge=lfs -text
+checkpoint-12000/tokenizer.json filter=lfs diff=lfs merge=lfs -text
+checkpoint-13074/tokenizer.json filter=lfs diff=lfs merge=lfs -text

README.md CHANGED Viewed

@@ -1,91 +1,206 @@
 ---
-license: apache-2.0
-language:
-- ro
 base_model: EvanD/xlm-roberta-base-romanian-ner-ronec
 tags:
-- token-classification
-- named-entity-recognition
-- pii-detection
-- romanian
-- moldova
-- financial-pii
-- banking
-- fintech
 ---
-# Finguys/acta-anonymizer-financial
-Acta Anonymizer Financial Adapter
-This model is a fine-tuned adapter for Romanian financial text anonymization.
-It's based on XLM-RoBERTa and trained specifically for detecting and anonymizing
-PII in Romanian financial documents from Moldova.
-Key features:
-- Romanian language support
-- Financial domain specialization
-- GDPR compliance focused
-- High accuracy PII detection
-Use cases:
-- Banking document anonymization
-- Financial report processing
-- Compliance data handling
-**Current Version**: 20250914_034323
-## Key Features
-- Romanian language support
-- GDPR compliance focused
-- High accuracy PII detection
-- Domain-specific fine-tuning
-## Use Cases
-- Banking document anonymization
-- Financial report processing
-- Compliance data handling
-## Training Data
-This model was trained on synthetic Moldovan PII data for financial domain anonymization.
-## Usage
-```python
-from peft import PeftModel
-from transformers import AutoModelForTokenClassification, AutoTokenizer, pipeline
-# Load base model
-model = AutoModelForTokenClassification.from_pretrained("EvanD/xlm-roberta-base-romanian-ner-ronec")
-tokenizer = AutoTokenizer.from_pretrained("EvanD/xlm-roberta-base-romanian-ner-ronec")
-# Load adapter
-model = PeftModel.from_pretrained(model, "Finguys/acta-anonymizer-financial")
-# Create pipeline
-ner_pipeline = pipeline(
-    "token-classification",
-    model=model,
-    tokenizer=tokenizer,
-    aggregation_strategy="simple"
-)
-# Example usage
-text = "Ion Popescu are un cont la Banca Transilvania cu IBAN RO49AAAA1B310075938400000."
-entities = ner_pipeline(text)
-print(entities)
-```
-## Training
-This model was trained using LoRA (Low-Rank Adaptation) on synthetic Moldovan PII data.
-## Versions
-- **Latest**: Root level contains the most recent version
-- **Archived**: Previous versions are stored in `versions/` folder
-- **Version Index**: See `version_history.yaml` for complete version history

 ---
 base_model: EvanD/xlm-roberta-base-romanian-ner-ronec
+library_name: peft
 tags:
+- base_model:adapter:EvanD/xlm-roberta-base-romanian-ner-ronec
+- lora
+- transformers
 ---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.17.1

adapter_config.json CHANGED Viewed

@@ -28,10 +28,10 @@
   "rank_pattern": {},
   "revision": null,
   "target_modules": [
-    "value",
-    "query",
     "dense",
-    "key"
   ],
   "target_parameters": null,
   "task_type": "TOKEN_CLS",

   "rank_pattern": {},
   "revision": null,
   "target_modules": [
     "dense",
+    "key",
+    "query",
+    "value"
   ],
   "target_parameters": null,
   "task_type": "TOKEN_CLS",

adapter_model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:8c48e02ea2efdc2f0d2341324d8dc56d3483d77b72650da8190837e34d17ca9b
 size 10899068

 version https://git-lfs.github.com/spec/v1
+oid sha256:ff2623b5e82b5d0e70983ebf344f330bb8f3e1226a105a5aed87766442cec175
 size 10899068

checkpoint-12000/README.md ADDED Viewed

	@@ -0,0 +1,206 @@

+---
+base_model: EvanD/xlm-roberta-base-romanian-ner-ronec
+library_name: peft
+tags:
+- base_model:adapter:EvanD/xlm-roberta-base-romanian-ner-ronec
+- lora
+- transformers
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.17.1

checkpoint-12000/adapter_config.json ADDED Viewed

	@@ -0,0 +1,42 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "EvanD/xlm-roberta-base-romanian-ner-ronec",
+  "bias": "none",
+  "corda_config": null,
+  "eva_config": null,
+  "exclude_modules": null,
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 32,
+  "lora_bias": false,
+  "lora_dropout": 0.1,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": [
+    "classifier",
+    "score"
+  ],
+  "peft_type": "LORA",
+  "qalora_group_size": 16,
+  "r": 16,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "dense",
+    "key",
+    "query",
+    "value"
+  ],
+  "target_parameters": null,
+  "task_type": "TOKEN_CLS",
+  "trainable_token_indices": null,
+  "use_dora": false,
+  "use_qalora": false,
+  "use_rslora": false
+}

checkpoint-12000/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ff2623b5e82b5d0e70983ebf344f330bb8f3e1226a105a5aed87766442cec175
+size 10899068

checkpoint-12000/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:74dc3270b391cca521fac3b9b4b5a0ea652ed78f7085fe88a455f8abae47e296
+size 21881739

checkpoint-12000/rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:d327ec768bc9db09d2d40744308223acb0ca55e9716c223edf491867a0d4754a
+size 14645

checkpoint-12000/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e5e033dce0b47487f8dd1cbf11ade358cc197177ea116db3c69784d048534456
+size 1465

checkpoint-12000/special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,51 @@

+{
+  "bos_token": {
+    "content": "<s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "cls_token": {
+    "content": "<s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "eos_token": {
+    "content": "</s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "mask_token": {
+    "content": "<mask>",
+    "lstrip": true,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "<pad>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "sep_token": {
+    "content": "</s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "unk_token": {
+    "content": "<unk>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

checkpoint-12000/tokenizer.json ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:8373f9cd3d27591e1924426bcc1c8799bc5a9affc4fc857982c5d66668dd1f41
+size 17082832

checkpoint-12000/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,59 @@

+{
+  "added_tokens_decoder": {
+    "0": {
+      "content": "<s>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "1": {
+      "content": "<pad>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "2": {
+      "content": "</s>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "3": {
+      "content": "<unk>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "250001": {
+      "content": "<mask>",
+      "lstrip": true,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "bos_token": "<s>",
+  "clean_up_tokenization_spaces": true,
+  "cls_token": "<s>",
+  "eos_token": "</s>",
+  "extra_special_tokens": {},
+  "mask_token": "<mask>",
+  "max_length": 512,
+  "model_max_length": 512,
+  "pad_token": "<pad>",
+  "sep_token": "</s>",
+  "stride": 0,
+  "tokenizer_class": "XLMRobertaTokenizer",
+  "truncation_side": "right",
+  "truncation_strategy": "longest_first",
+  "unk_token": "<unk>"
+}

checkpoint-12000/trainer_state.json ADDED Viewed

	@@ -0,0 +1,2002 @@

+{
+  "best_global_step": 12000,
+  "best_metric": 0.9476076338095303,
+  "best_model_checkpoint": "./models/financial_adapter_20250914_035417/checkpoint-12000",
+  "epoch": 2.7535566773749425,
+  "eval_steps": 500,
+  "global_step": 12000,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.011473152822395595,
+      "grad_norm": 5.206177234649658,
+      "learning_rate": 4.9000000000000005e-05,
+      "loss": 3.7097,
+      "step": 50
+    },
+    {
+      "epoch": 0.02294630564479119,
+      "grad_norm": 0.7576159834861755,
+      "learning_rate": 9.900000000000001e-05,
+      "loss": 1.1878,
+      "step": 100
+    },
+    {
+      "epoch": 0.03441945846718678,
+      "grad_norm": 1.0443555116653442,
+      "learning_rate": 0.000149,
+      "loss": 0.5979,
+      "step": 150
+    },
+    {
+      "epoch": 0.04589261128958238,
+      "grad_norm": 0.8121919631958008,
+      "learning_rate": 0.000199,
+      "loss": 0.351,
+      "step": 200
+    },
+    {
+      "epoch": 0.05736576411197797,
+      "grad_norm": 0.5419031381607056,
+      "learning_rate": 0.000249,
+      "loss": 0.2513,
+      "step": 250
+    },
+    {
+      "epoch": 0.06883891693437356,
+      "grad_norm": 0.969489336013794,
+      "learning_rate": 0.000299,
+      "loss": 0.2124,
+      "step": 300
+    },
+    {
+      "epoch": 0.08031206975676916,
+      "grad_norm": 0.7236778140068054,
+      "learning_rate": 0.00034899999999999997,
+      "loss": 0.1806,
+      "step": 350
+    },
+    {
+      "epoch": 0.09178522257916476,
+      "grad_norm": 0.7271482348442078,
+      "learning_rate": 0.00039900000000000005,
+      "loss": 0.1677,
+      "step": 400
+    },
+    {
+      "epoch": 0.10325837540156035,
+      "grad_norm": 0.5921468734741211,
+      "learning_rate": 0.000449,
+      "loss": 0.151,
+      "step": 450
+    },
+    {
+      "epoch": 0.11473152822395594,
+      "grad_norm": 0.7947412729263306,
+      "learning_rate": 0.000499,
+      "loss": 0.146,
+      "step": 500
+    },
+    {
+      "epoch": 0.11473152822395594,
+      "eval_accuracy": 0.9611541265050512,
+      "eval_f1": 0.9108550122870944,
+      "eval_loss": 0.13393332064151764,
+      "eval_precision": 0.9024035519296109,
+      "eval_recall": 0.920942706295809,
+      "eval_runtime": 157.6023,
+      "eval_samples_per_second": 126.394,
+      "eval_steps_per_second": 7.9,
+      "step": 500
+    },
+    {
+      "epoch": 0.12620468104635155,
+      "grad_norm": 0.42396143078804016,
+      "learning_rate": 0.0004980515349133132,
+      "loss": 0.1366,
+      "step": 550
+    },
+    {
+      "epoch": 0.13767783386874713,
+      "grad_norm": 0.4833989441394806,
+      "learning_rate": 0.0004960633052330205,
+      "loss": 0.1306,
+      "step": 600
+    },
+    {
+      "epoch": 0.14915098669114274,
+      "grad_norm": 0.5300698280334473,
+      "learning_rate": 0.0004940750755527279,
+      "loss": 0.1307,
+      "step": 650
+    },
+    {
+      "epoch": 0.16062413951353832,
+      "grad_norm": 0.391825407743454,
+      "learning_rate": 0.0004920868458724352,
+      "loss": 0.1294,
+      "step": 700
+    },
+    {
+      "epoch": 0.1720972923359339,
+      "grad_norm": 0.4168291985988617,
+      "learning_rate": 0.0004900986161921426,
+      "loss": 0.1143,
+      "step": 750
+    },
+    {
+      "epoch": 0.18357044515832951,
+      "grad_norm": 0.48850780725479126,
+      "learning_rate": 0.00048811038651184986,
+      "loss": 0.1164,
+      "step": 800
+    },
+    {
+      "epoch": 0.1950435979807251,
+      "grad_norm": 0.5360251665115356,
+      "learning_rate": 0.0004861221568315572,
+      "loss": 0.1162,
+      "step": 850
+    },
+    {
+      "epoch": 0.2065167508031207,
+      "grad_norm": 0.35034602880477905,
+      "learning_rate": 0.00048413392715126454,
+      "loss": 0.1161,
+      "step": 900
+    },
+    {
+      "epoch": 0.2179899036255163,
+      "grad_norm": 0.5149801969528198,
+      "learning_rate": 0.0004821456974709719,
+      "loss": 0.1193,
+      "step": 950
+    },
+    {
+      "epoch": 0.22946305644791187,
+      "grad_norm": 0.28831130266189575,
+      "learning_rate": 0.00048015746779067923,
+      "loss": 0.1133,
+      "step": 1000
+    },
+    {
+      "epoch": 0.22946305644791187,
+      "eval_accuracy": 0.9673264805199486,
+      "eval_f1": 0.9251527095587492,
+      "eval_loss": 0.10191841423511505,
+      "eval_precision": 0.9235157682187369,
+      "eval_recall": 0.924869302071445,
+      "eval_runtime": 120.0892,
+      "eval_samples_per_second": 165.877,
+      "eval_steps_per_second": 10.367,
+      "step": 1000
+    },
+    {
+      "epoch": 0.24093620927030748,
+      "grad_norm": 0.8704720735549927,
+      "learning_rate": 0.0004781692381103865,
+      "loss": 0.1155,
+      "step": 1050
+    },
+    {
+      "epoch": 0.2524093620927031,
+      "grad_norm": 0.29704341292381287,
+      "learning_rate": 0.00047618100843009386,
+      "loss": 0.1067,
+      "step": 1100
+    },
+    {
+      "epoch": 0.2638825149150987,
+      "grad_norm": 0.3146417438983917,
+      "learning_rate": 0.0004741927787498012,
+      "loss": 0.1112,
+      "step": 1150
+    },
+    {
+      "epoch": 0.27535566773749426,
+      "grad_norm": 0.38349583745002747,
+      "learning_rate": 0.0004722045490695085,
+      "loss": 0.1072,
+      "step": 1200
+    },
+    {
+      "epoch": 0.28682882055988984,
+      "grad_norm": 0.3654622733592987,
+      "learning_rate": 0.00047021631938921584,
+      "loss": 0.1093,
+      "step": 1250
+    },
+    {
+      "epoch": 0.2983019733822855,
+      "grad_norm": 0.32350990176200867,
+      "learning_rate": 0.0004682280897089232,
+      "loss": 0.1003,
+      "step": 1300
+    },
+    {
+      "epoch": 0.30977512620468106,
+      "grad_norm": 0.4420382082462311,
+      "learning_rate": 0.0004662398600286305,
+      "loss": 0.1006,
+      "step": 1350
+    },
+    {
+      "epoch": 0.32124827902707664,
+      "grad_norm": 0.2918410301208496,
+      "learning_rate": 0.0004642516303483378,
+      "loss": 0.1009,
+      "step": 1400
+    },
+    {
+      "epoch": 0.3327214318494722,
+      "grad_norm": 0.5075766444206238,
+      "learning_rate": 0.00046226340066804516,
+      "loss": 0.1014,
+      "step": 1450
+    },
+    {
+      "epoch": 0.3441945846718678,
+      "grad_norm": 0.4021468460559845,
+      "learning_rate": 0.0004602751709877525,
+      "loss": 0.0978,
+      "step": 1500
+    },
+    {
+      "epoch": 0.3441945846718678,
+      "eval_accuracy": 0.9686035999975656,
+      "eval_f1": 0.9285241312222242,
+      "eval_loss": 0.09482518583536148,
+      "eval_precision": 0.9235005355255337,
+      "eval_recall": 0.9350632555342531,
+      "eval_runtime": 117.8493,
+      "eval_samples_per_second": 169.029,
+      "eval_steps_per_second": 10.564,
+      "step": 1500
+    },
+    {
+      "epoch": 0.35566773749426345,
+      "grad_norm": 0.41242897510528564,
+      "learning_rate": 0.00045828694130745984,
+      "loss": 0.0998,
+      "step": 1550
+    },
+    {
+      "epoch": 0.36714089031665903,
+      "grad_norm": 0.252006858587265,
+      "learning_rate": 0.0004562987116271672,
+      "loss": 0.0984,
+      "step": 1600
+    },
+    {
+      "epoch": 0.3786140431390546,
+      "grad_norm": 0.42907676100730896,
+      "learning_rate": 0.00045431048194687453,
+      "loss": 0.0929,
+      "step": 1650
+    },
+    {
+      "epoch": 0.3900871959614502,
+      "grad_norm": 0.3133847117424011,
+      "learning_rate": 0.0004523222522665819,
+      "loss": 0.0995,
+      "step": 1700
+    },
+    {
+      "epoch": 0.4015603487838458,
+      "grad_norm": 0.2857881188392639,
+      "learning_rate": 0.0004503340225862892,
+      "loss": 0.095,
+      "step": 1750
+    },
+    {
+      "epoch": 0.4130335016062414,
+      "grad_norm": 0.4199719727039337,
+      "learning_rate": 0.0004483457929059965,
+      "loss": 0.0953,
+      "step": 1800
+    },
+    {
+      "epoch": 0.424506654428637,
+      "grad_norm": 0.43477049469947815,
+      "learning_rate": 0.00044635756322570385,
+      "loss": 0.0901,
+      "step": 1850
+    },
+    {
+      "epoch": 0.4359798072510326,
+      "grad_norm": 0.35086822509765625,
+      "learning_rate": 0.00044436933354541114,
+      "loss": 0.0958,
+      "step": 1900
+    },
+    {
+      "epoch": 0.44745296007342816,
+      "grad_norm": 0.22967366874217987,
+      "learning_rate": 0.0004423811038651185,
+      "loss": 0.0962,
+      "step": 1950
+    },
+    {
+      "epoch": 0.45892611289582375,
+      "grad_norm": 0.32249829173088074,
+      "learning_rate": 0.0004403928741848258,
+      "loss": 0.0951,
+      "step": 2000
+    },
+    {
+      "epoch": 0.45892611289582375,
+      "eval_accuracy": 0.9690691774610883,
+      "eval_f1": 0.9302497431533381,
+      "eval_loss": 0.09183786809444427,
+      "eval_precision": 0.9194746692093528,
+      "eval_recall": 0.9462869502523432,
+      "eval_runtime": 121.4699,
+      "eval_samples_per_second": 163.991,
+      "eval_steps_per_second": 10.249,
+      "step": 2000
+    },
+    {
+      "epoch": 0.4703992657182194,
+      "grad_norm": 0.3483874797821045,
+      "learning_rate": 0.00043840464450453317,
+      "loss": 0.0934,
+      "step": 2050
+    },
+    {
+      "epoch": 0.48187241854061497,
+      "grad_norm": 0.446117103099823,
+      "learning_rate": 0.0004364164148242405,
+      "loss": 0.0916,
+      "step": 2100
+    },
+    {
+      "epoch": 0.49334557136301055,
+      "grad_norm": 0.26050707697868347,
+      "learning_rate": 0.0004344281851439478,
+      "loss": 0.0912,
+      "step": 2150
+    },
+    {
+      "epoch": 0.5048187241854062,
+      "grad_norm": 0.7050098776817322,
+      "learning_rate": 0.00043243995546365515,
+      "loss": 0.0999,
+      "step": 2200
+    },
+    {
+      "epoch": 0.5162918770078018,
+      "grad_norm": 0.29161128401756287,
+      "learning_rate": 0.0004304517257833625,
+      "loss": 0.0974,
+      "step": 2250
+    },
+    {
+      "epoch": 0.5277650298301974,
+      "grad_norm": 0.24913199245929718,
+      "learning_rate": 0.00042846349610306983,
+      "loss": 0.0943,
+      "step": 2300
+    },
+    {
+      "epoch": 0.5392381826525929,
+      "grad_norm": 0.2685433626174927,
+      "learning_rate": 0.0004264752664227772,
+      "loss": 0.0937,
+      "step": 2350
+    },
+    {
+      "epoch": 0.5507113354749885,
+      "grad_norm": 0.3128316402435303,
+      "learning_rate": 0.0004244870367424845,
+      "loss": 0.0883,
+      "step": 2400
+    },
+    {
+      "epoch": 0.5621844882973841,
+      "grad_norm": 0.35494470596313477,
+      "learning_rate": 0.00042249880706219186,
+      "loss": 0.0946,
+      "step": 2450
+    },
+    {
+      "epoch": 0.5736576411197797,
+      "grad_norm": 0.3994844853878021,
+      "learning_rate": 0.0004205105773818992,
+      "loss": 0.0947,
+      "step": 2500
+    },
+    {
+      "epoch": 0.5736576411197797,
+      "eval_accuracy": 0.9688650630577742,
+      "eval_f1": 0.9293096562713767,
+      "eval_loss": 0.08980941772460938,
+      "eval_precision": 0.9170516031608807,
+      "eval_recall": 0.9498239578921504,
+      "eval_runtime": 118.7417,
+      "eval_samples_per_second": 167.759,
+      "eval_steps_per_second": 10.485,
+      "step": 2500
+    },
+    {
+      "epoch": 0.5851307939421753,
+      "grad_norm": 0.2693902254104614,
+      "learning_rate": 0.0004185223477016065,
+      "loss": 0.0979,
+      "step": 2550
+    },
+    {
+      "epoch": 0.596603946764571,
+      "grad_norm": 0.32620301842689514,
+      "learning_rate": 0.00041653411802131384,
+      "loss": 0.0921,
+      "step": 2600
+    },
+    {
+      "epoch": 0.6080770995869665,
+      "grad_norm": 0.2753586173057556,
+      "learning_rate": 0.00041454588834102113,
+      "loss": 0.0955,
+      "step": 2650
+    },
+    {
+      "epoch": 0.6195502524093621,
+      "grad_norm": 0.23614081740379333,
+      "learning_rate": 0.00041255765866072847,
+      "loss": 0.1073,
+      "step": 2700
+    },
+    {
+      "epoch": 0.6310234052317577,
+      "grad_norm": 0.19146519899368286,
+      "learning_rate": 0.0004105694289804358,
+      "loss": 0.0918,
+      "step": 2750
+    },
+    {
+      "epoch": 0.6424965580541533,
+      "grad_norm": 0.3596530258655548,
+      "learning_rate": 0.00040858119930014316,
+      "loss": 0.0914,
+      "step": 2800
+    },
+    {
+      "epoch": 0.6539697108765489,
+      "grad_norm": 0.3061048090457916,
+      "learning_rate": 0.0004065929696198505,
+      "loss": 0.0892,
+      "step": 2850
+    },
+    {
+      "epoch": 0.6654428636989445,
+      "grad_norm": 0.2256966084241867,
+      "learning_rate": 0.00040460473993955784,
+      "loss": 0.0913,
+      "step": 2900
+    },
+    {
+      "epoch": 0.67691601652134,
+      "grad_norm": 0.3125688135623932,
+      "learning_rate": 0.00040261651025926513,
+      "loss": 0.0901,
+      "step": 2950
+    },
+    {
+      "epoch": 0.6883891693437356,
+      "grad_norm": 0.29263100028038025,
+      "learning_rate": 0.0004006282805789725,
+      "loss": 0.0873,
+      "step": 3000
+    },
+    {
+      "epoch": 0.6883891693437356,
+      "eval_accuracy": 0.9712428085954635,
+      "eval_f1": 0.9363408053418008,
+      "eval_loss": 0.0843813493847847,
+      "eval_precision": 0.9319932598753176,
+      "eval_recall": 0.9405207225324199,
+      "eval_runtime": 118.9982,
+      "eval_samples_per_second": 167.398,
+      "eval_steps_per_second": 10.462,
+      "step": 3000
+    },
+    {
+      "epoch": 0.6998623221661312,
+      "grad_norm": 0.27859926223754883,
+      "learning_rate": 0.0003986400508986798,
+      "loss": 0.0906,
+      "step": 3050
+    },
+    {
+      "epoch": 0.7113354749885269,
+      "grad_norm": 0.3490137755870819,
+      "learning_rate": 0.00039665182121838716,
+      "loss": 0.0902,
+      "step": 3100
+    },
+    {
+      "epoch": 0.7228086278109225,
+      "grad_norm": 0.48625850677490234,
+      "learning_rate": 0.0003946635915380945,
+      "loss": 0.0912,
+      "step": 3150
+    },
+    {
+      "epoch": 0.7342817806333181,
+      "grad_norm": 0.3152211904525757,
+      "learning_rate": 0.00039267536185780185,
+      "loss": 0.0877,
+      "step": 3200
+    },
+    {
+      "epoch": 0.7457549334557136,
+      "grad_norm": 0.39225655794143677,
+      "learning_rate": 0.0003906871321775092,
+      "loss": 0.0935,
+      "step": 3250
+    },
+    {
+      "epoch": 0.7572280862781092,
+      "grad_norm": 0.29573819041252136,
+      "learning_rate": 0.00038869890249721654,
+      "loss": 0.0945,
+      "step": 3300
+    },
+    {
+      "epoch": 0.7687012391005048,
+      "grad_norm": 0.33124953508377075,
+      "learning_rate": 0.00038671067281692377,
+      "loss": 0.0893,
+      "step": 3350
+    },
+    {
+      "epoch": 0.7801743919229004,
+      "grad_norm": 0.32787060737609863,
+      "learning_rate": 0.0003847224431366311,
+      "loss": 0.0851,
+      "step": 3400
+    },
+    {
+      "epoch": 0.791647544745296,
+      "grad_norm": 0.28153640031814575,
+      "learning_rate": 0.00038273421345633846,
+      "loss": 0.0809,
+      "step": 3450
+    },
+    {
+      "epoch": 0.8031206975676916,
+      "grad_norm": 0.16959865391254425,
+      "learning_rate": 0.0003807459837760458,
+      "loss": 0.0863,
+      "step": 3500
+    },
+    {
+      "epoch": 0.8031206975676916,
+      "eval_accuracy": 0.9710864457268696,
+      "eval_f1": 0.93562115593739,
+      "eval_loss": 0.08447056263685226,
+      "eval_precision": 0.9258313586536397,
+      "eval_recall": 0.9499548983029477,
+      "eval_runtime": 118.7686,
+      "eval_samples_per_second": 167.721,
+      "eval_steps_per_second": 10.483,
+      "step": 3500
+    },
+    {
+      "epoch": 0.8145938503900872,
+      "grad_norm": 0.25350427627563477,
+      "learning_rate": 0.00037875775409575315,
+      "loss": 0.0911,
+      "step": 3550
+    },
+    {
+      "epoch": 0.8260670032124828,
+      "grad_norm": 0.265812486410141,
+      "learning_rate": 0.0003767695244154605,
+      "loss": 0.0898,
+      "step": 3600
+    },
+    {
+      "epoch": 0.8375401560348784,
+      "grad_norm": 0.3460964560508728,
+      "learning_rate": 0.00037478129473516783,
+      "loss": 0.0904,
+      "step": 3650
+    },
+    {
+      "epoch": 0.849013308857274,
+      "grad_norm": 0.2548121213912964,
+      "learning_rate": 0.0003727930650548751,
+      "loss": 0.0844,
+      "step": 3700
+    },
+    {
+      "epoch": 0.8604864616796696,
+      "grad_norm": 0.3280368447303772,
+      "learning_rate": 0.00037080483537458247,
+      "loss": 0.0881,
+      "step": 3750
+    },
+    {
+      "epoch": 0.8719596145020652,
+      "grad_norm": 0.3780980706214905,
+      "learning_rate": 0.0003688166056942898,
+      "loss": 0.0869,
+      "step": 3800
+    },
+    {
+      "epoch": 0.8834327673244607,
+      "grad_norm": 0.27135029435157776,
+      "learning_rate": 0.00036682837601399715,
+      "loss": 0.0889,
+      "step": 3850
+    },
+    {
+      "epoch": 0.8949059201468563,
+      "grad_norm": 0.25403034687042236,
+      "learning_rate": 0.0003648401463337045,
+      "loss": 0.091,
+      "step": 3900
+    },
+    {
+      "epoch": 0.9063790729692519,
+      "grad_norm": 0.20978939533233643,
+      "learning_rate": 0.00036285191665341184,
+      "loss": 0.0884,
+      "step": 3950
+    },
+    {
+      "epoch": 0.9178522257916475,
+      "grad_norm": 0.3094377815723419,
+      "learning_rate": 0.0003608636869731192,
+      "loss": 0.0874,
+      "step": 4000
+    },
+    {
+      "epoch": 0.9178522257916475,
+      "eval_accuracy": 0.9705721804240243,
+      "eval_f1": 0.933095606006577,
+      "eval_loss": 0.08402583748102188,
+      "eval_precision": 0.9196705654871365,
+      "eval_recall": 0.9517589661850431,
+      "eval_runtime": 117.7619,
+      "eval_samples_per_second": 169.155,
+      "eval_steps_per_second": 10.572,
+      "step": 4000
+    },
+    {
+      "epoch": 0.9293253786140432,
+      "grad_norm": 0.2958351671695709,
+      "learning_rate": 0.0003588754572928265,
+      "loss": 0.0877,
+      "step": 4050
+    },
+    {
+      "epoch": 0.9407985314364388,
+      "grad_norm": 0.26525843143463135,
+      "learning_rate": 0.00035688722761253376,
+      "loss": 0.0849,
+      "step": 4100
+    },
+    {
+      "epoch": 0.9522716842588343,
+      "grad_norm": 0.3481367826461792,
+      "learning_rate": 0.0003548989979322411,
+      "loss": 0.0874,
+      "step": 4150
+    },
+    {
+      "epoch": 0.9637448370812299,
+      "grad_norm": 0.28561869263648987,
+      "learning_rate": 0.00035291076825194845,
+      "loss": 0.0858,
+      "step": 4200
+    },
+    {
+      "epoch": 0.9752179899036255,
+      "grad_norm": 0.29354673624038696,
+      "learning_rate": 0.0003509225385716558,
+      "loss": 0.0832,
+      "step": 4250
+    },
+    {
+      "epoch": 0.9866911427260211,
+      "grad_norm": 0.2132130116224289,
+      "learning_rate": 0.00034893430889136313,
+      "loss": 0.0848,
+      "step": 4300
+    },
+    {
+      "epoch": 0.9981642955484167,
+      "grad_norm": 0.3773078918457031,
+      "learning_rate": 0.0003469460792110705,
+      "loss": 0.084,
+      "step": 4350
+    },
+    {
+      "epoch": 1.0096374483708124,
+      "grad_norm": 0.2106359899044037,
+      "learning_rate": 0.0003449578495307778,
+      "loss": 0.0847,
+      "step": 4400
+    },
+    {
+      "epoch": 1.0211106011932078,
+      "grad_norm": 0.32030966877937317,
+      "learning_rate": 0.0003429696198504851,
+      "loss": 0.0865,
+      "step": 4450
+    },
+    {
+      "epoch": 1.0325837540156035,
+      "grad_norm": 0.22732515633106232,
+      "learning_rate": 0.00034098139017019245,
+      "loss": 0.0789,
+      "step": 4500
+    },
+    {
+      "epoch": 1.0325837540156035,
+      "eval_accuracy": 0.9716784243117108,
+      "eval_f1": 0.9380768901857637,
+      "eval_loss": 0.08132949471473694,
+      "eval_precision": 0.9318431066870271,
+      "eval_recall": 0.9486988402882629,
+      "eval_runtime": 119.6134,
+      "eval_samples_per_second": 166.536,
+      "eval_steps_per_second": 10.409,
+      "step": 4500
+    },
+    {
+      "epoch": 1.044056906837999,
+      "grad_norm": 0.28138020634651184,
+      "learning_rate": 0.0003389931604898998,
+      "loss": 0.0833,
+      "step": 4550
+    },
+    {
+      "epoch": 1.0555300596603947,
+      "grad_norm": 0.4682922959327698,
+      "learning_rate": 0.00033700493080960714,
+      "loss": 0.0747,
+      "step": 4600
+    },
+    {
+      "epoch": 1.0670032124827902,
+      "grad_norm": 0.2394658625125885,
+      "learning_rate": 0.0003350167011293145,
+      "loss": 0.0901,
+      "step": 4650
+    },
+    {
+      "epoch": 1.0784763653051859,
+      "grad_norm": 0.20465607941150665,
+      "learning_rate": 0.0003330284714490218,
+      "loss": 0.08,
+      "step": 4700
+    },
+    {
+      "epoch": 1.0899495181275816,
+      "grad_norm": 0.1981644332408905,
+      "learning_rate": 0.00033104024176872917,
+      "loss": 0.0882,
+      "step": 4750
+    },
+    {
+      "epoch": 1.101422670949977,
+      "grad_norm": 0.2882890999317169,
+      "learning_rate": 0.00032905201208843646,
+      "loss": 0.0847,
+      "step": 4800
+    },
+    {
+      "epoch": 1.1128958237723727,
+      "grad_norm": 0.32356107234954834,
+      "learning_rate": 0.00032706378240814375,
+      "loss": 0.0904,
+      "step": 4850
+    },
+    {
+      "epoch": 1.1243689765947682,
+      "grad_norm": 0.2824298143386841,
+      "learning_rate": 0.0003250755527278511,
+      "loss": 0.0808,
+      "step": 4900
+    },
+    {
+      "epoch": 1.135842129417164,
+      "grad_norm": 0.4001295268535614,
+      "learning_rate": 0.00032308732304755844,
+      "loss": 0.0772,
+      "step": 4950
+    },
+    {
+      "epoch": 1.1473152822395594,
+      "grad_norm": 0.3055209815502167,
+      "learning_rate": 0.0003210990933672658,
+      "loss": 0.0825,
+      "step": 5000
+    },
+    {
+      "epoch": 1.1473152822395594,
+      "eval_accuracy": 0.9722970875777192,
+      "eval_f1": 0.9379456924855788,
+      "eval_loss": 0.08106915652751923,
+      "eval_precision": 0.9298989900295542,
+      "eval_recall": 0.9480877850378757,
+      "eval_runtime": 117.2311,
+      "eval_samples_per_second": 169.921,
+      "eval_steps_per_second": 10.62,
+      "step": 5000
+    },
+    {
+      "epoch": 1.158788435061955,
+      "grad_norm": 0.21581706404685974,
+      "learning_rate": 0.0003191108636869731,
+      "loss": 0.0785,
+      "step": 5050
+    },
+    {
+      "epoch": 1.1702615878843505,
+      "grad_norm": 0.21193784475326538,
+      "learning_rate": 0.00031712263400668047,
+      "loss": 0.084,
+      "step": 5100
+    },
+    {
+      "epoch": 1.1817347407067462,
+      "grad_norm": 0.22416283190250397,
+      "learning_rate": 0.0003151344043263878,
+      "loss": 0.0803,
+      "step": 5150
+    },
+    {
+      "epoch": 1.193207893529142,
+      "grad_norm": 0.20190711319446564,
+      "learning_rate": 0.00031314617464609515,
+      "loss": 0.0826,
+      "step": 5200
+    },
+    {
+      "epoch": 1.2046810463515374,
+      "grad_norm": 0.27103227376937866,
+      "learning_rate": 0.00031115794496580244,
+      "loss": 0.0865,
+      "step": 5250
+    },
+    {
+      "epoch": 1.216154199173933,
+      "grad_norm": 0.33871927857398987,
+      "learning_rate": 0.0003091697152855098,
+      "loss": 0.0859,
+      "step": 5300
+    },
+    {
+      "epoch": 1.2276273519963286,
+      "grad_norm": 0.3408603370189667,
+      "learning_rate": 0.00030718148560521713,
+      "loss": 0.0812,
+      "step": 5350
+    },
+    {
+      "epoch": 1.2391005048187242,
+      "grad_norm": 0.22194986045360565,
+      "learning_rate": 0.00030519325592492447,
+      "loss": 0.0821,
+      "step": 5400
+    },
+    {
+      "epoch": 1.2505736576411197,
+      "grad_norm": 0.27335065603256226,
+      "learning_rate": 0.0003032050262446318,
+      "loss": 0.0836,
+      "step": 5450
+    },
+    {
+      "epoch": 1.2620468104635154,
+      "grad_norm": 0.23065054416656494,
+      "learning_rate": 0.00030121679656433916,
+      "loss": 0.0816,
+      "step": 5500
+    },
+    {
+      "epoch": 1.2620468104635154,
+      "eval_accuracy": 0.9727914564077644,
+      "eval_f1": 0.9393282739038921,
+      "eval_loss": 0.07892649620771408,
+      "eval_precision": 0.9297129712043323,
+      "eval_recall": 0.9530020918134762,
+      "eval_runtime": 120.0797,
+      "eval_samples_per_second": 165.89,
+      "eval_steps_per_second": 10.368,
+      "step": 5500
+    },
+    {
+      "epoch": 1.2735199632859109,
+      "grad_norm": 0.4883849620819092,
+      "learning_rate": 0.00029922856688404645,
+      "loss": 0.0853,
+      "step": 5550
+    },
+    {
+      "epoch": 1.2849931161083066,
+      "grad_norm": 0.28291383385658264,
+      "learning_rate": 0.00029724033720375374,
+      "loss": 0.0752,
+      "step": 5600
+    },
+    {
+      "epoch": 1.2964662689307023,
+      "grad_norm": 0.2305889129638672,
+      "learning_rate": 0.0002952521075234611,
+      "loss": 0.0845,
+      "step": 5650
+    },
+    {
+      "epoch": 1.3079394217530977,
+      "grad_norm": 0.32855790853500366,
+      "learning_rate": 0.0002932638778431684,
+      "loss": 0.0798,
+      "step": 5700
+    },
+    {
+      "epoch": 1.3194125745754932,
+      "grad_norm": 0.20923027396202087,
+      "learning_rate": 0.00029127564816287577,
+      "loss": 0.0782,
+      "step": 5750
+    },
+    {
+      "epoch": 1.330885727397889,
+      "grad_norm": 0.28620150685310364,
+      "learning_rate": 0.0002892874184825831,
+      "loss": 0.085,
+      "step": 5800
+    },
+    {
+      "epoch": 1.3423588802202846,
+      "grad_norm": 0.2952438294887543,
+      "learning_rate": 0.00028729918880229045,
+      "loss": 0.0837,
+      "step": 5850
+    },
+    {
+      "epoch": 1.35383203304268,
+      "grad_norm": 0.34050026535987854,
+      "learning_rate": 0.0002853109591219978,
+      "loss": 0.0808,
+      "step": 5900
+    },
+    {
+      "epoch": 1.3653051858650758,
+      "grad_norm": 0.2680424451828003,
+      "learning_rate": 0.00028332272944170514,
+      "loss": 0.081,
+      "step": 5950
+    },
+    {
+      "epoch": 1.3767783386874712,
+      "grad_norm": 0.2738819718360901,
+      "learning_rate": 0.00028133449976141243,
+      "loss": 0.0723,
+      "step": 6000
+    },
+    {
+      "epoch": 1.3767783386874712,
+      "eval_accuracy": 0.9722467612053424,
+      "eval_f1": 0.9398005425243338,
+      "eval_loss": 0.08021672815084457,
+      "eval_precision": 0.9293461722001554,
+      "eval_recall": 0.9521032909689914,
+      "eval_runtime": 119.0417,
+      "eval_samples_per_second": 167.336,
+      "eval_steps_per_second": 10.459,
+      "step": 6000
+    },
+    {
+      "epoch": 1.388251491509867,
+      "grad_norm": 0.27307409048080444,
+      "learning_rate": 0.0002793462700811198,
+      "loss": 0.0765,
+      "step": 6050
+    },
+    {
+      "epoch": 1.3997246443322626,
+      "grad_norm": 0.32893654704093933,
+      "learning_rate": 0.0002773580404008271,
+      "loss": 0.0755,
+      "step": 6100
+    },
+    {
+      "epoch": 1.411197797154658,
+      "grad_norm": 0.20027205348014832,
+      "learning_rate": 0.00027536981072053446,
+      "loss": 0.0833,
+      "step": 6150
+    },
+    {
+      "epoch": 1.4226709499770536,
+      "grad_norm": 0.37896206974983215,
+      "learning_rate": 0.0002733815810402418,
+      "loss": 0.0773,
+      "step": 6200
+    },
+    {
+      "epoch": 1.4341441027994493,
+      "grad_norm": 0.3074203431606293,
+      "learning_rate": 0.0002713933513599491,
+      "loss": 0.0773,
+      "step": 6250
+    },
+    {
+      "epoch": 1.445617255621845,
+      "grad_norm": 0.37647494673728943,
+      "learning_rate": 0.00026940512167965644,
+      "loss": 0.0736,
+      "step": 6300
+    },
+    {
+      "epoch": 1.4570904084442404,
+      "grad_norm": 0.28269490599632263,
+      "learning_rate": 0.0002674168919993638,
+      "loss": 0.0846,
+      "step": 6350
+    },
+    {
+      "epoch": 1.4685635612666361,
+      "grad_norm": 0.25752097368240356,
+      "learning_rate": 0.00026542866231907107,
+      "loss": 0.0774,
+      "step": 6400
+    },
+    {
+      "epoch": 1.4800367140890316,
+      "grad_norm": 0.2019287496805191,
+      "learning_rate": 0.0002634404326387784,
+      "loss": 0.0782,
+      "step": 6450
+    },
+    {
+      "epoch": 1.4915098669114273,
+      "grad_norm": 0.29280802607536316,
+      "learning_rate": 0.00026145220295848576,
+      "loss": 0.08,
+      "step": 6500
+    },
+    {
+      "epoch": 1.4915098669114273,
+      "eval_accuracy": 0.9731327394353241,
+      "eval_f1": 0.9410652647060472,
+      "eval_loss": 0.07659982889890671,
+      "eval_precision": 0.9339469742244639,
+      "eval_recall": 0.949541061942897,
+      "eval_runtime": 119.2428,
+      "eval_samples_per_second": 167.054,
+      "eval_steps_per_second": 10.441,
+      "step": 6500
+    },
+    {
+      "epoch": 1.502983019733823,
+      "grad_norm": 0.2708223760128021,
+      "learning_rate": 0.0002594639732781931,
+      "loss": 0.0738,
+      "step": 6550
+    },
+    {
+      "epoch": 1.5144561725562184,
+      "grad_norm": 0.23474043607711792,
+      "learning_rate": 0.00025747574359790044,
+      "loss": 0.0765,
+      "step": 6600
+    },
+    {
+      "epoch": 1.525929325378614,
+      "grad_norm": 0.2765410542488098,
+      "learning_rate": 0.0002554875139176078,
+      "loss": 0.0825,
+      "step": 6650
+    },
+    {
+      "epoch": 1.5374024782010096,
+      "grad_norm": 0.28831180930137634,
+      "learning_rate": 0.00025349928423731513,
+      "loss": 0.085,
+      "step": 6700
+    },
+    {
+      "epoch": 1.5488756310234053,
+      "grad_norm": 0.2184303253889084,
+      "learning_rate": 0.00025151105455702247,
+      "loss": 0.0735,
+      "step": 6750
+    },
+    {
+      "epoch": 1.5603487838458008,
+      "grad_norm": 0.2716689705848694,
+      "learning_rate": 0.00024952282487672976,
+      "loss": 0.0801,
+      "step": 6800
+    },
+    {
+      "epoch": 1.5718219366681965,
+      "grad_norm": 0.21314671635627747,
+      "learning_rate": 0.0002475345951964371,
+      "loss": 0.0786,
+      "step": 6850
+    },
+    {
+      "epoch": 1.583295089490592,
+      "grad_norm": 0.27691009640693665,
+      "learning_rate": 0.00024554636551614445,
+      "loss": 0.0828,
+      "step": 6900
+    },
+    {
+      "epoch": 1.5947682423129876,
+      "grad_norm": 0.17883095145225525,
+      "learning_rate": 0.00024355813583585176,
+      "loss": 0.0774,
+      "step": 6950
+    },
+    {
+      "epoch": 1.6062413951353833,
+      "grad_norm": 0.271129846572876,
+      "learning_rate": 0.0002415699061555591,
+      "loss": 0.0765,
+      "step": 7000
+    },
+    {
+      "epoch": 1.6062413951353833,
+      "eval_accuracy": 0.9728195455458352,
+      "eval_f1": 0.9411493302318824,
+      "eval_loss": 0.07761505246162415,
+      "eval_precision": 0.9323404680064952,
+      "eval_recall": 0.9529729939444102,
+      "eval_runtime": 119.0526,
+      "eval_samples_per_second": 167.321,
+      "eval_steps_per_second": 10.458,
+      "step": 7000
+    },
+    {
+      "epoch": 1.6177145479577788,
+      "grad_norm": 0.291202187538147,
+      "learning_rate": 0.00023958167647526642,
+      "loss": 0.0805,
+      "step": 7050
+    },
+    {
+      "epoch": 1.6291877007801743,
+      "grad_norm": 0.34442946314811707,
+      "learning_rate": 0.00023759344679497377,
+      "loss": 0.0793,
+      "step": 7100
+    },
+    {
+      "epoch": 1.64066085360257,
+      "grad_norm": 0.259473592042923,
+      "learning_rate": 0.00023560521711468108,
+      "loss": 0.0796,
+      "step": 7150
+    },
+    {
+      "epoch": 1.6521340064249657,
+      "grad_norm": 0.2742938697338104,
+      "learning_rate": 0.00023361698743438843,
+      "loss": 0.0736,
+      "step": 7200
+    },
+    {
+      "epoch": 1.6636071592473611,
+      "grad_norm": 0.2515215575695038,
+      "learning_rate": 0.00023162875775409574,
+      "loss": 0.0795,
+      "step": 7250
+    },
+    {
+      "epoch": 1.6750803120697566,
+      "grad_norm": 0.25539901852607727,
+      "learning_rate": 0.0002296405280738031,
+      "loss": 0.0717,
+      "step": 7300
+    },
+    {
+      "epoch": 1.6865534648921523,
+      "grad_norm": 0.3502283990383148,
+      "learning_rate": 0.00022765229839351043,
+      "loss": 0.0747,
+      "step": 7350
+    },
+    {
+      "epoch": 1.698026617714548,
+      "grad_norm": 0.33969902992248535,
+      "learning_rate": 0.00022566406871321777,
+      "loss": 0.0756,
+      "step": 7400
+    },
+    {
+      "epoch": 1.7094997705369437,
+      "grad_norm": 0.25111883878707886,
+      "learning_rate": 0.0002236758390329251,
+      "loss": 0.0773,
+      "step": 7450
+    },
+    {
+      "epoch": 1.7209729233593392,
+      "grad_norm": 0.19171655178070068,
+      "learning_rate": 0.0002216876093526324,
+      "loss": 0.0763,
+      "step": 7500
+    },
+    {
+      "epoch": 1.7209729233593392,
+      "eval_accuracy": 0.9737420396553088,
+      "eval_f1": 0.9419056166125817,
+      "eval_loss": 0.07485274225473404,
+      "eval_precision": 0.9380178663505828,
+      "eval_recall": 0.947572106136094,
+      "eval_runtime": 118.3867,
+      "eval_samples_per_second": 168.262,
+      "eval_steps_per_second": 10.516,
+      "step": 7500
+    },
+    {
+      "epoch": 1.7324460761817346,
+      "grad_norm": 0.29894569516181946,
+      "learning_rate": 0.00021969937967233975,
+      "loss": 0.0801,
+      "step": 7550
+    },
+    {
+      "epoch": 1.7439192290041303,
+      "grad_norm": 0.2702566981315613,
+      "learning_rate": 0.0002177111499920471,
+      "loss": 0.0766,
+      "step": 7600
+    },
+    {
+      "epoch": 1.755392381826526,
+      "grad_norm": 0.380188912153244,
+      "learning_rate": 0.0002157229203117544,
+      "loss": 0.0784,
+      "step": 7650
+    },
+    {
+      "epoch": 1.7668655346489215,
+      "grad_norm": 0.40631330013275146,
+      "learning_rate": 0.00021373469063146175,
+      "loss": 0.0749,
+      "step": 7700
+    },
+    {
+      "epoch": 1.778338687471317,
+      "grad_norm": 0.28044983744621277,
+      "learning_rate": 0.0002117464609511691,
+      "loss": 0.0728,
+      "step": 7750
+    },
+    {
+      "epoch": 1.7898118402937127,
+      "grad_norm": 0.2612505853176117,
+      "learning_rate": 0.00020975823127087644,
+      "loss": 0.0763,
+      "step": 7800
+    },
+    {
+      "epoch": 1.8012849931161083,
+      "grad_norm": 0.3105819523334503,
+      "learning_rate": 0.00020777000159058373,
+      "loss": 0.0796,
+      "step": 7850
+    },
+    {
+      "epoch": 1.812758145938504,
+      "grad_norm": 0.19942106306552887,
+      "learning_rate": 0.00020578177191029107,
+      "loss": 0.078,
+      "step": 7900
+    },
+    {
+      "epoch": 1.8242312987608995,
+      "grad_norm": 0.3573172986507416,
+      "learning_rate": 0.00020379354222999841,
+      "loss": 0.0767,
+      "step": 7950
+    },
+    {
+      "epoch": 1.835704451583295,
+      "grad_norm": 0.28859594464302063,
+      "learning_rate": 0.00020180531254970573,
+      "loss": 0.081,
+      "step": 8000
+    },
+    {
+      "epoch": 1.835704451583295,
+      "eval_accuracy": 0.9739487288962795,
+      "eval_f1": 0.9433389998606067,
+      "eval_loss": 0.07406344264745712,
+      "eval_precision": 0.9362268776515584,
+      "eval_recall": 0.9517735151195761,
+      "eval_runtime": 119.11,
+      "eval_samples_per_second": 167.24,
+      "eval_steps_per_second": 10.453,
+      "step": 8000
+    },
+    {
+      "epoch": 1.8471776044056907,
+      "grad_norm": 0.4359795153141022,
+      "learning_rate": 0.00019981708286941307,
+      "loss": 0.0763,
+      "step": 8050
+    },
+    {
+      "epoch": 1.8586507572280864,
+      "grad_norm": 0.3242838382720947,
+      "learning_rate": 0.00019782885318912042,
+      "loss": 0.0724,
+      "step": 8100
+    },
+    {
+      "epoch": 1.8701239100504818,
+      "grad_norm": 0.2099728137254715,
+      "learning_rate": 0.00019584062350882776,
+      "loss": 0.0786,
+      "step": 8150
+    },
+    {
+      "epoch": 1.8815970628728773,
+      "grad_norm": 0.2423078864812851,
+      "learning_rate": 0.00019385239382853508,
+      "loss": 0.0721,
+      "step": 8200
+    },
+    {
+      "epoch": 1.893070215695273,
+      "grad_norm": 0.3431580364704132,
+      "learning_rate": 0.0001918641641482424,
+      "loss": 0.0832,
+      "step": 8250
+    },
+    {
+      "epoch": 1.9045433685176687,
+      "grad_norm": 0.1967107057571411,
+      "learning_rate": 0.00018987593446794974,
+      "loss": 0.0767,
+      "step": 8300
+    },
+    {
+      "epoch": 1.9160165213400644,
+      "grad_norm": 0.28923261165618896,
+      "learning_rate": 0.00018788770478765708,
+      "loss": 0.0747,
+      "step": 8350
+    },
+    {
+      "epoch": 1.9274896741624599,
+      "grad_norm": 0.2539917826652527,
+      "learning_rate": 0.0001858994751073644,
+      "loss": 0.0805,
+      "step": 8400
+    },
+    {
+      "epoch": 1.9389628269848553,
+      "grad_norm": 0.2369006723165512,
+      "learning_rate": 0.00018391124542707174,
+      "loss": 0.0712,
+      "step": 8450
+    },
+    {
+      "epoch": 1.950435979807251,
+      "grad_norm": 0.31666308641433716,
+      "learning_rate": 0.00018192301574677908,
+      "loss": 0.0763,
+      "step": 8500
+    },
+    {
+      "epoch": 1.950435979807251,
+      "eval_accuracy": 0.9740484453364306,
+      "eval_f1": 0.9422648085299441,
+      "eval_loss": 0.07338932156562805,
+      "eval_precision": 0.9361482440420985,
+      "eval_recall": 0.9501505006450027,
+      "eval_runtime": 118.5531,
+      "eval_samples_per_second": 168.026,
+      "eval_steps_per_second": 10.502,
+      "step": 8500
+    },
+    {
+      "epoch": 1.9619091326296467,
+      "grad_norm": 0.21725589036941528,
+      "learning_rate": 0.00017993478606648643,
+      "loss": 0.0844,
+      "step": 8550
+    },
+    {
+      "epoch": 1.9733822854520422,
+      "grad_norm": 0.2819405198097229,
+      "learning_rate": 0.00017794655638619372,
+      "loss": 0.0759,
+      "step": 8600
+    },
+    {
+      "epoch": 1.9848554382744377,
+      "grad_norm": 0.2227209508419037,
+      "learning_rate": 0.00017595832670590106,
+      "loss": 0.0717,
+      "step": 8650
+    },
+    {
+      "epoch": 1.9963285910968334,
+      "grad_norm": 0.2700563073158264,
+      "learning_rate": 0.0001739700970256084,
+      "loss": 0.0765,
+      "step": 8700
+    },
+    {
+      "epoch": 2.007801743919229,
+      "grad_norm": 0.34059178829193115,
+      "learning_rate": 0.00017198186734531575,
+      "loss": 0.0742,
+      "step": 8750
+    },
+    {
+      "epoch": 2.0192748967416247,
+      "grad_norm": 0.2557748258113861,
+      "learning_rate": 0.00016999363766502306,
+      "loss": 0.0713,
+      "step": 8800
+    },
+    {
+      "epoch": 2.03074804956402,
+      "grad_norm": 0.31445449590682983,
+      "learning_rate": 0.0001680054079847304,
+      "loss": 0.0747,
+      "step": 8850
+    },
+    {
+      "epoch": 2.0422212023864157,
+      "grad_norm": 0.3940103352069855,
+      "learning_rate": 0.00016601717830443775,
+      "loss": 0.0706,
+      "step": 8900
+    },
+    {
+      "epoch": 2.0536943552088114,
+      "grad_norm": 0.27769824862480164,
+      "learning_rate": 0.00016402894862414507,
+      "loss": 0.0718,
+      "step": 8950
+    },
+    {
+      "epoch": 2.065167508031207,
+      "grad_norm": 0.28729698061943054,
+      "learning_rate": 0.00016204071894385238,
+      "loss": 0.073,
+      "step": 9000
+    },
+    {
+      "epoch": 2.065167508031207,
+      "eval_accuracy": 0.974420158263567,
+      "eval_f1": 0.9445772525431679,
+      "eval_loss": 0.07281184196472168,
+      "eval_precision": 0.9397438689480371,
+      "eval_recall": 0.9504560282701964,
+      "eval_runtime": 117.3567,
+      "eval_samples_per_second": 169.739,
+      "eval_steps_per_second": 10.609,
+      "step": 9000
+    },
+    {
+      "epoch": 2.0766406608536028,
+      "grad_norm": 0.267760306596756,
+      "learning_rate": 0.00016005248926355973,
+      "loss": 0.0668,
+      "step": 9050
+    },
+    {
+      "epoch": 2.088113813675998,
+      "grad_norm": 0.16492970287799835,
+      "learning_rate": 0.00015806425958326707,
+      "loss": 0.0676,
+      "step": 9100
+    },
+    {
+      "epoch": 2.0995869664983937,
+      "grad_norm": 0.20092828571796417,
+      "learning_rate": 0.00015607602990297438,
+      "loss": 0.0675,
+      "step": 9150
+    },
+    {
+      "epoch": 2.1110601193207894,
+      "grad_norm": 0.17862418293952942,
+      "learning_rate": 0.00015408780022268173,
+      "loss": 0.07,
+      "step": 9200
+    },
+    {
+      "epoch": 2.122533272143185,
+      "grad_norm": 0.264687716960907,
+      "learning_rate": 0.00015209957054238907,
+      "loss": 0.0721,
+      "step": 9250
+    },
+    {
+      "epoch": 2.1340064249655804,
+      "grad_norm": 0.18594799935817719,
+      "learning_rate": 0.0001501113408620964,
+      "loss": 0.0708,
+      "step": 9300
+    },
+    {
+      "epoch": 2.145479577787976,
+      "grad_norm": 0.23833034932613373,
+      "learning_rate": 0.0001481231111818037,
+      "loss": 0.075,
+      "step": 9350
+    },
+    {
+      "epoch": 2.1569527306103717,
+      "grad_norm": 0.2754266858100891,
+      "learning_rate": 0.00014613488150151105,
+      "loss": 0.0663,
+      "step": 9400
+    },
+    {
+      "epoch": 2.1684258834327674,
+      "grad_norm": 0.24011731147766113,
+      "learning_rate": 0.0001441466518212184,
+      "loss": 0.0711,
+      "step": 9450
+    },
+    {
+      "epoch": 2.179899036255163,
+      "grad_norm": 0.2121812105178833,
+      "learning_rate": 0.00014215842214092573,
+      "loss": 0.0706,
+      "step": 9500
+    },
+    {
+      "epoch": 2.179899036255163,
+      "eval_accuracy": 0.9745226836175251,
+      "eval_f1": 0.944122628074362,
+      "eval_loss": 0.07175323367118835,
+      "eval_precision": 0.938023404048986,
+      "eval_recall": 0.9516652063847191,
+      "eval_runtime": 119.8057,
+      "eval_samples_per_second": 166.269,
+      "eval_steps_per_second": 10.392,
+      "step": 9500
+    },
+    {
+      "epoch": 2.1913721890775584,
+      "grad_norm": 0.22729764878749847,
+      "learning_rate": 0.00014017019246063305,
+      "loss": 0.0693,
+      "step": 9550
+    },
+    {
+      "epoch": 2.202845341899954,
+      "grad_norm": 0.432969868183136,
+      "learning_rate": 0.0001381819627803404,
+      "loss": 0.0727,
+      "step": 9600
+    },
+    {
+      "epoch": 2.2143184947223498,
+      "grad_norm": 0.28356945514678955,
+      "learning_rate": 0.00013619373310004774,
+      "loss": 0.0686,
+      "step": 9650
+    },
+    {
+      "epoch": 2.2257916475447455,
+      "grad_norm": 0.2591719329357147,
+      "learning_rate": 0.00013420550341975505,
+      "loss": 0.0719,
+      "step": 9700
+    },
+    {
+      "epoch": 2.2372648003671407,
+      "grad_norm": 0.18898649513721466,
+      "learning_rate": 0.00013221727373946237,
+      "loss": 0.074,
+      "step": 9750
+    },
+    {
+      "epoch": 2.2487379531895364,
+      "grad_norm": 0.230307474732399,
+      "learning_rate": 0.0001302290440591697,
+      "loss": 0.0636,
+      "step": 9800
+    },
+    {
+      "epoch": 2.260211106011932,
+      "grad_norm": 0.21404670178890228,
+      "learning_rate": 0.00012824081437887706,
+      "loss": 0.0707,
+      "step": 9850
+    },
+    {
+      "epoch": 2.271684258834328,
+      "grad_norm": 0.182530015707016,
+      "learning_rate": 0.0001262525846985844,
+      "loss": 0.0728,
+      "step": 9900
+    },
+    {
+      "epoch": 2.283157411656723,
+      "grad_norm": 0.31098031997680664,
+      "learning_rate": 0.00012426435501829172,
+      "loss": 0.0666,
+      "step": 9950
+    },
+    {
+      "epoch": 2.2946305644791187,
+      "grad_norm": 0.22960515320301056,
+      "learning_rate": 0.00012227612533799906,
+      "loss": 0.0717,
+      "step": 10000
+    },
+    {
+      "epoch": 2.2946305644791187,
+      "eval_accuracy": 0.9739419406879124,
+      "eval_f1": 0.9426748850468422,
+      "eval_loss": 0.07281766831874847,
+      "eval_precision": 0.932625711260672,
+      "eval_recall": 0.9557954872438175,
+      "eval_runtime": 117.3059,
+      "eval_samples_per_second": 169.812,
+      "eval_steps_per_second": 10.613,
+      "step": 10000
+    },
+    {
+      "epoch": 2.3061037173015144,
+      "grad_norm": 0.23993970453739166,
+      "learning_rate": 0.00012028789565770639,
+      "loss": 0.0675,
+      "step": 10050
+    },
+    {
+      "epoch": 2.31757687012391,
+      "grad_norm": 0.23594702780246735,
+      "learning_rate": 0.00011829966597741372,
+      "loss": 0.0671,
+      "step": 10100
+    },
+    {
+      "epoch": 2.329050022946306,
+      "grad_norm": 0.4768570065498352,
+      "learning_rate": 0.00011631143629712105,
+      "loss": 0.0714,
+      "step": 10150
+    },
+    {
+      "epoch": 2.340523175768701,
+      "grad_norm": 0.37876570224761963,
+      "learning_rate": 0.00011432320661682838,
+      "loss": 0.0661,
+      "step": 10200
+    },
+    {
+      "epoch": 2.3519963285910968,
+      "grad_norm": 0.2580972909927368,
+      "learning_rate": 0.00011233497693653571,
+      "loss": 0.0696,
+      "step": 10250
+    },
+    {
+      "epoch": 2.3634694814134924,
+      "grad_norm": 0.20318330824375153,
+      "learning_rate": 0.00011034674725624304,
+      "loss": 0.0688,
+      "step": 10300
+    },
+    {
+      "epoch": 2.374942634235888,
+      "grad_norm": 0.2656238079071045,
+      "learning_rate": 0.00010835851757595037,
+      "loss": 0.0656,
+      "step": 10350
+    },
+    {
+      "epoch": 2.386415787058284,
+      "grad_norm": 0.2967742085456848,
+      "learning_rate": 0.00010637028789565771,
+      "loss": 0.0768,
+      "step": 10400
+    },
+    {
+      "epoch": 2.397888939880679,
+      "grad_norm": 0.22500257194042206,
+      "learning_rate": 0.00010438205821536504,
+      "loss": 0.0671,
+      "step": 10450
+    },
+    {
+      "epoch": 2.4093620927030748,
+      "grad_norm": 0.3866559863090515,
+      "learning_rate": 0.00010239382853507237,
+      "loss": 0.0712,
+      "step": 10500
+    },
+    {
+      "epoch": 2.4093620927030748,
+      "eval_accuracy": 0.9745250243790311,
+      "eval_f1": 0.9449194759736153,
+      "eval_loss": 0.0714457556605339,
+      "eval_precision": 0.9347134332940459,
+      "eval_recall": 0.9574427499426126,
+      "eval_runtime": 113.4582,
+      "eval_samples_per_second": 175.571,
+      "eval_steps_per_second": 10.973,
+      "step": 10500
+    },
+    {
+      "epoch": 2.4208352455254705,
+      "grad_norm": 0.2979605495929718,
+      "learning_rate": 0.0001004055988547797,
+      "loss": 0.0679,
+      "step": 10550
+    },
+    {
+      "epoch": 2.432308398347866,
+      "grad_norm": 0.3229621946811676,
+      "learning_rate": 9.841736917448704e-05,
+      "loss": 0.0707,
+      "step": 10600
+    },
+    {
+      "epoch": 2.4437815511702614,
+      "grad_norm": 0.26730015873908997,
+      "learning_rate": 9.642913949419436e-05,
+      "loss": 0.0655,
+      "step": 10650
+    },
+    {
+      "epoch": 2.455254703992657,
+      "grad_norm": 0.3086176812648773,
+      "learning_rate": 9.44409098139017e-05,
+      "loss": 0.0739,
+      "step": 10700
+    },
+    {
+      "epoch": 2.466727856815053,
+      "grad_norm": 0.3094359040260315,
+      "learning_rate": 9.245268013360903e-05,
+      "loss": 0.0717,
+      "step": 10750
+    },
+    {
+      "epoch": 2.4782010096374485,
+      "grad_norm": 0.20422030985355377,
+      "learning_rate": 9.046445045331638e-05,
+      "loss": 0.0691,
+      "step": 10800
+    },
+    {
+      "epoch": 2.4896741624598437,
+      "grad_norm": 0.32366958260536194,
+      "learning_rate": 8.84762207730237e-05,
+      "loss": 0.068,
+      "step": 10850
+    },
+    {
+      "epoch": 2.5011473152822394,
+      "grad_norm": 0.21282616257667542,
+      "learning_rate": 8.648799109273104e-05,
+      "loss": 0.0747,
+      "step": 10900
+    },
+    {
+      "epoch": 2.512620468104635,
+      "grad_norm": 0.24280066788196564,
+      "learning_rate": 8.449976141243837e-05,
+      "loss": 0.0676,
+      "step": 10950
+    },
+    {
+      "epoch": 2.524093620927031,
+      "grad_norm": 0.25853705406188965,
+      "learning_rate": 8.251153173214571e-05,
+      "loss": 0.0658,
+      "step": 11000
+    },
+    {
+      "epoch": 2.524093620927031,
+      "eval_accuracy": 0.9746734286585049,
+      "eval_f1": 0.9448772381464688,
+      "eval_loss": 0.07095114141702652,
+      "eval_precision": 0.9373394177599341,
+      "eval_recall": 0.9540754798723573,
+      "eval_runtime": 115.6195,
+      "eval_samples_per_second": 172.289,
+      "eval_steps_per_second": 10.768,
+      "step": 11000
+    },
+    {
+      "epoch": 2.5355667737494265,
+      "grad_norm": 0.2284245491027832,
+      "learning_rate": 8.052330205185303e-05,
+      "loss": 0.0743,
+      "step": 11050
+    },
+    {
+      "epoch": 2.5470399265718218,
+      "grad_norm": 0.19337309896945953,
+      "learning_rate": 7.853507237156037e-05,
+      "loss": 0.0676,
+      "step": 11100
+    },
+    {
+      "epoch": 2.5585130793942175,
+      "grad_norm": 0.22750115394592285,
+      "learning_rate": 7.65468426912677e-05,
+      "loss": 0.0723,
+      "step": 11150
+    },
+    {
+      "epoch": 2.569986232216613,
+      "grad_norm": 0.2701912820339203,
+      "learning_rate": 7.455861301097503e-05,
+      "loss": 0.0675,
+      "step": 11200
+    },
+    {
+      "epoch": 2.581459385039009,
+      "grad_norm": 0.22987499833106995,
+      "learning_rate": 7.257038333068236e-05,
+      "loss": 0.065,
+      "step": 11250
+    },
+    {
+      "epoch": 2.5929325378614045,
+      "grad_norm": 0.20396412909030914,
+      "learning_rate": 7.05821536503897e-05,
+      "loss": 0.0665,
+      "step": 11300
+    },
+    {
+      "epoch": 2.6044056906838,
+      "grad_norm": 0.17404744029045105,
+      "learning_rate": 6.859392397009703e-05,
+      "loss": 0.0626,
+      "step": 11350
+    },
+    {
+      "epoch": 2.6158788435061955,
+      "grad_norm": 0.24504683911800385,
+      "learning_rate": 6.660569428980435e-05,
+      "loss": 0.0715,
+      "step": 11400
+    },
+    {
+      "epoch": 2.627351996328591,
+      "grad_norm": 0.29088979959487915,
+      "learning_rate": 6.461746460951169e-05,
+      "loss": 0.0634,
+      "step": 11450
+    },
+    {
+      "epoch": 2.6388251491509864,
+      "grad_norm": 0.24859917163848877,
+      "learning_rate": 6.262923492921902e-05,
+      "loss": 0.0718,
+      "step": 11500
+    },
+    {
+      "epoch": 2.6388251491509864,
+      "eval_accuracy": 0.9750416304433823,
+      "eval_f1": 0.9463313377336133,
+      "eval_loss": 0.06995302438735962,
+      "eval_precision": 0.937154559060786,
+      "eval_recall": 0.9573328246594741,
+      "eval_runtime": 116.2346,
+      "eval_samples_per_second": 171.378,
+      "eval_steps_per_second": 10.711,
+      "step": 11500
+    },
+    {
+      "epoch": 2.650298301973382,
+      "grad_norm": 0.38858455419540405,
+      "learning_rate": 6.064100524892636e-05,
+      "loss": 0.0677,
+      "step": 11550
+    },
+    {
+      "epoch": 2.661771454795778,
+      "grad_norm": 0.15558552742004395,
+      "learning_rate": 5.865277556863369e-05,
+      "loss": 0.0683,
+      "step": 11600
+    },
+    {
+      "epoch": 2.6732446076181735,
+      "grad_norm": 0.2536437511444092,
+      "learning_rate": 5.6664545888341025e-05,
+      "loss": 0.0725,
+      "step": 11650
+    },
+    {
+      "epoch": 2.684717760440569,
+      "grad_norm": 0.22305089235305786,
+      "learning_rate": 5.4676316208048355e-05,
+      "loss": 0.0682,
+      "step": 11700
+    },
+    {
+      "epoch": 2.6961909132629645,
+      "grad_norm": 0.25250253081321716,
+      "learning_rate": 5.268808652775569e-05,
+      "loss": 0.0717,
+      "step": 11750
+    },
+    {
+      "epoch": 2.70766406608536,
+      "grad_norm": 0.21804587543010712,
+      "learning_rate": 5.069985684746302e-05,
+      "loss": 0.0675,
+      "step": 11800
+    },
+    {
+      "epoch": 2.719137218907756,
+      "grad_norm": 0.28288906812667847,
+      "learning_rate": 4.871162716717036e-05,
+      "loss": 0.0639,
+      "step": 11850
+    },
+    {
+      "epoch": 2.7306103717301515,
+      "grad_norm": 0.22967451810836792,
+      "learning_rate": 4.672339748687769e-05,
+      "loss": 0.0674,
+      "step": 11900
+    },
+    {
+      "epoch": 2.7420835245525472,
+      "grad_norm": 0.23140451312065125,
+      "learning_rate": 4.473516780658501e-05,
+      "loss": 0.0671,
+      "step": 11950
+    },
+    {
+      "epoch": 2.7535566773749425,
+      "grad_norm": 0.32377928495407104,
+      "learning_rate": 4.274693812629235e-05,
+      "loss": 0.0637,
+      "step": 12000
+    },
+    {
+      "epoch": 2.7535566773749425,
+      "eval_accuracy": 0.9755069738307545,
+      "eval_f1": 0.9476076338095303,
+      "eval_loss": 0.06893511861562729,
+      "eval_precision": 0.9417873864638041,
+      "eval_recall": 0.9544925493289708,
+      "eval_runtime": 115.4256,
+      "eval_samples_per_second": 172.579,
+      "eval_steps_per_second": 10.786,
+      "step": 12000
+    }
+  ],
+  "logging_steps": 50,
+  "max_steps": 13074,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 3,
+  "save_steps": 1000,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": false
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 2.878898410102747e+16,
+  "train_batch_size": 16,
+  "trial_name": null,
+  "trial_params": null
+}

checkpoint-12000/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:0dd103f98dee7758a7916a783307af9f65932119d66eedade7204a203817a6cc
+size 5841

checkpoint-13074/README.md ADDED Viewed

	@@ -0,0 +1,206 @@

+---
+base_model: EvanD/xlm-roberta-base-romanian-ner-ronec
+library_name: peft
+tags:
+- base_model:adapter:EvanD/xlm-roberta-base-romanian-ner-ronec
+- lora
+- transformers
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.17.1

checkpoint-13074/adapter_config.json ADDED Viewed

	@@ -0,0 +1,42 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "EvanD/xlm-roberta-base-romanian-ner-ronec",
+  "bias": "none",
+  "corda_config": null,
+  "eva_config": null,
+  "exclude_modules": null,
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 32,
+  "lora_bias": false,
+  "lora_dropout": 0.1,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": [
+    "classifier",
+    "score"
+  ],
+  "peft_type": "LORA",
+  "qalora_group_size": 16,
+  "r": 16,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "dense",
+    "key",
+    "query",
+    "value"
+  ],
+  "target_parameters": null,
+  "task_type": "TOKEN_CLS",
+  "trainable_token_indices": null,
+  "use_dora": false,
+  "use_qalora": false,
+  "use_rslora": false
+}

checkpoint-13074/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:c6c760fe7c57a14c48d1c0378241c4f3293589a2a52b782e4b6aad3f4bac1787
+size 10899068

checkpoint-13074/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:7861e7e52828953a705a399c9721b9f9efb994471ae0ab42dd782b3de7c8d1f7
+size 21881739

checkpoint-13074/rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b8880e9ef17393ddbb046cafcd85975c8879f1fa43a507df36313395ee20382a
+size 14645

checkpoint-13074/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:5b8e7fd99ca1ee21d46d64586227d157a8461913f02ab835e124fd93eb2a9476
+size 1465

checkpoint-13074/special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,51 @@

+{
+  "bos_token": {
+    "content": "<s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "cls_token": {
+    "content": "<s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "eos_token": {
+    "content": "</s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "mask_token": {
+    "content": "<mask>",
+    "lstrip": true,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "<pad>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "sep_token": {
+    "content": "</s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "unk_token": {
+    "content": "<unk>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

checkpoint-13074/tokenizer.json ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:8373f9cd3d27591e1924426bcc1c8799bc5a9affc4fc857982c5d66668dd1f41
+size 17082832

checkpoint-13074/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,59 @@

+{
+  "added_tokens_decoder": {
+    "0": {
+      "content": "<s>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "1": {
+      "content": "<pad>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "2": {
+      "content": "</s>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "3": {
+      "content": "<unk>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "250001": {
+      "content": "<mask>",
+      "lstrip": true,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "bos_token": "<s>",
+  "clean_up_tokenization_spaces": true,
+  "cls_token": "<s>",
+  "eos_token": "</s>",
+  "extra_special_tokens": {},
+  "mask_token": "<mask>",
+  "max_length": 512,
+  "model_max_length": 512,
+  "pad_token": "<pad>",
+  "sep_token": "</s>",
+  "stride": 0,
+  "tokenizer_class": "XLMRobertaTokenizer",
+  "truncation_side": "right",
+  "truncation_strategy": "longest_first",
+  "unk_token": "<unk>"
+}

checkpoint-13074/trainer_state.json ADDED Viewed

	@@ -0,0 +1,2173 @@

+{
+  "best_global_step": 12000,
+  "best_metric": 0.9476076338095303,
+  "best_model_checkpoint": "./models/financial_adapter_20250914_035417/checkpoint-12000",
+  "epoch": 3.0,
+  "eval_steps": 500,
+  "global_step": 13074,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.011473152822395595,
+      "grad_norm": 5.206177234649658,
+      "learning_rate": 4.9000000000000005e-05,
+      "loss": 3.7097,
+      "step": 50
+    },
+    {
+      "epoch": 0.02294630564479119,
+      "grad_norm": 0.7576159834861755,
+      "learning_rate": 9.900000000000001e-05,
+      "loss": 1.1878,
+      "step": 100
+    },
+    {
+      "epoch": 0.03441945846718678,
+      "grad_norm": 1.0443555116653442,
+      "learning_rate": 0.000149,
+      "loss": 0.5979,
+      "step": 150
+    },
+    {
+      "epoch": 0.04589261128958238,
+      "grad_norm": 0.8121919631958008,
+      "learning_rate": 0.000199,
+      "loss": 0.351,
+      "step": 200
+    },
+    {
+      "epoch": 0.05736576411197797,
+      "grad_norm": 0.5419031381607056,
+      "learning_rate": 0.000249,
+      "loss": 0.2513,
+      "step": 250
+    },
+    {
+      "epoch": 0.06883891693437356,
+      "grad_norm": 0.969489336013794,
+      "learning_rate": 0.000299,
+      "loss": 0.2124,
+      "step": 300
+    },
+    {
+      "epoch": 0.08031206975676916,
+      "grad_norm": 0.7236778140068054,
+      "learning_rate": 0.00034899999999999997,
+      "loss": 0.1806,
+      "step": 350
+    },
+    {
+      "epoch": 0.09178522257916476,
+      "grad_norm": 0.7271482348442078,
+      "learning_rate": 0.00039900000000000005,
+      "loss": 0.1677,
+      "step": 400
+    },
+    {
+      "epoch": 0.10325837540156035,
+      "grad_norm": 0.5921468734741211,
+      "learning_rate": 0.000449,
+      "loss": 0.151,
+      "step": 450
+    },
+    {
+      "epoch": 0.11473152822395594,
+      "grad_norm": 0.7947412729263306,
+      "learning_rate": 0.000499,
+      "loss": 0.146,
+      "step": 500
+    },
+    {
+      "epoch": 0.11473152822395594,
+      "eval_accuracy": 0.9611541265050512,
+      "eval_f1": 0.9108550122870944,
+      "eval_loss": 0.13393332064151764,
+      "eval_precision": 0.9024035519296109,
+      "eval_recall": 0.920942706295809,
+      "eval_runtime": 157.6023,
+      "eval_samples_per_second": 126.394,
+      "eval_steps_per_second": 7.9,
+      "step": 500
+    },
+    {
+      "epoch": 0.12620468104635155,
+      "grad_norm": 0.42396143078804016,
+      "learning_rate": 0.0004980515349133132,
+      "loss": 0.1366,
+      "step": 550
+    },
+    {
+      "epoch": 0.13767783386874713,
+      "grad_norm": 0.4833989441394806,
+      "learning_rate": 0.0004960633052330205,
+      "loss": 0.1306,
+      "step": 600
+    },
+    {
+      "epoch": 0.14915098669114274,
+      "grad_norm": 0.5300698280334473,
+      "learning_rate": 0.0004940750755527279,
+      "loss": 0.1307,
+      "step": 650
+    },
+    {
+      "epoch": 0.16062413951353832,
+      "grad_norm": 0.391825407743454,
+      "learning_rate": 0.0004920868458724352,
+      "loss": 0.1294,
+      "step": 700
+    },
+    {
+      "epoch": 0.1720972923359339,
+      "grad_norm": 0.4168291985988617,
+      "learning_rate": 0.0004900986161921426,
+      "loss": 0.1143,
+      "step": 750
+    },
+    {
+      "epoch": 0.18357044515832951,
+      "grad_norm": 0.48850780725479126,
+      "learning_rate": 0.00048811038651184986,
+      "loss": 0.1164,
+      "step": 800
+    },
+    {
+      "epoch": 0.1950435979807251,
+      "grad_norm": 0.5360251665115356,
+      "learning_rate": 0.0004861221568315572,
+      "loss": 0.1162,
+      "step": 850
+    },
+    {
+      "epoch": 0.2065167508031207,
+      "grad_norm": 0.35034602880477905,
+      "learning_rate": 0.00048413392715126454,
+      "loss": 0.1161,
+      "step": 900
+    },
+    {
+      "epoch": 0.2179899036255163,
+      "grad_norm": 0.5149801969528198,
+      "learning_rate": 0.0004821456974709719,
+      "loss": 0.1193,
+      "step": 950
+    },
+    {
+      "epoch": 0.22946305644791187,
+      "grad_norm": 0.28831130266189575,
+      "learning_rate": 0.00048015746779067923,
+      "loss": 0.1133,
+      "step": 1000
+    },
+    {
+      "epoch": 0.22946305644791187,
+      "eval_accuracy": 0.9673264805199486,
+      "eval_f1": 0.9251527095587492,
+      "eval_loss": 0.10191841423511505,
+      "eval_precision": 0.9235157682187369,
+      "eval_recall": 0.924869302071445,
+      "eval_runtime": 120.0892,
+      "eval_samples_per_second": 165.877,
+      "eval_steps_per_second": 10.367,
+      "step": 1000
+    },
+    {
+      "epoch": 0.24093620927030748,
+      "grad_norm": 0.8704720735549927,
+      "learning_rate": 0.0004781692381103865,
+      "loss": 0.1155,
+      "step": 1050
+    },
+    {
+      "epoch": 0.2524093620927031,
+      "grad_norm": 0.29704341292381287,
+      "learning_rate": 0.00047618100843009386,
+      "loss": 0.1067,
+      "step": 1100
+    },
+    {
+      "epoch": 0.2638825149150987,
+      "grad_norm": 0.3146417438983917,
+      "learning_rate": 0.0004741927787498012,
+      "loss": 0.1112,
+      "step": 1150
+    },
+    {
+      "epoch": 0.27535566773749426,
+      "grad_norm": 0.38349583745002747,
+      "learning_rate": 0.0004722045490695085,
+      "loss": 0.1072,
+      "step": 1200
+    },
+    {
+      "epoch": 0.28682882055988984,
+      "grad_norm": 0.3654622733592987,
+      "learning_rate": 0.00047021631938921584,
+      "loss": 0.1093,
+      "step": 1250
+    },
+    {
+      "epoch": 0.2983019733822855,
+      "grad_norm": 0.32350990176200867,
+      "learning_rate": 0.0004682280897089232,
+      "loss": 0.1003,
+      "step": 1300
+    },
+    {
+      "epoch": 0.30977512620468106,
+      "grad_norm": 0.4420382082462311,
+      "learning_rate": 0.0004662398600286305,
+      "loss": 0.1006,
+      "step": 1350
+    },
+    {
+      "epoch": 0.32124827902707664,
+      "grad_norm": 0.2918410301208496,
+      "learning_rate": 0.0004642516303483378,
+      "loss": 0.1009,
+      "step": 1400
+    },
+    {
+      "epoch": 0.3327214318494722,
+      "grad_norm": 0.5075766444206238,
+      "learning_rate": 0.00046226340066804516,
+      "loss": 0.1014,
+      "step": 1450
+    },
+    {
+      "epoch": 0.3441945846718678,
+      "grad_norm": 0.4021468460559845,
+      "learning_rate": 0.0004602751709877525,
+      "loss": 0.0978,
+      "step": 1500
+    },
+    {
+      "epoch": 0.3441945846718678,
+      "eval_accuracy": 0.9686035999975656,
+      "eval_f1": 0.9285241312222242,
+      "eval_loss": 0.09482518583536148,
+      "eval_precision": 0.9235005355255337,
+      "eval_recall": 0.9350632555342531,
+      "eval_runtime": 117.8493,
+      "eval_samples_per_second": 169.029,
+      "eval_steps_per_second": 10.564,
+      "step": 1500
+    },
+    {
+      "epoch": 0.35566773749426345,
+      "grad_norm": 0.41242897510528564,
+      "learning_rate": 0.00045828694130745984,
+      "loss": 0.0998,
+      "step": 1550
+    },
+    {
+      "epoch": 0.36714089031665903,
+      "grad_norm": 0.252006858587265,
+      "learning_rate": 0.0004562987116271672,
+      "loss": 0.0984,
+      "step": 1600
+    },
+    {
+      "epoch": 0.3786140431390546,
+      "grad_norm": 0.42907676100730896,
+      "learning_rate": 0.00045431048194687453,
+      "loss": 0.0929,
+      "step": 1650
+    },
+    {
+      "epoch": 0.3900871959614502,
+      "grad_norm": 0.3133847117424011,
+      "learning_rate": 0.0004523222522665819,
+      "loss": 0.0995,
+      "step": 1700
+    },
+    {
+      "epoch": 0.4015603487838458,
+      "grad_norm": 0.2857881188392639,
+      "learning_rate": 0.0004503340225862892,
+      "loss": 0.095,
+      "step": 1750
+    },
+    {
+      "epoch": 0.4130335016062414,
+      "grad_norm": 0.4199719727039337,
+      "learning_rate": 0.0004483457929059965,
+      "loss": 0.0953,
+      "step": 1800
+    },
+    {
+      "epoch": 0.424506654428637,
+      "grad_norm": 0.43477049469947815,
+      "learning_rate": 0.00044635756322570385,
+      "loss": 0.0901,
+      "step": 1850
+    },
+    {
+      "epoch": 0.4359798072510326,
+      "grad_norm": 0.35086822509765625,
+      "learning_rate": 0.00044436933354541114,
+      "loss": 0.0958,
+      "step": 1900
+    },
+    {
+      "epoch": 0.44745296007342816,
+      "grad_norm": 0.22967366874217987,
+      "learning_rate": 0.0004423811038651185,
+      "loss": 0.0962,
+      "step": 1950
+    },
+    {
+      "epoch": 0.45892611289582375,
+      "grad_norm": 0.32249829173088074,
+      "learning_rate": 0.0004403928741848258,
+      "loss": 0.0951,
+      "step": 2000
+    },
+    {
+      "epoch": 0.45892611289582375,
+      "eval_accuracy": 0.9690691774610883,
+      "eval_f1": 0.9302497431533381,
+      "eval_loss": 0.09183786809444427,
+      "eval_precision": 0.9194746692093528,
+      "eval_recall": 0.9462869502523432,
+      "eval_runtime": 121.4699,
+      "eval_samples_per_second": 163.991,
+      "eval_steps_per_second": 10.249,
+      "step": 2000
+    },
+    {
+      "epoch": 0.4703992657182194,
+      "grad_norm": 0.3483874797821045,
+      "learning_rate": 0.00043840464450453317,
+      "loss": 0.0934,
+      "step": 2050
+    },
+    {
+      "epoch": 0.48187241854061497,
+      "grad_norm": 0.446117103099823,
+      "learning_rate": 0.0004364164148242405,
+      "loss": 0.0916,
+      "step": 2100
+    },
+    {
+      "epoch": 0.49334557136301055,
+      "grad_norm": 0.26050707697868347,
+      "learning_rate": 0.0004344281851439478,
+      "loss": 0.0912,
+      "step": 2150
+    },
+    {
+      "epoch": 0.5048187241854062,
+      "grad_norm": 0.7050098776817322,
+      "learning_rate": 0.00043243995546365515,
+      "loss": 0.0999,
+      "step": 2200
+    },
+    {
+      "epoch": 0.5162918770078018,
+      "grad_norm": 0.29161128401756287,
+      "learning_rate": 0.0004304517257833625,
+      "loss": 0.0974,
+      "step": 2250
+    },
+    {
+      "epoch": 0.5277650298301974,
+      "grad_norm": 0.24913199245929718,
+      "learning_rate": 0.00042846349610306983,
+      "loss": 0.0943,
+      "step": 2300
+    },
+    {
+      "epoch": 0.5392381826525929,
+      "grad_norm": 0.2685433626174927,
+      "learning_rate": 0.0004264752664227772,
+      "loss": 0.0937,
+      "step": 2350
+    },
+    {
+      "epoch": 0.5507113354749885,
+      "grad_norm": 0.3128316402435303,
+      "learning_rate": 0.0004244870367424845,
+      "loss": 0.0883,
+      "step": 2400
+    },
+    {
+      "epoch": 0.5621844882973841,
+      "grad_norm": 0.35494470596313477,
+      "learning_rate": 0.00042249880706219186,
+      "loss": 0.0946,
+      "step": 2450
+    },
+    {
+      "epoch": 0.5736576411197797,
+      "grad_norm": 0.3994844853878021,
+      "learning_rate": 0.0004205105773818992,
+      "loss": 0.0947,
+      "step": 2500
+    },
+    {
+      "epoch": 0.5736576411197797,
+      "eval_accuracy": 0.9688650630577742,
+      "eval_f1": 0.9293096562713767,
+      "eval_loss": 0.08980941772460938,
+      "eval_precision": 0.9170516031608807,
+      "eval_recall": 0.9498239578921504,
+      "eval_runtime": 118.7417,
+      "eval_samples_per_second": 167.759,
+      "eval_steps_per_second": 10.485,
+      "step": 2500
+    },
+    {
+      "epoch": 0.5851307939421753,
+      "grad_norm": 0.2693902254104614,
+      "learning_rate": 0.0004185223477016065,
+      "loss": 0.0979,
+      "step": 2550
+    },
+    {
+      "epoch": 0.596603946764571,
+      "grad_norm": 0.32620301842689514,
+      "learning_rate": 0.00041653411802131384,
+      "loss": 0.0921,
+      "step": 2600
+    },
+    {
+      "epoch": 0.6080770995869665,
+      "grad_norm": 0.2753586173057556,
+      "learning_rate": 0.00041454588834102113,
+      "loss": 0.0955,
+      "step": 2650
+    },
+    {
+      "epoch": 0.6195502524093621,
+      "grad_norm": 0.23614081740379333,
+      "learning_rate": 0.00041255765866072847,
+      "loss": 0.1073,
+      "step": 2700
+    },
+    {
+      "epoch": 0.6310234052317577,
+      "grad_norm": 0.19146519899368286,
+      "learning_rate": 0.0004105694289804358,
+      "loss": 0.0918,
+      "step": 2750
+    },
+    {
+      "epoch": 0.6424965580541533,
+      "grad_norm": 0.3596530258655548,
+      "learning_rate": 0.00040858119930014316,
+      "loss": 0.0914,
+      "step": 2800
+    },
+    {
+      "epoch": 0.6539697108765489,
+      "grad_norm": 0.3061048090457916,
+      "learning_rate": 0.0004065929696198505,
+      "loss": 0.0892,
+      "step": 2850
+    },
+    {
+      "epoch": 0.6654428636989445,
+      "grad_norm": 0.2256966084241867,
+      "learning_rate": 0.00040460473993955784,
+      "loss": 0.0913,
+      "step": 2900
+    },
+    {
+      "epoch": 0.67691601652134,
+      "grad_norm": 0.3125688135623932,
+      "learning_rate": 0.00040261651025926513,
+      "loss": 0.0901,
+      "step": 2950
+    },
+    {
+      "epoch": 0.6883891693437356,
+      "grad_norm": 0.29263100028038025,
+      "learning_rate": 0.0004006282805789725,
+      "loss": 0.0873,
+      "step": 3000
+    },
+    {
+      "epoch": 0.6883891693437356,
+      "eval_accuracy": 0.9712428085954635,
+      "eval_f1": 0.9363408053418008,
+      "eval_loss": 0.0843813493847847,
+      "eval_precision": 0.9319932598753176,
+      "eval_recall": 0.9405207225324199,
+      "eval_runtime": 118.9982,
+      "eval_samples_per_second": 167.398,
+      "eval_steps_per_second": 10.462,
+      "step": 3000
+    },
+    {
+      "epoch": 0.6998623221661312,
+      "grad_norm": 0.27859926223754883,
+      "learning_rate": 0.0003986400508986798,
+      "loss": 0.0906,
+      "step": 3050
+    },
+    {
+      "epoch": 0.7113354749885269,
+      "grad_norm": 0.3490137755870819,
+      "learning_rate": 0.00039665182121838716,
+      "loss": 0.0902,
+      "step": 3100
+    },
+    {
+      "epoch": 0.7228086278109225,
+      "grad_norm": 0.48625850677490234,
+      "learning_rate": 0.0003946635915380945,
+      "loss": 0.0912,
+      "step": 3150
+    },
+    {
+      "epoch": 0.7342817806333181,
+      "grad_norm": 0.3152211904525757,
+      "learning_rate": 0.00039267536185780185,
+      "loss": 0.0877,
+      "step": 3200
+    },
+    {
+      "epoch": 0.7457549334557136,
+      "grad_norm": 0.39225655794143677,
+      "learning_rate": 0.0003906871321775092,
+      "loss": 0.0935,
+      "step": 3250
+    },
+    {
+      "epoch": 0.7572280862781092,
+      "grad_norm": 0.29573819041252136,
+      "learning_rate": 0.00038869890249721654,
+      "loss": 0.0945,
+      "step": 3300
+    },
+    {
+      "epoch": 0.7687012391005048,
+      "grad_norm": 0.33124953508377075,
+      "learning_rate": 0.00038671067281692377,
+      "loss": 0.0893,
+      "step": 3350
+    },
+    {
+      "epoch": 0.7801743919229004,
+      "grad_norm": 0.32787060737609863,
+      "learning_rate": 0.0003847224431366311,
+      "loss": 0.0851,
+      "step": 3400
+    },
+    {
+      "epoch": 0.791647544745296,
+      "grad_norm": 0.28153640031814575,
+      "learning_rate": 0.00038273421345633846,
+      "loss": 0.0809,
+      "step": 3450
+    },
+    {
+      "epoch": 0.8031206975676916,
+      "grad_norm": 0.16959865391254425,
+      "learning_rate": 0.0003807459837760458,
+      "loss": 0.0863,
+      "step": 3500
+    },
+    {
+      "epoch": 0.8031206975676916,
+      "eval_accuracy": 0.9710864457268696,
+      "eval_f1": 0.93562115593739,
+      "eval_loss": 0.08447056263685226,
+      "eval_precision": 0.9258313586536397,
+      "eval_recall": 0.9499548983029477,
+      "eval_runtime": 118.7686,
+      "eval_samples_per_second": 167.721,
+      "eval_steps_per_second": 10.483,
+      "step": 3500
+    },
+    {
+      "epoch": 0.8145938503900872,
+      "grad_norm": 0.25350427627563477,
+      "learning_rate": 0.00037875775409575315,
+      "loss": 0.0911,
+      "step": 3550
+    },
+    {
+      "epoch": 0.8260670032124828,
+      "grad_norm": 0.265812486410141,
+      "learning_rate": 0.0003767695244154605,
+      "loss": 0.0898,
+      "step": 3600
+    },
+    {
+      "epoch": 0.8375401560348784,
+      "grad_norm": 0.3460964560508728,
+      "learning_rate": 0.00037478129473516783,
+      "loss": 0.0904,
+      "step": 3650
+    },
+    {
+      "epoch": 0.849013308857274,
+      "grad_norm": 0.2548121213912964,
+      "learning_rate": 0.0003727930650548751,
+      "loss": 0.0844,
+      "step": 3700
+    },
+    {
+      "epoch": 0.8604864616796696,
+      "grad_norm": 0.3280368447303772,
+      "learning_rate": 0.00037080483537458247,
+      "loss": 0.0881,
+      "step": 3750
+    },
+    {
+      "epoch": 0.8719596145020652,
+      "grad_norm": 0.3780980706214905,
+      "learning_rate": 0.0003688166056942898,
+      "loss": 0.0869,
+      "step": 3800
+    },
+    {
+      "epoch": 0.8834327673244607,
+      "grad_norm": 0.27135029435157776,
+      "learning_rate": 0.00036682837601399715,
+      "loss": 0.0889,
+      "step": 3850
+    },
+    {
+      "epoch": 0.8949059201468563,
+      "grad_norm": 0.25403034687042236,
+      "learning_rate": 0.0003648401463337045,
+      "loss": 0.091,
+      "step": 3900
+    },
+    {
+      "epoch": 0.9063790729692519,
+      "grad_norm": 0.20978939533233643,
+      "learning_rate": 0.00036285191665341184,
+      "loss": 0.0884,
+      "step": 3950
+    },
+    {
+      "epoch": 0.9178522257916475,
+      "grad_norm": 0.3094377815723419,
+      "learning_rate": 0.0003608636869731192,
+      "loss": 0.0874,
+      "step": 4000
+    },
+    {
+      "epoch": 0.9178522257916475,
+      "eval_accuracy": 0.9705721804240243,
+      "eval_f1": 0.933095606006577,
+      "eval_loss": 0.08402583748102188,
+      "eval_precision": 0.9196705654871365,
+      "eval_recall": 0.9517589661850431,
+      "eval_runtime": 117.7619,
+      "eval_samples_per_second": 169.155,
+      "eval_steps_per_second": 10.572,
+      "step": 4000
+    },
+    {
+      "epoch": 0.9293253786140432,
+      "grad_norm": 0.2958351671695709,
+      "learning_rate": 0.0003588754572928265,
+      "loss": 0.0877,
+      "step": 4050
+    },
+    {
+      "epoch": 0.9407985314364388,
+      "grad_norm": 0.26525843143463135,
+      "learning_rate": 0.00035688722761253376,
+      "loss": 0.0849,
+      "step": 4100
+    },
+    {
+      "epoch": 0.9522716842588343,
+      "grad_norm": 0.3481367826461792,
+      "learning_rate": 0.0003548989979322411,
+      "loss": 0.0874,
+      "step": 4150
+    },
+    {
+      "epoch": 0.9637448370812299,
+      "grad_norm": 0.28561869263648987,
+      "learning_rate": 0.00035291076825194845,
+      "loss": 0.0858,
+      "step": 4200
+    },
+    {
+      "epoch": 0.9752179899036255,
+      "grad_norm": 0.29354673624038696,
+      "learning_rate": 0.0003509225385716558,
+      "loss": 0.0832,
+      "step": 4250
+    },
+    {
+      "epoch": 0.9866911427260211,
+      "grad_norm": 0.2132130116224289,
+      "learning_rate": 0.00034893430889136313,
+      "loss": 0.0848,
+      "step": 4300
+    },
+    {
+      "epoch": 0.9981642955484167,
+      "grad_norm": 0.3773078918457031,
+      "learning_rate": 0.0003469460792110705,
+      "loss": 0.084,
+      "step": 4350
+    },
+    {
+      "epoch": 1.0096374483708124,
+      "grad_norm": 0.2106359899044037,
+      "learning_rate": 0.0003449578495307778,
+      "loss": 0.0847,
+      "step": 4400
+    },
+    {
+      "epoch": 1.0211106011932078,
+      "grad_norm": 0.32030966877937317,
+      "learning_rate": 0.0003429696198504851,
+      "loss": 0.0865,
+      "step": 4450
+    },
+    {
+      "epoch": 1.0325837540156035,
+      "grad_norm": 0.22732515633106232,
+      "learning_rate": 0.00034098139017019245,
+      "loss": 0.0789,
+      "step": 4500
+    },
+    {
+      "epoch": 1.0325837540156035,
+      "eval_accuracy": 0.9716784243117108,
+      "eval_f1": 0.9380768901857637,
+      "eval_loss": 0.08132949471473694,
+      "eval_precision": 0.9318431066870271,
+      "eval_recall": 0.9486988402882629,
+      "eval_runtime": 119.6134,
+      "eval_samples_per_second": 166.536,
+      "eval_steps_per_second": 10.409,
+      "step": 4500
+    },
+    {
+      "epoch": 1.044056906837999,
+      "grad_norm": 0.28138020634651184,
+      "learning_rate": 0.0003389931604898998,
+      "loss": 0.0833,
+      "step": 4550
+    },
+    {
+      "epoch": 1.0555300596603947,
+      "grad_norm": 0.4682922959327698,
+      "learning_rate": 0.00033700493080960714,
+      "loss": 0.0747,
+      "step": 4600
+    },
+    {
+      "epoch": 1.0670032124827902,
+      "grad_norm": 0.2394658625125885,
+      "learning_rate": 0.0003350167011293145,
+      "loss": 0.0901,
+      "step": 4650
+    },
+    {
+      "epoch": 1.0784763653051859,
+      "grad_norm": 0.20465607941150665,
+      "learning_rate": 0.0003330284714490218,
+      "loss": 0.08,
+      "step": 4700
+    },
+    {
+      "epoch": 1.0899495181275816,
+      "grad_norm": 0.1981644332408905,
+      "learning_rate": 0.00033104024176872917,
+      "loss": 0.0882,
+      "step": 4750
+    },
+    {
+      "epoch": 1.101422670949977,
+      "grad_norm": 0.2882890999317169,
+      "learning_rate": 0.00032905201208843646,
+      "loss": 0.0847,
+      "step": 4800
+    },
+    {
+      "epoch": 1.1128958237723727,
+      "grad_norm": 0.32356107234954834,
+      "learning_rate": 0.00032706378240814375,
+      "loss": 0.0904,
+      "step": 4850
+    },
+    {
+      "epoch": 1.1243689765947682,
+      "grad_norm": 0.2824298143386841,
+      "learning_rate": 0.0003250755527278511,
+      "loss": 0.0808,
+      "step": 4900
+    },
+    {
+      "epoch": 1.135842129417164,
+      "grad_norm": 0.4001295268535614,
+      "learning_rate": 0.00032308732304755844,
+      "loss": 0.0772,
+      "step": 4950
+    },
+    {
+      "epoch": 1.1473152822395594,
+      "grad_norm": 0.3055209815502167,
+      "learning_rate": 0.0003210990933672658,
+      "loss": 0.0825,
+      "step": 5000
+    },
+    {
+      "epoch": 1.1473152822395594,
+      "eval_accuracy": 0.9722970875777192,
+      "eval_f1": 0.9379456924855788,
+      "eval_loss": 0.08106915652751923,
+      "eval_precision": 0.9298989900295542,
+      "eval_recall": 0.9480877850378757,
+      "eval_runtime": 117.2311,
+      "eval_samples_per_second": 169.921,
+      "eval_steps_per_second": 10.62,
+      "step": 5000
+    },
+    {
+      "epoch": 1.158788435061955,
+      "grad_norm": 0.21581706404685974,
+      "learning_rate": 0.0003191108636869731,
+      "loss": 0.0785,
+      "step": 5050
+    },
+    {
+      "epoch": 1.1702615878843505,
+      "grad_norm": 0.21193784475326538,
+      "learning_rate": 0.00031712263400668047,
+      "loss": 0.084,
+      "step": 5100
+    },
+    {
+      "epoch": 1.1817347407067462,
+      "grad_norm": 0.22416283190250397,
+      "learning_rate": 0.0003151344043263878,
+      "loss": 0.0803,
+      "step": 5150
+    },
+    {
+      "epoch": 1.193207893529142,
+      "grad_norm": 0.20190711319446564,
+      "learning_rate": 0.00031314617464609515,
+      "loss": 0.0826,
+      "step": 5200
+    },
+    {
+      "epoch": 1.2046810463515374,
+      "grad_norm": 0.27103227376937866,
+      "learning_rate": 0.00031115794496580244,
+      "loss": 0.0865,
+      "step": 5250
+    },
+    {
+      "epoch": 1.216154199173933,
+      "grad_norm": 0.33871927857398987,
+      "learning_rate": 0.0003091697152855098,
+      "loss": 0.0859,
+      "step": 5300
+    },
+    {
+      "epoch": 1.2276273519963286,
+      "grad_norm": 0.3408603370189667,
+      "learning_rate": 0.00030718148560521713,
+      "loss": 0.0812,
+      "step": 5350
+    },
+    {
+      "epoch": 1.2391005048187242,
+      "grad_norm": 0.22194986045360565,
+      "learning_rate": 0.00030519325592492447,
+      "loss": 0.0821,
+      "step": 5400
+    },
+    {
+      "epoch": 1.2505736576411197,
+      "grad_norm": 0.27335065603256226,
+      "learning_rate": 0.0003032050262446318,
+      "loss": 0.0836,
+      "step": 5450
+    },
+    {
+      "epoch": 1.2620468104635154,
+      "grad_norm": 0.23065054416656494,
+      "learning_rate": 0.00030121679656433916,
+      "loss": 0.0816,
+      "step": 5500
+    },
+    {
+      "epoch": 1.2620468104635154,
+      "eval_accuracy": 0.9727914564077644,
+      "eval_f1": 0.9393282739038921,
+      "eval_loss": 0.07892649620771408,
+      "eval_precision": 0.9297129712043323,
+      "eval_recall": 0.9530020918134762,
+      "eval_runtime": 120.0797,
+      "eval_samples_per_second": 165.89,
+      "eval_steps_per_second": 10.368,
+      "step": 5500
+    },
+    {
+      "epoch": 1.2735199632859109,
+      "grad_norm": 0.4883849620819092,
+      "learning_rate": 0.00029922856688404645,
+      "loss": 0.0853,
+      "step": 5550
+    },
+    {
+      "epoch": 1.2849931161083066,
+      "grad_norm": 0.28291383385658264,
+      "learning_rate": 0.00029724033720375374,
+      "loss": 0.0752,
+      "step": 5600
+    },
+    {
+      "epoch": 1.2964662689307023,
+      "grad_norm": 0.2305889129638672,
+      "learning_rate": 0.0002952521075234611,
+      "loss": 0.0845,
+      "step": 5650
+    },
+    {
+      "epoch": 1.3079394217530977,
+      "grad_norm": 0.32855790853500366,
+      "learning_rate": 0.0002932638778431684,
+      "loss": 0.0798,
+      "step": 5700
+    },
+    {
+      "epoch": 1.3194125745754932,
+      "grad_norm": 0.20923027396202087,
+      "learning_rate": 0.00029127564816287577,
+      "loss": 0.0782,
+      "step": 5750
+    },
+    {
+      "epoch": 1.330885727397889,
+      "grad_norm": 0.28620150685310364,
+      "learning_rate": 0.0002892874184825831,
+      "loss": 0.085,
+      "step": 5800
+    },
+    {
+      "epoch": 1.3423588802202846,
+      "grad_norm": 0.2952438294887543,
+      "learning_rate": 0.00028729918880229045,
+      "loss": 0.0837,
+      "step": 5850
+    },
+    {
+      "epoch": 1.35383203304268,
+      "grad_norm": 0.34050026535987854,
+      "learning_rate": 0.0002853109591219978,
+      "loss": 0.0808,
+      "step": 5900
+    },
+    {
+      "epoch": 1.3653051858650758,
+      "grad_norm": 0.2680424451828003,
+      "learning_rate": 0.00028332272944170514,
+      "loss": 0.081,
+      "step": 5950
+    },
+    {
+      "epoch": 1.3767783386874712,
+      "grad_norm": 0.2738819718360901,
+      "learning_rate": 0.00028133449976141243,
+      "loss": 0.0723,
+      "step": 6000
+    },
+    {
+      "epoch": 1.3767783386874712,
+      "eval_accuracy": 0.9722467612053424,
+      "eval_f1": 0.9398005425243338,
+      "eval_loss": 0.08021672815084457,
+      "eval_precision": 0.9293461722001554,
+      "eval_recall": 0.9521032909689914,
+      "eval_runtime": 119.0417,
+      "eval_samples_per_second": 167.336,
+      "eval_steps_per_second": 10.459,
+      "step": 6000
+    },
+    {
+      "epoch": 1.388251491509867,
+      "grad_norm": 0.27307409048080444,
+      "learning_rate": 0.0002793462700811198,
+      "loss": 0.0765,
+      "step": 6050
+    },
+    {
+      "epoch": 1.3997246443322626,
+      "grad_norm": 0.32893654704093933,
+      "learning_rate": 0.0002773580404008271,
+      "loss": 0.0755,
+      "step": 6100
+    },
+    {
+      "epoch": 1.411197797154658,
+      "grad_norm": 0.20027205348014832,
+      "learning_rate": 0.00027536981072053446,
+      "loss": 0.0833,
+      "step": 6150
+    },
+    {
+      "epoch": 1.4226709499770536,
+      "grad_norm": 0.37896206974983215,
+      "learning_rate": 0.0002733815810402418,
+      "loss": 0.0773,
+      "step": 6200
+    },
+    {
+      "epoch": 1.4341441027994493,
+      "grad_norm": 0.3074203431606293,
+      "learning_rate": 0.0002713933513599491,
+      "loss": 0.0773,
+      "step": 6250
+    },
+    {
+      "epoch": 1.445617255621845,
+      "grad_norm": 0.37647494673728943,
+      "learning_rate": 0.00026940512167965644,
+      "loss": 0.0736,
+      "step": 6300
+    },
+    {
+      "epoch": 1.4570904084442404,
+      "grad_norm": 0.28269490599632263,
+      "learning_rate": 0.0002674168919993638,
+      "loss": 0.0846,
+      "step": 6350
+    },
+    {
+      "epoch": 1.4685635612666361,
+      "grad_norm": 0.25752097368240356,
+      "learning_rate": 0.00026542866231907107,
+      "loss": 0.0774,
+      "step": 6400
+    },
+    {
+      "epoch": 1.4800367140890316,
+      "grad_norm": 0.2019287496805191,
+      "learning_rate": 0.0002634404326387784,
+      "loss": 0.0782,
+      "step": 6450
+    },
+    {
+      "epoch": 1.4915098669114273,
+      "grad_norm": 0.29280802607536316,
+      "learning_rate": 0.00026145220295848576,
+      "loss": 0.08,
+      "step": 6500
+    },
+    {
+      "epoch": 1.4915098669114273,
+      "eval_accuracy": 0.9731327394353241,
+      "eval_f1": 0.9410652647060472,
+      "eval_loss": 0.07659982889890671,
+      "eval_precision": 0.9339469742244639,
+      "eval_recall": 0.949541061942897,
+      "eval_runtime": 119.2428,
+      "eval_samples_per_second": 167.054,
+      "eval_steps_per_second": 10.441,
+      "step": 6500
+    },
+    {
+      "epoch": 1.502983019733823,
+      "grad_norm": 0.2708223760128021,
+      "learning_rate": 0.0002594639732781931,
+      "loss": 0.0738,
+      "step": 6550
+    },
+    {
+      "epoch": 1.5144561725562184,
+      "grad_norm": 0.23474043607711792,
+      "learning_rate": 0.00025747574359790044,
+      "loss": 0.0765,
+      "step": 6600
+    },
+    {
+      "epoch": 1.525929325378614,
+      "grad_norm": 0.2765410542488098,
+      "learning_rate": 0.0002554875139176078,
+      "loss": 0.0825,
+      "step": 6650
+    },
+    {
+      "epoch": 1.5374024782010096,
+      "grad_norm": 0.28831180930137634,
+      "learning_rate": 0.00025349928423731513,
+      "loss": 0.085,
+      "step": 6700
+    },
+    {
+      "epoch": 1.5488756310234053,
+      "grad_norm": 0.2184303253889084,
+      "learning_rate": 0.00025151105455702247,
+      "loss": 0.0735,
+      "step": 6750
+    },
+    {
+      "epoch": 1.5603487838458008,
+      "grad_norm": 0.2716689705848694,
+      "learning_rate": 0.00024952282487672976,
+      "loss": 0.0801,
+      "step": 6800
+    },
+    {
+      "epoch": 1.5718219366681965,
+      "grad_norm": 0.21314671635627747,
+      "learning_rate": 0.0002475345951964371,
+      "loss": 0.0786,
+      "step": 6850
+    },
+    {
+      "epoch": 1.583295089490592,
+      "grad_norm": 0.27691009640693665,
+      "learning_rate": 0.00024554636551614445,
+      "loss": 0.0828,
+      "step": 6900
+    },
+    {
+      "epoch": 1.5947682423129876,
+      "grad_norm": 0.17883095145225525,
+      "learning_rate": 0.00024355813583585176,
+      "loss": 0.0774,
+      "step": 6950
+    },
+    {
+      "epoch": 1.6062413951353833,
+      "grad_norm": 0.271129846572876,
+      "learning_rate": 0.0002415699061555591,
+      "loss": 0.0765,
+      "step": 7000
+    },
+    {
+      "epoch": 1.6062413951353833,
+      "eval_accuracy": 0.9728195455458352,
+      "eval_f1": 0.9411493302318824,
+      "eval_loss": 0.07761505246162415,
+      "eval_precision": 0.9323404680064952,
+      "eval_recall": 0.9529729939444102,
+      "eval_runtime": 119.0526,
+      "eval_samples_per_second": 167.321,
+      "eval_steps_per_second": 10.458,
+      "step": 7000
+    },
+    {
+      "epoch": 1.6177145479577788,
+      "grad_norm": 0.291202187538147,
+      "learning_rate": 0.00023958167647526642,
+      "loss": 0.0805,
+      "step": 7050
+    },
+    {
+      "epoch": 1.6291877007801743,
+      "grad_norm": 0.34442946314811707,
+      "learning_rate": 0.00023759344679497377,
+      "loss": 0.0793,
+      "step": 7100
+    },
+    {
+      "epoch": 1.64066085360257,
+      "grad_norm": 0.259473592042923,
+      "learning_rate": 0.00023560521711468108,
+      "loss": 0.0796,
+      "step": 7150
+    },
+    {
+      "epoch": 1.6521340064249657,
+      "grad_norm": 0.2742938697338104,
+      "learning_rate": 0.00023361698743438843,
+      "loss": 0.0736,
+      "step": 7200
+    },
+    {
+      "epoch": 1.6636071592473611,
+      "grad_norm": 0.2515215575695038,
+      "learning_rate": 0.00023162875775409574,
+      "loss": 0.0795,
+      "step": 7250
+    },
+    {
+      "epoch": 1.6750803120697566,
+      "grad_norm": 0.25539901852607727,
+      "learning_rate": 0.0002296405280738031,
+      "loss": 0.0717,
+      "step": 7300
+    },
+    {
+      "epoch": 1.6865534648921523,
+      "grad_norm": 0.3502283990383148,
+      "learning_rate": 0.00022765229839351043,
+      "loss": 0.0747,
+      "step": 7350
+    },
+    {
+      "epoch": 1.698026617714548,
+      "grad_norm": 0.33969902992248535,
+      "learning_rate": 0.00022566406871321777,
+      "loss": 0.0756,
+      "step": 7400
+    },
+    {
+      "epoch": 1.7094997705369437,
+      "grad_norm": 0.25111883878707886,
+      "learning_rate": 0.0002236758390329251,
+      "loss": 0.0773,
+      "step": 7450
+    },
+    {
+      "epoch": 1.7209729233593392,
+      "grad_norm": 0.19171655178070068,
+      "learning_rate": 0.0002216876093526324,
+      "loss": 0.0763,
+      "step": 7500
+    },
+    {
+      "epoch": 1.7209729233593392,
+      "eval_accuracy": 0.9737420396553088,
+      "eval_f1": 0.9419056166125817,
+      "eval_loss": 0.07485274225473404,
+      "eval_precision": 0.9380178663505828,
+      "eval_recall": 0.947572106136094,
+      "eval_runtime": 118.3867,
+      "eval_samples_per_second": 168.262,
+      "eval_steps_per_second": 10.516,
+      "step": 7500
+    },
+    {
+      "epoch": 1.7324460761817346,
+      "grad_norm": 0.29894569516181946,
+      "learning_rate": 0.00021969937967233975,
+      "loss": 0.0801,
+      "step": 7550
+    },
+    {
+      "epoch": 1.7439192290041303,
+      "grad_norm": 0.2702566981315613,
+      "learning_rate": 0.0002177111499920471,
+      "loss": 0.0766,
+      "step": 7600
+    },
+    {
+      "epoch": 1.755392381826526,
+      "grad_norm": 0.380188912153244,
+      "learning_rate": 0.0002157229203117544,
+      "loss": 0.0784,
+      "step": 7650
+    },
+    {
+      "epoch": 1.7668655346489215,
+      "grad_norm": 0.40631330013275146,
+      "learning_rate": 0.00021373469063146175,
+      "loss": 0.0749,
+      "step": 7700
+    },
+    {
+      "epoch": 1.778338687471317,
+      "grad_norm": 0.28044983744621277,
+      "learning_rate": 0.0002117464609511691,
+      "loss": 0.0728,
+      "step": 7750
+    },
+    {
+      "epoch": 1.7898118402937127,
+      "grad_norm": 0.2612505853176117,
+      "learning_rate": 0.00020975823127087644,
+      "loss": 0.0763,
+      "step": 7800
+    },
+    {
+      "epoch": 1.8012849931161083,
+      "grad_norm": 0.3105819523334503,
+      "learning_rate": 0.00020777000159058373,
+      "loss": 0.0796,
+      "step": 7850
+    },
+    {
+      "epoch": 1.812758145938504,
+      "grad_norm": 0.19942106306552887,
+      "learning_rate": 0.00020578177191029107,
+      "loss": 0.078,
+      "step": 7900
+    },
+    {
+      "epoch": 1.8242312987608995,
+      "grad_norm": 0.3573172986507416,
+      "learning_rate": 0.00020379354222999841,
+      "loss": 0.0767,
+      "step": 7950
+    },
+    {
+      "epoch": 1.835704451583295,
+      "grad_norm": 0.28859594464302063,
+      "learning_rate": 0.00020180531254970573,
+      "loss": 0.081,
+      "step": 8000
+    },
+    {
+      "epoch": 1.835704451583295,
+      "eval_accuracy": 0.9739487288962795,
+      "eval_f1": 0.9433389998606067,
+      "eval_loss": 0.07406344264745712,
+      "eval_precision": 0.9362268776515584,
+      "eval_recall": 0.9517735151195761,
+      "eval_runtime": 119.11,
+      "eval_samples_per_second": 167.24,
+      "eval_steps_per_second": 10.453,
+      "step": 8000
+    },
+    {
+      "epoch": 1.8471776044056907,
+      "grad_norm": 0.4359795153141022,
+      "learning_rate": 0.00019981708286941307,
+      "loss": 0.0763,
+      "step": 8050
+    },
+    {
+      "epoch": 1.8586507572280864,
+      "grad_norm": 0.3242838382720947,
+      "learning_rate": 0.00019782885318912042,
+      "loss": 0.0724,
+      "step": 8100
+    },
+    {
+      "epoch": 1.8701239100504818,
+      "grad_norm": 0.2099728137254715,
+      "learning_rate": 0.00019584062350882776,
+      "loss": 0.0786,
+      "step": 8150
+    },
+    {
+      "epoch": 1.8815970628728773,
+      "grad_norm": 0.2423078864812851,
+      "learning_rate": 0.00019385239382853508,
+      "loss": 0.0721,
+      "step": 8200
+    },
+    {
+      "epoch": 1.893070215695273,
+      "grad_norm": 0.3431580364704132,
+      "learning_rate": 0.0001918641641482424,
+      "loss": 0.0832,
+      "step": 8250
+    },
+    {
+      "epoch": 1.9045433685176687,
+      "grad_norm": 0.1967107057571411,
+      "learning_rate": 0.00018987593446794974,
+      "loss": 0.0767,
+      "step": 8300
+    },
+    {
+      "epoch": 1.9160165213400644,
+      "grad_norm": 0.28923261165618896,
+      "learning_rate": 0.00018788770478765708,
+      "loss": 0.0747,
+      "step": 8350
+    },
+    {
+      "epoch": 1.9274896741624599,
+      "grad_norm": 0.2539917826652527,
+      "learning_rate": 0.0001858994751073644,
+      "loss": 0.0805,
+      "step": 8400
+    },
+    {
+      "epoch": 1.9389628269848553,
+      "grad_norm": 0.2369006723165512,
+      "learning_rate": 0.00018391124542707174,
+      "loss": 0.0712,
+      "step": 8450
+    },
+    {
+      "epoch": 1.950435979807251,
+      "grad_norm": 0.31666308641433716,
+      "learning_rate": 0.00018192301574677908,
+      "loss": 0.0763,
+      "step": 8500
+    },
+    {
+      "epoch": 1.950435979807251,
+      "eval_accuracy": 0.9740484453364306,
+      "eval_f1": 0.9422648085299441,
+      "eval_loss": 0.07338932156562805,
+      "eval_precision": 0.9361482440420985,
+      "eval_recall": 0.9501505006450027,
+      "eval_runtime": 118.5531,
+      "eval_samples_per_second": 168.026,
+      "eval_steps_per_second": 10.502,
+      "step": 8500
+    },
+    {
+      "epoch": 1.9619091326296467,
+      "grad_norm": 0.21725589036941528,
+      "learning_rate": 0.00017993478606648643,
+      "loss": 0.0844,
+      "step": 8550
+    },
+    {
+      "epoch": 1.9733822854520422,
+      "grad_norm": 0.2819405198097229,
+      "learning_rate": 0.00017794655638619372,
+      "loss": 0.0759,
+      "step": 8600
+    },
+    {
+      "epoch": 1.9848554382744377,
+      "grad_norm": 0.2227209508419037,
+      "learning_rate": 0.00017595832670590106,
+      "loss": 0.0717,
+      "step": 8650
+    },
+    {
+      "epoch": 1.9963285910968334,
+      "grad_norm": 0.2700563073158264,
+      "learning_rate": 0.0001739700970256084,
+      "loss": 0.0765,
+      "step": 8700
+    },
+    {
+      "epoch": 2.007801743919229,
+      "grad_norm": 0.34059178829193115,
+      "learning_rate": 0.00017198186734531575,
+      "loss": 0.0742,
+      "step": 8750
+    },
+    {
+      "epoch": 2.0192748967416247,
+      "grad_norm": 0.2557748258113861,
+      "learning_rate": 0.00016999363766502306,
+      "loss": 0.0713,
+      "step": 8800
+    },
+    {
+      "epoch": 2.03074804956402,
+      "grad_norm": 0.31445449590682983,
+      "learning_rate": 0.0001680054079847304,
+      "loss": 0.0747,
+      "step": 8850
+    },
+    {
+      "epoch": 2.0422212023864157,
+      "grad_norm": 0.3940103352069855,
+      "learning_rate": 0.00016601717830443775,
+      "loss": 0.0706,
+      "step": 8900
+    },
+    {
+      "epoch": 2.0536943552088114,
+      "grad_norm": 0.27769824862480164,
+      "learning_rate": 0.00016402894862414507,
+      "loss": 0.0718,
+      "step": 8950
+    },
+    {
+      "epoch": 2.065167508031207,
+      "grad_norm": 0.28729698061943054,
+      "learning_rate": 0.00016204071894385238,
+      "loss": 0.073,
+      "step": 9000
+    },
+    {
+      "epoch": 2.065167508031207,
+      "eval_accuracy": 0.974420158263567,
+      "eval_f1": 0.9445772525431679,
+      "eval_loss": 0.07281184196472168,
+      "eval_precision": 0.9397438689480371,
+      "eval_recall": 0.9504560282701964,
+      "eval_runtime": 117.3567,
+      "eval_samples_per_second": 169.739,
+      "eval_steps_per_second": 10.609,
+      "step": 9000
+    },
+    {
+      "epoch": 2.0766406608536028,
+      "grad_norm": 0.267760306596756,
+      "learning_rate": 0.00016005248926355973,
+      "loss": 0.0668,
+      "step": 9050
+    },
+    {
+      "epoch": 2.088113813675998,
+      "grad_norm": 0.16492970287799835,
+      "learning_rate": 0.00015806425958326707,
+      "loss": 0.0676,
+      "step": 9100
+    },
+    {
+      "epoch": 2.0995869664983937,
+      "grad_norm": 0.20092828571796417,
+      "learning_rate": 0.00015607602990297438,
+      "loss": 0.0675,
+      "step": 9150
+    },
+    {
+      "epoch": 2.1110601193207894,
+      "grad_norm": 0.17862418293952942,
+      "learning_rate": 0.00015408780022268173,
+      "loss": 0.07,
+      "step": 9200
+    },
+    {
+      "epoch": 2.122533272143185,
+      "grad_norm": 0.264687716960907,
+      "learning_rate": 0.00015209957054238907,
+      "loss": 0.0721,
+      "step": 9250
+    },
+    {
+      "epoch": 2.1340064249655804,
+      "grad_norm": 0.18594799935817719,
+      "learning_rate": 0.0001501113408620964,
+      "loss": 0.0708,
+      "step": 9300
+    },
+    {
+      "epoch": 2.145479577787976,
+      "grad_norm": 0.23833034932613373,
+      "learning_rate": 0.0001481231111818037,
+      "loss": 0.075,
+      "step": 9350
+    },
+    {
+      "epoch": 2.1569527306103717,
+      "grad_norm": 0.2754266858100891,
+      "learning_rate": 0.00014613488150151105,
+      "loss": 0.0663,
+      "step": 9400
+    },
+    {
+      "epoch": 2.1684258834327674,
+      "grad_norm": 0.24011731147766113,
+      "learning_rate": 0.0001441466518212184,
+      "loss": 0.0711,
+      "step": 9450
+    },
+    {
+      "epoch": 2.179899036255163,
+      "grad_norm": 0.2121812105178833,
+      "learning_rate": 0.00014215842214092573,
+      "loss": 0.0706,
+      "step": 9500
+    },
+    {
+      "epoch": 2.179899036255163,
+      "eval_accuracy": 0.9745226836175251,
+      "eval_f1": 0.944122628074362,
+      "eval_loss": 0.07175323367118835,
+      "eval_precision": 0.938023404048986,
+      "eval_recall": 0.9516652063847191,
+      "eval_runtime": 119.8057,
+      "eval_samples_per_second": 166.269,
+      "eval_steps_per_second": 10.392,
+      "step": 9500
+    },
+    {
+      "epoch": 2.1913721890775584,
+      "grad_norm": 0.22729764878749847,
+      "learning_rate": 0.00014017019246063305,
+      "loss": 0.0693,
+      "step": 9550
+    },
+    {
+      "epoch": 2.202845341899954,
+      "grad_norm": 0.432969868183136,
+      "learning_rate": 0.0001381819627803404,
+      "loss": 0.0727,
+      "step": 9600
+    },
+    {
+      "epoch": 2.2143184947223498,
+      "grad_norm": 0.28356945514678955,
+      "learning_rate": 0.00013619373310004774,
+      "loss": 0.0686,
+      "step": 9650
+    },
+    {
+      "epoch": 2.2257916475447455,
+      "grad_norm": 0.2591719329357147,
+      "learning_rate": 0.00013420550341975505,
+      "loss": 0.0719,
+      "step": 9700
+    },
+    {
+      "epoch": 2.2372648003671407,
+      "grad_norm": 0.18898649513721466,
+      "learning_rate": 0.00013221727373946237,
+      "loss": 0.074,
+      "step": 9750
+    },
+    {
+      "epoch": 2.2487379531895364,
+      "grad_norm": 0.230307474732399,
+      "learning_rate": 0.0001302290440591697,
+      "loss": 0.0636,
+      "step": 9800
+    },
+    {
+      "epoch": 2.260211106011932,
+      "grad_norm": 0.21404670178890228,
+      "learning_rate": 0.00012824081437887706,
+      "loss": 0.0707,
+      "step": 9850
+    },
+    {
+      "epoch": 2.271684258834328,
+      "grad_norm": 0.182530015707016,
+      "learning_rate": 0.0001262525846985844,
+      "loss": 0.0728,
+      "step": 9900
+    },
+    {
+      "epoch": 2.283157411656723,
+      "grad_norm": 0.31098031997680664,
+      "learning_rate": 0.00012426435501829172,
+      "loss": 0.0666,
+      "step": 9950
+    },
+    {
+      "epoch": 2.2946305644791187,
+      "grad_norm": 0.22960515320301056,
+      "learning_rate": 0.00012227612533799906,
+      "loss": 0.0717,
+      "step": 10000
+    },
+    {
+      "epoch": 2.2946305644791187,
+      "eval_accuracy": 0.9739419406879124,
+      "eval_f1": 0.9426748850468422,
+      "eval_loss": 0.07281766831874847,
+      "eval_precision": 0.932625711260672,
+      "eval_recall": 0.9557954872438175,
+      "eval_runtime": 117.3059,
+      "eval_samples_per_second": 169.812,
+      "eval_steps_per_second": 10.613,
+      "step": 10000
+    },
+    {
+      "epoch": 2.3061037173015144,
+      "grad_norm": 0.23993970453739166,
+      "learning_rate": 0.00012028789565770639,
+      "loss": 0.0675,
+      "step": 10050
+    },
+    {
+      "epoch": 2.31757687012391,
+      "grad_norm": 0.23594702780246735,
+      "learning_rate": 0.00011829966597741372,
+      "loss": 0.0671,
+      "step": 10100
+    },
+    {
+      "epoch": 2.329050022946306,
+      "grad_norm": 0.4768570065498352,
+      "learning_rate": 0.00011631143629712105,
+      "loss": 0.0714,
+      "step": 10150
+    },
+    {
+      "epoch": 2.340523175768701,
+      "grad_norm": 0.37876570224761963,
+      "learning_rate": 0.00011432320661682838,
+      "loss": 0.0661,
+      "step": 10200
+    },
+    {
+      "epoch": 2.3519963285910968,
+      "grad_norm": 0.2580972909927368,
+      "learning_rate": 0.00011233497693653571,
+      "loss": 0.0696,
+      "step": 10250
+    },
+    {
+      "epoch": 2.3634694814134924,
+      "grad_norm": 0.20318330824375153,
+      "learning_rate": 0.00011034674725624304,
+      "loss": 0.0688,
+      "step": 10300
+    },
+    {
+      "epoch": 2.374942634235888,
+      "grad_norm": 0.2656238079071045,
+      "learning_rate": 0.00010835851757595037,
+      "loss": 0.0656,
+      "step": 10350
+    },
+    {
+      "epoch": 2.386415787058284,
+      "grad_norm": 0.2967742085456848,
+      "learning_rate": 0.00010637028789565771,
+      "loss": 0.0768,
+      "step": 10400
+    },
+    {
+      "epoch": 2.397888939880679,
+      "grad_norm": 0.22500257194042206,
+      "learning_rate": 0.00010438205821536504,
+      "loss": 0.0671,
+      "step": 10450
+    },
+    {
+      "epoch": 2.4093620927030748,
+      "grad_norm": 0.3866559863090515,
+      "learning_rate": 0.00010239382853507237,
+      "loss": 0.0712,
+      "step": 10500
+    },
+    {
+      "epoch": 2.4093620927030748,
+      "eval_accuracy": 0.9745250243790311,
+      "eval_f1": 0.9449194759736153,
+      "eval_loss": 0.0714457556605339,
+      "eval_precision": 0.9347134332940459,
+      "eval_recall": 0.9574427499426126,
+      "eval_runtime": 113.4582,
+      "eval_samples_per_second": 175.571,
+      "eval_steps_per_second": 10.973,
+      "step": 10500
+    },
+    {
+      "epoch": 2.4208352455254705,
+      "grad_norm": 0.2979605495929718,
+      "learning_rate": 0.0001004055988547797,
+      "loss": 0.0679,
+      "step": 10550
+    },
+    {
+      "epoch": 2.432308398347866,
+      "grad_norm": 0.3229621946811676,
+      "learning_rate": 9.841736917448704e-05,
+      "loss": 0.0707,
+      "step": 10600
+    },
+    {
+      "epoch": 2.4437815511702614,
+      "grad_norm": 0.26730015873908997,
+      "learning_rate": 9.642913949419436e-05,
+      "loss": 0.0655,
+      "step": 10650
+    },
+    {
+      "epoch": 2.455254703992657,
+      "grad_norm": 0.3086176812648773,
+      "learning_rate": 9.44409098139017e-05,
+      "loss": 0.0739,
+      "step": 10700
+    },
+    {
+      "epoch": 2.466727856815053,
+      "grad_norm": 0.3094359040260315,
+      "learning_rate": 9.245268013360903e-05,
+      "loss": 0.0717,
+      "step": 10750
+    },
+    {
+      "epoch": 2.4782010096374485,
+      "grad_norm": 0.20422030985355377,
+      "learning_rate": 9.046445045331638e-05,
+      "loss": 0.0691,
+      "step": 10800
+    },
+    {
+      "epoch": 2.4896741624598437,
+      "grad_norm": 0.32366958260536194,
+      "learning_rate": 8.84762207730237e-05,
+      "loss": 0.068,
+      "step": 10850
+    },
+    {
+      "epoch": 2.5011473152822394,
+      "grad_norm": 0.21282616257667542,
+      "learning_rate": 8.648799109273104e-05,
+      "loss": 0.0747,
+      "step": 10900
+    },
+    {
+      "epoch": 2.512620468104635,
+      "grad_norm": 0.24280066788196564,
+      "learning_rate": 8.449976141243837e-05,
+      "loss": 0.0676,
+      "step": 10950
+    },
+    {
+      "epoch": 2.524093620927031,
+      "grad_norm": 0.25853705406188965,
+      "learning_rate": 8.251153173214571e-05,
+      "loss": 0.0658,
+      "step": 11000
+    },
+    {
+      "epoch": 2.524093620927031,
+      "eval_accuracy": 0.9746734286585049,
+      "eval_f1": 0.9448772381464688,
+      "eval_loss": 0.07095114141702652,
+      "eval_precision": 0.9373394177599341,
+      "eval_recall": 0.9540754798723573,
+      "eval_runtime": 115.6195,
+      "eval_samples_per_second": 172.289,
+      "eval_steps_per_second": 10.768,
+      "step": 11000
+    },
+    {
+      "epoch": 2.5355667737494265,
+      "grad_norm": 0.2284245491027832,
+      "learning_rate": 8.052330205185303e-05,
+      "loss": 0.0743,
+      "step": 11050
+    },
+    {
+      "epoch": 2.5470399265718218,
+      "grad_norm": 0.19337309896945953,
+      "learning_rate": 7.853507237156037e-05,
+      "loss": 0.0676,
+      "step": 11100
+    },
+    {
+      "epoch": 2.5585130793942175,
+      "grad_norm": 0.22750115394592285,
+      "learning_rate": 7.65468426912677e-05,
+      "loss": 0.0723,
+      "step": 11150
+    },
+    {
+      "epoch": 2.569986232216613,
+      "grad_norm": 0.2701912820339203,
+      "learning_rate": 7.455861301097503e-05,
+      "loss": 0.0675,
+      "step": 11200
+    },
+    {
+      "epoch": 2.581459385039009,
+      "grad_norm": 0.22987499833106995,
+      "learning_rate": 7.257038333068236e-05,
+      "loss": 0.065,
+      "step": 11250
+    },
+    {
+      "epoch": 2.5929325378614045,
+      "grad_norm": 0.20396412909030914,
+      "learning_rate": 7.05821536503897e-05,
+      "loss": 0.0665,
+      "step": 11300
+    },
+    {
+      "epoch": 2.6044056906838,
+      "grad_norm": 0.17404744029045105,
+      "learning_rate": 6.859392397009703e-05,
+      "loss": 0.0626,
+      "step": 11350
+    },
+    {
+      "epoch": 2.6158788435061955,
+      "grad_norm": 0.24504683911800385,
+      "learning_rate": 6.660569428980435e-05,
+      "loss": 0.0715,
+      "step": 11400
+    },
+    {
+      "epoch": 2.627351996328591,
+      "grad_norm": 0.29088979959487915,
+      "learning_rate": 6.461746460951169e-05,
+      "loss": 0.0634,
+      "step": 11450
+    },
+    {
+      "epoch": 2.6388251491509864,
+      "grad_norm": 0.24859917163848877,
+      "learning_rate": 6.262923492921902e-05,
+      "loss": 0.0718,
+      "step": 11500
+    },
+    {
+      "epoch": 2.6388251491509864,
+      "eval_accuracy": 0.9750416304433823,
+      "eval_f1": 0.9463313377336133,
+      "eval_loss": 0.06995302438735962,
+      "eval_precision": 0.937154559060786,
+      "eval_recall": 0.9573328246594741,
+      "eval_runtime": 116.2346,
+      "eval_samples_per_second": 171.378,
+      "eval_steps_per_second": 10.711,
+      "step": 11500
+    },
+    {
+      "epoch": 2.650298301973382,
+      "grad_norm": 0.38858455419540405,
+      "learning_rate": 6.064100524892636e-05,
+      "loss": 0.0677,
+      "step": 11550
+    },
+    {
+      "epoch": 2.661771454795778,
+      "grad_norm": 0.15558552742004395,
+      "learning_rate": 5.865277556863369e-05,
+      "loss": 0.0683,
+      "step": 11600
+    },
+    {
+      "epoch": 2.6732446076181735,
+      "grad_norm": 0.2536437511444092,
+      "learning_rate": 5.6664545888341025e-05,
+      "loss": 0.0725,
+      "step": 11650
+    },
+    {
+      "epoch": 2.684717760440569,
+      "grad_norm": 0.22305089235305786,
+      "learning_rate": 5.4676316208048355e-05,
+      "loss": 0.0682,
+      "step": 11700
+    },
+    {
+      "epoch": 2.6961909132629645,
+      "grad_norm": 0.25250253081321716,
+      "learning_rate": 5.268808652775569e-05,
+      "loss": 0.0717,
+      "step": 11750
+    },
+    {
+      "epoch": 2.70766406608536,
+      "grad_norm": 0.21804587543010712,
+      "learning_rate": 5.069985684746302e-05,
+      "loss": 0.0675,
+      "step": 11800
+    },
+    {
+      "epoch": 2.719137218907756,
+      "grad_norm": 0.28288906812667847,
+      "learning_rate": 4.871162716717036e-05,
+      "loss": 0.0639,
+      "step": 11850
+    },
+    {
+      "epoch": 2.7306103717301515,
+      "grad_norm": 0.22967451810836792,
+      "learning_rate": 4.672339748687769e-05,
+      "loss": 0.0674,
+      "step": 11900
+    },
+    {
+      "epoch": 2.7420835245525472,
+      "grad_norm": 0.23140451312065125,
+      "learning_rate": 4.473516780658501e-05,
+      "loss": 0.0671,
+      "step": 11950
+    },
+    {
+      "epoch": 2.7535566773749425,
+      "grad_norm": 0.32377928495407104,
+      "learning_rate": 4.274693812629235e-05,
+      "loss": 0.0637,
+      "step": 12000
+    },
+    {
+      "epoch": 2.7535566773749425,
+      "eval_accuracy": 0.9755069738307545,
+      "eval_f1": 0.9476076338095303,
+      "eval_loss": 0.06893511861562729,
+      "eval_precision": 0.9417873864638041,
+      "eval_recall": 0.9544925493289708,
+      "eval_runtime": 115.4256,
+      "eval_samples_per_second": 172.579,
+      "eval_steps_per_second": 10.786,
+      "step": 12000
+    },
+    {
+      "epoch": 2.765029830197338,
+      "grad_norm": 0.24531525373458862,
+      "learning_rate": 4.075870844599968e-05,
+      "loss": 0.0617,
+      "step": 12050
+    },
+    {
+      "epoch": 2.776502983019734,
+      "grad_norm": 0.27357715368270874,
+      "learning_rate": 3.8770478765707014e-05,
+      "loss": 0.0673,
+      "step": 12100
+    },
+    {
+      "epoch": 2.7879761358421296,
+      "grad_norm": 0.28870201110839844,
+      "learning_rate": 3.6782249085414344e-05,
+      "loss": 0.0711,
+      "step": 12150
+    },
+    {
+      "epoch": 2.7994492886645252,
+      "grad_norm": 0.2304583042860031,
+      "learning_rate": 3.479401940512168e-05,
+      "loss": 0.0648,
+      "step": 12200
+    },
+    {
+      "epoch": 2.8109224414869205,
+      "grad_norm": 0.24561718106269836,
+      "learning_rate": 3.280578972482901e-05,
+      "loss": 0.07,
+      "step": 12250
+    },
+    {
+      "epoch": 2.822395594309316,
+      "grad_norm": 0.2914511263370514,
+      "learning_rate": 3.081756004453635e-05,
+      "loss": 0.0669,
+      "step": 12300
+    },
+    {
+      "epoch": 2.833868747131712,
+      "grad_norm": 0.1773723065853119,
+      "learning_rate": 2.8829330364243677e-05,
+      "loss": 0.0725,
+      "step": 12350
+    },
+    {
+      "epoch": 2.845341899954107,
+      "grad_norm": 0.23746098577976227,
+      "learning_rate": 2.684110068395101e-05,
+      "loss": 0.0675,
+      "step": 12400
+    },
+    {
+      "epoch": 2.856815052776503,
+      "grad_norm": 0.30285367369651794,
+      "learning_rate": 2.4852871003658343e-05,
+      "loss": 0.0643,
+      "step": 12450
+    },
+    {
+      "epoch": 2.8682882055988985,
+      "grad_norm": 0.25162991881370544,
+      "learning_rate": 2.2864641323365676e-05,
+      "loss": 0.0692,
+      "step": 12500
+    },
+    {
+      "epoch": 2.8682882055988985,
+      "eval_accuracy": 0.9753543561805701,
+      "eval_f1": 0.9467074104568793,
+      "eval_loss": 0.06883265823125839,
+      "eval_precision": 0.9384069122677029,
+      "eval_recall": 0.9563160157904436,
+      "eval_runtime": 118.2482,
+      "eval_samples_per_second": 168.459,
+      "eval_steps_per_second": 10.529,
+      "step": 12500
+    },
+    {
+      "epoch": 2.879761358421294,
+      "grad_norm": 0.22222265601158142,
+      "learning_rate": 2.087641164307301e-05,
+      "loss": 0.0671,
+      "step": 12550
+    },
+    {
+      "epoch": 2.89123451124369,
+      "grad_norm": 0.30242499709129333,
+      "learning_rate": 1.8888181962780343e-05,
+      "loss": 0.0678,
+      "step": 12600
+    },
+    {
+      "epoch": 2.902707664066085,
+      "grad_norm": 0.42658179998397827,
+      "learning_rate": 1.6899952282487673e-05,
+      "loss": 0.0693,
+      "step": 12650
+    },
+    {
+      "epoch": 2.914180816888481,
+      "grad_norm": 0.2844005823135376,
+      "learning_rate": 1.4911722602195006e-05,
+      "loss": 0.0665,
+      "step": 12700
+    },
+    {
+      "epoch": 2.9256539697108765,
+      "grad_norm": 0.18663829565048218,
+      "learning_rate": 1.2923492921902337e-05,
+      "loss": 0.0662,
+      "step": 12750
+    },
+    {
+      "epoch": 2.9371271225332722,
+      "grad_norm": 0.35708916187286377,
+      "learning_rate": 1.093526324160967e-05,
+      "loss": 0.0684,
+      "step": 12800
+    },
+    {
+      "epoch": 2.948600275355668,
+      "grad_norm": 0.2069770097732544,
+      "learning_rate": 8.947033561317004e-06,
+      "loss": 0.067,
+      "step": 12850
+    },
+    {
+      "epoch": 2.960073428178063,
+      "grad_norm": 0.27548667788505554,
+      "learning_rate": 6.958803881024336e-06,
+      "loss": 0.0663,
+      "step": 12900
+    },
+    {
+      "epoch": 2.971546581000459,
+      "grad_norm": 0.27818095684051514,
+      "learning_rate": 4.970574200731669e-06,
+      "loss": 0.0664,
+      "step": 12950
+    },
+    {
+      "epoch": 2.9830197338228546,
+      "grad_norm": 0.22553269565105438,
+      "learning_rate": 2.982344520439001e-06,
+      "loss": 0.0688,
+      "step": 13000
+    },
+    {
+      "epoch": 2.9830197338228546,
+      "eval_accuracy": 0.9755989657579363,
+      "eval_f1": 0.9475172260787175,
+      "eval_loss": 0.06789490580558777,
+      "eval_precision": 0.9400606792625981,
+      "eval_recall": 0.9562012408624608,
+      "eval_runtime": 114.6472,
+      "eval_samples_per_second": 173.75,
+      "eval_steps_per_second": 10.859,
+      "step": 13000
+    },
+    {
+      "epoch": 2.9944928866452503,
+      "grad_norm": 0.19997993111610413,
+      "learning_rate": 9.941148401463338e-07,
+      "loss": 0.0641,
+      "step": 13050
+    }
+  ],
+  "logging_steps": 50,
+  "max_steps": 13074,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 3,
+  "save_steps": 1000,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": true
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 3.138440975520091e+16,
+  "train_batch_size": 16,
+  "trial_name": null,
+  "trial_params": null
+}

checkpoint-13074/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:0dd103f98dee7758a7916a783307af9f65932119d66eedade7204a203817a6cc
+size 5841

training_args.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:0f778871ce8a609f6f785396c85ddcc7b142bad44db1595a8c5baec5144a16fa
-size 5777

 version https://git-lfs.github.com/spec/v1
+oid sha256:0dd103f98dee7758a7916a783307af9f65932119d66eedade7204a203817a6cc
+size 5841