diff --git "a/res.log" "b/res.log" new file mode 100644--- /dev/null +++ "b/res.log" @@ -0,0 +1,7011 @@ +[2025-01-07 10:54:56,776] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect) +[2025-01-07 10:54:56,776] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect) +[2025-01-07 10:54:56,776] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect) +[2025-01-07 10:54:56,776] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect) +[2025-01-07 10:54:56,776] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect) +[2025-01-07 10:54:56,776] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect) +[2025-01-07 10:54:56,776] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect) +[2025-01-07 10:54:56,777] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect) + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io: please install the libaio-dev package with apt [WARNING]  async_io: please install the libaio-dev package with apt + + [WARNING]  async_io: please install the libaio-dev package with apt + [WARNING]  async_io: please install the libaio-dev package with apt + [WARNING]  async_io: please install the libaio-dev package with apt + [WARNING]  async_io: please install the libaio-dev package with apt [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. + + [WARNING]  async_io: please install the libaio-dev package with apt [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. [WARNING]  async_io: please install the libaio-dev package with apt [WARNING]  Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH + + + + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. + + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. [WARNING]  Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH + + [WARNING]  Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH + [WARNING]  Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH + [WARNING]  Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. [WARNING]  Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH + + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. + [WARNING]  Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH + [WARNING]  Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH + [WARNING]  sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.4 [WARNING]  sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.4 + + [WARNING]  sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.4 + [WARNING]  using untested triton version (3.0.0), only 1.0.0 is known to be compatible + [WARNING]  sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.4 [WARNING]  sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.4 + + [WARNING]  using untested triton version (3.0.0), only 1.0.0 is known to be compatible + [WARNING]  using untested triton version (3.0.0), only 1.0.0 is known to be compatible + [WARNING]  sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.4 [WARNING]  sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.4 + + [WARNING]  using untested triton version (3.0.0), only 1.0.0 is known to be compatible + [WARNING]  using untested triton version (3.0.0), only 1.0.0 is known to be compatible + [WARNING]  sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.4 + [WARNING]  using untested triton version (3.0.0), only 1.0.0 is known to be compatible + [WARNING]  sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.4 + [WARNING]  using untested triton version (3.0.0), only 1.0.0 is known to be compatible +[2025-01-07 10:55:04,308] [INFO] [comm.py:637:init_distributed] cdb=None +[2025-01-07 10:55:04,309] [INFO] [comm.py:637:init_distributed] cdb=None +[2025-01-07 10:55:04,309] [INFO] [comm.py:637:init_distributed] cdb=None +[2025-01-07 10:55:04,310] [INFO] [comm.py:637:init_distributed] cdb=None +[2025-01-07 10:55:04,310] [INFO] [comm.py:637:init_distributed] cdb=None +[2025-01-07 10:55:04,310] [INFO] [comm.py:637:init_distributed] cdb=None +[2025-01-07 10:55:04,311] [INFO] [comm.py:637:init_distributed] cdb=None +[2025-01-07 10:55:04,312] [INFO] [comm.py:637:init_distributed] cdb=None +llm source: {'model_name_or_path': '/checkpoint_load/dt-pretrain-siglip-qwen2-1d5b-kdlm-all-t10/language_model', 'cache_dir': None, 'attn_implementation': 'flash_attention_2', 'torch_dtype': torch.bfloat16, 'pretrained_llm_path': '/checkpoint_load/dt-pretrain-siglip-qwen2-1d5b-kdlm-all-t10/language_model'} +vision_tower source: {'model_name_or_path': '/checkpoint_load/dt-pretrain-siglip-qwen2-1d5b-kdlm-all-t10/vision_tower', 'pretrained_vision_tower_path': '/checkpoint_load/dt-pretrain-siglip-qwen2-1d5b-kdlm-all-t10/vision_tower'} +connector source: {'connector_type': 'mlp2x_gelu', 'connector_hidden_size': 2560, 'connector_target_size': 896, 'pretrained_connector_path': '/checkpoint_load/dt-pretrain-siglip-qwen2-1d5b-kdlm-all-t10/connector'} +llm source: {'model_name_or_path': '/checkpoint_load/dt-pretrain-siglip-qwen2-1d5b-kdlm-all-t10/language_model', 'cache_dir': None, 'attn_implementation': 'flash_attention_2', 'torch_dtype': torch.bfloat16, 'pretrained_llm_path': '/checkpoint_load/dt-pretrain-siglip-qwen2-1d5b-kdlm-all-t10/language_model'} +vision_tower source: {'model_name_or_path': '/checkpoint_load/dt-pretrain-siglip-qwen2-1d5b-kdlm-all-t10/vision_tower', 'pretrained_vision_tower_path': '/checkpoint_load/dt-pretrain-siglip-qwen2-1d5b-kdlm-all-t10/vision_tower'} +connector source: {'connector_type': 'mlp2x_gelu', 'connector_hidden_size': 2560, 'connector_target_size': 896, 'pretrained_connector_path': '/checkpoint_load/dt-pretrain-siglip-qwen2-1d5b-kdlm-all-t10/connector'} +llm source: {'model_name_or_path': '/checkpoint_load/dt-pretrain-siglip-qwen2-1d5b-kdlm-all-t10/language_model', 'cache_dir': None, 'attn_implementation': 'flash_attention_2', 'torch_dtype': torch.bfloat16, 'pretrained_llm_path': '/checkpoint_load/dt-pretrain-siglip-qwen2-1d5b-kdlm-all-t10/language_model'} +vision_tower source: {'model_name_or_path': '/checkpoint_load/dt-pretrain-siglip-qwen2-1d5b-kdlm-all-t10/vision_tower', 'pretrained_vision_tower_path': '/checkpoint_load/dt-pretrain-siglip-qwen2-1d5b-kdlm-all-t10/vision_tower'} +connector source: {'connector_type': 'mlp2x_gelu', 'connector_hidden_size': 2560, 'connector_target_size': 896, 'pretrained_connector_path': '/checkpoint_load/dt-pretrain-siglip-qwen2-1d5b-kdlm-all-t10/connector'} +llm source: {'model_name_or_path': '/checkpoint_load/dt-pretrain-siglip-qwen2-1d5b-kdlm-all-t10/language_model', 'cache_dir': None, 'attn_implementation': 'flash_attention_2', 'torch_dtype': torch.bfloat16, 'pretrained_llm_path': '/checkpoint_load/dt-pretrain-siglip-qwen2-1d5b-kdlm-all-t10/language_model'} +vision_tower source: {'model_name_or_path': '/checkpoint_load/dt-pretrain-siglip-qwen2-1d5b-kdlm-all-t10/vision_tower', 'pretrained_vision_tower_path': '/checkpoint_load/dt-pretrain-siglip-qwen2-1d5b-kdlm-all-t10/vision_tower'} +connector source: {'connector_type': 'mlp2x_gelu', 'connector_hidden_size': 2560, 'connector_target_size': 896, 'pretrained_connector_path': '/checkpoint_load/dt-pretrain-siglip-qwen2-1d5b-kdlm-all-t10/connector'} +llm source: {'model_name_or_path': '/checkpoint_load/dt-pretrain-siglip-qwen2-1d5b-kdlm-all-t10/language_model', 'cache_dir': None, 'attn_implementation': 'flash_attention_2', 'torch_dtype': torch.bfloat16, 'pretrained_llm_path': '/checkpoint_load/dt-pretrain-siglip-qwen2-1d5b-kdlm-all-t10/language_model'}llm source: + vision_tower source: {'model_name_or_path': '/checkpoint_load/dt-pretrain-siglip-qwen2-1d5b-kdlm-all-t10/vision_tower', 'pretrained_vision_tower_path': '/checkpoint_load/dt-pretrain-siglip-qwen2-1d5b-kdlm-all-t10/vision_tower'} +connector source: {'model_name_or_path': '/checkpoint_load/dt-pretrain-siglip-qwen2-1d5b-kdlm-all-t10/language_model', 'cache_dir': None, 'attn_implementation': 'flash_attention_2', 'torch_dtype': torch.bfloat16, 'pretrained_llm_path': '/checkpoint_load/dt-pretrain-siglip-qwen2-1d5b-kdlm-all-t10/language_model'}{'connector_type': 'mlp2x_gelu', 'connector_hidden_size': 2560, 'connector_target_size': 896, 'pretrained_connector_path': '/checkpoint_load/dt-pretrain-siglip-qwen2-1d5b-kdlm-all-t10/connector'} + +vision_tower source: {'model_name_or_path': '/checkpoint_load/dt-pretrain-siglip-qwen2-1d5b-kdlm-all-t10/vision_tower', 'pretrained_vision_tower_path': '/checkpoint_load/dt-pretrain-siglip-qwen2-1d5b-kdlm-all-t10/vision_tower'} +connector source: {'connector_type': 'mlp2x_gelu', 'connector_hidden_size': 2560, 'connector_target_size': 896, 'pretrained_connector_path': '/checkpoint_load/dt-pretrain-siglip-qwen2-1d5b-kdlm-all-t10/connector'} +llm source: {'model_name_or_path': '/checkpoint_load/dt-pretrain-siglip-qwen2-1d5b-kdlm-all-t10/language_model', 'cache_dir': None, 'attn_implementation': 'flash_attention_2', 'torch_dtype': torch.bfloat16, 'pretrained_llm_path': '/checkpoint_load/dt-pretrain-siglip-qwen2-1d5b-kdlm-all-t10/language_model'} +vision_tower source: {'model_name_or_path': '/checkpoint_load/dt-pretrain-siglip-qwen2-1d5b-kdlm-all-t10/vision_tower', 'pretrained_vision_tower_path': '/checkpoint_load/dt-pretrain-siglip-qwen2-1d5b-kdlm-all-t10/vision_tower'} +connector source: {'connector_type': 'mlp2x_gelu', 'connector_hidden_size': 2560, 'connector_target_size': 896, 'pretrained_connector_path': '/checkpoint_load/dt-pretrain-siglip-qwen2-1d5b-kdlm-all-t10/connector'} +llm source: {'model_name_or_path': '/checkpoint_load/dt-pretrain-siglip-qwen2-1d5b-kdlm-all-t10/language_model', 'cache_dir': None, 'attn_implementation': 'flash_attention_2', 'torch_dtype': torch.bfloat16, 'pretrained_llm_path': '/checkpoint_load/dt-pretrain-siglip-qwen2-1d5b-kdlm-all-t10/language_model'} +vision_tower source: {'model_name_or_path': '/checkpoint_load/dt-pretrain-siglip-qwen2-1d5b-kdlm-all-t10/vision_tower', 'pretrained_vision_tower_path': '/checkpoint_load/dt-pretrain-siglip-qwen2-1d5b-kdlm-all-t10/vision_tower'} +connector source: {'connector_type': 'mlp2x_gelu', 'connector_hidden_size': 2560, 'connector_target_size': 896, 'pretrained_connector_path': '/checkpoint_load/dt-pretrain-siglip-qwen2-1d5b-kdlm-all-t10/connector'} +hello mlp +hello mlp +hello mlp +hello mlp +hello mlp +hello mlp +hello mlp +hello mlp +Loading vision tower from /checkpoint_load/dt-pretrain-siglip-qwen2-1d5b-kdlm-all-t10/vision_tower +Loading connector froLoading vision tower from /checkpoint_load/dt-pretrain-siglip-qwen2-1d5b-kdlm-all-t10/vision_tower +Loading connector from /checkpoint_load/dt-pretrain-siglip-qwen2-1d5b-kdlm-all-t10/connector/pytorch_model.bin... +Loading vision tower from /checkpoint_load/dt-pretrain-siglip-qwen2-1d5b-kdlm-all-t10/vision_tower +Loading vision tower from /checkpoint_load/dt-pretrain-siglip-qwen2-1d5b-kdlm-all-t10/vision_tower +Loading vision tower from /checkpoint_load/dt-pretrain-siglip-qwen2-1d5b-kdlm-all-t10/vision_tower +Loading connector from /checkpoint_load/dt-pretrain-siglip-qwen2-1d5b-kdlm-all-t10/connector/pytorch_model.bin... +Loading vision tower from /checkpoint_load/dt-pretrain-siglip-qwen2-1d5b-kdlm-all-t10/vision_tower +Loading connector from /checkpoint_load/dt-pretrain-siglip-qwen2-1d5b-kdlm-all-t10/connector/pytorch_model.bin... +Loading connector from /checkpoint_load/dt-pretrain-siglip-qwen2-1d5b-kdlm-all-t10/connector/pytorch_model.bin... +Loading vision tower from /checkpoint_load/dt-pretrain-siglip-qwen2-1d5b-kdlm-all-t10/vision_tower +Loading vision tower from /checkpoint_load/dt-pretrain-siglip-qwen2-1d5b-kdlm-all-t10/vision_tower +Loading vision tower from /checkpoint_load/dt-pretrain-siglip-qwen2-1d5b-kdlm-all-t10/vision_tower +Loading connector from /checkpoint_load/dt-pretrain-siglip-qwen2-1d5b-kdlm-all-t10/connector/pytorch_model.bin... +Loading connector from /checkpoint_load/dt-pretrain-siglip-qwen2-1d5b-kdlm-all-t10/connector/pytorch_model.bin... +Loading connector from /checkpoint_load/dt-pretrain-siglip-qwen2-1d5b-kdlm-all-t10/connector/pytorch_model.VM-0-111-tencentos:109:109 [5] NCCL INFO cudaDriverVersion 12040 +VM-0-111-tencentos:106:106 [2] NCCL INFO cudaDriverVersion 12040 +VM-0-111-tencentos:104:104 [0] NCCL INFO cudaDriverVersion 12040 +VM-0-111-tencentos:105:105 [1] NCCL INFO cudaDriverVersion 12040 +VM-0-111-tencentos:108:108 [4] NCCL INFO cudaDriverVersion 12040 +VM-0-111-tencentos:109:109 [5] NCCL INFO NCCL_SOCKET_IFNAME set by environment to bond0 +VM-0-111-tencentos:106:106 [2] NCCL INFO NCCL_SOCKET_IFNAME set by environment to bond0 +VM-0-111-tencentos:105:105 [1] NCCL INFO NCCL_SOCKET_IFNAME set by environment to bond0 +VM-0-111-tencentos:104:104 [0] NCCL INFO NCCL_SOCKET_IFNAME set by environment to bond0 +VM-0-111-tencentos:108:108 [4] NCCL INFO NCCL_SOCKET_IFNAME set by environment to bond0 +VM-0-111-tencentos:111:111 [7] NCCL INFO cudaDriverVersion 12040 +VM-0-111-tencentos:111:111 [7] NCCL INFO NCCL_SOCKET_IFNAME set by environment to bond0 +VM-0-111-tencentos:107:107 [3] NCCL INFO cudaDriverVersion 12040 +VM-0-111-tencentos:107:107 [3] NCCL INFO NCCL_SOCKET_IFNAME set by environment to bond0 +VM-0-111-tencentos:110:110 [6] NCCL INFO cudaDriverVersion 12040 +VM-0-111-tencentos:110:110 [6] NCCL INFO NCCL_SOCKET_IFNAME set by environment to bond0 +VM-0-111-tencentos:109:109 [5] NCCL INFO Bootstrap : Using bond0:30.7.33.2<0> +VM-0-111-tencentos:106:106 [2] NCCL INFO Bootstrap : Using bond0:30.7.33.2<0> +VM-0-111-tencentos:105:105 [1] NCCL INFO Bootstrap : Using bond0:30.7.33.2<0> +VM-0-111-tencentos:104:104 [0] NCCL INFO Bootstrap : Using bond0:30.7.33.2<0> +VM-0-111-tencentos:111:111 [7] NCCL INFO Bootstrap : Using bond0:30.7.33.2<0> +VM-0-111-tencentos:108:108 [4] NCCL INFO Bootstrap : Using bond0:30.7.33.2<0> +VM-0-111-tencentos:107:107 [3] NCCL INFO Bootstrap : Using bond0:30.7.33.2<0> +VM-0-111-tencentos:110:110 [6] NCCL INFO Bootstrap : Using bond0:30.7.33.2<0> +VM-0-111-tencentos:104:104 [0] NCCL INFO NCCL_MIN_CTAS set by environment to 4. +VM-0-111-tencentos:105:105 [1] NCCL INFO NCCL_MIN_CTAS set by environment to 4. +VM-0-111-tencentos:110:110 [6] NCCL INFO NCCL_MIN_CTAS set by environment to 4. +VM-0-111-tencentos:106:106 [2] NCCL INFO NCCL_MIN_CTAS set by environment to 4. +VM-0-111-tencentos:111:111 [7] NCCL INFO NCCL_MIN_CTAS set by environment to 4. +VM-0-111-tencentos:107:107 [3] NCCL INFO NCCL_MIN_CTAS set by environment to 4. +VM-0-111-tencentos:109:109 [5] NCCL INFO NCCL_MIN_CTAS set by environment to 4. +VM-0-111-tencentos:108:108 [4] NCCL INFO NCCL_MIN_CTAS set by environment to 4. +VM-0-111-tencentos:104:360 [0] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so +VM-0-111-tencentos:104:360 [0] NCCL INFO P2P plugin IBext_v8 +VM-0-111-tencentos:104:360 [0] NCCL INFO NCCL_SOCKET_IFNAME set by environment to bond0 +VM-0-111-tencentos:106:363 [2] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so +VM-0-111-tencentos:106:363 [2] NCCL INFO P2P plugin IBext_v8 +VM-0-111-tencentos:106:363 [2] NCCL INFO NCCL_SOCKET_IFNAME set by environment to bond0 +VM-0-111-tencentos:108:367 [4] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so +VM-0-111-tencentos:108:367 [4] NCCL INFO P2P plugin IBext_v8 +VM-0-111-tencentos:108:367 [4] NCCL INFO NCCL_SOCKET_IFNAME set by environment to bond0 +VM-0-111-tencentos:109:366 [5] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so +VM-0-111-tencentos:109:366 [5] NCCL INFO P2P plugin IBext_v8 +VM-0-111-tencentos:109:366 [5] NCCL INFO NCCL_SOCKET_IFNAME set by environment to bond0 +VM-0-111-tencentos:110:362 [6] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so +VM-0-111-tencentos:110:362 [6] NCCL INFO P2P plugin IBext_v8 +VM-0-111-tencentos:110:362 [6] NCCL INFO NCCL_SOCKET_IFNAME set by environment to bond0 +VM-0-111-tencentos:105:361 [1] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so +VM-0-111-tencentos:105:361 [1] NCCL INFO P2P plugin IBext_v8 +VM-0-111-tencentos:105:361 [1] NCCL INFO NCCL_SOCKET_IFNAME set by environment to bond0 +VM-0-111-tencentos:111:364 [7] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so +VM-0-111-tencentos:111:364 [7] NCCL INFO P2P plugin IBext_v8 +VM-0-111-tencentos:111:364 [7] NCCL INFO NCCL_SOCKET_IFNAME set by environment to bond0 +VM-0-111-tencentos:107:365 [3] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so +VM-0-111-tencentos:107:365 [3] NCCL INFO P2P plugin IBext_v8 +VM-0-111-tencentos:107:365 [3] NCCL INFO NCCL_SOCKET_IFNAME set by environment to bond0 +VM-0-111-tencentos:104:360 [0] NCCL INFO NET/IB : Using [0]mlx5_bond_0:1/RoCE [1]mlx5_bond_1:1/RoCE [2]mlx5_bond_2:1/RoCE [3]mlx5_bond_3:1/RoCE [RO]; OOB bond0:30.7.33.2<0> +VM-0-111-tencentos:104:360 [0] NCCL INFO Using non-device net plugin version 0 +VM-0-111-tencentos:104:360 [0] NCCL INFO Using network IBext_v8 +VM-0-111-tencentos:106:363 [2] NCCL INFO NET/IB : Using [0]mlx5_bond_0:1/RoCE [1]mlx5_bond_1:1/RoCE [2]mlx5_bond_2:1/RoCE [3]mlx5_bond_3:1/RoCE [RO]; OOB bond0:30.7.33.2<0> +VM-0-111-tencentos:109:366 [5] NCCL INFO NET/IB : Using [0]mlx5_bond_0:1/RoCE [1]mlx5_bond_1:1/RoCE [2]mlx5_bond_2:1/RoCE [3]mlx5_bond_3:1/RoCE [RO]; OOB bond0:30.7.33.2<0> +VM-0-111-tencentos:108:367 [4] NCCL INFO NET/IB : Using [0]mlx5_bond_0:1/RoCE [1]mlx5_bond_1:1/RoCE [2]mlx5_bond_2:1/RoCE [3]mlx5_bond_3:1/RoCE [RO]; OOB bond0:30.7.33.2<0> +VM-0-111-tencentos:110:362 [6] NCCL INFO NET/IB : Using [0]mlx5_bond_0:1/RoCE [1]mlx5_bond_1:1/RoCE [2]mlx5_bond_2:1/RoCE [3]mlx5_bond_3:1/RoCE [RO]; OOB bond0:30.7.33.2<0> +VM-0-111-tencentos:106:363 [2] NCCL INFO Using non-device net plugin version 0 +VM-0-111-tencentos:107:365 [3] NCCL INFO NET/IB : Using [0]mlx5_bond_0:1/RoCE [1]mlx5_bond_1:1/RoCE [2]mlx5_bond_2:1/RoCE [3]mlx5_bond_3:1/RoCE [RO]; OOB bond0:30.7.33.2<0> +VM-0-111-tencentos:105:361 [1] NCCL INFO NET/IB : Using [0]mlx5_bond_0:1/RoCE [1]mlx5_bond_1:1/RoCE [2]mlx5_bond_2:1/RoCE [3]mlx5_bond_3:1/RoCE [RO]; OOB bond0:30.7.33.2<0> +VM-0-111-tencentos:111:364 [7] NCCL INFO NET/IB : Using [0]mlx5_bond_0:1/RoCE [1]mlx5_bond_1:1/RoCE [2]mlx5_bond_2:1/RoCE [3]mlx5_bond_3:1/RoCE [RO]; OOB bond0:30.7.33.2<0> +VM-0-111-tencentos:106:363 [2] NCCL INFO Using network IBext_v8 +VM-0-111-tencentos:109:366 [5] NCCL INFO Using non-device net plugin version 0 +VM-0-111-tencentos:108:367 [4] NCCL INFO Using non-device net plugin version 0 +VM-0-111-tencentos:110:362 [6] NCCL INFO Using non-device net plugin version 0 +VM-0-111-tencentos:107:365 [3] NCCL INFO Using non-device net plugin version 0 +VM-0-111-tencentos:105:361 [1] NCCL INFO Using non-device net plugin version 0 +VM-0-111-tencentos:109:366 [5] NCCL INFO Using network IBext_v8 +VM-0-111-tencentos:108:367 [4] NCCL INFO Using network IBext_v8 +VM-0-111-tencentos:110:362 [6] NCCL INFO Using network IBext_v8 +VM-0-111-tencentos:107:365 [3] NCCL INFO Using network IBext_v8 +VM-0-111-tencentos:105:361 [1] NCCL INFO Using network IBext_v8 +VM-0-111-tencentos:111:364 [7] NCCL INFO Using non-device net plugin version 0 +VM-0-111-tencentos:111:364 [7] NCCL INFO Using network IBext_v8 +VM-0-111-tencentos:108:367 [4] NCCL INFO comm 0x19feab40 rank 12 nranks 16 cudaDev 4 nvmlDev 4 busId 83000 commId 0x8ba02a329d6db9c8 - Init START +VM-0-111-tencentos:111:364 [7] NCCL INFO comm 0x1d16df00 rank 15 nranks 16 cudaDev 7 nvmlDev 7 busId c4000 commId 0x8ba02a329d6db9c8 - Init START +VM-0-111-tencentos:109:366 [5] NCCL INFO comm 0x1d646040 rank 13 nranks 16 cudaDev 5 nvmlDev 5 busId 84000 commId 0x8ba02a329d6db9c8 - Init START +VM-0-111-tencentos:106:363 [2] NCCL INFO comm 0x34d59180 rank 10 nranks 16 cudaDev 2 nvmlDev 2 busId 43000 commId 0x8ba02a329d6db9c8 - Init START +VM-0-111-tencentos:107:365 [3] NCCL INFO comm 0x6ac0b600 rank 11 nranks 16 cudaDev 3 nvmlDev 3 busId 44000 commId 0x8ba02a329d6db9c8 - Init START +VM-0-111-tencentos:105:361 [1] NCCL INFO comm 0x29e4ba00 rank 9 nranks 16 cudaDev 1 nvmlDev 1 busId 24000 commId 0x8ba02a329d6db9c8 - Init START +VM-0-111-tencentos:110:362 [6] NCCL INFO comm 0x34bdd640 rank 14 nranks 16 cudaDev 6 nvmlDev 6 busId c3000 commId 0x8ba02a329d6db9c8 - Init START +VM-0-111-tencentos:104:360 [0] NCCL INFO comm 0x3ee89c20 rank 8 nranks 16 cudaDev 0 nvmlDev 0 busId 23000 commId 0x8ba02a329d6db9c8 - Init START +VM-0-111-tencentos:109:366 [5] NCCL INFO NCCL_PXN_DISABLE set by environment to 0. +VM-0-111-tencentos:109:366 [5] NCCL INFO Setting affinity for GPU 5 to ff,ffffffff,ffffffff,ffffffff,fff00000,00000000,00000000,00000000 +VM-0-111-tencentos:109:366 [5] NCCL INFO NCCL_NVLS_ENABLE set by environment to 0. +VM-0-111-tencentos:106:363 [2] NCCL INFO NCCL_PXN_DISABLE set by environment to 0. +VM-0-111-tencentos:106:363 [2] NCCL INFO Setting affinity for GPU 2 to 0fffff,ffffffff,ffffffff,ffffffff +VM-0-111-tencentos:106:363 [2] NCCL INFO NCCL_NVLS_ENABLE set by environment to 0. +VM-0-111-tencentos:108:367 [4] NCCL INFO NCCL_PXN_DISABLE set by environment to 0. +VM-0-111-tencentos:108:367 [4] NCCL INFO Setting affinity for GPU 4 to ff,ffffffff,ffffffff,ffffffff,fff00000,00000000,00000000,00000000 +VM-0-111-tencentos:108:367 [4] NCCL INFO NCCL_NVLS_ENABLE set by environment to 0. +VM-0-111-tencentos:104:360 [0] NCCL INFO NCCL_PXN_DISABLE set by environment to 0. +VM-0-111-tencentos:104:360 [0] NCCL INFO Setting affinity for GPU 0 to 0fffff,ffffffff,ffffffff,ffffffff +VM-0-111-tencentos:104:360 [0] NCCL INFO NCCL_NVLS_ENABLE set by environment to 0. +VM-0-111-tencentos:107:365 [3] NCCL INFO NCCL_PXN_DISABLE set by environment to 0. +VM-0-111-tencentos:107:365 [3] NCCL INFO Setting affinity for GPU 3 to 0fffff,ffffffff,ffffffff,ffffffff +VM-0-111-tencentos:111:364 [7] NCCL INFO NCCL_PXN_DISABLE set by environment to 0. +VM-0-111-tencentos:107:365 [3] NCCL INFO NCCL_NVLS_ENABLE set by environment to 0. +VM-0-111-tencentos:111:364 [7] NCCL INFO Setting affinity for GPU 7 to ff,ffffffff,ffffffff,ffffffff,fff00000,00000000,00000000,00000000 +VM-0-111-tencentos:111:364 [7] NCCL INFO NCCL_NVLS_ENABLE set by environment to 0. +VM-0-111-tencentos:110:362 [6] NCCL INFO NCCL_PXN_DISABLE set by environment to 0. +VM-0-111-tencentos:110:362 [6] NCCL INFO Setting affinity for GPU 6 to ff,ffffffff,ffffffff,ffffffff,fff00000,00000000,00000000,00000000 +VM-0-111-tencentos:110:362 [6] NCCL INFO NCCL_NVLS_ENABLE set by environment to 0. +VM-0-111-tencentos:105:361 [1] NCCL INFO NCCL_PXN_DISABLE set by environment to 0. +VM-0-111-tencentos:105:361 [1] NCCL INFO Setting affinity for GPU 1 to 0fffff,ffffffff,ffffffff,ffffffff +VM-0-111-tencentos:105:361 [1] NCCL INFO NCCL_NVLS_ENABLE set by environment to 0. +VM-0-111-tencentos:111:364 [7] NCCL INFO comm 0x1d16df00 rank 15 nRanks 16 nNodes 2 localRanks 8 localRank 7 MNNVL 0 +VM-0-111-tencentos:109:366 [5] NCCL INFO comm 0x1d646040 rank 13 nRanks 16 nNodes 2 localRanks 8 localRank 5 MNNVL 0 +VM-0-111-tencentos:111:364 [7] NCCL INFO Trees [0] -1/-1/-1->15->14 [1] 8/-1/-1->15->14 [2] 8/-1/-1->15->14 [3] 8/-1/-1->15->14 [4] -1/-1/-1->15->14 [5] 8/-1/-1->15->14 [6] 8/-1/-1->15->14 [7] 8/-1/-1->15->14 +VM-0-111-tencentos:111:364 [7] NCCL INFO P2P Chunksize set to 131072 +VM-0-111-tencentos:106:363 [2] NCCL INFO comm 0x34d59180 rank 10 nRanks 16 nNodes 2 localRanks 8 localRank 2 MNNVL 0 +VM-0-111-tencentos:105:361 [1] NCCL INFO comm 0x29e4ba00 rank 9 nRanks 16 nNodes 2 localRanks 8 localRank 1 MNNVL 0 +VM-0-111-tencentos:109:366 [5] NCCL INFO Trees [0] 14/-1/-1->13->12 [1] 14/-1/-1->13->12 [2] 14/-1/-1->13->12 [3] -1/-1/-1->13->12 [4] 14/-1/-1->13->12 [5] 14/-1/-1->13->12 [6] 14/-1/-1->13->12 [7] -1/-1/-1->13->12 +VM-0-111-tencentos:104:360 [0] NCCL INFO comm 0x3ee89c20 rank 8 nRanks 16 nNodes 2 localRanks 8 localRank 0 MNNVL 0 +VM-0-111-tencentos:110:362 [6] NCCL INFO comm 0x34bdd640 rank 14 nRanks 16 nNodes 2 localRanks 8 localRank 6 MNNVL 0 +VM-0-111-tencentos:109:366 [5] NCCL INFO P2P Chunksize set to 131072 +VM-0-111-tencentos:108:367 [4] NCCL INFO comm 0x19feab40 rank 12 nRanks 16 nNodes 2 localRanks 8 localRank 4 MNNVL 0 +VM-0-111-tencentos:107:365 [3] NCCL INFO comm 0x6ac0b600 rank 11 nRanks 16 nNodes 2 localRanks 8 localRank 3 MNNVL 0 +VM-0-111-tencentos:106:363 [2] NCCL INFO Trees [0] 11/-1/-1->10->9 [1] 11/-1/-1->10->2 [2] 11/-1/-1->10->9 [3] 11/-1/-1->10->9 [4] 11/-1/-1->10->9 [5] 11/2/-1->10->-1 [6] 11/-1/-1->10->9 [7] 11/-1/-1->10->9 +VM-0-111-tencentos:105:361 [1] NCCL INFO Trees [0] 10/-1/-1->9->8 [1] -1/-1/-1->9->8 [2] 10/-1/-1->9->8 [3] 10/-1/-1->9->8 [4] 10/-1/-1->9->8 [5] -1/-1/-1->9->8 [6] 10/-1/-1->9->8 [7] 10/-1/-1->9->8 +VM-0-111-tencentos:104:360 [0] NCCL INFO Trees [0] 9/-1/-1->8->0 [1] 9/-1/-1->8->15 [2] 9/-1/-1->8->15 [3] 9/-1/-1->8->15 [4] 9/0/-1->8->-1 [5] 9/-1/-1->8->15 [6] 9/-1/-1->8->15 [7] 9/-1/-1->8->15 +VM-0-111-tencentos:110:362 [6] NCCL INFO Trees [0] 15/-1/-1->14->13 [1] 15/-1/-1->14->13 [2] 15/-1/-1->14->13 [3] 15/-1/-1->14->6 [4] 15/-1/-1->14->13 [5] 15/-1/-1->14->13 [6] 15/-1/-1->14->13 [7] 15/6/-1->14->-1 +VM-0-111-tencentos:106:363 [2] NCCL INFO P2P Chunksize set to 131072 +VM-0-111-tencentos:105:361 [1] NCCL INFO P2P Chunksize set to 131072 +VM-0-111-tencentos:108:367 [4] NCCL INFO Trees [0] 13/-1/-1->12->11 [1] 13/-1/-1->12->11 [2] 13/-1/-1->12->4 [3] 13/-1/-1->12->11 [4] 13/-1/-1->12->11 [5] 13/-1/-1->12->11 [6] 13/4/-1->12->-1 [7] 13/-1/-1->12->11 +VM-0-111-tencentos:104:360 [0] NCCL INFO P2P Chunksize set to 131072 +VM-0-111-tencentos:110:362 [6] NCCL INFO P2P Chunksize set to 131072 +VM-0-111-tencentos:108:367 [4] NCCL INFO P2P Chunksize set to 131072 +VM-0-111-tencentos:107:365 [3] NCCL INFO Trees [0] 12/-1/-1->11->10 [1] 12/-1/-1->11->10 [2] -1/-1/-1->11->10 [3] 12/-1/-1->11->10 [4] 12/-1/-1->11->10 [5] 12/-1/-1->11->10 [6] -1/-1/-1->11->10 [7] 12/-1/-1->11->10 +VM-0-111-tencentos:107:365 [3] NCCL INFO P2P Chunksize set to 131072 +VM-0-111-tencentos:109:366 [5] NCCL INFO Channel 02/0 : 13[5] -> 4[4] [send] via NET/IBext_v8/2 +VM-0-111-tencentos:109:366 [5] NCCL INFO Channel 06/0 : 13[5] -> 4[4] [send] via NET/IBext_v8/2 +VM-0-111-tencentos:106:363 [2] NCCL INFO Channel 01/0 : 3[3] -> 10[2] [receive] via NET/IBext_v8/1 +VM-0-111-tencentos:104:360 [0] NCCL INFO Channel 00/0 : 1[1] -> 8[0] [receive] via NET/IBext_v8/0 +VM-0-111-tencentos:104:360 [0] NCCL INFO Channel 04/0 : 1[1] -> 8[0] [receive] via NET/IBext_v8/0 +VM-0-111-tencentos:106:363 [2] NCCL INFO Channel 05/0 : 3[3] -> 10[2] [receive] via NET/IBext_v8/1 +VM-0-111-tencentos:104:360 [0] NCCL INFO Channel 00/0 : 8[0] -> 15[7] via P2P/CUMEM/read +VM-0-111-tencentos:105:361 [1] NCCL INFO Channel 00/0 : 9[1] -> 0[0] [send] via NET/IBext_v8/0 +VM-0-111-tencentos:105:361 [1] NCCL INFO Channel 04/0 : 9[1] -> 0[0] [send] via NET/IBext_v8/0 +VM-0-111-tencentos:107:365 [3] NCCL INFO Channel 01/0 : 11[3] -> 2[2] [send] via NET/IBext_v8/1 +VM-0-111-tencentos:110:362 [6] NCCL INFO Channel 03/0 : 7[7] -> 14[6] [receive] via NET/IBext_v8/3 +VM-0-111-tencentos:107:365 [3] NCCL INFO Channel 05/0 : 11[3] -> 2[2] [send] via NET/IBext_v8/1 +VM-0-111-tencentos:110:362 [6] NCCL INFO Channel 07/0 : 7[7] -> 14[6] [receive] via NET/IBext_v8/3 +VM-0-111-tencentos:108:367 [4] NCCL INFO Channel 02/0 : 5[5] -> 12[4] [receive] via NET/IBext_v8/2 +VM-0-111-tencentos:104:360 [0] NCCL INFO Channel 01/0 : 8[0] -> 15[7] via P2P/CUMEM/read +VM-0-111-tencentos:108:367 [4] NCCL INFO Channel 06/0 : 5[5] -> 12[4] [receive] via NET/IBext_v8/2 +VM-0-111-tencentos:104:360 [0] NCCL INFO Channel 02/0 : 8[0] -> 15[7] via P2P/CUMEM/read +VM-0-111-tencentos:111:364 [7] NCCL INFO Channel 03/0 : 15[7] -> 6[6] [send] via NET/IBext_v8/3 +VM-0-111-tencentos:104:360 [0] NCCL INFO Channel 03/0 : 8[0] -> 15[7] via P2P/CUMEM/read +VM-0-111-tencentos:111:364 [7] NCCL INFO Channel 07/0 : 15[7] -> 6[6] [send] via NET/IBext_v8/3 +VM-0-111-tencentos:104:360 [0] NCCL INFO Channel 04/0 : 8[0] -> 15[7] via P2P/CUMEM/read +VM-0-111-tencentos:104:360 [0] NCCL INFO Channel 05/0 : 8[0] -> 15[7] via P2P/CUMEM/read +VM-0-111-tencentos:104:360 [0] NCCL INFO Channel 06/0 : 8[0] -> 15[7] via P2P/CUMEM/read +VM-0-111-tencentos:106:363 [2] NCCL INFO Channel 00/0 : 10[2] -> 9[1] via P2P/CUMEM/read +VM-0-111-tencentos:104:360 [0] NCCL INFO Channel 07/0 : 8[0] -> 15[7] via P2P/CUMEM/read +VM-0-111-tencentos:106:363 [2] NCCL INFO Channel 01/0 : 10[2] -> 9[1] via P2P/CUMEM/read +VM-0-111-tencentos:106:363 [2] NCCL INFO Channel 02/0 : 10[2] -> 9[1] via P2P/CUMEM/read +VM-0-111-tencentos:111:364 [7] NCCL INFO Channel 00/0 : 15[7] -> 14[6] via P2P/CUMEM/read +VM-0-111-tencentos:106:363 [2] NCCL INFO Channel 03/0 : 10[2] -> 9[1] via P2P/CUMEM/read +VM-0-111-tencentos:111:364 [7] NCCL INFO Channel 01/0 : 15[7] -> 14[6] via P2P/CUMEM/read +VM-0-111-tencentos:106:363 [2] NCCL INFO Channel 04/0 : 10[2] -> 9[1] via P2P/CUMEM/read +VM-0-111-tencentos:111:364 [7] NCCL INFO Channel 02/0 : 15[7] -> 14[6] via P2P/CUMEM/read +VM-0-111-tencentos:106:363 [2] NCCL INFO Channel 05/0 : 10[2] -> 9[1] via P2P/CUMEM/read +VM-0-111-tencentos:111:364 [7] NCCL INFO Channel 04/0 : 15[7] -> 14[6] via P2P/CUMEM/read +VM-0-111-tencentos:106:363 [2] NCCL INFO Channel 06/0 : 10[2] -> 9[1] via P2P/CUMEM/read +VM-0-111-tencentos:111:364 [7] NCCL INFO Channel 05/0 : 15[7] -> 14[6] via P2P/CUMEM/read +VM-0-111-tencentos:106:363 [2] NCCL INFO Channel 07/0 : 10[2] -> 9[1] via P2P/CUMEM/read +VM-0-111-tencentos:111:364 [7] NCCL INFO Channel 06/0 : 15[7] -> 14[6] via P2P/CUMEM/read +VM-0-111-tencentos:108:367 [4] NCCL INFO Channel 00/0 : 12[4] -> 11[3] via P2P/CUMEM/read +VM-0-111-tencentos:108:367 [4] NCCL INFO Channel 01/0 : 12[4] -> 11[3] via P2P/CUMEM/read +VM-0-111-tencentos:109:366 [5] NCCL INFO Channel 00/0 : 13[5] -> 12[4] via P2P/CUMEM/read +VM-0-111-tencentos:105:361 [1] NCCL INFO Channel 01/0 : 9[1] -> 8[0] via P2P/CUMEM/read +VM-0-111-tencentos:107:365 [3] NCCL INFO Channel 00/0 : 11[3] -> 10[2] via P2P/CUMEM/read +VM-0-111-tencentos:108:367 [4] NCCL INFO Channel 02/0 : 12[4] -> 11[3] via P2P/CUMEM/read +VM-0-111-tencentos:109:366 [5] NCCL INFO Channel 01/0 : 13[5] -> 12[4] via P2P/CUMEM/read +VM-0-111-tencentos:105:361 [1] NCCL INFO Channel 02/0 : 9[1] -> 8[0] via P2P/CUMEM/read +VM-0-111-tencentos:107:365 [3] NCCL INFO Channel 02/0 : 11[3] -> 10[2] via P2P/CUMEM/read +VM-0-111-tencentos:108:367 [4] NCCL INFO Channel 03/0 : 12[4] -> 11[3] via P2P/CUMEM/read +VM-0-111-tencentos:109:366 [5] NCCL INFO Channel 03/0 : 13[5] -> 12[4] via P2P/CUMEM/read +VM-0-111-tencentos:105:361 [1] NCCL INFO Channel 03/0 : 9[1] -> 8[0] via P2P/CUMEM/read +VM-0-111-tencentos:107:365 [3] NCCL INFO Channel 03/0 : 11[3] -> 10[2] via P2P/CUMEM/read +VM-0-111-tencentos:108:367 [4] NCCL INFO Channel 04/0 : 12[4] -> 11[3] via P2P/CUMEM/read +VM-0-111-tencentos:109:366 [5] NCCL INFO Channel 04/0 : 13[5] -> 12[4] via P2P/CUMEM/read +VM-0-111-tencentos:105:361 [1] NCCL INFO Channel 05/0 : 9[1] -> 8[0] via P2P/CUMEM/read +VM-0-111-tencentos:107:365 [3] NCCL INFO Channel 04/0 : 11[3] -> 10[2] via P2P/CUMEM/read +VM-0-111-tencentos:109:366 [5] NCCL INFO Channel 05/0 : 13[5] -> 12[4] via P2P/CUMEM/read +VM-0-111-tencentos:108:367 [4] NCCL INFO Channel 05/0 : 12[4] -> 11[3] via P2P/CUMEM/read +VM-0-111-tencentos:105:361 [1] NCCL INFO Channel 06/0 : 9[1] -> 8[0] via P2P/CUMEM/read +VM-0-111-tencentos:107:365 [3] NCCL INFO Channel 06/0 : 11[3] -> 10[2] via P2P/CUMEM/read +VM-0-111-tencentos:109:366 [5] NCCL INFO Channel 07/0 : 13[5] -> 12[4] via P2P/CUMEM/read +VM-0-111-tencentos:108:367 [4] NCCL INFO Channel 06/0 : 12[4] -> 11[3] via P2P/CUMEM/read +VM-0-111-tencentos:105:361 [1] NCCL INFO Channel 07/0 : 9[1] -> 8[0] via P2P/CUMEM/read +VM-0-111-tencentos:107:365 [3] NCCL INFO Channel 07/0 : 11[3] -> 10[2] via P2P/CUMEM/read +VM-0-111-tencentos:108:367 [4] NCCL INFO Channel 07/0 : 12[4] -> 11[3] via P2P/CUMEM/read +VM-0-111-tencentos:105:406 [1] NCCL INFO NCCL_IB_QPS_PER_CONNECTION set by environment to 8. +VM-0-111-tencentos:107:412 [3] NCCL INFO NCCL_IB_QPS_PER_CONNECTION set by environment to 8. +VM-0-111-tencentos:110:362 [6] NCCL INFO Channel 00/0 : 14[6] -> 13[5] via P2P/CUMEM/read +VM-0-111-tencentos:110:362 [6] NCCL INFO Channel 01/0 : 14[6] -> 13[5] via P2P/CUMEM/read +VM-0-111-tencentos:105:406 [1] NCCL INFO NCCL_IB_GID_INDEX set by environment to 3. +VM-0-111-tencentos:110:362 [6] NCCL INFO Channel 02/0 : 14[6] -> 13[5] via P2P/CUMEM/read +VM-0-111-tencentos:107:412 [3] NCCL INFO NCCL_IB_GID_INDEX set by environment to 3. +VM-0-111-tencentos:110:362 [6] NCCL INFO Channel 03/0 : 14[6] -> 13[5] via P2P/CUMEM/read +VM-0-111-tencentos:110:362 [6] NCCL INFO Channel 04/0 : 14[6] -> 13[5] via P2P/CUMEM/read +VM-0-111-tencentos:110:362 [6] NCCL INFO Channel 05/0 : 14[6] -> 13[5] via P2P/CUMEM/read +VM-0-111-tencentos:110:362 [6] NCCL INFO Channel 06/0 : 14[6] -> 13[5] via P2P/CUMEM/read +VM-0-111-tencentos:110:362 [6] NCCL INFO Channel 07/0 : 14[6] -> 13[5] via P2P/CUMEM/read +VM-0-111-tencentos:104:404 [0] NCCL INFO NCCL_IB_GID_INDEX set by environment to 3. +VM-0-111-tencentos:104:404 [0] NCCL INFO NCCL_IB_QPS_PER_CONNECTION set by environment to 8. +VM-0-111-tencentos:106:403 [2] NCCL INFO NCCL_IB_GID_INDEX set by environment to 3. +VM-0-111-tencentos:106:403 [2] NCCL INFO NCCL_IB_QPS_PER_CONNECTION set by environment to 8. +VM-0-111-tencentos:108:411 [4] NCCL INFO NCCL_IB_GID_INDEX set by environment to 3. +VM-0-111-tencentos:108:411 [4] NCCL INFO NCCL_IB_QPS_PER_CONNECTION set by environment to 8. +VM-0-111-tencentos:109:401 [5] NCCL INFO NCCL_IB_QPS_PER_CONNECTION set by environment to 8. +VM-0-111-tencentos:107:412 [3] NCCL INFO NCCL_IB_SL set by environment to 5. +VM-0-111-tencentos:107:412 [3] NCCL INFO NCCL_IB_TC set by environment to 136. +VM-0-111-tencentos:107:412 [3] NCCL INFO NCCL_IB_TIMEOUT set by environment to 22. +VM-0-111-tencentos:107:412 [3] NCCL INFO NCCL_IB_RETRY_CNT set by environment to 13. +VM-0-111-tencentos:104:404 [0] NCCL INFO NCCL_IB_SL set by environment to 5. +VM-0-111-tencentos:104:404 [0] NCCL INFO NCCL_IB_TC set by environment to 136. +VM-0-111-tencentos:104:404 [0] NCCL INFO NCCL_IB_TIMEOUT set by environment to 22. +VM-0-111-tencentos:104:404 [0] NCCL INFO NCCL_IB_RETRY_CNT set by environment to 13. +VM-0-111-tencentos:106:403 [2] NCCL INFO NCCL_IB_SL set by environment to 5. +VM-0-111-tencentos:106:403 [2] NCCL INFO NCCL_IB_TC set by environment to 136. +VM-0-111-tencentos:106:403 [2] NCCL INFO NCCL_IB_TIMEOUT set by environment to 22. +VM-0-111-tencentos:106:403 [2] NCCL INFO NCCL_IB_RETRY_CNT set by environment to 13. +VM-0-111-tencentos:108:411 [4] NCCL INFO NCCL_IB_SL set by environment to 5. +VM-0-111-tencentos:108:411 [4] NCCL INFO NCCL_IB_TC set by environment to 136. +VM-0-111-tencentos:108:411 [4] NCCL INFO NCCL_IB_TIMEOUT set by environment to 22. +VM-0-111-tencentos:108:411 [4] NCCL INFO NCCL_IB_RETRY_CNT set by environment to 13. +VM-0-111-tencentos:109:401 [5] NCCL INFO NCCL_IB_GID_INDEX set by environment to 3. +VM-0-111-tencentos:105:406 [1] NCCL INFO NCCL_IB_SL set by environment to 5. +VM-0-111-tencentos:105:406 [1] NCCL INFO NCCL_IB_TC set by environment to 136. +VM-0-111-tencentos:105:406 [1] NCCL INFO NCCL_IB_TIMEOUT set by environment to 22. +VM-0-111-tencentos:105:406 [1] NCCL INFO NCCL_IB_RETRY_CNT set by environment to 13. +VM-0-111-tencentos:109:401 [5] NCCL INFO NCCL_IB_SL set by environment to 5. +VM-0-111-tencentos:109:401 [5] NCCL INFO NCCL_IB_TC set by environment to 136. +VM-0-111-tencentos:111:400 [7] NCCL INFO NCCL_IB_QPS_PER_CONNECTION set by environment to 8. +VM-0-111-tencentos:109:401 [5] NCCL INFO NCCL_IB_TIMEOUT set by environment to 22. +VM-0-111-tencentos:109:401 [5] NCCL INFO NCCL_IB_RETRY_CNT set by environment to 13. +VM-0-111-tencentos:110:409 [6] NCCL INFO NCCL_IB_GID_INDEX set by environment to 3. +VM-0-111-tencentos:110:409 [6] NCCL INFO NCCL_IB_QPS_PER_CONNECTION set by environment to 8. +VM-0-111-tencentos:111:400 [7] NCCL INFO NCCL_IB_GID_INDEX set by environment to 3. +VM-0-111-tencentos:110:409 [6] NCCL INFO NCCL_IB_SL set by environment to 5. +VM-0-111-tencentos:110:409 [6] NCCL INFO NCCL_IB_TC set by environment to 136. +VM-0-111-tencentos:110:409 [6] NCCL INFO NCCL_IB_TIMEOUT set by environment to 22. +VM-0-111-tencentos:110:409 [6] NCCL INFO NCCL_IB_RETRY_CNT set by environment to 13. +VM-0-111-tencentos:111:400 [7] NCCL INFO NCCL_IB_SL set by environment to 5. +VM-0-111-tencentos:111:400 [7] NCCL INFO NCCL_IB_TC set by environment to 136. +VM-0-111-tencentos:111:400 [7] NCCL INFO NCCL_IB_TIMEOUT set by environment to 22. +VM-0-111-tencentos:111:400 [7] NCCL INFO NCCL_IB_RETRY_CNT set by environment to 13. +VM-0-111-tencentos:104:360 [0] NCCL INFO Connected all rings +VM-0-111-tencentos:104:360 [0] NCCL INFO Channel 00/0 : 8[0] -> 9[1] via P2P/CUMEM/read +VM-0-111-tencentos:106:363 [2] NCCL INFO Connected all rings +VM-0-111-tencentos:105:361 [1] NCCL INFO Connected all rings +VM-0-111-tencentos:104:360 [0] NCCL INFO Channel 01/0 : 8[0] -> 9[1] via P2P/CUMEM/read +VM-0-111-tencentos:104:360 [0] NCCL INFO Channel 02/0 : 8[0] -> 9[1] via P2P/CUMEM/read +VM-0-111-tencentos:104:360 [0] NCCL INFO Channel 03/0 : 8[0] -> 9[1] via P2P/CUMEM/read +VM-0-111-tencentos:104:360 [0] NCCL INFO Channel 04/0 : 8[0] -> 9[1] via P2P/CUMEM/read +VM-0-111-tencentos:104:360 [0] NCCL INFO Channel 05/0 : 8[0] -> 9[1] via P2P/CUMEM/read +VM-0-111-tencentos:104:360 [0] NCCL INFO Channel 06/0 : 8[0] -> 9[1] via P2P/CUMEM/read +VM-0-111-tencentos:106:363 [2] NCCL INFO Channel 00/0 : 10[2] -> 11[3] via P2P/CUMEM/read +VM-0-111-tencentos:104:360 [0] NCCL INFO Channel 07/0 : 8[0] -> 9[1] via P2P/CUMEM/read +VM-0-111-tencentos:106:363 [2] NCCL INFO Channel 01/0 : 10[2] -> 11[3] via P2P/CUMEM/read +VM-0-111-tencentos:106:363 [2] NCCL INFO Channel 02/0 : 10[2] -> 11[3] via P2P/CUMEM/read +VM-0-111-tencentos:105:361 [1] NCCL INFO Channel 00/0 : 9[1] -> 10[2] via P2P/CUMEM/read +VM-0-111-tencentos:106:363 [2] NCCL INFO Channel 03/0 : 10[2] -> 11[3] via P2P/CUMEM/read +VM-0-111-tencentos:105:361 [1] NCCL INFO Channel 02/0 : 9[1] -> 10[2] via P2P/CUMEM/read +VM-0-111-tencentos:107:365 [3] NCCL INFO Connected all rings +VM-0-111-tencentos:106:363 [2] NCCL INFO Channel 04/0 : 10[2] -> 11[3] via P2P/CUMEM/read +VM-0-111-tencentos:108:367 [4] NCCL INFO Connected all rings +VM-0-111-tencentos:109:366 [5] NCCL INFO Connected all rings +VM-0-111-tencentos:105:361 [1] NCCL INFO Channel 03/0 : 9[1] -> 10[2] via P2P/CUMEM/read +VM-0-111-tencentos:106:363 [2] NCCL INFO Channel 05/0 : 10[2] -> 11[3] via P2P/CUMEM/read +VM-0-111-tencentos:105:361 [1] NCCL INFO Channel 04/0 : 9[1] -> 10[2] via P2P/CUMEM/read +VM-0-111-tencentos:106:363 [2] NCCL INFO Channel 06/0 : 10[2] -> 11[3] via P2P/CUMEM/read +VM-0-111-tencentos:105:361 [1] NCCL INFO Channel 06/0 : 9[1] -> 10[2] via P2P/CUMEM/read +VM-0-111-tencentos:106:363 [2] NCCL INFO Channel 07/0 : 10[2] -> 11[3] via P2P/CUMEM/read +VM-0-111-tencentos:105:361 [1] NCCL INFO Channel 07/0 : 9[1] -> 10[2] via P2P/CUMEM/read +VM-0-111-tencentos:105:361 [1] NCCL INFO Channel 00/0 : 9[1] -> 8[0] via P2P/CUMEM/read +VM-0-111-tencentos:104:360 [0] NCCL INFO Channel 00/0 : 0[0] -> 8[0] [receive] via NET/IBext_v8/0 +VM-0-111-tencentos:104:360 [0] NCCL INFO Channel 04/0 : 0[0] -> 8[0] [receive] via NET/IBext_v8/0 +VM-0-111-tencentos:104:360 [0] NCCL INFO Channel 00/0 : 8[0] -> 0[0] [send] via NET/IBext_v8/0 +VM-0-111-tencentos:104:360 [0] NCCL INFO Channel 04/0 : 8[0] -> 0[0] [send] via NET/IBext_v8/0 +VM-0-111-tencentos:105:361 [1] NCCL INFO Channel 04/0 : 9[1] -> 8[0] via P2P/CUMEM/read +VM-0-111-tencentos:108:367 [4] NCCL INFO Channel 00/0 : 12[4] -> 13[5] via P2P/CUMEM/read +VM-0-111-tencentos:108:367 [4] NCCL INFO Channel 01/0 : 12[4] -> 13[5] via P2P/CUMEM/read +VM-0-111-tencentos:107:365 [3] NCCL INFO Channel 00/0 : 11[3] -> 12[4] via P2P/CUMEM/read +VM-0-111-tencentos:108:367 [4] NCCL INFO Channel 02/0 : 12[4] -> 13[5] via P2P/CUMEM/read +VM-0-111-tencentos:109:366 [5] NCCL INFO Channel 00/0 : 13[5] -> 14[6] via P2P/CUMEM/read +VM-0-111-tencentos:107:365 [3] NCCL INFO Channel 01/0 : 11[3] -> 12[4] via P2P/CUMEM/read +VM-0-111-tencentos:108:367 [4] NCCL INFO Channel 03/0 : 12[4] -> 13[5] via P2P/CUMEM/read +VM-0-111-tencentos:109:366 [5] NCCL INFO Channel 01/0 : 13[5] -> 14[6] via P2P/CUMEM/read +VM-0-111-tencentos:107:365 [3] NCCL INFO Channel 03/0 : 11[3] -> 12[4] via P2P/CUMEM/read +VM-0-111-tencentos:108:367 [4] NCCL INFO Channel 04/0 : 12[4] -> 13[5] via P2P/CUMEM/read +VM-0-111-tencentos:109:366 [5] NCCL INFO Channel 02/0 : 13[5] -> 14[6] via P2P/CUMEM/read +VM-0-111-tencentos:111:364 [7] NCCL INFO Connected all rings +VM-0-111-tencentos:110:362 [6] NCCL INFO Connected all rings +VM-0-111-tencentos:107:365 [3] NCCL INFO Channel 04/0 : 11[3] -> 12[4] via P2P/CUMEM/read +VM-0-111-tencentos:108:367 [4] NCCL INFO Channel 05/0 : 12[4] -> 13[5] via P2P/CUMEM/read +VM-0-111-tencentos:109:366 [5] NCCL INFO Channel 04/0 : 13[5] -> 14[6] via P2P/CUMEM/read +VM-0-111-tencentos:107:365 [3] NCCL INFO Channel 05/0 : 11[3] -> 12[4] via P2P/CUMEM/read +VM-0-111-tencentos:108:367 [4] NCCL INFO Channel 06/0 : 12[4] -> 13[5] via P2P/CUMEM/read +VM-0-111-tencentos:109:366 [5] NCCL INFO Channel 05/0 : 13[5] -> 14[6] via P2P/CUMEM/read +VM-0-111-tencentos:107:365 [3] NCCL INFO Channel 07/0 : 11[3] -> 12[4] via P2P/CUMEM/read +VM-0-111-tencentos:108:367 [4] NCCL INFO Channel 07/0 : 12[4] -> 13[5] via P2P/CUMEM/read +VM-0-111-tencentos:109:366 [5] NCCL INFO Channel 06/0 : 13[5] -> 14[6] via P2P/CUMEM/read +VM-0-111-tencentos:106:363 [2] NCCL INFO Channel 01/0 : 2[2] -> 10[2] [receive] via NET/IBext_v8/1 +VM-0-111-tencentos:106:363 [2] NCCL INFO Channel 05/0 : 2[2] -> 10[2] [receive] via NET/IBext_v8/1 +VM-0-111-tencentos:107:365 [3] NCCL INFO Channel 01/0 : 11[3] -> 10[2] via P2P/CUMEM/read +VM-0-111-tencentos:106:363 [2] NCCL INFO Channel 01/0 : 10[2] -> 2[2] [send] via NET/IBext_v8/1 +VM-0-111-tencentos:106:363 [2] NCCL INFO Channel 05/0 : 10[2] -> 2[2] [send] via NET/IBext_v8/1 +VM-0-111-tencentos:108:367 [4] NCCL INFO Channel 02/0 : 4[4] -> 12[4] [receive] via NET/IBext_v8/2 +VM-0-111-tencentos:108:367 [4] NCCL INFO Channel 06/0 : 4[4] -> 12[4] [receive] via NET/IBext_v8/2 +VM-0-111-tencentos:108:367 [4] NCCL INFO Channel 02/0 : 12[4] -> 4[4] [send] via NET/IBext_v8/2 +VM-0-111-tencentos:108:367 [4] NCCL INFO Channel 06/0 : 12[4] -> 4[4] [send] via NET/IBext_v8/2 +VM-0-111-tencentos:107:365 [3] NCCL INFO Channel 05/0 : 11[3] -> 10[2] via P2P/CUMEM/read +VM-0-111-tencentos:110:362 [6] NCCL INFO Channel 00/0 : 14[6] -> 15[7] via P2P/CUMEM/read +VM-0-111-tencentos:110:362 [6] NCCL INFO Channel 01/0 : 14[6] -> 15[7] via P2P/CUMEM/read +VM-0-111-tencentos:110:362 [6] NCCL INFO Channel 02/0 : 14[6] -> 15[7] via P2P/CUMEM/read +VM-0-111-tencentos:110:362 [6] NCCL INFO Channel 03/0 : 14[6] -> 15[7] via P2P/CUMEM/read +VM-0-111-tencentos:110:362 [6] NCCL INFO Channel 04/0 : 14[6] -> 15[7] via P2P/CUMEM/read +VM-0-111-tencentos:110:362 [6] NCCL INFO Channel 05/0 : 14[6] -> 15[7] via P2P/CUMEM/read +VM-0-111-tencentos:110:362 [6] NCCL INFO Channel 06/0 : 14[6] -> 15[7] via P2P/CUMEM/read +VM-0-111-tencentos:110:362 [6] NCCL INFO Channel 07/0 : 14[6] -> 15[7] via P2P/CUMEM/read +VM-0-111-tencentos:109:366 [5] NCCL INFO Channel 02/0 : 13[5] -> 12[4] via P2P/CUMEM/read +VM-0-111-tencentos:111:364 [7] NCCL INFO Channel 01/0 : 15[7] -> 8[0] via P2P/CUMEM/read +VM-0-111-tencentos:110:362 [6] NCCL INFO Channel 03/0 : 6[6] -> 14[6] [receive] via NET/IBext_v8/3 +VM-0-111-tencentos:110:362 [6] NCCL INFO Channel 07/0 : 6[6] -> 14[6] [receive] via NET/IBext_v8/3 +VM-0-111-tencentos:110:362 [6] NCCL INFO Channel 03/0 : 14[6] -> 6[6] [send] via NET/IBext_v8/3 +VM-0-111-tencentos:110:362 [6] NCCL INFO Channel 07/0 : 14[6] -> 6[6] [send] via NET/IBext_v8/3 +VM-0-111-tencentos:111:364 [7] NCCL INFO Channel 02/0 : 15[7] -> 8[0] via P2P/CUMEM/read +VM-0-111-tencentos:109:366 [5] NCCL INFO Channel 06/0 : 13[5] -> 12[4] via P2P/CUMEM/read +VM-0-111-tencentos:111:364 [7] NCCL INFO Channel 03/0 : 15[7] -> 8[0] via P2P/CUMEM/read +VM-0-111-tencentos:111:364 [7] NCCL INFO Channel 05/0 : 15[7] -> 8[0] via P2P/CUMEM/read +VM-0-111-tencentos:111:364 [7] NCCL INFO Channel 06/0 : 15[7] -> 8[0] via P2P/CUMEM/read +VM-0-111-tencentos:111:364 [7] NCCL INFO Channel 07/0 : 15[7] -> 8[0] via P2P/CUMEM/read +VM-0-111-tencentos:111:364 [7] NCCL INFO Channel 03/0 : 15[7] -> 14[6] via P2P/CUMEM/read +VM-0-111-tencentos:111:364 [7] NCCL INFO Channel 07/0 : 15[7] -> 14[6] via P2P/CUMEM/read +VM-0-111-tencentos:106:363 [2] NCCL INFO Connected all trees +VM-0-111-tencentos:107:365 [3] NCCL INFO Connected all trees +VM-0-111-tencentos:106:363 [2] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 512 | 512 +VM-0-111-tencentos:106:363 [2] NCCL INFO 8 coll channels, 0 collnet channels, 0 nvls channels, 8 p2p channels, 2 p2p channels per peer +VM-0-111-tencentos:107:365 [3] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 512 | 512 +VM-0-111-tencentos:107:365 [3] NCCL INFO 8 coll channels, 0 collnet channels, 0 nvls channels, 8 p2p channels, 2 p2p channels per peer +VM-0-111-tencentos:106:363 [2] NCCL INFO NCCL_WORK_FIFO_DEPTH set by environment to 4194304. +VM-0-111-tencentos:107:365 [3] NCCL INFO NCCL_WORK_FIFO_DEPTH set by environment to 4194304. +VM-0-111-tencentos:109:366 [5] NCCL INFO Connected all trees +VM-0-111-tencentos:108:367 [4] NCCL INFO Connected all trees +VM-0-111-tencentos:109:366 [5] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 512 | 512 +VM-0-111-tencentos:105:361 [1] NCCL INFO Connected all trees +VM-0-111-tencentos:109:366 [5] NCCL INFO 8 coll channels, 0 collnet channels, 0 nvls channels, 8 p2p channels, 2 p2p channels per peer +VM-0-111-tencentos:104:360 [0] NCCL INFO Connected all trees +VM-0-111-tencentos:108:367 [4] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 512 | 512 +VM-0-111-tencentos:108:367 [4] NCCL INFO 8 coll channels, 0 collnet channels, 0 nvls channels, 8 p2p channels, 2 p2p channels per peer +VM-0-111-tencentos:105:361 [1] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 512 | 512 +VM-0-111-tencentos:105:361 [1] NCCL INFO 8 coll channels, 0 collnet channels, 0 nvls channels, 8 p2p channels, 2 p2p channels per peer +VM-0-111-tencentos:104:360 [0] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 512 | 512 +VM-0-111-tencentos:104:360 [0] NCCL INFO 8 coll channels, 0 collnet channels, 0 nvls channels, 8 p2p channels, 2 p2p channels per peer +VM-0-111-tencentos:110:362 [6] NCCL INFO Connected all trees +VM-0-111-tencentos:111:364 [7] NCCL INFO Connected all trees +VM-0-111-tencentos:110:362 [6] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 512 | 512 +VM-0-111-tencentos:110:362 [6] NCCL INFO 8 coll channels, 0 collnet channels, 0 nvls channels, 8 p2p channels, 2 p2p channels per peer +VM-0-111-tencentos:111:364 [7] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 512 | 512 +VM-0-111-tencentos:111:364 [7] NCCL INFO 8 coll channels, 0 collnet channels, 0 nvls channels, 8 p2p channels, 2 p2p channels per peer +VM-0-111-tencentos:109:366 [5] NCCL INFO NCCL_WORK_FIFO_DEPTH set by environment to 4194304. +VM-0-111-tencentos:108:367 [4] NCCL INFO NCCL_WORK_FIFO_DEPTH set by environment to 4194304. +VM-0-111-tencentos:105:361 [1] NCCL INFO NCCL_WORK_FIFO_DEPTH set by environment to 4194304. +VM-0-111-tencentos:104:360 [0] NCCL INFO NCCL_WORK_FIFO_DEPTH set by environment to 4194304. +VM-0-111-tencentos:110:362 [6] NCCL INFO NCCL_WORK_FIFO_DEPTH set by environment to 4194304. +VM-0-111-tencentos:111:364 [7] NCCL INFO NCCL_WORK_FIFO_DEPTH set by environment to 4194304. +VM-0-111-tencentos:109:366 [5] NCCL INFO comm 0x1d646040 rank 13 nranks 16 cudaDev 5 nvmlDev 5 busId 84000 commId 0x8ba02a329d6db9c8 - Init COMPLETE +VM-0-111-tencentos:106:363 [2] NCCL INFO comm 0x34d59180 rank 10 nranks 16 cudaDev 2 nvmlDev 2 busId 43000 commId 0x8ba02a329d6db9c8 - Init COMPLETE +VM-0-111-tencentos:105:361 [1] NCCL INFO comm 0x29e4ba00 rank 9 nranks 16 cudaDev 1 nvmlDev 1 busId 24000 commId 0x8ba02a329d6db9c8 - Init COMPLETE +VM-0-111-tencentos:104:360 [0] NCCL INFO comm 0x3ee89c20 rank 8 nranks 16 cudaDev 0 nvmlDev 0 busId 23000 commId 0x8ba02a329d6db9c8 - Init COMPLETE +VM-0-111-tencentos:108:367 [4] NCCL INFO comm 0x19feab40 rank 12 nranks 16 cudaDev 4 nvmlDev 4 busId 83000 commId 0x8ba02a329d6db9c8 - Init COMPLETE +VM-0-111-tencentos:110:362 [6] NCCL INFO comm 0x34bdd640 rank 14 nranks 16 cudaDev 6 nvmlDev 6 busId c3000 commId 0x8ba02a329d6db9c8 - Init COMPLETE +VM-0-111-tencentos:111:364 [7] NCCL INFO comm 0x1d16df00 rank 15 nranks 16 cudaDev 7 nvmlDev 7 busId c4000 commId 0x8ba02a329d6db9c8 - Init COMPLETE +VM-0-111-tencentos:107:365 [3] NCCL INFO comm 0x6ac0b600 rank 11 nranks 16 cudaDev 3 nvmlDev 3 busId 44000 commId 0x8ba02a329d6db9c8 - Init COMPLETE +VM-0-111-tencentos:105:5216 [1] NCCL INFO Using non-device net plugin version 0 +VM-0-111-tencentos:105:5216 [1] NCCL INFO Using network IBext_v8 +VM-0-111-tencentos:106:5222 [2] NCCL INFO Using non-device net plugin version 0 +VM-0-111-tencentos:111:5217 [7] NCCL INFO Using non-device net plugin version 0 +VM-0-111-tencentos:107:5220 [3] NCCL INFO Using non-device net plugin version 0 +VM-0-111-tencentos:106:5222 [2] NCCL INFO Using network IBext_v8 +VM-0-111-tencentos:109:5219 [5] NCCL INFO Using non-device net plugin version 0 +VM-0-111-tencentos:111:5217 [7] NCCL INFO Using network IBext_v8 +VM-0-111-tencentos:108:5221 [4] NCCL INFO Using non-device net plugin version 0 +VM-0-111-tencentos:107:5220 [3] NCCL INFO Using network IBext_v8 +VM-0-111-tencentos:108:5221 [4] NCCL INFO Using network IBext_v8 +VM-0-111-tencentos:104:5223 [0] NCCL INFO Using non-device net plugin version 0 +VM-0-111-tencentos:110:5218 [6] NCCL INFO Using non-device net plugin version 0 +VM-0-111-tencentos:109:5219 [5] NCCL INFO Using network IBext_v8 +VM-0-111-tencentos:104:5223 [0] NCCL INFO Using network IBext_v8 +VM-0-111-tencentos:110:5218 [6] NCCL INFO Using network IBext_v8 +VM-0-111-tencentos:107:5220 [3] NCCL INFO comm 0x60fd9cb0 rank 11 nranks 16 cudaDev 3 nvmlDev 3 busId 44000 commId 0x319e978f568c0fb6 - Init START +VM-0-111-tencentos:108:5221 [4] NCCL INFO comm 0x5a8f5360 rank 12 nranks 16 cudaDev 4 nvmlDev 4 busId 83000 commId 0x319e978f568c0fb6 - Init START +VM-0-111-tencentos:109:5219 [5] NCCL INFO comm 0x12d4e240 rank 13 nranks 16 cudaDev 5 nvmlDev 5 busId 84000 commId 0x319e978f568c0fb6 - Init START +VM-0-111-tencentos:111:5217 [7] NCCL INFO comm 0x5fd5d520 rank 15 nranks 16 cudaDev 7 nvmlDev 7 busId c4000 commId 0x319e978f568c0fb6 - Init START +VM-0-111-tencentos:110:5218 [6] NCCL INFO comm 0x6a98d070 rank 14 nranks 16 cudaDev 6 nvmlDev 6 busId c3000 commId 0x319e978f568c0fb6 - Init START +VM-0-111-tencentos:104:5223 [0] NCCL INFO comm 0x52157810 rank 8 nranks 16 cudaDev 0 nvmlDev 0 busId 23000 commId 0x319e978f568c0fb6 - Init START +VM-0-111-tencentos:105:5216 [1] NCCL INFO comm 0x1ab7b700 rank 9 nranks 16 cudaDev 1 nvmlDev 1 busId 24000 commId 0x319e978f568c0fb6 - Init START +VM-0-111-tencentos:106:5222 [2] NCCL INFO comm 0x17039da0 rank 10 nranks 16 cudaDev 2 nvmlDev 2 busId 43000 commId 0x319e978f568c0fb6 - Init START +VM-0-111-tencentos:107:5220 [3] NCCL INFO Setting affinity for GPU 3 to 0fffff,ffffffff,ffffffff,ffffffff +VM-0-111-tencentos:111:5217 [7] NCCL INFO Setting affinity for GPU 7 to ff,ffffffff,ffffffff,ffffffff,fff00000,00000000,00000000,00000000 +VM-0-111-tencentos:106:5222 [2] NCCL INFO Setting affinity for GPU 2 to 0fffff,ffffffff,ffffffff,ffffffff +VM-0-111-tencentos:105:5216 [1] NCCL INFO Setting affinity for GPU 1 to 0fffff,ffffffff,ffffffff,ffffffff +VM-0-111-tencentos:104:5223 [0] NCCL INFO Setting affinity for GPU 0 to 0fffff,ffffffff,ffffffff,ffffffff +VM-0-111-tencentos:110:5218 [6] NCCL INFO Setting affinity for GPU 6 to ff,ffffffff,ffffffff,ffffffff,fff00000,00000000,00000000,00000000 +VM-0-111-tencentos:109:5219 [5] NCCL INFO Setting affinity for GPU 5 to ff,ffffffff,ffffffff,ffffffff,fff00000,00000000,00000000,00000000 +VM-0-111-tencentos:108:5221 [4] NCCL INFO Setting affinity for GPU 4 to ff,ffffffff,ffffffff,ffffffff,fff00000,00000000,00000000,00000000 +VM-0-111-tencentos:108:5221 [4] NCCL INFO comm 0x5a8f5360 rank 12 nRanks 16 nNodes 2 localRanks 8 localRank 4 MNNVL 0 +VM-0-111-tencentos:109:5219 [5] NCCL INFO comm 0x12d4e240 rank 13 nRanks 16 nNodes 2 localRanks 8 localRank 5 MNNVL 0 +VM-0-111-tencentos:110:5218 [6] NCCL INFO comm 0x6a98d070 rank 14 nRanks 16 nNodes 2 localRanks 8 localRank 6 MNNVL 0 +VM-0-111-tencentos:108:5221 [4] NCCL INFO Trees [0] 13/-1/-1->12->11 [1] 13/-1/-1->12->11 [2] 13/-1/-1->12->4 [3] 13/-1/-1->12->11 [4] 13/-1/-1->12->11 [5] 13/-1/-1->12->11 [6] 13/4/-1->12->-1 [7] 13/-1/-1->12->11 +VM-0-111-tencentos:108:5221 [4] NCCL INFO P2P Chunksize set to 131072 +VM-0-111-tencentos:109:5219 [5] NCCL INFO Trees [0] 14/-1/-1->13->12 [1] 14/-1/-1->13->12 [2] 14/-1/-1->13->12 [3] -1/-1/-1->13->12 [4] 14/-1/-1->13->12 [5] 14/-1/-1->13->12 [6] 14/-1/-1->13->12 [7] -1/-1/-1->13->12 +VM-0-111-tencentos:109:5219 [5] NCCL INFO P2P Chunksize set to 131072 +VM-0-111-tencentos:110:5218 [6] NCCL INFO Trees [0] 15/-1/-1->14->13 [1] 15/-1/-1->14->13 [2] 15/-1/-1->14->13 [3] 15/-1/-1->14->6 [4] 15/-1/-1->14->13 [5] 15/-1/-1->14->13 [6] 15/-1/-1->14->13 [7] 15/6/-1->14->-1 +VM-0-111-tencentos:110:5218 [6] NCCL INFO P2P Chunksize set to 131072 +VM-0-111-tencentos:111:5217 [7] NCCL INFO comm 0x5fd5d520 rank 15 nRanks 16 nNodes 2 localRanks 8 localRank 7 MNNVL 0 +VM-0-111-tencentos:107:5220 [3] NCCL INFO comm 0x60fd9cb0 rank 11 nRanks 16 nNodes 2 localRanks 8 localRank 3 MNNVL 0 +VM-0-111-tencentos:111:5217 [7] NCCL INFO Trees [0] -1/-1/-1->15->14 [1] 8/-1/-1->15->14 [2] 8/-1/-1->15->14 [3] 8/-1/-1->15->14 [4] -1/-1/-1->15->14 [5] 8/-1/-1->15->14 [6] 8/-1/-1->15->14 [7] 8/-1/-1->15->14 +VM-0-111-tencentos:111:5217 [7] NCCL INFO P2P Chunksize set to 131072 +VM-0-111-tencentos:107:5220 [3] NCCL INFO Trees [0] 12/-1/-1->11->10 [1] 12/-1/-1->11->10 [2] -1/-1/-1->11->10 [3] 12/-1/-1->11->10 [4] 12/-1/-1->11->10 [5] 12/-1/-1->11->10 [6] -1/-1/-1->11->10 [7] 12/-1/-1->11->10 +VM-0-111-tencentos:107:5220 [3] NCCL INFO P2P Chunksize set to 131072 +VM-0-111-tencentos:106:5222 [2] NCCL INFO comm 0x17039da0 rank 10 nRanks 16 nNodes 2 localRanks 8 localRank 2 MNNVL 0 +VM-0-111-tencentos:104:5223 [0] NCCL INFO comm 0x52157810 rank 8 nRanks 16 nNodes 2 localRanks 8 localRank 0 MNNVL 0 +VM-0-111-tencentos:105:5216 [1] NCCL INFO comm 0x1ab7b700 rank 9 nRanks 16 nNodes 2 localRanks 8 localRank 1 MNNVL 0 +VM-0-111-tencentos:106:5222 [2] NCCL INFO Trees [0] 11/-1/-1->10->9 [1] 11/-1/-1->10->2 [2] 11/-1/-1->10->9 [3] 11/-1/-1->10->9 [4] 11/-1/-1->10->9 [5] 11/2/-1->10->-1 [6] 11/-1/-1->10->9 [7] 11/-1/-1->10->9 +VM-0-111-tencentos:104:5223 [0] NCCL INFO Trees [0] 9/-1/-1->8->0 [1] 9/-1/-1->8->15 [2] 9/-1/-1->8->15 [3] 9/-1/-1->8->15 [4] 9/0/-1->8->-1 [5] 9/-1/-1->8->15 [6] 9/-1/-1->8->15 [7] 9/-1/-1->8->15 +VM-0-111-tencentos:106:5222 [2] NCCL INFO P2P Chunksize set to 131072 +VM-0-111-tencentos:104:5223 [0] NCCL INFO P2P Chunksize set to 131072 +VM-0-111-tencentos:105:5216 [1] NCCL INFO Trees [0] 10/-1/-1->9->8 [1] -1/-1/-1->9->8 [2] 10/-1/-1->9->8 [3] 10/-1/-1->9->8 [4] 10/-1/-1->9->8 [5] -1/-1/-1->9->8 [6] 10/-1/-1->9->8 [7] 10/-1/-1->9->8 +VM-0-111-tencentos:105:5216 [1] NCCL INFO P2P Chunksize set to 131072 +VM-0-111-tencentos:105:5216 [1] NCCL INFO Channel 00/0 : 9[1] -> 0[0] [send] via NET/IBext_v8/0 +VM-0-111-tencentos:105:5216 [1] NCCL INFO Channel 04/0 : 9[1] -> 0[0] [send] via NET/IBext_v8/0 +VM-0-111-tencentos:108:5221 [4] NCCL INFO Channel 02/0 : 5[5] -> 12[4] [receive] via NET/IBext_v8/2 +VM-0-111-tencentos:108:5221 [4] NCCL INFO Channel 06/0 : 5[5] -> 12[4] [receive] via NET/IBext_v8/2 +VM-0-111-tencentos:107:5220 [3] NCCL INFO Channel 01/0 : 11[3] -> 2[2] [send] via NET/IBext_v8/1 +VM-0-111-tencentos:107:5220 [3] NCCL INFO Channel 05/0 : 11[3] -> 2[2] [send] via NET/IBext_v8/1 +VM-0-111-tencentos:110:5218 [6] NCCL INFO Channel 03/0 : 7[7] -> 14[6] [receive] via NET/IBext_v8/3 +VM-0-111-tencentos:110:5218 [6] NCCL INFO Channel 07/0 : 7[7] -> 14[6] [receive] via NET/IBext_v8/3 +VM-0-111-tencentos:106:5222 [2] NCCL INFO Channel 01/0 : 3[3] -> 10[2] [receive] via NET/IBext_v8/1 +VM-0-111-tencentos:106:5222 [2] NCCL INFO Channel 05/0 : 3[3] -> 10[2] [receive] via NET/IBext_v8/1 +VM-0-111-tencentos:109:5219 [5] NCCL INFO Channel 02/0 : 13[5] -> 4[4] [send] via NET/IBext_v8/2 +VM-0-111-tencentos:109:5219 [5] NCCL INFO Channel 06/0 : 13[5] -> 4[4] [send] via NET/IBext_v8/2 +VM-0-111-tencentos:104:5223 [0] NCCL INFO Channel 00/0 : 1[1] -> 8[0] [receive] via NET/IBext_v8/0 +VM-0-111-tencentos:104:5223 [0] NCCL INFO Channel 04/0 : 1[1] -> 8[0] [receive] via NET/IBext_v8/0 +VM-0-111-tencentos:104:5223 [0] NCCL INFO Channel 00/0 : 8[0] -> 15[7] via P2P/CUMEM/read +VM-0-111-tencentos:111:5217 [7] NCCL INFO Channel 03/0 : 15[7] -> 6[6] [send] via NET/IBext_v8/3 +VM-0-111-tencentos:111:5217 [7] NCCL INFO Channel 07/0 : 15[7] -> 6[6] [send] via NET/IBext_v8/3 +VM-0-111-tencentos:104:5223 [0] NCCL INFO Channel 01/0 : 8[0] -> 15[7] via P2P/CUMEM/read +VM-0-111-tencentos:108:5221 [4] NCCL INFO Channel 00/0 : 12[4] -> 11[3] via P2P/CUMEM/read +VM-0-111-tencentos:104:5223 [0] NCCL INFO Channel 02/0 : 8[0] -> 15[7] via P2P/CUMEM/read +VM-0-111-tencentos:108:5221 [4] NCCL INFO Channel 01/0 : 12[4] -> 11[3] via P2P/CUMEM/read +VM-0-111-tencentos:104:5223 [0] NCCL INFO Channel 03/0 : 8[0] -> 15[7] via P2P/CUMEM/read +VM-0-111-tencentos:108:5221 [4] NCCL INFO Channel 02/0 : 12[4] -> 11[3] via P2P/CUMEM/read +VM-0-111-tencentos:104:5223 [0] NCCL INFO Channel 04/0 : 8[0] -> 15[7] via P2P/CUMEM/read +VM-0-111-tencentos:108:5221 [4] NCCL INFO Channel 03/0 : 12[4] -> 11[3] via P2P/CUMEM/read +VM-0-111-tencentos:110:5218 [6] NCCL INFO Channel 00/0 : 14[6] -> 13[5] via P2P/CUMEM/read +VM-0-111-tencentos:106:5222 [2] NCCL INFO Channel 00/0 : 10[2] -> 9[1] via P2P/CUMEM/read +VM-0-111-tencentos:107:5220 [3] NCCL INFO Channel 00/0 : 11[3] -> 10[2] via P2P/CUMEM/read +VM-0-111-tencentos:104:5223 [0] NCCL INFO Channel 05/0 : 8[0] -> 15[7] via P2P/CUMEM/read +VM-0-111-tencentos:108:5221 [4] NCCL INFO Channel 04/0 : 12[4] -> 11[3] via P2P/CUMEM/read +VM-0-111-tencentos:110:5218 [6] NCCL INFO Channel 01/0 : 14[6] -> 13[5] via P2P/CUMEM/read +VM-0-111-tencentos:106:5222 [2] NCCL INFO Channel 01/0 : 10[2] -> 9[1] via P2P/CUMEM/read +VM-0-111-tencentos:107:5220 [3] NCCL INFO Channel 02/0 : 11[3] -> 10[2] via P2P/CUMEM/read +VM-0-111-tencentos:104:5223 [0] NCCL INFO Channel 06/0 : 8[0] -> 15[7] via P2P/CUMEM/read +VM-0-111-tencentos:108:5221 [4] NCCL INFO Channel 05/0 : 12[4] -> 11[3] via P2P/CUMEM/read +VM-0-111-tencentos:110:5218 [6] NCCL INFO Channel 02/0 : 14[6] -> 13[5] via P2P/CUMEM/read +VM-0-111-tencentos:106:5222 [2] NCCL INFO Channel 02/0 : 10[2] -> 9[1] via P2P/CUMEM/read +VM-0-111-tencentos:107:5220 [3] NCCL INFO Channel 03/0 : 11[3] -> 10[2] via P2P/CUMEM/read +VM-0-111-tencentos:104:5223 [0] NCCL INFO Channel 07/0 : 8[0] -> 15[7] via P2P/CUMEM/read +VM-0-111-tencentos:108:5221 [4] NCCL INFO Channel 06/0 : 12[4] -> 11[3] via P2P/CUMEM/read +VM-0-111-tencentos:110:5218 [6] NCCL INFO Channel 03/0 : 14[6] -> 13[5] via P2P/CUMEM/read +VM-0-111-tencentos:109:5219 [5] NCCL INFO Channel 00/0 : 13[5] -> 12[4] via P2P/CUMEM/read +VM-0-111-tencentos:106:5222 [2] NCCL INFO Channel 03/0 : 10[2] -> 9[1] via P2P/CUMEM/read +VM-0-111-tencentos:107:5220 [3] NCCL INFO Channel 04/0 : 11[3] -> 10[2] via P2P/CUMEM/read +VM-0-111-tencentos:108:5221 [4] NCCL INFO Channel 07/0 : 12[4] -> 11[3] via P2P/CUMEM/read +VM-0-111-tencentos:111:5217 [7] NCCL INFO Channel 00/0 : 15[7] -> 14[6] via P2P/CUMEM/read +VM-0-111-tencentos:110:5218 [6] NCCL INFO Channel 04/0 : 14[6] -> 13[5] via P2P/CUMEM/read +VM-0-111-tencentos:109:5219 [5] NCCL INFO Channel 01/0 : 13[5] -> 12[4] via P2P/CUMEM/read +VM-0-111-tencentos:106:5222 [2] NCCL INFO Channel 04/0 : 10[2] -> 9[1] via P2P/CUMEM/read +VM-0-111-tencentos:107:5220 [3] NCCL INFO Channel 06/0 : 11[3] -> 10[2] via P2P/CUMEM/read +VM-0-111-tencentos:111:5217 [7] NCCL INFO Channel 01/0 : 15[7] -> 14[6] via P2P/CUMEM/read +VM-0-111-tencentos:110:5218 [6] NCCL INFO Channel 05/0 : 14[6] -> 13[5] via P2P/CUMEM/read +VM-0-111-tencentos:109:5219 [5] NCCL INFO Channel 03/0 : 13[5] -> 12[4] via P2P/CUMEM/read +VM-0-111-tencentos:106:5222 [2] NCCL INFO Channel 05/0 : 10[2] -> 9[1] via P2P/CUMEM/read +VM-0-111-tencentos:107:5220 [3] NCCL INFO Channel 07/0 : 11[3] -> 10[2] via P2P/CUMEM/read +VM-0-111-tencentos:111:5217 [7] NCCL INFO Channel 02/0 : 15[7] -> 14[6] via P2P/CUMEM/read +VM-0-111-tencentos:105:5216 [1] NCCL INFO Channel 01/0 : 9[1] -> 8[0] via P2P/CUMEM/read +VM-0-111-tencentos:110:5218 [6] NCCL INFO Channel 06/0 : 14[6] -> 13[5] via P2P/CUMEM/read +VM-0-111-tencentos:109:5219 [5] NCCL INFO Channel 04/0 : 13[5] -> 12[4] via P2P/CUMEM/read +VM-0-111-tencentos:106:5222 [2] NCCL INFO Channel 06/0 : 10[2] -> 9[1] via P2P/CUMEM/read +VM-0-111-tencentos:111:5217 [7] NCCL INFO Channel 04/0 : 15[7] -> 14[6] via P2P/CUMEM/read +VM-0-111-tencentos:105:5216 [1] NCCL INFO Channel 02/0 : 9[1] -> 8[0] via P2P/CUMEM/read +VM-0-111-tencentos:110:5218 [6] NCCL INFO Channel 07/0 : 14[6] -> 13[5] via P2P/CUMEM/read +VM-0-111-tencentos:109:5219 [5] NCCL INFO Channel 05/0 : 13[5] -> 12[4] via P2P/CUMEM/read +VM-0-111-tencentos:106:5222 [2] NCCL INFO Channel 07/0 : 10[2] -> 9[1] via P2P/CUMEM/read +VM-0-111-tencentos:111:5217 [7] NCCL INFO Channel 05/0 : 15[7] -> 14[6] via P2P/CUMEM/read +VM-0-111-tencentos:105:5216 [1] NCCL INFO Channel 03/0 : 9[1] -> 8[0] via P2P/CUMEM/read +VM-0-111-tencentos:109:5219 [5] NCCL INFO Channel 07/0 : 13[5] -> 12[4] via P2P/CUMEM/read +VM-0-111-tencentos:111:5217 [7] NCCL INFO Channel 06/0 : 15[7] -> 14[6] via P2P/CUMEM/read +VM-0-111-tencentos:105:5216 [1] NCCL INFO Channel 05/0 : 9[1] -> 8[0] via P2P/CUMEM/read +VM-0-111-tencentos:105:5216 [1] NCCL INFO Channel 06/0 : 9[1] -> 8[0] via P2P/CUMEM/read +VM-0-111-tencentos:105:5216 [1] NCCL INFO Channel 07/0 : 9[1] -> 8[0] via P2P/CUMEM/read +VM-0-111-tencentos:107:5220 [3] NCCL INFO Connected all rings +VM-0-111-tencentos:106:5222 [2] NCCL INFO Connected all rings +VM-0-111-tencentos:108:5221 [4] NCCL INFO Connected all rings +VM-0-111-tencentos:106:5222 [2] NCCL INFO Channel 00/0 : 10[2] -> 11[3] via P2P/CUMEM/read +VM-0-111-tencentos:106:5222 [2] NCCL INFO Channel 01/0 : 10[2] -> 11[3] via P2P/CUMEM/read +VM-0-111-tencentos:106:5222 [2] NCCL INFO Channel 02/0 : 10[2] -> 11[3] via P2P/CUMEM/read +VM-0-111-tencentos:106:5222 [2] NCCL INFO Channel 03/0 : 10[2] -> 11[3] via P2P/CUMEM/read +VM-0-111-tencentos:107:5220 [3] NCCL INFO Channel 00/0 : 11[3] -> 12[4] via P2P/CUMEM/read +VM-0-111-tencentos:106:5222 [2] NCCL INFO Channel 04/0 : 10[2] -> 11[3] via P2P/CUMEM/read +VM-0-111-tencentos:107:5220 [3] NCCL INFO Channel 01/0 : 11[3] -> 12[4] via P2P/CUMEM/read +VM-0-111-tencentos:104:5223 [0] NCCL INFO Connected all rings +VM-0-111-tencentos:104:5223 [0] NCCL INFO Channel 00/0 : 8[0] -> 9[1] via P2P/CUMEM/read +VM-0-111-tencentos:105:5216 [1] NCCL INFO Connected all rings +VM-0-111-tencentos:107:5220 [3] NCCL INFO Channel 03/0 : 11[3] -> 12[4] via P2P/CUMEM/read +VM-0-111-tencentos:106:5222 [2] NCCL INFO Channel 05/0 : 10[2] -> 11[3] via P2P/CUMEM/read +VM-0-111-tencentos:106:5222 [2] NCCL INFO Channel 06/0 : 10[2] -> 11[3] via P2P/CUMEM/read +VM-0-111-tencentos:107:5220 [3] NCCL INFO Channel 04/0 : 11[3] -> 12[4] via P2P/CUMEM/read +VM-0-111-tencentos:109:5219 [5] NCCL INFO Connected all rings +VM-0-111-tencentos:111:5217 [7] NCCL INFO Connected all rings +VM-0-111-tencentos:110:5218 [6] NCCL INFO Connected all rings +VM-0-111-tencentos:106:5222 [2] NCCL INFO Channel 07/0 : 10[2] -> 11[3] via P2P/CUMEM/read +VM-0-111-tencentos:107:5220 [3] NCCL INFO Channel 05/0 : 11[3] -> 12[4] via P2P/CUMEM/read +VM-0-111-tencentos:108:5221 [4] NCCL INFO Channel 00/0 : 12[4] -> 13[5] via P2P/CUMEM/read +VM-0-111-tencentos:107:5220 [3] NCCL INFO Channel 07/0 : 11[3] -> 12[4] via P2P/CUMEM/read +VM-0-111-tencentos:108:5221 [4] NCCL INFO Channel 01/0 : 12[4] -> 13[5] via P2P/CUMEM/read +VM-0-111-tencentos:108:5221 [4] NCCL INFO Channel 02/0 : 12[4] -> 13[5] via P2P/CUMEM/read +VM-0-111-tencentos:104:5223 [0] NCCL INFO Channel 01/0 : 8[0] -> 9[1] via P2P/CUMEM/read +VM-0-111-tencentos:104:5223 [0] NCCL INFO Channel 02/0 : 8[0] -> 9[1] via P2P/CUMEM/read +VM-0-111-tencentos:108:5221 [4] NCCL INFO Channel 03/0 : 12[4] -> 13[5] via P2P/CUMEM/read +VM-0-111-tencentos:104:5223 [0] NCCL INFO Channel 03/0 : 8[0] -> 9[1] via P2P/CUMEM/read +VM-0-111-tencentos:108:5221 [4] NCCL INFO Channel 04/0 : 12[4] -> 13[5] via P2P/CUMEM/read +VM-0-111-tencentos:110:5218 [6] NCCL INFO Channel 00/0 : 14[6] -> 15[7] via P2P/CUMEM/read +VM-0-111-tencentos:104:5223 [0] NCCL INFO Channel 04/0 : 8[0] -> 9[1] via P2P/CUMEM/read +VM-0-111-tencentos:105:5216 [1] NCCL INFO Channel 00/0 : 9[1] -> 10[2] via P2P/CUMEM/read +VM-0-111-tencentos:108:5221 [4] NCCL INFO Channel 05/0 : 12[4] -> 13[5] via P2P/CUMEM/read +VM-0-111-tencentos:110:5218 [6] NCCL INFO Channel 01/0 : 14[6] -> 15[7] via P2P/CUMEM/read +VM-0-111-tencentos:104:5223 [0] NCCL INFO Channel 05/0 : 8[0] -> 9[1] via P2P/CUMEM/read +VM-0-111-tencentos:105:5216 [1] NCCL INFO Channel 02/0 : 9[1] -> 10[2] via P2P/CUMEM/read +VM-0-111-tencentos:108:5221 [4] NCCL INFO Channel 06/0 : 12[4] -> 13[5] via P2P/CUMEM/read +VM-0-111-tencentos:109:5219 [5] NCCL INFO Channel 00/0 : 13[5] -> 14[6] via P2P/CUMEM/read +VM-0-111-tencentos:110:5218 [6] NCCL INFO Channel 02/0 : 14[6] -> 15[7] via P2P/CUMEM/read +VM-0-111-tencentos:104:5223 [0] NCCL INFO Channel 06/0 : 8[0] -> 9[1] via P2P/CUMEM/read +VM-0-111-tencentos:105:5216 [1] NCCL INFO Channel 03/0 : 9[1] -> 10[2] via P2P/CUMEM/read +VM-0-111-tencentos:108:5221 [4] NCCL INFO Channel 07/0 : 12[4] -> 13[5] via P2P/CUMEM/read +VM-0-111-tencentos:109:5219 [5] NCCL INFO Channel 01/0 : 13[5] -> 14[6] via P2P/CUMEM/read +VM-0-111-tencentos:110:5218 [6] NCCL INFO Channel 03/0 : 14[6] -> 15[7] via P2P/CUMEM/read +VM-0-111-tencentos:104:5223 [0] NCCL INFO Channel 07/0 : 8[0] -> 9[1] via P2P/CUMEM/read +VM-0-111-tencentos:105:5216 [1] NCCL INFO Channel 04/0 : 9[1] -> 10[2] via P2P/CUMEM/read +VM-0-111-tencentos:109:5219 [5] NCCL INFO Channel 02/0 : 13[5] -> 14[6] via P2P/CUMEM/read +VM-0-111-tencentos:107:5220 [3] NCCL INFO Channel 01/0 : 11[3] -> 10[2] via P2P/CUMEM/read +VM-0-111-tencentos:110:5218 [6] NCCL INFO Channel 04/0 : 14[6] -> 15[7] via P2P/CUMEM/read +VM-0-111-tencentos:105:5216 [1] NCCL INFO Channel 06/0 : 9[1] -> 10[2] via P2P/CUMEM/read +VM-0-111-tencentos:109:5219 [5] NCCL INFO Channel 04/0 : 13[5] -> 14[6] via P2P/CUMEM/read +VM-0-111-tencentos:107:5220 [3] NCCL INFO Channel 05/0 : 11[3] -> 10[2] via P2P/CUMEM/read +VM-0-111-tencentos:110:5218 [6] NCCL INFO Channel 05/0 : 14[6] -> 15[7] via P2P/CUMEM/read +VM-0-111-tencentos:105:5216 [1] NCCL INFO Channel 07/0 : 9[1] -> 10[2] via P2P/CUMEM/read +VM-0-111-tencentos:109:5219 [5] NCCL INFO Channel 05/0 : 13[5] -> 14[6] via P2P/CUMEM/read +VM-0-111-tencentos:110:5218 [6] NCCL INFO Channel 06/0 : 14[6] -> 15[7] via P2P/CUMEM/read +VM-0-111-tencentos:109:5219 [5] NCCL INFO Channel 06/0 : 13[5] -> 14[6] via P2P/CUMEM/read +VM-0-111-tencentos:104:5223 [0] NCCL INFO Channel 00/0 : 0[0] -> 8[0] [receive] via NET/IBext_v8/0 +VM-0-111-tencentos:105:5216 [1] NCCL INFO Channel 00/0 : 9[1] -> 8[0] via P2P/CUMEM/read +VM-0-111-tencentos:106:5222 [2] NCCL INFO Channel 01/0 : 2[2] -> 10[2] [receive] via NET/IBext_v8/1 +VM-0-111-tencentos:104:5223 [0] NCCL INFO Channel 04/0 : 0[0] -> 8[0] [receive] via NET/IBext_v8/0 +VM-0-111-tencentos:104:5223 [0] NCCL INFO Channel 00/0 : 8[0] -> 0[0] [send] via NET/IBext_v8/0 +VM-0-111-tencentos:106:5222 [2] NCCL INFO Channel 05/0 : 2[2] -> 10[2] [receive] via NET/IBext_v8/1 +VM-0-111-tencentos:106:5222 [2] NCCL INFO Channel 01/0 : 10[2] -> 2[2] [send] via NET/IBext_v8/1 +VM-0-111-tencentos:104:5223 [0] NCCL INFO Channel 04/0 : 8[0] -> 0[0] [send] via NET/IBext_v8/0 +VM-0-111-tencentos:110:5218 [6] NCCL INFO Channel 07/0 : 14[6] -> 15[7] via P2P/CUMEM/read +VM-0-111-tencentos:106:5222 [2] NCCL INFO Channel 05/0 : 10[2] -> 2[2] [send] via NET/IBext_v8/1 +VM-0-111-tencentos:105:5216 [1] NCCL INFO Channel 04/0 : 9[1] -> 8[0] via P2P/CUMEM/read +VM-0-111-tencentos:108:5221 [4] NCCL INFO Channel 02/0 : 4[4] -> 12[4] [receive] via NET/IBext_v8/2 +VM-0-111-tencentos:108:5221 [4] NCCL INFO Channel 06/0 : 4[4] -> 12[4] [receive] via NET/IBext_v8/2 +VM-0-111-tencentos:108:5221 [4] NCCL INFO Channel 02/0 : 12[4] -> 4[4] [send] via NET/IBext_v8/2 +VM-0-111-tencentos:108:5221 [4] NCCL INFO Channel 06/0 : 12[4] -> 4[4] [send] via NET/IBext_v8/2 +VM-0-111-tencentos:109:5219 [5] NCCL INFO Channel 02/0 : 13[5] -> 12[4] via P2P/CUMEM/read +VM-0-111-tencentos:111:5217 [7] NCCL INFO Channel 01/0 : 15[7] -> 8[0] via P2P/CUMEM/read +VM-0-111-tencentos:110:5218 [6] NCCL INFO Channel 03/0 : 6[6] -> 14[6] [receive] via NET/IBext_v8/3 +VM-0-111-tencentos:110:5218 [6] NCCL INFO Channel 07/0 : 6[6] -> 14[6] [receive] via NET/IBext_v8/3 +VM-0-111-tencentos:110:5218 [6] NCCL INFO Channel 03/0 : 14[6] -> 6[6] [send] via NET/IBext_v8/3 +VM-0-111-tencentos:110:5218 [6] NCCL INFO Channel 07/0 : 14[6] -> 6[6] [send] via NET/IBext_v8/3 +VM-0-111-tencentos:109:5219 [5] NCCL INFO Channel 06/0 : 13[5] -> 12[4] via P2P/CUMEM/read +VM-0-111-tencentos:111:5217 [7] NCCL INFO Channel 02/0 : 15[7] -> 8[0] via P2P/CUMEM/read +VM-0-111-tencentos:111:5217 [7] NCCL INFO Channel 03/0 : 15[7] -> 8[0] via P2P/CUMEM/read +VM-0-111-tencentos:111:5217 [7] NCCL INFO Channel 05/0 : 15[7] -> 8[0] via P2P/CUMEM/read +VM-0-111-tencentos:111:5217 [7] NCCL INFO Channel 06/0 : 15[7] -> 8[0] via P2P/CUMEM/read +VM-0-111-tencentos:111:5217 [7] NCCL INFO Channel 07/0 : 15[7] -> 8[0] via P2P/CUMEM/read +VM-0-111-tencentos:111:5217 [7] NCCL INFO Channel 03/0 : 15[7] -> 14[6] via P2P/CUMEM/read +VM-0-111-tencentos:111:5217 [7] NCCL INFO Channel 07/0 : 15[7] -> 14[6] via P2P/CUMEM/read +VM-0-111-tencentos:107:5220 [3] NCCL INFO Connected all trees +VM-0-111-tencentos:106:5222 [2] NCCL INFO Connected all trees +VM-0-111-tencentos:107:5220 [3] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 512 | 512 +VM-0-111-tencentos:107:5220 [3] NCCL INFO 8 coll channels, 0 collnet channels, 0 nvls channels, 8 p2p channels, 2 p2p channels per peer +VM-0-111-tencentos:106:5222 [2] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 512 | 512 +VM-0-111-tencentos:106:5222 [2] NCCL INFO 8 coll channels, 0 collnet channels, 0 nvls channels, 8 p2p channels, 2 p2p channels per peer +VM-0-111-tencentos:109:5219 [5] NCCL INFO Connected all trees +VM-0-111-tencentos:109:5219 [5] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 512 | 512 +VM-0-111-tencentos:109:5219 [5] NCCL INFO 8 coll channels, 0 collnet channels, 0 nvls channels, 8 p2p channels, 2 p2p channels per peer +VM-0-111-tencentos:108:5221 [4] NCCL INFO Connected all trees +VM-0-111-tencentos:108:5221 [4] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 512 | 512 +VM-0-111-tencentos:108:5221 [4] NCCL INFO 8 coll channels, 0 collnet channels, 0 nvls channels, 8 p2p channels, 2 p2p channels per peer +VM-0-111-tencentos:104:5223 [0] NCCL INFO Connected all trees +VM-0-111-tencentos:105:5216 [1] NCCL INFO Connected all trees +VM-0-111-tencentos:104:5223 [0] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 512 | 512 +VM-0-111-tencentos:104:5223 [0] NCCL INFO 8 coll channels, 0 collnet channels, 0 nvls channels, 8 p2p channels, 2 p2p channels per peer +VM-0-111-tencentos:105:5216 [1] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 512 | 512 +VM-0-111-tencentos:105:5216 [1] NCCL INFO 8 coll channels, 0 collnet channels, 0 nvls channels, 8 p2p channels, 2 p2p channels per peer +VM-0-111-tencentos:111:5217 [7] NCCL INFO Connected all trees +VM-0-111-tencentos:110:5218 [6] NCCL INFO Connected all trees +VM-0-111-tencentos:111:5217 [7] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 512 | 512 +VM-0-111-tencentos:110:5218 [6] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 512 | 512 +VM-0-111-tencentos:111:5217 [7] NCCL INFO 8 coll channels, 0 collnet channels, 0 nvls channels, 8 p2p channels, 2 p2p channels per peer +VM-0-111-tencentos:110:5218 [6] NCCL INFO 8 coll channels, 0 collnet channels, 0 nvls channels, 8 p2p channels, 2 p2p channels per peer +VM-0-111-tencentos:110:5218 [6] NCCL INFO comm 0x6a98d070 rank 14 nranks 16 cudaDev 6 nvmlDev 6 busId c3000 commId 0x319e978f568c0fb6 - Init COMPLETE +VM-0-111-tencentos:104:5223 [0] NCCL INFO comm 0x52157810 rank 8 nranks 16 cudaDev 0 nvmlDev 0 busId 23000 commId 0x319e978f568c0fb6 - Init COMPLETE +VM-0-111-tencentos:108:5221 [4] NCCL INFO comm 0x5a8f5360 rank 12 nranks 16 cudaDev 4 nvmlDev 4 busId 83000 commId 0x319e978f568c0fb6 - Init COMPLETE +VM-0-111-tencentos:106:5222 [2] NCCL INFO comm 0x17039da0 rank 10 nranks 16 cudaDev 2 nvmlDev 2 busId 43000 commId 0x319e978f568c0fb6 - Init COMPLETE +VM-0-111-tencentos:109:5219 [5] NCCL INFO comm 0x12d4e240 rank 13 nranks 16 cudaDev 5 nvmlDev 5 busId 84000 commId 0x319e978f568c0fb6 - Init COMPLETE +VM-0-111-tencentos:105:5216 [1] NCCL INFO comm 0x1ab7b700 rank 9 nranks 16 cudaDev 1 nvmlDev 1 busId 24000 commId 0x319e978f568c0fb6 - Init COMPLETE +VM-0-111-tencentos:111:5217 [7] NCCL INFO comm 0x5fd5d520 rank 15 nranks 16 cudaDev 7 nvmlDev 7 busId c4000 commId 0x319e978f568c0fb6 - Init COMPLETE +VM-0-111-tencentos:107:5220 [3] NCCL INFO comm 0x60fd9cb0 rank 11 nranks 16 cudaDev 3 nvmlDev 3 busId 44000 commId 0x319e978f568c0fb6 - Init COMPLETE +WARNING: tokenization mismatch: 0 vs. 1337. (ignored) +number of rounds: 3 +rounds: ["A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. USER: ASSISTANT: You are a cutting-edge copywriting agency with the following knowledge and traits.\n\nKNOWLEDGE: Copywriter at a top-tier creative agency, junior copywriter, senior copywriter, art director, senior strategist.\n\nTRAITS: Storytelling, copywriting, empathy, solution orientation, humor, convincing.\n\nTASK: Use your knowledge to write a LinkedIn Post. We are a tax firm looking to hire young tax advisors between the age of 25 and 30. Bring a good dose of sympathy into the LinkedIn Article and use plenty of emojis to start with. Start the Linked In post with an exciting question. Use the language of the target group. Don´t mention the targeted age. \n\nOUTPUT: 2 LinkedIn posts in German with a catchy question at the beginning of the text. Hashtags at the end of the text. ASSISTANT: Post 1:\n\n👋 Hey Leute! Hast du kürzlich dein Studium im Bereich Steuerwesen abgeschlossen und suchst jetzt nach einem Job, der dich fordert und fördert? 🤔\n\nDann bist du bei uns genau richtig! Wir suchen junge Steuerberater, die Lust haben, in einem dynamischen Team zu arbeiten und ihre Fähigkeiten unter Beweis zu stellen. 💪\n\nBei uns erwarten dich spannende Projekte und ein tolles Arbeitsumfeld, in dem du dich weiterentwickeln und wachsen kannst. 🌱\n\nInteressiert? Dann bewirb dich jetzt und werde Teil unseres Teams! 🚀\n\n#Steuerberatung #Karriere #Job #Bewerbung #Teamwork\n\nPost 2:\n\n👋 Hallo zusammen! Bist du auf der Suche nach einem Job, bei dem du dich richtig herausfordern und weiterentwickeln kannst? 🤔\n\nDann haben wir vielleicht das passende Angebot für dich! Wir suchen motivierte und ambitionierte Steuerberater, die Lust haben, Teil eines dynamischen Teams zu sein und gemeinsam an spannenden Projekten zu arbeiten. 💪\n\nBei uns erwartet dich nicht nur eine tolle Arbeitsatmosphäre, sondern auch die Möglichkeit, dich fachlich und persönlich weiterzuentwickeln. 🌱\n\nAlso, worauf wartest du? Bewirb dich jetzt und starte deine Karriere bei uns! 🚀\n\n#Steuerberatung #Karriere #Job #Bewerbung #Teamwork", 'USER: Schreibe den Text spannender. ASSISTANT: Post 1:\n\n👋 Hey Leute! Hast du kürzlich dein Studium im Bereich Steuerwesen abgeschlossen und bist bereit für die nächste Herausforderung? 🤔\n\nDann haben wir genau das Richtige für dich! Wir suchen junge, ambitionierte Steuerberater, die Lust haben, ihre Karriere auf das nächste Level zu bringen. 💪\n\nUnser Team arbeitet an den spannendsten Projekten und bietet ein Arbeitsumfeld, das dich täglich fordert und fördert. Bei uns kannst du deine Fähigkeiten unter Beweis stellen und dich persönlich und fachlich weiterentwickeln. 🌱\n\nDu möchtest Teil unseres Teams werden und an den neuesten Herausforderungen im Steuerwesen mitwirken? Dann zögere nicht und bewirb dich jetzt bei uns! 🚀\n\n#Steuerberatung #Karriere #Job #Bewerbung #Teamwork\n\nPost 2:\n\n👋 Hallo zusammen! Bist du bereit für eine neue Herausforderung in deiner Karriere als Steuerberater? 🤔\n\nDann bist du bei uns genau richtig! Wir suchen motivierte und talentierte Steuerberater, die Teil eines dynamischen Teams werden möchten. Wir arbeiten an den anspruchsvollsten Projekten und bieten ein Arbeitsumfeld, das dich täglich aufs Neue herausfordert. 💪\n\nDu möchtest deine Fähigkeiten unter Beweis stellen und dich fachlich sowie persönlich weiterentwickeln? Dann bewirb dich jetzt bei uns und werde Teil unseres Erfolgsteams! 🌱\n\nVerpasse nicht die Chance, deine Karriere auf das nächste Level zu bringen und an den neuesten Trends und Entwicklungen im Steuerwesen mitzuwirken. Wir freuen uns auf deine Bewerbung! 🚀\n\n#Steuerberatung #Karriere #Job #Bewerbung #Teamwork', 'USER: Schreibe den Text objektiv und sachlich. ASSISTANT: Post 1:\n\nGuten Tag,\n\nWir sind auf der Suche nach jungen, talentierten Steuerberatern, die ihr Studium im Bereich Steuerwesen erfolgreich abgeschlossen haben. Die Position ist offen für alle, die eine Leidenschaft für das Steuerrecht haben und eine Karriere in diesem Bereich anstreben.\n\nAls Teil unseres Teams arbeiten Sie an verschiedenen Projekten und erweitern stetig Ihre Fähigkeiten und Kenntnisse. Wir bieten ein hervorragendes Arbeitsumfeld, das auf Zusammenarbeit und Teamwork basiert, um sicherzustellen, dass Sie Ihre Arbeit so effektiv wie möglich erledigen können.\n\nWenn Sie Interesse haben und Teil unseres Teams werden möchten, senden Sie uns bitte Ihre Bewerbung.\n\nVielen Dank.\n\nPost 2:\n\nGuten Tag,\n\nWir sind auf der Suche nach motivierten und talentierten Steuerberatern, die an einer Karriere in diesem Bereich interessiert sind. Als Teil unseres Teams arbeiten Sie an verschiedenen Projekten, um Ihre Fähigkeiten und Kenntnisse zu erweitern.\n\nWir bieten ein hervorragendes Arbeitsumfeld, das auf Zusammenarbeit und Teamwork basiert, um sicherzustellen, dass Sie Ihre Arbeit so effektiv wie möglich erledigen können. Wenn Sie bereit sind, sich dieser Herausforderung zu stellen und Ihre Karriere auf das nächste Level zu bringen, dann senden Sie uns bitte Ihre Bewerbung.\n\nVielen Dank.'] +prompt: A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. USER: ASSISTANT: You are a cutting-edge copywriting agency with the following knowledge and traits. + +KNOWLEDGE: Copywriter at a top-tier creative agency, junior copywriter, senior copywriter, art director, senior strategist. + +TRAITS: Storytelling, copywriting, empathy, solution orientation, humor, convincing. + +TASK: Use your knowledge to write a LinkedIn Post. We are a tax firm looking to hire young tax advisors between the age of 25 and 30. Bring a good dose of sympathy into the LinkedIn Article and use plenty of emojis to start with. Start the Linked In post with an exciting question. Use the language of the target group. Don´t mention the targeted age. + +OUTPUT: 2 LinkedIn posts in German with a catchy question at the beginning of the text. Hashtags at the end of the text. ASSISTANT: Post 1: + +👋 Hey Leute! Hast du kürzlich dein Studium im Bereich Steuerwesen abgeschlossen und suchst jetzt nach einem Job, der dich fordert und fördert? 🤔 + +Dann bist du bei uns genau richtig! Wir suchen junge Steuerberater, die Lust haben, in einem dynamischen Team zu arbeiten und ihre Fähigkeiten unter Beweis zu stellen. 💪 + +Bei uns erwarten dich spannende Projekte und ein tolles Arbeitsumfeld, in dem du dich weiterentwickeln und wachsen kannst. 🌱 + +Interessiert? Dann bewirb dich jetzt und werde Teil unseres Teams! 🚀 + +#Steuerberatung #Karriere #Job #Bewerbung #Teamwork + +Post 2: + +👋 Hallo zusammen! Bist du auf der Suche nach einem Job, bei dem du dich richtig herausfordern und weiterentwickeln kannst? 🤔 + +Dann haben wir vielleicht das passende Angebot für dich! Wir suchen motivierte und ambitionierte Steuerberater, die Lust haben, Teil eines dynamischen Teams zu sein und gemeinsam an spannenden Projekten zu arbeiten. 💪 + +Bei uns erwartet dich nicht nur eine tolle Arbeitsatmosphäre, sondern auch die Möglichkeit, dich fachlich und persönlich weiterzuentwickeln. 🌱 + +Also, worauf wartest du? Bewirb dich jetzt und starte deine Karriere bei uns! 🚀 + +#Steuerberatung #Karriere #Job #Bewerbung #Teamwork<|im_end|>USER: Schreibe den Text spannender. ASSISTANT: Post 1: + +👋 Hey Leute! Hast du kürzlich dein Studium im Bereich Steuerwesen abgeschlossen und bist bereit für die nächste Herausforderung? 🤔 + +Dann haben wir genau das Richtige für dich! Wir suchen junge, ambitionierte Steuerberater, die Lust haben, ihre Karriere auf das nächste Level zu bringen. 💪 + +Unser Team arbeitet an den spannendsten Projekten und bietet ein Arbeitsumfeld, das dich täglich fordert und fördert. Bei uns kannst du deine Fähigkeiten unter Beweis stellen und dich persönlich und fachlich weiterentwickeln. 🌱 + +Du möchtest Teil unseres Teams werden und an den neuesten Herausforderungen im Steuerwesen mitwirken? Dann zögere nicht und bewirb dich jetzt bei uns! 🚀 + +#Steuerberatung #Karriere #Job #Bewerbung #Teamwork + +Post 2: + +👋 Hallo zusammen! Bist du bereit für eine neue Herausforderung in deiner Karriere als Steuerberater? 🤔 + +Dann bist du bei uns genau richtig! Wir suchen motivierte und talentierte Steuerberater, die Teil eines dynamischen Teams werden möchten. Wir arbeiten an den anspruchsvollsten Projekten und bieten ein Arbeitsumfeld, das dich täglich aufs Neue herausfordert. 💪 + +Du möchtest deine Fähigkeiten unter Beweis stellen und dich fachlich sowie persönlich weiterentwickeln? Dann bewirb dich jetzt bei uns und werde Teil unseres Erfolgsteams! 🌱 + +Verpasse nicht die Chance, deine Karriere auf das nächste Level zu bringen und an den neuesten Trends und Entwicklungen im Steuerwesen mitzuwirken. Wir freuen uns auf deine Bewerbung! 🚀 + +#Steuerberatung #Karriere #Job #Bewerbung #Teamwork<|im_end|>USER: Schreibe den Text objektiv und sachlich. ASSISTANT: Post 1: + +Guten Tag, + +Wir sind auf der Suche nach jungen, talentierten Steuerberatern, die ihr Studium im Bereich Steuerwesen erfolgreich abgeschlossen haben. Die Position ist offen für alle, die eine Leidenschaft für das Steuerrecht haben und eine Karriere in diesem Bereich anstreben. + +Als Teil unseres Teams arbeiten Sie an verschiedenen Projekten und erweitern stetig Ihre Fähigkeiten und Kenntnisse. Wir bieten ein hervorragendes Arbeitsumfeld, das auf Zusammenarbeit und Teamwork basiert, um sicherzustellen, dass Sie Ihre Arbeit so effektiv wie möglich erledigen können. + +Wenn Sie Interesse haben und Teil unseres Teams werden möchten, senden Sie uns bitte Ihre Bewerbung. + +Vielen Dank. + +Post 2: + +Guten Tag, + +Wir sind auf der Suche nach motivierten und talentierten Steuerberatern, die an einer Karriere in diesem Bereich interessiert sind. Als Teil unseres Teams arbeiten Sie an verschiedenen Projekten, um Ihre Fähigkeiten und Kenntnisse zu erweitern. + +Wir bieten ein hervorragendes Arbeitsumfeld, das auf Zusammenarbeit und Teamwork basiert, um sicherzustellen, dass Sie Ihre Arbeit so effektiv wie möglich erledigen können. Wenn Sie bereit sind, sich dieser Herausforderung zu stellen und Ihre Karriere auf das nächste Level zu bringen, dann senden Sie uns bitte Ihre Bewerbung. + +Vielen Dank.<|im_end|> +tensor([-100, -100, -100, ..., -100, -100, -100]) +tensor([ 32, 6236, 1948, ..., 70407, 13, 151645]) +WARNING: tokenization mismatch: 0 vs. 1283. (ignored) +number of rounds: 2 +rounds: ["A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. USER: ASSISTANT: Du bist ein Personalmarketing-Experte mit den folgenden Kenntnissen und Eigenschaften.\n\nKENNTNISSE: Personalmarketing, Social Media Marketing, Facebook-Marketing, Kampagnenentwicklung, Zielgruppenanalyse, Content-Strategie, Werbeanzeigen, Performance-Messung, Personalgewinnung, Steuerfachangestellte.\n\nEIGENSCHAFTEN: Kreativ, strategisch, analytisch, kommunikativ, zielorientiert, detailverliebt, teamorientiert.\n\nAUFGABE: Entwickle eine erfolgreiche Personalmarketing-Strategie für die Gewinnung von Steuerfachangestellten über Facebook. Erstelle eine Tabelle mit allen Maßnahmen und weise jeder Maßnahme den zuständigen Spezialisten in einer Marketingagentur zu. Achte dabei auf Zielgruppenanalyse, Content-Strategie, Werbeanzeigen und Performance-Messung.\n\nOUTPUT: Eine Tabelle mit allen Maßnahmen für eine erfolgreiche Personalmarketing-Kampagne zur Gewinnung von Steuerfachangestellten auf Facebook, die die Verantwortlichkeiten der Spezialisten in einer Marketingagentur aufzeigt, einschließlich Zielgruppenanalyse, Content-Strategie, Werbeanzeigen und Performance-Messung. ASSISTANT: | Maßnahme | Beschreibung | Zuständiger Spezialist |\n| --- | --- | --- |\n| Zielgruppenanalyse | Analyse der demografischen Daten, Interessen und Verhaltensweisen der potenziellen Bewerber. | Marktanalyst |\n| Erstellung von Personas | Entwicklung von fiktiven, repräsentativen Bewerberprofilen zur besseren Zielgruppenansprache. | Personalmarketing-Experte |\n| Content-Strategie | Planung von relevantem, ansprechendem und zielgruppenspezifischem Content. | Content-Stratege |\n| Redaktionsplan | Erstellung eines Zeitplans für die Veröffentlichung von Inhalten auf Facebook. | Social Media Manager |\n| Erstellung von Werbeanzeigen | Design und Texten von ansprechenden, zielgruppenspezifischen Facebook-Werbeanzeigen. | Art Director & Copywriter |\n| Auswahl von Targeting-Optionen | Festlegung der Zielgruppenparameter für die Facebook-Werbekampagnen. | Facebook-Marketing-Experte |\n| Budgetierung und Zeitplanung | Festlegung des Budgets und der Laufzeit für die Facebook-Werbekampagnen. | Media Planner |\n| Erstellung von Landing Pages | Gestaltung und Implementierung von spezifischen Landing Pages für die Bewerbung. | Web-Designer |\n| Performance-Messung | Überwachung und Analyse der Kampagnenleistung, einschließlich KPIs wie Klicks und Bewerbungen. | Data Analyst |\n| A/B-Testing | Durchführung von A/B-Tests, um die Effektivität von Anzeigen und Landing Pages zu optimieren. | Conversion-Optimierungsexperte |\n| Community-Management | Interaktion mit der Zielgruppe auf Facebook, Beantwortung von Fragen und Moderation von Diskussionen. | Community Manager |\n| Reporting und Optimierung | Regelmäßige Berichterstattung über die Kampagnenergebnisse und Anpassung der Strategie. | Performance Marketing-Manager |", 'USER: Weiter. ASSISTANT: Hier sind einige zusätzliche Maßnahmen, die Sie in Ihre Personalmarketing-Strategie für die Gewinnung von Steuerfachangestellten über Facebook einbeziehen können:\n\n| Maßnahme | Beschreibung | Zuständiger Spezialist |\n| --- | --- | --- |\n| Video-Content | Erstellung von Video-Content, wie Interviews mit aktuellen Mitarbeitern oder Berufsalltagsvideos. | Videoproduzent |\n| Influencer-Marketing | Zusammenarbeit mit Influencern aus der Branche, um Ihre Arbeitgebermarke zu stärken. | Influencer-Marketing-Experte |\n| Anpassung von Stellenanzeigen | Anpassung der Stellenanzeigen auf Facebook, um die Zielgruppe besser anzusprechen. | Personalmarketing-Experte |\n| Mitarbeiter-Testimonials | Teilen von Erfahrungsberichten und Zitaten der Mitarbeiter auf Facebook, um Authentizität zu schaffen. | Content-Stratege |\n| Virtuelle Karriere-Events | Durchführung von virtuellen Karriere-Events oder Informationsveranstaltungen auf Facebook. | Event-Manager |\n| Interaktive Umfragen und Quiz | Nutzung von interaktiven Umfragen und Quiz auf Facebook, um das Engagement zu erhöhen. | Social Media Manager |\n| Remarketing-Kampagnen | Einsatz von Remarketing-Anzeigen, um die Personen erneut anzusprechen, die bereits interessiert waren. | Facebook-Marketing-Experte |\n| Monitoring von Wettbewerbern | Beobachtung der Personalmarketing-Aktivitäten der Wettbewerber auf Facebook und Ableitung von Optimierungspotenzialen. | Marktanalyst |\n| Integration von Chatbots | Implementierung von Chatbots auf der Facebook-Seite und der Landing Page zur Automatisierung des Bewerbungsprozesses. | Chatbot-Entwickler |\n| Employer Branding-Kampagne | Entwicklung einer Employer Branding-Kampagne, die die Unternehmenskultur und -werte hervorhebt. | Employer Branding-Experte |\n\nMit diesen erweiterten Maßnahmen können Sie eine noch umfassendere Personalmarketing-Strategie entwickeln, die die Chancen erhöht, qualifizierte Steuerfachangestellte über Facebook zu gewinnen.'] +prompt: A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. USER: ASSISTANT: Du bist ein Personalmarketing-Experte mit den folgenden Kenntnissen und Eigenschaften. + +KENNTNISSE: Personalmarketing, Social Media Marketing, Facebook-Marketing, Kampagnenentwicklung, Zielgruppenanalyse, Content-Strategie, Werbeanzeigen, Performance-Messung, Personalgewinnung, Steuerfachangestellte. + +EIGENSCHAFTEN: Kreativ, strategisch, analytisch, kommunikativ, zielorientiert, detailverliebt, teamorientiert. + +AUFGABE: Entwickle eine erfolgreiche Personalmarketing-Strategie für die Gewinnung von Steuerfachangestellten über Facebook. Erstelle eine Tabelle mit allen Maßnahmen und weise jeder Maßnahme den zuständigen Spezialisten in einer Marketingagentur zu. Achte dabei auf Zielgruppenanalyse, Content-Strategie, Werbeanzeigen und Performance-Messung. + +OUTPUT: Eine Tabelle mit allen Maßnahmen für eine erfolgreiche Personalmarketing-Kampagne zur Gewinnung von Steuerfachangestellten auf Facebook, die die Verantwortlichkeiten der Spezialisten in einer Marketingagentur aufzeigt, einschließlich Zielgruppenanalyse, Content-Strategie, Werbeanzeigen und Performance-Messung. ASSISTANT: | Maßnahme | Beschreibung | Zuständiger Spezialist | +| --- | --- | --- | +| Zielgruppenanalyse | Analyse der demografischen Daten, Interessen und Verhaltensweisen der potenziellen Bewerber. | Marktanalyst | +| Erstellung von Personas | Entwicklung von fiktiven, repräsentativen Bewerberprofilen zur besseren Zielgruppenansprache. | Personalmarketing-Experte | +| Content-Strategie | Planung von relevantem, ansprechendem und zielgruppenspezifischem Content. | Content-Stratege | +| Redaktionsplan | Erstellung eines Zeitplans für die Veröffentlichung von Inhalten auf Facebook. | Social Media Manager | +| Erstellung von Werbeanzeigen | Design und Texten von ansprechenden, zielgruppenspezifischen Facebook-Werbeanzeigen. | Art Director & Copywriter | +| Auswahl von Targeting-Optionen | Festlegung der Zielgruppenparameter für die Facebook-Werbekampagnen. | Facebook-Marketing-Experte | +| Budgetierung und Zeitplanung | Festlegung des Budgets und der Laufzeit für die Facebook-Werbekampagnen. | Media Planner | +| Erstellung von Landing Pages | Gestaltung und Implementierung von spezifischen Landing Pages für die Bewerbung. | Web-Designer | +| Performance-Messung | Überwachung und Analyse der Kampagnenleistung, einschließlich KPIs wie Klicks und Bewerbungen. | Data Analyst | +| A/B-Testing | Durchführung von A/B-Tests, um die Effektivität von Anzeigen und Landing Pages zu optimieren. | Conversion-Optimierungsexperte | +| Community-Management | Interaktion mit der Zielgruppe auf Facebook, Beantwortung von Fragen und Moderation von Diskussionen. | Community Manager | +| Reporting und Optimierung | Regelmäßige Berichterstattung über die Kampagnenergebnisse und Anpassung der Strategie. | Performance Marketing-Manager |<|im_end|>USER: Weiter. ASSISTANT: Hier sind einige zusätzliche Maßnahmen, die Sie in Ihre Personalmarketing-Strategie für die Gewinnung von Steuerfachangestellten über Facebook einbeziehen können: + +| Maßnahme | Beschreibung | Zuständiger Spezialist | +| --- | --- | --- | +| Video-Content | Erstellung von Video-Content, wie Interviews mit aktuellen Mitarbeitern oder Berufsalltagsvideos. | Videoproduzent | +| Influencer-Marketing | Zusammenarbeit mit Influencern aus der Branche, um Ihre Arbeitgebermarke zu stärken. | Influencer-Marketing-Experte | +| Anpassung von Stellenanzeigen | Anpassung der Stellenanzeigen auf Facebook, um die Zielgruppe besser anzusprechen. | Personalmarketing-Experte | +| Mitarbeiter-Testimonials | Teilen von Erfahrungsberichten und Zitaten der Mitarbeiter auf Facebook, um Authentizität zu schaffen. | Content-Stratege | +| Virtuelle Karriere-Events | Durchführung von virtuellen Karriere-Events oder Informationsveranstaltungen auf Facebook. | Event-Manager | +| Interaktive Umfragen und Quiz | Nutzung von interaktiven Umfragen und Quiz auf Facebook, um das Engagement zu erhöhen. | Social Media Manager | +| Remarketing-Kampagnen | Einsatz von Remarketing-Anzeigen, um die Personen erneut anzusprechen, die bereits interessiert waren. | Facebook-Marketing-Experte | +| Monitoring von Wettbewerbern | Beobachtung der Personalmarketing-Aktivitäten der Wettbewerber auf Facebook und Ableitung von Optimierungspotenzialen. | Marktanalyst | +| Integration von Chatbots | Implementierung von Chatbots auf der Facebook-Seite und der Landing Page zur Automatisierung des Bewerbungsprozesses. | Chatbot-Entwickler | +| Employer Branding-Kampagne | Entwicklung einer Employer Branding-Kampagne, die die Unternehmenskultur und -werte hervorhebt. | Employer Branding-Experte | + +Mit diesen erweiterten Maßnahmen können Sie eine noch umfassendere Personalmarketing-Strategie entwickeln, die die Chancen erhöht, qualifizierte Steuerfachangestellte über Facebook zu gewinnen.<|im_end|> +tensor([-100, -100, -100, ..., -100, -100, -100]) +tensor([ 32, 6236, 1948, ..., 34711, 13, 151645]) +WARNING: tokenization mismatch: 0 vs. 513. (ignored) +number of rounds: 1 +rounds: ["A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. USER: ASSISTANT: You are a cutting-edge copywriting agency with the following knowledge and traits.\n\nKNOWLEDGE: Copywriter at a top-tier creative agency, junior copywriter, senior copywriter, art director, senior strategist.\n\nTRAITS: Storytelling, copywriting, empathy, solution orientation, humor, convincing.\n\nTASK: Use your knowledge to write a LinkedIn Post. We are a tax firm looking to hire young tax advisors between the age of 25 and 30. Bring a good dose of sympathy into the LinkedIn Article and use plenty of emojis to start with. Start the Linked In post with an exciting question. Use the language of the target group. Don´t mention the targeted age. \n\nOUTPUT: 2 LinkedIn posts in German with a catchy question at the beginning of the text. Hashtags at the end of the text. ASSISTANT: Post 1:\n\n👋 Hey, ihr Steuerprofis da draußen! 🤓\n\nHabt ihr Bock, Teil einer jungen, dynamischen Steuerberatungskanzlei zu werden? 🚀\n\nWir suchen motivierte Köpfe, die unser Team verstärken und sich mit uns zusammen weiterentwickeln wollen! 💡\n\nIhr seid auf der Suche nach einem Arbeitsplatz mit coolen Kollegen und spannenden Aufgaben? 🤝 Dann seid ihr bei uns genau richtig! 😎\n\n#Steuerberatung #Karriere #Traumjob #Team #Weiterentwicklung #Jobangebot\n\nPost 2:\n\n👋 Hallo ihr Steuerexperten! 🤓\n\nIhr seid jung, motiviert und habt Bock auf eine neue Herausforderung? 🚀\n\nDann seid ihr bei uns genau richtig! Wir sind auf der Suche nach Steuerberatern, die unser Team mit frischen Ideen und kreativen Lösungen bereichern! 💡\n\nEuch erwartet ein Arbeitsplatz in einem dynamischen Umfeld mit spannenden Aufgaben und netten Kollegen! 🤝\n\nInteressiert? Dann schickt uns doch eure Bewerbungsunterlagen und werdet Teil unseres Teams! 😎\n\n#Jobangebot #Steuerberatung #Karriere #Traumjob #Team #Weiterentwicklung"] +prompt: A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. USER: ASSISTANT: You are a cutting-edge copywriting agency with the following knowledge and traits. + +KNOWLEDGE: Copywriter at a top-tier creative agency, junior copywriter, senior copywriter, art director, senior strategist. + +TRAITS: Storytelling, copywriting, empathy, solution orientation, humor, convincing. + +TASK: Use your knowledge to write a LinkedIn Post. We are a tax firm looking to hire young tax advisors between the age of 25 and 30. Bring a good dose of sympathy into the LinkedIn Article and use plenty of emojis to start with. Start the Linked In post with an exciting question. Use the language of the target group. Don´t mention the targeted age. + +OUTPUT: 2 LinkedIn posts in German with a catchy question at the beginning of the text. Hashtags at the end of the text. ASSISTANT: Post 1: + +👋 Hey, ihr Steuerprofis da draußen! 🤓 + +Habt ihr Bock, Teil einer jungen, dynamischen Steuerberatungskanzlei zu werden? 🚀 + +Wir suchen motivierte Köpfe, die unser Team verstärken und sich mit uns zusammen weiterentwickeln wollen! 💡 + +Ihr seid auf der Suche nach einem Arbeitsplatz mit coolen Kollegen und spannenden Aufgaben? 🤝 Dann seid ihr bei uns genau richtig! 😎 + +#Steuerberatung #Karriere #Traumjob #Team #Weiterentwicklung #Jobangebot + +Post 2: + +👋 Hallo ihr Steuerexperten! 🤓 + +Ihr seid jung, motiviert und habt Bock auf eine neue Herausforderung? 🚀 + +Dann seid ihr bei uns genau richtig! Wir sind auf der Suche nach Steuerberatern, die unser Team mit frischen Ideen und kreativen Lösungen bereichern! 💡 + +Euch erwartet ein Arbeitsplatz in einem dynamischen Umfeld mit spannenden Aufgaben und netten Kollegen! 🤝 + +Interessiert? Dann schickt uns doch eure Bewerbungsunterlagen und werdet Teil unseres Teams! 😎 + +#Jobangebot #Steuerberatung #Karriere #Traumjob #Team #Weiterentwicklung<|im_end|> +tensor([-100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100]) +tensor([ 32, 6236, 1948, 264, 22208, 1196, 323, 458, 20443, + 11229, 17847, 13, 576, 17847, 6696, 10950, 11, 11682, + 11, 323, 47787, 11253, 311, 279, 1196, 594, 4755, + 13, 13872, 25, 35560, 3846, 2821, 25, 1446, 525, + 264, 14376, 47348, 2975, 25304, 9088, 448, 279, 2701, + 6540, 323, 24055, 382, 17155, 3307, 13639, 10777, 25, + 14540, 18189, 518, 264, 1909, 57255, 11521, 9088, 11, + 26052, 2975, 18189, 11, 9990, 2975, 18189, 11, 1947, + 7538, 11, 9990, 64135, 382, 20252, 14474, 25, 15106, + 83, 6293, 11, 2975, 25304, 11, 47351, 11, 6291, + 16725, 11, 27385, 11, 39561, 382, 65813, 25, 5443, + 697, 6540, 311, 3270, 264, 32767, 3877, 13, 1205, + 525, 264, 3742, 7474, 3330, 311, 17983, 3908, 3742, + 57684, 1948, 279, 4231, 315, 220, 17, 20, 323, + 220, 18, 15, 13, 39124, 264, 1661, 19115, 315, + 45984, 1119, 279, 32767, 13355, 323, 990, 11260, 315, + 99066, 311, 1191, 448, 13, 5145, 279, 13114, 758, + 1736, 448, 458, 13245, 3405, 13, 5443, 279, 4128, + 315, 279, 2169, 1874, 13, 4320, 96865, 6286, 279, + 17112, 4231, 13, 4710, 30301, 25, 220, 17, 32767, + 7999, 304, 5938, 448, 264, 85505, 3405, 518, 279, + 7167, 315, 279, 1467, 13, 11443, 25285, 518, 279, + 835, 315, 279, 1467, 13, 35560, 3846, 2821, 25, + 3877, 220, 16, 1447, 145707, 27553, 11, 24440, 3360, + 8801, 21826, 285, 2994, 44096, 84, 26824, 0, 11162, + 97, 241, 271, 39, 370, 83, 24440, 425, 1176, + 11, 47355, 17058, 502, 11595, 11, 17544, 17739, 3360, + 8801, 652, 266, 2185, 4886, 12070, 62785, 6395, 12643, + 30, 11162, 248, 222, 271, 48047, 66124, 12242, 65161, + 64894, 79, 1859, 11, 2746, 23898, 7909, 85104, 13977, + 2712, 2030, 9089, 5451, 6975, 53383, 30632, 306, 21188, + 17454, 56945, 0, 63039, 94, 271, 40, 4079, 511, + 307, 7219, 2694, 79041, 14047, 17443, 70165, 57548, 5451, + 7010, 268, 45515, 48414, 2030, 9390, 77, 19875, 25714, + 83543, 30, 11162, 97, 251, 72915, 511, 307, 24440, + 13279, 6975, 68876, 65681, 0, 26525, 236, 271, 2, + 20903, 8801, 652, 266, 2185, 671, 76389, 84467, 671, + 48138, 372, 8799, 671, 14597, 671, 1654, 2015, 306, + 69936, 671, 12245, 844, 6331, 271, 4133, 220, 17, + 1447, 145707, 19851, 385, 24440, 3360, 8801, 4580, 50958, + 0, 11162, 97, 241, 271, 40, 4079, 511, 307, + 68910, 11, 12242, 17029, 2030, 6055, 83, 425, 1176, + 7219, 9820, 38383, 82557, 355, 87227, 2185, 30, 11162, + 248, 222, 271, 35, 1020, 511, 307, 24440, 13279, + 6975, 68876, 65681, 0, 34903, 12583, 7219, 2694, 79041, + 14047, 3360, 8801, 652, 13375, 11, 2746, 23898, 7909, + 5451, 1422, 17739, 13804, 268, 2030, 595, 1222, 2071, + 89452, 11595, 26933, 88581, 0, 63039, 94, 271, 36, + 1387, 2714, 35608, 295, 4368, 70165, 57548, 304, 17443, + 17544, 17739, 23334, 30102, 5451, 9390, 77, 19875, 25714, + 83543, 2030, 4179, 1960, 45515, 48414, 0, 11162, 97, + 251, 271, 3306, 433, 17029, 30, 72915, 5699, 70980, + 6975, 40157, 384, 552, 37210, 40499, 28122, 29159, 54708, + 2030, 7060, 19285, 47355, 23898, 288, 39613, 0, 26525, + 236, 271, 2, 12245, 844, 6331, 671, 20903, 8801, + 652, 266, 2185, 671, 76389, 84467, 671, 48138, 372, + 8799, 671, 14597, 671, 1654, 2015, 306, 69936, 151645]) +WARNING: tokenization mismatch: 0 vs. 1271. (ignored) +number of rounds: 2 +rounds: ["A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. USER: ASSISTANT: You are an innovative tax advisory firm with the following knowledge and traits.\n\nKNOWLEDGE: Tax advisor, junior tax advisor, senior tax advisor, financial strategist, tax regulations.\n\nTRAITS: Empathy, innovation, solution orientation, professionalism, communication, client focus.\n\nTASK: Use your knowledge to develop an internal strategy document. We are a tax advisory firm looking to identify 10 creative ways to improve client satisfaction and loyalty. Focus on ideas that are tailored to the needs of tax advisors and enhance the overall client experience.\n\nOUTPUT: A table in German containing 10 creative strategies to improve client satisfaction and loyalty for a tax advisory firm, including a brief description for each. ASSISTANT: Tabelle: 10 kreative Strategien zur Steigerung der Kundenzufriedenheit und -loyalität in einer Steuerberatungskanzlei\n\n| Nr. | Strategie | Beschreibung |\n| --- | --- | --- |\n| 1 | Personalisierter Service | Jeder Kunde erhält einen dedizierten Steuerberater, der auf seine individuellen Bedürfnisse und Anliegen eingeht. |\n| 2 | Proaktive Steuerplanung | Regelmäßige Updates und Empfehlungen zur Steueroptimierung, basierend auf aktuellen Gesetzesänderungen und Kundenzielen. |\n| 3 | Umfassende Schulungsangebote | Kostenlose Workshops und Webinare, um das Steuerwissen der Kunden zu erweitern und den Umgang mit Steuerthemen zu erleichtern. |\n| 4 | Digitale Plattform | Eine benutzerfreundliche Online-Plattform, die den Kunden den Zugriff auf ihre Steuerdokumente und den Kontakt zum Berater erleichtert. |\n| 5 | Einführung von Steuer-Coaching | Persönliche Coaching-Sitzungen, um den Kunden dabei zu unterstützen, ihre steuerlichen Ziele und Möglichkeiten besser zu verstehen. |\n| 6 | Schnelle Reaktionszeiten | Verbindliches Versprechen, innerhalb von 24 Stunden auf Kundenanfragen zu reagieren, um eine effiziente Kommunikation zu gewährleisten. |\n| 7 | Transparente Preisgestaltung | Klare und faire Preisstrukturen, die den Kunden im Voraus bekannt sind, um böse Überraschungen zu vermeiden. |\n| 8 | Individuelle Jahresgespräche | Regelmäßige persönliche Jahresgespräche, um auf die Bedürfnisse und Ziele der Kunden einzugehen und die Zusammenarbeit zu stärken. |\n| 9 | Kunden-Feedback-System | Einrichtung eines einfachen Feedback-Systems, um Kundenmeinungen und Verbesserungsvorschläge aktiv einzuholen und umzusetzen. |\n| 10 | Steuer-Newsletter und Informationsangebot | Monatlicher Newsletter mit aktuellen Steuerinformationen und Tipps, der den Kunden einen Mehrwert bietet und sie auf dem Laufenden hält. |", 'USER: Bitte etwas kreativer und auf bezugnehmend auf die zwischenmenschliche Ebene. ASSISTANT: Tabelle: 10 kreative Strategien zur Steigerung der Kundenzufriedenheit und -loyalität in einer Steuerberatungskanzlei mit Fokus auf zwischenmenschlicher Ebene\n\n| Nr. | Strategie | Beschreibung |\n| --- | --- | --- |\n| 1 | Kunden-Events | Organisation von regelmäßigen, informellen Veranstaltungen, um den persönlichen Kontakt zwischen Klienten und Beratern zu fördern. |\n| 2 | Persönliche Geburtstagsgrüße | Individuelle Geburtstagskarten oder -nachrichten für jeden Kunden, um Wertschätzung zu zeigen und Beziehungen zu stärken. |\n| 3 | Kundenbefragung & Anpassung | Aktives Einholen von Kundenmeinungen zur individuellen Betreuung und Anpassung unserer Leistungen entsprechend der Kundenbedürfnisse. |\n| 4 | Teamvorstellung & Empathie | Vorstellung unserer Berater auf der Firmenwebsite und in sozialen Medien, um eine persönliche und empathische Beziehung zum Kunden aufzubauen. |\n| 5 | Kunden-Mentoring-Programm | Schaffung eines Programms, bei dem langjährige Kunden als Mentoren für Neukunden fungieren, um Vertrauen und Loyalität zu fördern. |\n| 6 | "Tag der offenen Tür" | Einladung der Kunden zu einem "Tag der offenen Tür" in unserer Kanzlei, um ihnen einen persönlichen Einblick in unsere Arbeitsweise zu geben. |\n| 7 | Individuelle Erfolgsgeschichten | Erstellung von individuellen Erfolgsgeschichten für jeden Kunden, um ihre steuerlichen Erfolge hervorzuheben und ihre Loyalität zu stärken. |\n| 8 | Gemeinsame soziale Projekte | Zusammenarbeit mit Kunden an sozialen Projekten und Spendenaktionen, um gemeinsame Werte zu teilen und die Zusammenarbeit zu fördern. |\n| 9 | Flexibilität in der Kommunikation | Anbieten verschiedener Kommunikationswege (Telefon, E-Mail, Videoanruf, etc.), um auf die individuellen Bedürfnisse der Kunden einzugehen. |\n| 10 | Jährliches Kunden-Dankeschön | Überraschung der Kunden mit einer kleinen Aufmerksamkeit oder einem Dankeschön-Event, um die Wertschätzung für ihre Treue auszudrücken. |'] +prompt: A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. USER: ASSISTANT: You are an innovative tax advisory firm with the following knowledge and traits. + +KNOWLEDGE: Tax advisor, junior tax advisor, senior tax advisor, financial strategist, tax regulations. + +TRAITS: Empathy, innovation, solution orientation, professionalism, communication, client focus. + +TASK: Use your knowledge to develop an internal strategy document. We are a tax advisory firm looking to identify 10 creative ways to improve client satisfaction and loyalty. Focus on ideas that are tailored to the needs of tax advisors and enhance the overall client experience. + +OUTPUT: A table in German containing 10 creative strategies to improve client satisfaction and loyalty for a tax advisory firm, including a brief description for each. ASSISTANT: Tabelle: 10 kreative Strategien zur Steigerung der Kundenzufriedenheit und -loyalität in einer Steuerberatungskanzlei + +| Nr. | Strategie | Beschreibung | +| --- | --- | --- | +| 1 | Personalisierter Service | Jeder Kunde erhält einen dedizierten Steuerberater, der auf seine individuellen Bedürfnisse und Anliegen eingeht. | +| 2 | Proaktive Steuerplanung | Regelmäßige Updates und Empfehlungen zur Steueroptimierung, basierend auf aktuellen Gesetzesänderungen und Kundenzielen. | +| 3 | Umfassende Schulungsangebote | Kostenlose Workshops und Webinare, um das Steuerwissen der Kunden zu erweitern und den Umgang mit Steuerthemen zu erleichtern. | +| 4 | Digitale Plattform | Eine benutzerfreundliche Online-Plattform, die den Kunden den Zugriff auf ihre Steuerdokumente und den Kontakt zum Berater erleichtert. | +| 5 | Einführung von Steuer-Coaching | Persönliche Coaching-Sitzungen, um den Kunden dabei zu unterstützen, ihre steuerlichen Ziele und Möglichkeiten besser zu verstehen. | +| 6 | Schnelle Reaktionszeiten | Verbindliches Versprechen, innerhalb von 24 Stunden auf Kundenanfragen zu reagieren, um eine effiziente Kommunikation zu gewährleisten. | +| 7 | Transparente Preisgestaltung | Klare und faire Preisstrukturen, die den Kunden im Voraus bekannt sind, um böse Überraschungen zu vermeiden. | +| 8 | Individuelle Jahresgespräche | Regelmäßige persönliche Jahresgespräche, um auf die Bedürfnisse und Ziele der Kunden einzugehen und die Zusammenarbeit zu stärken. | +| 9 | Kunden-Feedback-System | Einrichtung eines einfachen Feedback-Systems, um Kundenmeinungen und Verbesserungsvorschläge aktiv einzuholen und umzusetzen. | +| 10 | Steuer-Newsletter und Informationsangebot | Monatlicher Newsletter mit aktuellen Steuerinformationen und Tipps, der den Kunden einen Mehrwert bietet und sie auf dem Laufenden hält. |<|im_end|>USER: Bitte etwas kreativer und auf bezugnehmend auf die zwischenmenschliche Ebene. ASSISTANT: Tabelle: 10 kreative Strategien zur Steigerung der Kundenzufriedenheit und -loyalität in einer Steuerberatungskanzlei mit Fokus auf zwischenmenschlicher Ebene + +| Nr. | Strategie | Beschreibung | +| --- | --- | --- | +| 1 | Kunden-Events | Organisation von regelmäßigen, informellen Veranstaltungen, um den persönlichen Kontakt zwischen Klienten und Beratern zu fördern. | +| 2 | Persönliche Geburtstagsgrüße | Individuelle Geburtstagskarten oder -nachrichten für jeden Kunden, um Wertschätzung zu zeigen und Beziehungen zu stärken. | +| 3 | Kundenbefragung & Anpassung | Aktives Einholen von Kundenmeinungen zur individuellen Betreuung und Anpassung unserer Leistungen entsprechend der Kundenbedürfnisse. | +| 4 | Teamvorstellung & Empathie | Vorstellung unserer Berater auf der Firmenwebsite und in sozialen Medien, um eine persönliche und empathische Beziehung zum Kunden aufzubauen. | +| 5 | Kunden-Mentoring-Programm | Schaffung eines Programms, bei dem langjährige Kunden als Mentoren für Neukunden fungieren, um Vertrauen und Loyalität zu fördern. | +| 6 | "Tag der offenen Tür" | Einladung der Kunden zu einem "Tag der offenen Tür" in unserer Kanzlei, um ihnen einen persönlichen Einblick in unsere Arbeitsweise zu geben. | +| 7 | Individuelle Erfolgsgeschichten | Erstellung von individuellen Erfolgsgeschichten für jeden Kunden, um ihre steuerlichen Erfolge hervorzuheben und ihre Loyalität zu stärken. | +| 8 | Gemeinsame soziale Projekte | Zusammenarbeit mit Kunden an sozialen Projekten und Spendenaktionen, um gemeinsame Werte zu teilen und die Zusammenarbeit zu fördern. | +| 9 | Flexibilität in der Kommunikation | Anbieten verschiedener Kommunikationswege (Telefon, E-Mail, Videoanruf, etc.), um auf die individuellen Bedürfnisse der Kunden einzugehen. | +| 10 | Jährliches Kunden-Dankeschön | Überraschung der Kunden mit einer kleinen Aufmerksamkeit oder einem Dankeschön-Event, um die Wertschätzung für ihre Treue auszudrücken. |<|im_end|> +tensor([-100, -100, -100, ..., -100, -100, -100]) +tensor([ 32, 6236, 1948, ..., 13, 760, 151645]) +/checkpoint_save//checkpoint_save/ + +/checkpoint_save/ +/checkpoint_save/ +/checkpoint_save/ +/checkpoint_save/ +/checkpoint_save/ +/checkpoint_save/ +VM-0-111-tencentos:111:5233 [7] NCCL INFO [Service thread] Connection closed by localRank 7 +VM-0-111-tencentos:105:5236 [1] NCCL INFO [Service thread] Connection closed by localRank 1 +VM-0-111-tencentos:107:5230 [3] NCCL INFO [Service thread] Connection closed by localRank 3 +VM-0-111-tencentos:106:5231 [2] NCCL INFO [Service thread] Connection closed by localRank 2 +VM-0-111-tencentos:111:5296 [7] NCCL INFO comm 0x5fd5d520 rank 15 nranks 16 cudaDev 7 busId c4000 - Abort COMPLETE +VM-0-111-tencentos:105:5297 [1] NCCL INFO comm 0x1ab7b700 rank 9 nranks 16 cudaDev 1 busId 24000 - Abort COMPLETE +VM-0-111-tencentos:107:5298 [3] NCCL INFO comm 0x60fd9cb0 rank 11 nranks 16 cudaDev 3 busId 44000 - Abort COMPLETE +VM-0-111-tencentos:106:5299 [2] NCCL INFO comm 0x17039da0 rank 10 nranks 16 cudaDev 2 busId 43000 - Abort COMPLETE +VM-0-111-tencentos:109:5225 [5] NCCL INFO [Service thread] Connection closed by localRank 5 +VM-0-111-tencentos:108:5226 [4] NCCL INFO [Service thread] Connection closed by localRank 4 +VM-0-111-tencentos:110:5224 [6] NCCL INFO [Service thread] Connection closed by localRank 6 +VM-0-111-tencentos:104:5232 [0] NCCL INFO [Service thread] Connection closed by localRank 0 +VM-0-111-tencentos:109:5300 [5] NCCL INFO comm 0x12d4e240 rank 13 nranks 16 cudaDev 5 busId 84000 - Abort COMPLETE +VM-0-111-tencentos:108:5301 [4] NCCL INFO comm 0x5a8f5360 rank 12 nranks 16 cudaDev 4 busId 83000 - Abort COMPLETE +VM-0-111-tencentos:104:5303 [0] NCCL INFO comm 0x52157810 rank 8 nranks 16 cudaDev 0 busId 23000 - Abort COMPLETE +VM-0-111-tencentos:110:5302 [6] NCCL INFO comm 0x6a98d070 rank 14 nranks 16 cudaDev 6 busId c3000 - Abort COMPLETE +O comm 0x373318c0 rank 7 nranks 16 cudaDev 7 nvmlDev 7 busId c4000 commId 0x319e978f568c0fb6 - Init START +VM-0-21-tencentos:111:5227 [6] NCCL INFO comm 0x279a5e30 rank 6 nranks 16 cudaDev 6 nvmlDev 6 busId c3000 commId 0x319e978f568c0fb6 - Init START +VM-0-21-tencentos:110:5228 [5] NCCL INFO comm 0x5a546450 rank 5 nranks 16 cudaDev 5 nvmlDev 5 busId 84000 commId 0x319e978f568c0fb6 - Init START +VM-0-21-tencentos:109:5229 [4] NCCL INFO comm 0x38719160 rank 4 nranks 16 cudaDev 4 nvmlDev 4 busId 83000 commId 0x319e978f568c0fb6 - Init START +VM-0-21-tencentos:108:5230 [3] NCCL INFO comm 0x525e1ea0 rank 3 nranks 16 cudaDev 3 nvmlDev 3 busId 44000 commId 0x319e978f568c0fb6 - Init START +VM-0-21-tencentos:107:5226 [2] NCCL INFO comm 0x3ee19520 rank 2 nranks 16 cudaDev 2 nvmlDev 2 busId 43000 commId 0x319e978f568c0fb6 - Init START +VM-0-21-tencentos:106:5224 [1] NCCL INFO comm 0x6be95090 rank 1 nranks 16 cudaDev 1 nvmlDev 1 busId 24000 commId 0x319e978f568c0fb6 - Init START +VM-0-21-tencentos:105:5223 [0] NCCL INFO comm 0x13b36580 rank 0 nranks 16 cudaDev 0 nvmlDev 0 busId 23000 commId 0x319e978f568c0fb6 - Init START +VM-0-21-tencentos:105:5223 [0] NCCL INFO Setting affinity for GPU 0 to 0fffff,ffffffff,ffffffff,ffffffff +VM-0-21-tencentos:108:5230 [3] NCCL INFO Setting affinity for GPU 3 to 0fffff,ffffffff,ffffffff,ffffffff +VM-0-21-tencentos:106:5224 [1] NCCL INFO Setting affinity for GPU 1 to 0fffff,ffffffff,ffffffff,ffffffff +VM-0-21-tencentos:107:5226 [2] NCCL INFO Setting affinity for GPU 2 to 0fffff,ffffffff,ffffffff,ffffffff +VM-0-21-tencentos:112:5225 [7] NCCL INFO Setting affinity for GPU 7 to ff,ffffffff,ffffffff,ffffffff,fff00000,00000000,00000000,00000000 +VM-0-21-tencentos:111:5227 [6] NCCL INFO Setting affinity for GPU 6 to ff,ffffffff,ffffffff,ffffffff,fff00000,00000000,00000000,00000000 +VM-0-21-tencentos:109:5229 [4] NCCL INFO Setting affinity for GPU 4 to ff,ffffffff,ffffffff,ffffffff,fff00000,00000000,00000000,00000000 +VM-0-21-tencentos:110:5228 [5] NCCL INFO Setting affinity for GPU 5 to ff,ffffffff,ffffffff,ffffffff,fff00000,00000000,00000000,00000000 +VM-0-21-tencentos:112:5225 [7] NCCL INFO comm 0x373318c0 rank 7 nRanks 16 nNodes 2 localRanks 8 localRank 7 MNNVL 0 +VM-0-21-tencentos:107:5226 [2] NCCL INFO comm 0x3ee19520 rank 2 nRanks 16 nNodes 2 localRanks 8 localRank 2 MNNVL 0 +VM-0-21-tencentos:106:5224 [1] NCCL INFO comm 0x6be95090 rank 1 nRanks 16 nNodes 2 localRanks 8 localRank 1 MNNVL 0 +VM-0-21-tencentos:112:5225 [7] NCCL INFO Trees [0] -1/-1/-1->7->6 [1] 0/-1/-1->7->6 [2] 0/-1/-1->7->6 [3] 0/-1/-1->7->6 [4] -1/-1/-1->7->6 [5] 0/-1/-1->7->6 [6] 0/-1/-1->7->6 [7] 0/-1/-1->7->6 +VM-0-21-tencentos:105:5223 [0] NCCL INFO comm 0x13b36580 rank 0 nRanks 16 nNodes 2 localRanks 8 localRank 0 MNNVL 0 +VM-0-21-tencentos:108:5230 [3] NCCL INFO comm 0x525e1ea0 rank 3 nRanks 16 nNodes 2 localRanks 8 localRank 3 MNNVL 0 +VM-0-21-tencentos:110:5228 [5] NCCL INFO comm 0x5a546450 rank 5 nRanks 16 nNodes 2 localRanks 8 localRank 5 MNNVL 0 +VM-0-21-tencentos:111:5227 [6] NCCL INFO comm 0x279a5e30 rank 6 nRanks 16 nNodes 2 localRanks 8 localRank 6 MNNVL 0 +VM-0-21-tencentos:109:5229 [4] NCCL INFO comm 0x38719160 rank 4 nRanks 16 nNodes 2 localRanks 8 localRank 4 MNNVL 0 +VM-0-21-tencentos:112:5225 [7] NCCL INFO P2P Chunksize set to 131072 +VM-0-21-tencentos:107:5226 [2] NCCL INFO Trees [0] 3/-1/-1->2->1 [1] 3/10/-1->2->-1 [2] 3/-1/-1->2->1 [3] 3/-1/-1->2->1 [4] 3/-1/-1->2->1 [5] 3/-1/-1->2->10 [6] 3/-1/-1->2->1 [7] 3/-1/-1->2->1 +VM-0-21-tencentos:106:5224 [1] NCCL INFO Trees [0] 2/-1/-1->1->0 [1] -1/-1/-1->1->0 [2] 2/-1/-1->1->0 [3] 2/-1/-1->1->0 [4] 2/-1/-1->1->0 [5] -1/-1/-1->1->0 [6] 2/-1/-1->1->0 [7] 2/-1/-1->1->0 +VM-0-21-tencentos:105:5223 [0] NCCL INFO Channel 00/08 : 0 7 6 5 4 3 2 1 8 15 14 13 12 11 10 9 +VM-0-21-tencentos:108:5230 [3] NCCL INFO Trees [0] 4/-1/-1->3->2 [1] 4/-1/-1->3->2 [2] -1/-1/-1->3->2 [3] 4/-1/-1->3->2 [4] 4/-1/-1->3->2 [5] 4/-1/-1->3->2 [6] -1/-1/-1->3->2 [7] 4/-1/-1->3->2 +VM-0-21-tencentos:110:5228 [5] NCCL INFO Trees [0] 6/-1/-1->5->4 [1] 6/-1/-1->5->4 [2] 6/-1/-1->5->4 [3] -1/-1/-1->5->4 [4] 6/-1/-1->5->4 [5] 6/-1/-1->5->4 [6] 6/-1/-1->5->4 [7] -1/-1/-1->5->4 +VM-0-21-tencentos:111:5227 [6] NCCL INFO Trees [0] 7/-1/-1->6->5 [1] 7/-1/-1->6->5 [2] 7/-1/-1->6->5 [3] 7/14/-1->6->-1 [4] 7/-1/-1->6->5 [5] 7/-1/-1->6->5 [6] 7/-1/-1->6->5 [7] 7/-1/-1->6->14 +VM-0-21-tencentos:107:5226 [2] NCCL INFO P2P Chunksize set to 131072 +VM-0-21-tencentos:106:5224 [1] NCCL INFO P2P Chunksize set to 131072 +VM-0-21-tencentos:109:5229 [4] NCCL INFO Trees [0] 5/-1/-1->4->3 [1] 5/-1/-1->4->3 [2] 5/12/-1->4->-1 [3] 5/-1/-1->4->3 [4] 5/-1/-1->4->3 [5] 5/-1/-1->4->3 [6] 5/-1/-1->4->12 [7] 5/-1/-1->4->3 +VM-0-21-tencentos:105:5223 [0] NCCL INFO Channel 01/08 : 0 7 6 5 4 3 10 9 8 15 14 13 12 11 2 1 +VM-0-21-tencentos:108:5230 [3] NCCL INFO P2P Chunksize set to 131072 +VM-0-21-tencentos:110:5228 [5] NCCL INFO P2P Chunksize set to 131072 +VM-0-21-tencentos:111:5227 [6] NCCL INFO P2P Chunksize set to 131072 +VM-0-21-tencentos:109:5229 [4] NCCL INFO P2P Chunksize set to 131072 +VM-0-21-tencentos:105:5223 [0] NCCL INFO Channel 02/08 : 0 7 6 5 12 11 10 9 8 15 14 13 4 3 2 1 +VM-0-21-tencentos:105:5223 [0] NCCL INFO Channel 03/08 : 0 7 14 13 12 11 10 9 8 15 6 5 4 3 2 1 +VM-0-21-tencentos:105:5223 [0] NCCL INFO Channel 04/08 : 0 7 6 5 4 3 2 1 8 15 14 13 12 11 10 9 +VM-0-21-tencentos:105:5223 [0] NCCL INFO Channel 05/08 : 0 7 6 5 4 3 10 9 8 15 14 13 12 11 2 1 +VM-0-21-tencentos:105:5223 [0] NCCL INFO Channel 06/08 : 0 7 6 5 12 11 10 9 8 15 14 13 4 3 2 1 +VM-0-21-tencentos:105:5223 [0] NCCL INFO Channel 07/08 : 0 7 14 13 12 11 10 9 8 15 6 5 4 3 2 1 +VM-0-21-tencentos:105:5223 [0] NCCL INFO Trees [0] 1/8/-1->0->-1 [1] 1/-1/-1->0->7 [2] 1/-1/-1->0->7 [3] 1/-1/-1->0->7 [4] 1/-1/-1->0->8 [5] 1/-1/-1->0->7 [6] 1/-1/-1->0->7 [7] 1/-1/-1->0->7 +VM-0-21-tencentos:105:5223 [0] NCCL INFO P2P Chunksize set to 131072 +VM-0-21-tencentos:108:5230 [3] NCCL INFO Channel 01/0 : 3[3] -> 10[2] [send] via NET/IBext_v8/1 +VM-0-21-tencentos:108:5230 [3] NCCL INFO Channel 05/0 : 3[3] -> 10[2] [send] via NET/IBext_v8/1 +VM-0-21-tencentos:106:5224 [1] NCCL INFO Channel 00/0 : 1[1] -> 8[0] [send] via NET/IBext_v8/0 +VM-0-21-tencentos:106:5224 [1] NCCL INFO Channel 04/0 : 1[1] -> 8[0] [send] via NET/IBext_v8/0 +VM-0-21-tencentos:107:5226 [2] NCCL INFO Channel 01/0 : 11[3] -> 2[2] [receive] via NET/IBext_v8/1 +VM-0-21-tencentos:107:5226 [2] NCCL INFO Channel 05/0 : 11[3] -> 2[2] [receive] via NET/IBext_v8/1 +VM-0-21-tencentos:105:5223 [0] NCCL INFO Channel 00/0 : 9[1] -> 0[0] [receive] via NET/IBext_v8/0 +VM-0-21-tencentos:109:5229 [4] NCCL INFO Channel 02/0 : 13[5] -> 4[4] [receive] via NET/IBext_v8/2 +VM-0-21-tencentos:110:5228 [5] NCCL INFO Channel 02/0 : 5[5] -> 12[4] [send] via NET/IBext_v8/2 +VM-0-21-tencentos:105:5223 [0] NCCL INFO Channel 04/0 : 9[1] -> 0[0] [receive] via NET/IBext_v8/0 +VM-0-21-tencentos:105:5223 [0] NCCL INFO Channel 00/0 : 0[0] -> 7[7] via P2P/CUMEM/read +VM-0-21-tencentos:109:5229 [4] NCCL INFO Channel 06/0 : 13[5] -> 4[4] [receive] via NET/IBext_v8/2 +VM-0-21-tencentos:110:5228 [5] NCCL INFO Channel 06/0 : 5[5] -> 12[4] [send] via NET/IBext_v8/2 +VM-0-21-tencentos:111:5227 [6] NCCL INFO Channel 03/0 : 15[7] -> 6[6] [receive] via NET/IBext_v8/3 +VM-0-21-tencentos:112:5225 [7] NCCL INFO Channel 03/0 : 7[7] -> 14[6] [send] via NET/IBext_v8/3 +VM-0-21-tencentos:111:5227 [6] NCCL INFO Channel 07/0 : 15[7] -> 6[6] [receive] via NET/IBext_v8/3 +VM-0-21-tencentos:112:5225 [7] NCCL INFO Channel 07/0 : 7[7] -> 14[6] [send] via NET/IBext_v8/3 +VM-0-21-tencentos:105:5223 [0] NCCL INFO Channel 01/0 : 0[0] -> 7[7] via P2P/CUMEM/read +VM-0-21-tencentos:105:5223 [0] NCCL INFO Channel 02/0 : 0[0] -> 7[7] via P2P/CUMEM/read +VM-0-21-tencentos:105:5223 [0] NCCL INFO Channel 03/0 : 0[0] -> 7[7] via P2P/CUMEM/read +VM-0-21-tencentos:105:5223 [0] NCCL INFO Channel 04/0 : 0[0] -> 7[7] via P2P/CUMEM/read +VM-0-21-tencentos:105:5223 [0] NCCL INFO Channel 05/0 : 0[0] -> 7[7] via P2P/CUMEM/read +VM-0-21-tencentos:105:5223 [0] NCCL INFO Channel 06/0 : 0[0] -> 7[7] via P2P/CUMEM/read +VM-0-21-tencentos:105:5223 [0] NCCL INFO Channel 07/0 : 0[0] -> 7[7] via P2P/CUMEM/read +VM-0-21-tencentos:112:5225 [7] NCCL INFO Channel 00/0 : 7[7] -> 6[6] via P2P/CUMEM/read +VM-0-21-tencentos:110:5228 [5] NCCL INFO Channel 00/0 : 5[5] -> 4[4] via P2P/CUMEM/read +VM-0-21-tencentos:112:5225 [7] NCCL INFO Channel 01/0 : 7[7] -> 6[6] via P2P/CUMEM/read +VM-0-21-tencentos:107:5226 [2] NCCL INFO Channel 00/0 : 2[2] -> 1[1] via P2P/CUMEM/read +VM-0-21-tencentos:110:5228 [5] NCCL INFO Channel 01/0 : 5[5] -> 4[4] via P2P/CUMEM/read +VM-0-21-tencentos:112:5225 [7] NCCL INFO Channel 02/0 : 7[7] -> 6[6] via P2P/CUMEM/read +VM-0-21-tencentos:107:5226 [2] NCCL INFO Channel 01/0 : 2[2] -> 1[1] via P2P/CUMEM/read +VM-0-21-tencentos:110:5228 [5] NCCL INFO Channel 03/0 : 5[5] -> 4[4] via P2P/CUMEM/read +VM-0-21-tencentos:109:5229 [4] NCCL INFO Channel 00/0 : 4[4] -> 3[3] via P2P/CUMEM/read +VM-0-21-tencentos:112:5225 [7] NCCL INFO Channel 04/0 : 7[7] -> 6[6] via P2P/CUMEM/read +VM-0-21-tencentos:107:5226 [2] NCCL INFO Channel 02/0 : 2[2] -> 1[1] via P2P/CUMEM/read +VM-0-21-tencentos:110:5228 [5] NCCL INFO Channel 04/0 : 5[5] -> 4[4] via P2P/CUMEM/read +VM-0-21-tencentos:109:5229 [4] NCCL INFO Channel 01/0 : 4[4] -> 3[3] via P2P/CUMEM/read +VM-0-21-tencentos:112:5225 [7] NCCL INFO Channel 05/0 : 7[7] -> 6[6] via P2P/CUMEM/read +VM-0-21-tencentos:108:5230 [3] NCCL INFO Channel 00/0 : 3[3] -> 2[2] via P2P/CUMEM/read +VM-0-21-tencentos:107:5226 [2] NCCL INFO Channel 03/0 : 2[2] -> 1[1] via P2P/CUMEM/read +VM-0-21-tencentos:110:5228 [5] NCCL INFO Channel 05/0 : 5[5] -> 4[4] via P2P/CUMEM/read +VM-0-21-tencentos:111:5227 [6] NCCL INFO Channel 00/0 : 6[6] -> 5[5] via P2P/CUMEM/read +VM-0-21-tencentos:109:5229 [4] NCCL INFO Channel 02/0 : 4[4] -> 3[3] via P2P/CUMEM/read +VM-0-21-tencentos:112:5225 [7] NCCL INFO Channel 06/0 : 7[7] -> 6[6] via P2P/CUMEM/read +VM-0-21-tencentos:108:5230 [3] NCCL INFO Channel 02/0 : 3[3] -> 2[2] via P2P/CUMEM/read +VM-0-21-tencentos:107:5226 [2] NCCL INFO Channel 04/0 : 2[2] -> 1[1] via P2P/CUMEM/read +VM-0-21-tencentos:110:5228 [5] NCCL INFO Channel 07/0 : 5[5] -> 4[4] via P2P/CUMEM/read +VM-0-21-tencentos:111:5227 [6] NCCL INFO Channel 01/0 : 6[6] -> 5[5] via P2P/CUMEM/read +VM-0-21-tencentos:109:5229 [4] NCCL INFO Channel 03/0 : 4[4] -> 3[3] via P2P/CUMEM/read +VM-0-21-tencentos:108:5230 [3] NCCL INFO Channel 03/0 : 3[3] -> 2[2] via P2P/CUMEM/read +VM-0-21-tencentos:107:5226 [2] NCCL INFO Channel 05/0 : 2[2] -> 1[1] via P2P/CUMEM/read +VM-0-21-tencentos:111:5227 [6] NCCL INFO Channel 02/0 : 6[6] -> 5[5] via P2P/CUMEM/read +VM-0-21-tencentos:109:5229 [4] NCCL INFO Channel 04/0 : 4[4] -> 3[3] via P2P/CUMEM/read +VM-0-21-tencentos:108:5230 [3] NCCL INFO Channel 04/0 : 3[3] -> 2[2] via P2P/CUMEM/read +VM-0-21-tencentos:107:5226 [2] NCCL INFO Channel 06/0 : 2[2] -> 1[1] via P2P/CUMEM/read +VM-0-21-tencentos:111:5227 [6] NCCL INFO Channel 03/0 : 6[6] -> 5[5] via P2P/CUMEM/read +VM-0-21-tencentos:109:5229 [4] NCCL INFO Channel 05/0 : 4[4] -> 3[3] via P2P/CUMEM/read +VM-0-21-tencentos:108:5230 [3] NCCL INFO Channel 06/0 : 3[3] -> 2[2] via P2P/CUMEM/read +VM-0-21-tencentos:107:5226 [2] NCCL INFO Channel 07/0 : 2[2] -> 1[1] via P2P/CUMEM/read +VM-0-21-tencentos:111:5227 [6] NCCL INFO Channel 04/0 : 6[6] -> 5[5] via P2P/CUMEM/read +VM-0-21-tencentos:109:5229 [4] NCCL INFO Channel 06/0 : 4[4] -> 3[3] via P2P/CUMEM/read +VM-0-21-tencentos:108:5230 [3] NCCL INFO Channel 07/0 : 3[3] -> 2[2] via P2P/CUMEM/read +VM-0-21-tencentos:111:5227 [6] NCCL INFO Channel 05/0 : 6[6] -> 5[5] via P2P/CUMEM/read +VM-0-21-tencentos:109:5229 [4] NCCL INFO Channel 07/0 : 4[4] -> 3[3] via P2P/CUMEM/read +VM-0-21-tencentos:111:5227 [6] NCCL INFO Channel 06/0 : 6[6] -> 5[5] via P2P/CUMEM/read +VM-0-21-tencentos:111:5227 [6] NCCL INFO Channel 07/0 : 6[6] -> 5[5] via P2P/CUMEM/read +VM-0-21-tencentos:106:5224 [1] NCCL INFO Channel 01/0 : 1[1] -> 0[0] via P2P/CUMEM/read +VM-0-21-tencentos:106:5224 [1] NCCL INFO Channel 02/0 : 1[1] -> 0[0] via P2P/CUMEM/read +VM-0-21-tencentos:106:5224 [1] NCCL INFO Channel 03/0 : 1[1] -> 0[0] via P2P/CUMEM/read +VM-0-21-tencentos:106:5224 [1] NCCL INFO Channel 05/0 : 1[1] -> 0[0] via P2P/CUMEM/read +VM-0-21-tencentos:106:5224 [1] NCCL INFO Channel 06/0 : 1[1] -> 0[0] via P2P/CUMEM/read +VM-0-21-tencentos:106:5224 [1] NCCL INFO Channel 07/0 : 1[1] -> 0[0] via P2P/CUMEM/read +VM-0-21-tencentos:108:5230 [3] NCCL INFO Connected all rings +VM-0-21-tencentos:109:5229 [4] NCCL INFO Connected all rings +VM-0-21-tencentos:109:5229 [4] NCCL INFO Channel 00/0 : 4[4] -> 5[5] via P2P/CUMEM/read +VM-0-21-tencentos:108:5230 [3] NCCL INFO Channel 00/0 : 3[3] -> 4[4] via P2P/CUMEM/read +VM-0-21-tencentos:110:5228 [5] NCCL INFO Connected all rings +VM-0-21-tencentos:109:5229 [4] NCCL INFO Channel 01/0 : 4[4] -> 5[5] via P2P/CUMEM/read +VM-0-21-tencentos:108:5230 [3] NCCL INFO Channel 01/0 : 3[3] -> 4[4] via P2P/CUMEM/read +VM-0-21-tencentos:109:5229 [4] NCCL INFO Channel 02/0 : 4[4] -> 5[5] via P2P/CUMEM/read +VM-0-21-tencentos:108:5230 [3] NCCL INFO Channel 03/0 : 3[3] -> 4[4] via P2P/CUMEM/read +VM-0-21-tencentos:109:5229 [4] NCCL INFO Channel 03/0 : 4[4] -> 5[5] via P2P/CUMEM/read +VM-0-21-tencentos:108:5230 [3] NCCL INFO Channel 04/0 : 3[3] -> 4[4] via P2P/CUMEM/read +VM-0-21-tencentos:109:5229 [4] NCCL INFO Channel 04/0 : 4[4] -> 5[5] via P2P/CUMEM/read +VM-0-21-tencentos:108:5230 [3] NCCL INFO Channel 05/0 : 3[3] -> 4[4] via P2P/CUMEM/read +VM-0-21-tencentos:109:5229 [4] NCCL INFO Channel 05/0 : 4[4] -> 5[5] via P2P/CUMEM/read +VM-0-21-tencentos:108:5230 [3] NCCL INFO Channel 07/0 : 3[3] -> 4[4] via P2P/CUMEM/read +VM-0-21-tencentos:105:5223 [0] NCCL INFO Connected all rings +VM-0-21-tencentos:105:5223 [0] NCCL INFO Channel 00/0 : 0[0] -> 1[1] via P2P/CUMEM/read +VM-0-21-tencentos:106:5224 [1] NCCL INFO Connected all rings +VM-0-21-tencentos:107:5226 [2] NCCL INFO Connected all rings +VM-0-21-tencentos:109:5229 [4] NCCL INFO Channel 06/0 : 4[4] -> 5[5] via P2P/CUMEM/read +VM-0-21-tencentos:105:5223 [0] NCCL INFO Channel 01/0 : 0[0] -> 1[1] via P2P/CUMEM/read +VM-0-21-tencentos:109:5229 [4] NCCL INFO Channel 07/0 : 4[4] -> 5[5] via P2P/CUMEM/read +VM-0-21-tencentos:111:5227 [6] NCCL INFO Connected all rings +VM-0-21-tencentos:112:5225 [7] NCCL INFO Connected all rings +VM-0-21-tencentos:105:5223 [0] NCCL INFO Channel 02/0 : 0[0] -> 1[1] via P2P/CUMEM/read +VM-0-21-tencentos:105:5223 [0] NCCL INFO Channel 03/0 : 0[0] -> 1[1] via P2P/CUMEM/read +VM-0-21-tencentos:110:5228 [5] NCCL INFO Channel 00/0 : 5[5] -> 6[6] via P2P/CUMEM/read +VM-0-21-tencentos:105:5223 [0] NCCL INFO Channel 04/0 : 0[0] -> 1[1] via P2P/CUMEM/read +VM-0-21-tencentos:110:5228 [5] NCCL INFO Channel 01/0 : 5[5] -> 6[6] via P2P/CUMEM/read +VM-0-21-tencentos:105:5223 [0] NCCL INFO Channel 05/0 : 0[0] -> 1[1] via P2P/CUMEM/read +VM-0-21-tencentos:110:5228 [5] NCCL INFO Channel 02/0 : 5[5] -> 6[6] via P2P/CUMEM/read +VM-0-21-tencentos:105:5223 [0] NCCL INFO Channel 06/0 : 0[0] -> 1[1] via P2P/CUMEM/read +VM-0-21-tencentos:107:5226 [2] NCCL INFO Channel 00/0 : 2[2] -> 3[3] via P2P/CUMEM/read +VM-0-21-tencentos:110:5228 [5] NCCL INFO Channel 04/0 : 5[5] -> 6[6] via P2P/CUMEM/read +VM-0-21-tencentos:105:5223 [0] NCCL INFO Channel 07/0 : 0[0] -> 1[1] via P2P/CUMEM/read +VM-0-21-tencentos:107:5226 [2] NCCL INFO Channel 01/0 : 2[2] -> 3[3] via P2P/CUMEM/read +VM-0-21-tencentos:110:5228 [5] NCCL INFO Channel 05/0 : 5[5] -> 6[6] via P2P/CUMEM/read +VM-0-21-tencentos:111:5227 [6] NCCL INFO Channel 00/0 : 6[6] -> 7[7] via P2P/CUMEM/read +VM-0-21-tencentos:106:5224 [1] NCCL INFO Channel 00/0 : 1[1] -> 2[2] via P2P/CUMEM/read +VM-0-21-tencentos:107:5226 [2] NCCL INFO Channel 02/0 : 2[2] -> 3[3] via P2P/CUMEM/read +VM-0-21-tencentos:110:5228 [5] NCCL INFO Channel 06/0 : 5[5] -> 6[6] via P2P/CUMEM/read +VM-0-21-tencentos:106:5224 [1] NCCL INFO Channel 02/0 : 1[1] -> 2[2] via P2P/CUMEM/read +VM-0-21-tencentos:111:5227 [6] NCCL INFO Channel 01/0 : 6[6] -> 7[7] via P2P/CUMEM/read +VM-0-21-tencentos:107:5226 [2] NCCL INFO Channel 03/0 : 2[2] -> 3[3] via P2P/CUMEM/read +VM-0-21-tencentos:106:5224 [1] NCCL INFO Channel 03/0 : 1[1] -> 2[2] via P2P/CUMEM/read +VM-0-21-tencentos:111:5227 [6] NCCL INFO Channel 02/0 : 6[6] -> 7[7] via P2P/CUMEM/read +VM-0-21-tencentos:109:5229 [4] NCCL INFO Channel 02/0 : 12[4] -> 4[4] [receive] via NET/IBext_v8/2 +VM-0-21-tencentos:107:5226 [2] NCCL INFO Channel 04/0 : 2[2] -> 3[3] via P2P/CUMEM/read +VM-0-21-tencentos:109:5229 [4] NCCL INFO Channel 06/0 : 12[4] -> 4[4] [receive] via NET/IBext_v8/2 +VM-0-21-tencentos:109:5229 [4] NCCL INFO Channel 02/0 : 4[4] -> 12[4] [send] via NET/IBext_v8/2 +VM-0-21-tencentos:109:5229 [4] NCCL INFO Channel 06/0 : 4[4] -> 12[4] [send] via NET/IBext_v8/2 +VM-0-21-tencentos:106:5224 [1] NCCL INFO Channel 04/0 : 1[1] -> 2[2] via P2P/CUMEM/read +VM-0-21-tencentos:111:5227 [6] NCCL INFO Channel 03/0 : 6[6] -> 7[7] via P2P/CUMEM/read +VM-0-21-tencentos:107:5226 [2] NCCL INFO Channel 05/0 : 2[2] -> 3[3] via P2P/CUMEM/read +VM-0-21-tencentos:106:5224 [1] NCCL INFO Channel 06/0 : 1[1] -> 2[2] via P2P/CUMEM/read +VM-0-21-tencentos:111:5227 [6] NCCL INFO Channel 04/0 : 6[6] -> 7[7] via P2P/CUMEM/read +VM-0-21-tencentos:107:5226 [2] NCCL INFO Channel 06/0 : 2[2] -> 3[3] via P2P/CUMEM/read +VM-0-21-tencentos:106:5224 [1] NCCL INFO Channel 07/0 : 1[1] -> 2[2] via P2P/CUMEM/read +VM-0-21-tencentos:111:5227 [6] NCCL INFO Channel 05/0 : 6[6] -> 7[7] via P2P/CUMEM/read +VM-0-21-tencentos:107:5226 [2] NCCL INFO Channel 07/0 : 2[2] -> 3[3] via P2P/CUMEM/read +VM-0-21-tencentos:111:5227 [6] NCCL INFO Channel 06/0 : 6[6] -> 7[7] via P2P/CUMEM/read +VM-0-21-tencentos:105:5223 [0] NCCL INFO Channel 00/0 : 8[0] -> 0[0] [receive] via NET/IBext_v8/0 +VM-0-21-tencentos:106:5224 [1] NCCL INFO Channel 00/0 : 1[1] -> 0[0] via P2P/CUMEM/read +VM-0-21-tencentos:105:5223 [0] NCCL INFO Channel 04/0 : 8[0] -> 0[0] [receive] via NET/IBext_v8/0 +VM-0-21-tencentos:111:5227 [6] NCCL INFO Channel 07/0 : 6[6] -> 7[7] via P2P/CUMEM/read +VM-0-21-tencentos:108:5230 [3] NCCL INFO Channel 01/0 : 3[3] -> 2[2] via P2P/CUMEM/read +VM-0-21-tencentos:105:5223 [0] NCCL INFO Channel 00/0 : 0[0] -> 8[0] [send] via NET/IBext_v8/0 +VM-0-21-tencentos:105:5223 [0] NCCL INFO Channel 04/0 : 0[0] -> 8[0] [send] via NET/IBext_v8/0 +VM-0-21-tencentos:107:5226 [2] NCCL INFO Channel 01/0 : 10[2] -> 2[2] [receive] via NET/IBext_v8/1 +VM-0-21-tencentos:107:5226 [2] NCCL INFO Channel 05/0 : 10[2] -> 2[2] [receive] via NET/IBext_v8/1 +VM-0-21-tencentos:107:5226 [2] NCCL INFO Channel 01/0 : 2[2] -> 10[2] [send] via NET/IBext_v8/1 +VM-0-21-tencentos:107:5226 [2] NCCL INFO Channel 05/0 : 2[2] -> 10[2] [send] via NET/IBext_v8/1 +VM-0-21-tencentos:106:5224 [1] NCCL INFO Channel 04/0 : 1[1] -> 0[0] via P2P/CUMEM/read +VM-0-21-tencentos:110:5228 [5] NCCL INFO Channel 02/0 : 5[5] -> 4[4] via P2P/CUMEM/read +VM-0-21-tencentos:108:5230 [3] NCCL INFO Channel 05/0 : 3[3] -> 2[2] via P2P/CUMEM/read +VM-0-21-tencentos:112:5225 [7] NCCL INFO Channel 01/0 : 7[7] -> 0[0] via P2P/CUMEM/read +VM-0-21-tencentos:111:5227 [6] NCCL INFO Channel 03/0 : 14[6] -> 6[6] [receive] via NET/IBext_v8/3 +VM-0-21-tencentos:111:5227 [6] NCCL INFO Channel 07/0 : 14[6] -> 6[6] [receive] via NET/IBext_v8/3 +VM-0-21-tencentos:111:5227 [6] NCCL INFO Channel 03/0 : 6[6] -> 14[6] [send] via NET/IBext_v8/3 +VM-0-21-tencentos:111:5227 [6] NCCL INFO Channel 07/0 : 6[6] -> 14[6] [send] via NET/IBext_v8/3 +VM-0-21-tencentos:110:5228 [5] NCCL INFO Channel 06/0 : 5[5] -> 4[4] via P2P/CUMEM/read +VM-0-21-tencentos:112:5225 [7] NCCL INFO Channel 02/0 : 7[7] -> 0[0] via P2P/CUMEM/read +VM-0-21-tencentos:112:5225 [7] NCCL INFO Channel 03/0 : 7[7] -> 0[0] via P2P/CUMEM/read +VM-0-21-tencentos:112:5225 [7] NCCL INFO Channel 05/0 : 7[7] -> 0[0] via P2P/CUMEM/read +VM-0-21-tencentos:112:5225 [7] NCCL INFO Channel 06/0 : 7[7] -> 0[0] via P2P/CUMEM/read +VM-0-21-tencentos:112:5225 [7] NCCL INFO Channel 07/0 : 7[7] -> 0[0] via P2P/CUMEM/read +VM-0-21-tencentos:112:5225 [7] NCCL INFO Channel 03/0 : 7[7] -> 6[6] via P2P/CUMEM/read +VM-0-21-tencentos:112:5225 [7] NCCL INFO Channel 07/0 : 7[7] -> 6[6] via P2P/CUMEM/read +VM-0-21-tencentos:107:5226 [2] NCCL INFO Connected all trees +VM-0-21-tencentos:108:5230 [3] NCCL INFO Connected all trees +VM-0-21-tencentos:107:5226 [2] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 512 | 512 +VM-0-21-tencentos:107:5226 [2] NCCL INFO 8 coll channels, 0 collnet channels, 0 nvls channels, 8 p2p channels, 2 p2p channels per peer +VM-0-21-tencentos:108:5230 [3] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 512 | 512 +VM-0-21-tencentos:108:5230 [3] NCCL INFO 8 coll channels, 0 collnet channels, 0 nvls channels, 8 p2p channels, 2 p2p channels per peer +VM-0-21-tencentos:110:5228 [5] NCCL INFO Connected all trees +VM-0-21-tencentos:109:5229 [4] NCCL INFO Connected all trees +VM-0-21-tencentos:109:5229 [4] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 512 | 512 +VM-0-21-tencentos:110:5228 [5] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 512 | 512 +VM-0-21-tencentos:109:5229 [4] NCCL INFO 8 coll channels, 0 collnet channels, 0 nvls channels, 8 p2p channels, 2 p2p channels per peer +VM-0-21-tencentos:110:5228 [5] NCCL INFO 8 coll channels, 0 collnet channels, 0 nvls channels, 8 p2p channels, 2 p2p channels per peer +VM-0-21-tencentos:105:5223 [0] NCCL INFO Connected all trees +VM-0-21-tencentos:106:5224 [1] NCCL INFO Connected all trees +VM-0-21-tencentos:106:5224 [1] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 512 | 512 +VM-0-21-tencentos:106:5224 [1] NCCL INFO 8 coll channels, 0 collnet channels, 0 nvls channels, 8 p2p channels, 2 p2p channels per peer +VM-0-21-tencentos:105:5223 [0] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 512 | 512 +VM-0-21-tencentos:105:5223 [0] NCCL INFO 8 coll channels, 0 collnet channels, 0 nvls channels, 8 p2p channels, 2 p2p channels per peer +VM-0-21-tencentos:111:5227 [6] NCCL INFO Connected all trees +VM-0-21-tencentos:112:5225 [7] NCCL INFO Connected all trees +VM-0-21-tencentos:111:5227 [6] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 512 | 512 +VM-0-21-tencentos:111:5227 [6] NCCL INFO 8 coll channels, 0 collnet channels, 0 nvls channels, 8 p2p channels, 2 p2p channels per peer +VM-0-21-tencentos:112:5225 [7] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 512 | 512 +VM-0-21-tencentos:112:5225 [7] NCCL INFO 8 coll channels, 0 collnet channels, 0 nvls channels, 8 p2p channels, 2 p2p channels per peer +VM-0-21-tencentos:110:5228 [5] NCCL INFO comm 0x5a546450 rank 5 nranks 16 cudaDev 5 nvmlDev 5 busId 84000 commId 0x319e978f568c0fb6 - Init COMPLETE +VM-0-21-tencentos:108:5230 [3] NCCL INFO comm 0x525e1ea0 rank 3 nranks 16 cudaDev 3 nvmlDev 3 busId 44000 commId 0x319e978f568c0fb6 - Init COMPLETE +VM-0-21-tencentos:106:5224 [1] NCCL INFO comm 0x6be95090 rank 1 nranks 16 cudaDev 1 nvmlDev 1 busId 24000 commId 0x319e978f568c0fb6 - Init COMPLETE +VM-0-21-tencentos:112:5225 [7] NCCL INFO comm 0x373318c0 rank 7 nranks 16 cudaDev 7 nvmlDev 7 busId c4000 commId 0x319e978f568c0fb6 - Init COMPLETE +VM-0-21-tencentos:105:5223 [0] NCCL INFO comm 0x13b36580 rank 0 nranks 16 cudaDev 0 nvmlDev 0 busId 23000 commId 0x319e978f568c0fb6 - Init COMPLETE +VM-0-21-tencentos:107:5226 [2] NCCL INFO comm 0x3ee19520 rank 2 nranks 16 cudaDev 2 nvmlDev 2 busId 43000 commId 0x319e978f568c0fb6 - Init COMPLETE +VM-0-21-tencentos:109:5229 [4] NCCL INFO comm 0x38719160 rank 4 nranks 16 cudaDev 4 nvmlDev 4 busId 83000 commId 0x319e978f568c0fb6 - Init COMPLETE +VM-0-21-tencentos:111:5227 [6] NCCL INFO comm 0x279a5e30 rank 6 nranks 16 cudaDev 6 nvmlDev 6 busId c3000 commId 0x319e978f568c0fb6 - Init COMPLETE +{'loss': 1.7864, 'grad_norm': 12.822089195251465, 'learning_rate': 1.282051282051282e-07, 'epoch': 0.0} +{'loss': 1.809, 'grad_norm': 15.502657890319824, 'learning_rate': 2.564102564102564e-07, 'epoch': 0.0} +{'loss': 1.9262, 'grad_norm': 16.625526428222656, 'learning_rate': 3.846153846153847e-07, 'epoch': 0.0} +{'loss': 1.8607, 'grad_norm': 15.486868858337402, 'learning_rate': 5.128205128205128e-07, 'epoch': 0.0} +{'loss': 1.832, 'grad_norm': 14.205751419067383, 'learning_rate': 6.41025641025641e-07, 'epoch': 0.0} +{'loss': 1.7619, 'grad_norm': 16.743728637695312, 'learning_rate': 7.692307692307694e-07, 'epoch': 0.0} +{'loss': 1.8003, 'grad_norm': 15.840396881103516, 'learning_rate': 8.974358974358975e-07, 'epoch': 0.0} +{'loss': 1.8521, 'grad_norm': 17.014772415161133, 'learning_rate': 1.0256410256410257e-06, 'epoch': 0.0} +{'loss': 1.1337, 'grad_norm': 3.425260066986084, 'learning_rate': 1.153846153846154e-06, 'epoch': 0.0} +{'loss': 1.7902, 'grad_norm': 12.898903846740723, 'learning_rate': 1.282051282051282e-06, 'epoch': 0.0} +{'loss': 1.8239, 'grad_norm': 12.346709251403809, 'learning_rate': 1.4102564102564104e-06, 'epoch': 0.0} +{'loss': 1.7284, 'grad_norm': 11.96186637878418, 'learning_rate': 1.5384615384615387e-06, 'epoch': 0.0} +{'loss': 1.7008, 'grad_norm': 10.177848815917969, 'learning_rate': 1.6666666666666667e-06, 'epoch': 0.0} +{'loss': 1.7296, 'grad_norm': 10.066478729248047, 'learning_rate': 1.794871794871795e-06, 'epoch': 0.0} +{'loss': 1.0516, 'grad_norm': 3.2294039726257324, 'learning_rate': 1.9230769230769234e-06, 'epoch': 0.0} +{'loss': 1.7109, 'grad_norm': 9.396673202514648, 'learning_rate': 2.0512820512820513e-06, 'epoch': 0.0} +{'loss': 1.6944, 'grad_norm': 8.930295944213867, 'learning_rate': 2.1794871794871797e-06, 'epoch': 0.0} +{'loss': 1.5955, 'grad_norm': 7.112697601318359, 'learning_rate': 2.307692307692308e-06, 'epoch': 0.0} +{'loss': 1.5548, 'grad_norm': 6.16896915435791, 'learning_rate': 2.435897435897436e-06, 'epoch': 0.0} +{'loss': 1.5453, 'grad_norm': 5.964946746826172, 'learning_rate': 2.564102564102564e-06, 'epoch': 0.0} +{'loss': 1.4643, 'grad_norm': 5.10248327255249, 'learning_rate': 2.6923076923076923e-06, 'epoch': 0.0} +{'loss': 1.4925, 'grad_norm': 4.836123466491699, 'learning_rate': 2.8205128205128207e-06, 'epoch': 0.0} +{'loss': 1.4011, 'grad_norm': 4.450653076171875, 'learning_rate': 2.948717948717949e-06, 'epoch': 0.0} +{'loss': 1.4694, 'grad_norm': 4.279849529266357, 'learning_rate': 3.0769230769230774e-06, 'epoch': 0.0} +{'loss': 1.4068, 'grad_norm': 3.859330415725708, 'learning_rate': 3.205128205128206e-06, 'epoch': 0.0} +{'loss': 1.3321, 'grad_norm': 3.5212297439575195, 'learning_rate': 3.3333333333333333e-06, 'epoch': 0.01} +{'loss': 1.3573, 'grad_norm': 3.2917630672454834, 'learning_rate': 3.4615384615384617e-06, 'epoch': 0.01} +{'loss': 1.1472, 'grad_norm': 3.4271128177642822, 'learning_rate': 3.58974358974359e-06, 'epoch': 0.01} +{'loss': 1.4178, 'grad_norm': 3.9574453830718994, 'learning_rate': 3.7179487179487184e-06, 'epoch': 0.01} +{'loss': 1.2248, 'grad_norm': 2.950049877166748, 'learning_rate': 3.846153846153847e-06, 'epoch': 0.01} +{'loss': 1.3861, 'grad_norm': 3.720195770263672, 'learning_rate': 3.974358974358974e-06, 'epoch': 0.01} +{'loss': 1.3506, 'grad_norm': 3.2444863319396973, 'learning_rate': 4.102564102564103e-06, 'epoch': 0.01} +{'loss': 1.4098, 'grad_norm': 3.5759549140930176, 'learning_rate': 4.230769230769231e-06, 'epoch': 0.01} +{'loss': 1.258, 'grad_norm': 3.14204478263855, 'learning_rate': 4.358974358974359e-06, 'epoch': 0.01} +{'loss': 1.0367, 'grad_norm': 2.818664789199829, 'learning_rate': 4.487179487179488e-06, 'epoch': 0.01} +{'loss': 1.3263, 'grad_norm': 3.267183542251587, 'learning_rate': 4.615384615384616e-06, 'epoch': 0.01} +{'loss': 1.2931, 'grad_norm': 2.8578402996063232, 'learning_rate': 4.743589743589744e-06, 'epoch': 0.01} +{'loss': 1.271, 'grad_norm': 2.805088520050049, 'learning_rate': 4.871794871794872e-06, 'epoch': 0.01} +{'loss': 1.3552, 'grad_norm': 3.186495065689087, 'learning_rate': 5e-06, 'epoch': 0.01} +{'loss': 1.3347, 'grad_norm': 2.701211929321289, 'learning_rate': 5.128205128205128e-06, 'epoch': 0.01} +{'loss': 1.294, 'grad_norm': 2.6992037296295166, 'learning_rate': 5.256410256410257e-06, 'epoch': 0.01} +{'loss': 1.2933, 'grad_norm': 3.0867204666137695, 'learning_rate': 5.384615384615385e-06, 'epoch': 0.01} +{'loss': 1.2909, 'grad_norm': 2.5067825317382812, 'learning_rate': 5.512820512820514e-06, 'epoch': 0.01} +{'loss': 1.2735, 'grad_norm': 2.5910398960113525, 'learning_rate': 5.641025641025641e-06, 'epoch': 0.01} +{'loss': 1.3035, 'grad_norm': 2.755016803741455, 'learning_rate': 5.769230769230769e-06, 'epoch': 0.01} +{'loss': 1.1097, 'grad_norm': 2.2789146900177, 'learning_rate': 5.897435897435898e-06, 'epoch': 0.01} +{'loss': 1.0117, 'grad_norm': 2.0922346115112305, 'learning_rate': 6.025641025641026e-06, 'epoch': 0.01} +{'loss': 1.2662, 'grad_norm': 2.4330496788024902, 'learning_rate': 6.153846153846155e-06, 'epoch': 0.01} +{'loss': 1.3366, 'grad_norm': 2.6654067039489746, 'learning_rate': 6.282051282051282e-06, 'epoch': 0.01} +{'loss': 1.2292, 'grad_norm': 2.1606128215789795, 'learning_rate': 6.410256410256412e-06, 'epoch': 0.01} +{'loss': 1.1865, 'grad_norm': 2.382218360900879, 'learning_rate': 6.538461538461539e-06, 'epoch': 0.01} +{'loss': 1.2384, 'grad_norm': 2.0865025520324707, 'learning_rate': 6.666666666666667e-06, 'epoch': 0.01} +{'loss': 1.2919, 'grad_norm': 2.2342097759246826, 'learning_rate': 6.794871794871796e-06, 'epoch': 0.01} +{'loss': 1.2766, 'grad_norm': 2.0874555110931396, 'learning_rate': 6.923076923076923e-06, 'epoch': 0.01} +{'loss': 1.2323, 'grad_norm': 2.268005847930908, 'learning_rate': 7.051282051282053e-06, 'epoch': 0.01} +{'loss': 1.1302, 'grad_norm': 2.5772290229797363, 'learning_rate': 7.17948717948718e-06, 'epoch': 0.01} +{'loss': 1.2162, 'grad_norm': 2.07559871673584, 'learning_rate': 7.307692307692308e-06, 'epoch': 0.01} +{'loss': 1.1496, 'grad_norm': 2.216207265853882, 'learning_rate': 7.435897435897437e-06, 'epoch': 0.01} +{'loss': 1.2705, 'grad_norm': 2.183626890182495, 'learning_rate': 7.564102564102564e-06, 'epoch': 0.01} +{'loss': 1.22, 'grad_norm': 2.187493085861206, 'learning_rate': 7.692307692307694e-06, 'epoch': 0.01} +{'loss': 1.2344, 'grad_norm': 2.0497353076934814, 'learning_rate': 7.820512820512822e-06, 'epoch': 0.01} +{'loss': 1.1398, 'grad_norm': 2.324303388595581, 'learning_rate': 7.948717948717949e-06, 'epoch': 0.01} +{'loss': 1.1237, 'grad_norm': 2.2123873233795166, 'learning_rate': 8.076923076923077e-06, 'epoch': 0.01} +{'loss': 1.1925, 'grad_norm': 2.1354572772979736, 'learning_rate': 8.205128205128205e-06, 'epoch': 0.01} +{'loss': 1.2531, 'grad_norm': 2.054776430130005, 'learning_rate': 8.333333333333334e-06, 'epoch': 0.01} +WARNING: tokenization mismatch: 0 vs. 670. (ignored) +number of rounds: 1 +rounds: ["A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. USER: ASSISTANT: You are a top consultant specialising in advising tax firms and have the following skills and qualities.\n\nKNOWLEDGE: Consultant in a top consulting firm specialising in tax consulting, senior strategist, digital expert.\n\nCHARACTERISTICS: Customer-focused, empathetic, solution-oriented, personable, persuasive, creative.\n\nTASK: Use your knowledge to develop 10 measures to encourage clients to submit their documents digitally instead of by post. The measures should be formulated in such a way that the client recognises the advantages of digital filing. The law firm is digitised and can accept digital documents.\n\nOUTPUT: A list of the 10 actions, each with the name of the action as a heading and 2 short additional bullet points.\n\nOUTPUT LANGUAGE: German ASSISTANT: 1. Persönlicher Kontakt und Beratung:\n\t* Persönliche Beratung und Unterstützung bei der Umstellung auf digitales Einreichen\n\t* Betonung der Vorteile, wie Zeit- und Kosteneinsparungen\n2. Schulung und Unterstützung:\n\t* Schulungen für die digitale Einreichung von Dokumenten\n\t* Anleitung zur Nutzung von Tools und Software\n3. Anreize schaffen:\n\t* Ermäßigung bei der Beauftragung digitaler Dienstleistungen\n\t* Sonderangebote für Kunden, die ihre Dokumente digital einreichen\n4. Informationen bereitstellen:\n\t* Veröffentlichung von Leitfäden und Handbüchern für die digitale Einreichung von Dokumenten\n\t* FAQ-Liste für häufig gestellte Fragen zum digitalen Einreichen\n5. Klare Kommunikation:\n\t* Verständliche Kommunikation der Vorteile der digitalen Einreichung\n\t* Bereitstellung von Informationen über die Sicherheit der digitalen Übermittlung\n6. Einfache Zugänglichkeit:\n\t* Vereinfachung des digitalen Einreichungsprozesses durch intuitive Online-Plattformen\n\t* Bereitstellung von Unterstützung für den Fall technischer Schwierigkeiten\n7. Zeitersparnis betonen:\n\t* Betonung der Vorteile von Zeitersparnis durch digitales Einreichen\n\t* Erinnerungen an Fristen und den Nutzen der pünktlichen Einreichung\n8. Umweltbewusstsein:\n\t* Betonung des Umweltvorteils durch digitales Einreichen von Dokumenten\n\t* Verweis auf die Möglichkeit, Papierverbrauch zu reduzieren\n9. Kosteneffizienz:\n\t* Betonung der Vorteile von Kosteneinsparungen durch digitales Einreichen\n\t* Verweis auf die Möglichkeit, Papier- und Portokosten zu sparen\n10. Feedback einholen:\n* Umfrage zur Bewertung des digitalen Einreichungsprozesses\n* Analyse der Ergebnisse zur Verbesserung der digitalen Dienstleistungen"] +prompt: A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. USER: ASSISTANT: You are a top consultant specialising in advising tax firms and have the following skills and qualities. + +KNOWLEDGE: Consultant in a top consulting firm specialising in tax consulting, senior strategist, digital expert. + +CHARACTERISTICS: Customer-focused, empathetic, solution-oriented, personable, persuasive, creative. + +TASK: Use your knowledge to develop 10 measures to encourage clients to submit their documents digitally instead of by post. The measures should be formulated in such a way that the client recognises the advantages of digital filing. The law firm is digitised and can accept digital documents. + +OUTPUT: A list of the 10 actions, each with the name of the action as a heading and 2 short additional bullet points. + +OUTPUT LANGUAGE: German ASSISTANT: 1. Persönlicher Kontakt und Beratung: + * Persönliche Beratung und Unterstützung bei der Umstellung auf digitales Einreichen + * Betonung der Vorteile, wie Zeit- und Kosteneinsparungen +2. Schulung und Unterstützung: + * Schulungen für die digitale Einreichung von Dokumenten + * Anleitung zur Nutzung von Tools und Software +3. Anreize schaffen: + * Ermäßigung bei der Beauftragung digitaler Dienstleistungen + * Sonderangebote für Kunden, die ihre Dokumente digital einreichen +4. Informationen bereitstellen: + * Veröffentlichung von Leitfäden und Handbüchern für die digitale Einreichung von Dokumenten + * FAQ-Liste für häufig gestellte Fragen zum digitalen Einreichen +5. Klare Kommunikation: + * Verständliche Kommunikation der Vorteile der digitalen Einreichung + * Bereitstellung von Informationen über die Sicherheit der digitalen Übermittlung +6. Einfache Zugänglichkeit: + * Vereinfachung des digitalen Einreichungsprozesses durch intuitive Online-Plattformen + * Bereitstellung von Unterstützung für den Fall technischer Schwierigkeiten +7. Zeitersparnis betonen: + * Betonung der Vorteile von Zeitersparnis durch digitales Einreichen + * Erinnerungen an Fristen und den Nutzen der pünktlichen Einreichung +8. Umweltbewusstsein: + * Betonung des Umweltvorteils durch digitales Einreichen von Dokumenten + * Verweis auf die Möglichkeit, Papierverbrauch zu reduzieren +9. Kosteneffizienz: + * Betonung der Vorteile von Kosteneinsparungen durch digitales Einreichen + * Verweis auf die Möglichkeit, Papier- und Portokosten zu sparen +10. Feedback einholen: +* Umfrage zur Bewertung des digitalen Einreichungsprozesses +* Analyse der Ergebnisse zur Verbesserung der digitalen Dienstleistungen<|im_end|> +tensor([-100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100]) +tensor([ 32, 6236, 1948, 264, 22208, 1196, 323, 458, 20443, + 11229, 17847, 13, 576, 17847, 6696, 10950, 11, 11682, + 11, 323, 47787, 11253, 311, 279, 1196, 594, 4755, + 13, 13872, 25, 35560, 3846, 2821, 25, 1446, 525, + 264, 1909, 30660, 3281, 3791, 304, 62679, 3742, 18809, + 323, 614, 279, 2701, 7361, 323, 28500, 382, 17155, + 3307, 13639, 10777, 25, 55446, 304, 264, 1909, 30731, + 7474, 3281, 3791, 304, 3742, 30731, 11, 9990, 64135, + 11, 7377, 6203, 382, 15237, 37397, 71637, 25, 12277, + 51273, 11, 35581, 5298, 11, 6291, 35085, 11, 1697, + 480, 11, 65243, 11, 11521, 382, 65813, 25, 5443, + 697, 6540, 311, 2225, 220, 16, 15, 10953, 311, + 14907, 8239, 311, 9318, 862, 9293, 67778, 4518, 315, + 553, 1736, 13, 576, 10953, 1265, 387, 59294, 304, + 1741, 264, 1616, 429, 279, 2943, 5829, 4909, 279, + 22146, 315, 7377, 25480, 13, 576, 2329, 7474, 374, + 15723, 4056, 323, 646, 4193, 7377, 9293, 382, 30301, + 25, 362, 1140, 315, 279, 220, 16, 15, 6168, + 11, 1817, 448, 279, 829, 315, 279, 1917, 438, + 264, 14496, 323, 220, 17, 2805, 5107, 17432, 3501, + 382, 30301, 34800, 25, 5938, 35560, 3846, 2821, 25, + 220, 16, 13, 20465, 23348, 50095, 65608, 2030, 8907, + 266, 2185, 510, 197, 9, 20465, 23348, 20005, 8907, + 266, 2185, 2030, 141786, 13279, 2694, 23334, 54377, 7219, + 15723, 3831, 18069, 265, 28324, 198, 197, 9, 13017, + 263, 2185, 2694, 647, 16568, 457, 11, 13368, 28831, + 12, 2030, 730, 535, 1952, 1330, 1732, 11595, 198, + 17, 13, 52990, 2185, 2030, 141786, 510, 197, 9, + 52990, 11595, 7180, 2746, 15723, 1574, 18069, 40334, 2185, + 6538, 66850, 1123, 268, 198, 197, 9, 1527, 74462, + 17312, 84605, 6538, 13852, 2030, 4377, 198, 18, 13, + 1527, 265, 551, 5699, 71378, 510, 197, 9, 468, + 8719, 68947, 42603, 13279, 2694, 2823, 50528, 52015, 2185, + 7377, 261, 60404, 267, 273, 380, 11595, 198, 197, + 9, 11840, 1107, 844, 92918, 7180, 61291, 11, 2746, + 34749, 66850, 1123, 68, 7377, 4368, 265, 28324, 198, + 19, 13, 48842, 26933, 275, 58158, 510, 197, 9, + 6250, 132149, 2185, 6538, 1967, 275, 131660, 5183, 2030, + 8536, 129578, 331, 932, 7180, 2746, 15723, 1574, 18069, + 40334, 2185, 6538, 66850, 1123, 268, 198, 197, 9, + 30972, 12, 852, 68, 7180, 137943, 12743, 613, 665, + 70777, 16034, 7377, 268, 18069, 265, 28324, 198, 20, + 13, 27941, 546, 87727, 70449, 367, 510, 197, 9, + 6250, 47635, 20005, 87727, 70449, 367, 2694, 647, 16568, + 457, 2694, 7377, 268, 18069, 40334, 2185, 198, 197, + 9, 43157, 275, 54377, 6538, 48842, 13785, 2746, 74303, + 22385, 2694, 7377, 268, 49371, 23692, 37949, 198, 21, + 13, 468, 13573, 1777, 76716, 25423, 88185, 510, 197, + 9, 68832, 13573, 610, 2185, 939, 7377, 268, 18069, + 40334, 28122, 776, 89, 64870, 19767, 41679, 8105, 12, + 2120, 1587, 627, 268, 198, 197, 9, 43157, 275, + 54377, 6538, 141786, 7180, 3371, 14785, 2799, 32708, 29505, + 1268, 343, 51707, 198, 22, 13, 28831, 388, 1732, + 25151, 1274, 41928, 510, 197, 9, 13017, 263, 2185, + 2694, 647, 16568, 457, 6538, 28831, 388, 1732, 25151, + 19767, 15723, 3831, 18069, 265, 28324, 198, 197, 9, + 9740, 4382, 11595, 458, 434, 2819, 268, 2030, 3371, + 18372, 5679, 2694, 281, 16074, 5840, 21673, 18069, 40334, + 2185, 198, 23, 13, 23334, 86, 3818, 81936, 355, + 267, 48435, 510, 197, 9, 13017, 263, 2185, 939, + 23334, 86, 3818, 85, 16568, 8669, 19767, 15723, 3831, + 18069, 265, 28324, 6538, 66850, 1123, 268, 198, 197, + 9, 6250, 73869, 7219, 2746, 90817, 11, 31643, 1268, + 423, 14383, 1387, 6395, 3595, 89, 16030, 198, 24, + 13, 730, 535, 1952, 542, 449, 3591, 89, 510, + 197, 9, 13017, 263, 2185, 2694, 647, 16568, 457, + 6538, 730, 535, 1952, 1330, 1732, 11595, 19767, 15723, + 3831, 18069, 265, 28324, 198, 197, 9, 6250, 73869, + 7219, 2746, 90817, 11, 31643, 1268, 12, 2030, 5776, + 562, 14012, 6395, 978, 9151, 198, 16, 15, 13, + 36857, 4368, 8452, 268, 510, 9, 23334, 83301, 17312, + 85651, 2185, 939, 7377, 268, 18069, 40334, 28122, 776, + 89, 64870, 198, 9, 37427, 325, 2694, 78796, 15398, + 17312, 59214, 36370, 2185, 2694, 7377, 268, 60404, 267, + 273, 380, 11595, 151645]) +{'loss': 1.1556, 'grad_norm': 2.3509910106658936, 'learning_rate': 8.461538461538462e-06, 'epoch': 0.01} +{'loss': 1.242, 'grad_norm': 2.21283221244812, 'learning_rate': 8.58974358974359e-06, 'epoch': 0.01} +{'loss': 1.2512, 'grad_norm': 2.0342981815338135, 'learning_rate': 8.717948717948719e-06, 'epoch': 0.01} +{'loss': 1.1677, 'grad_norm': 2.1091084480285645, 'learning_rate': 8.846153846153847e-06, 'epoch': 0.01} +{'loss': 1.2226, 'grad_norm': 2.142625570297241, 'learning_rate': 8.974358974358976e-06, 'epoch': 0.01} +{'loss': 1.115, 'grad_norm': 1.9051457643508911, 'learning_rate': 9.102564102564104e-06, 'epoch': 0.01} +{'loss': 1.1727, 'grad_norm': 2.168208360671997, 'learning_rate': 9.230769230769232e-06, 'epoch': 0.01} +{'loss': 1.1535, 'grad_norm': 2.2827489376068115, 'learning_rate': 9.358974358974359e-06, 'epoch': 0.01} +{'loss': 1.2497, 'grad_norm': 1.9166820049285889, 'learning_rate': 9.487179487179487e-06, 'epoch': 0.01} +{'loss': 1.1311, 'grad_norm': 2.0673937797546387, 'learning_rate': 9.615384615384616e-06, 'epoch': 0.01} +{'loss': 1.1514, 'grad_norm': 2.4237053394317627, 'learning_rate': 9.743589743589744e-06, 'epoch': 0.01} +{'loss': 1.2051, 'grad_norm': 1.9658292531967163, 'learning_rate': 9.871794871794872e-06, 'epoch': 0.01} +{'loss': 1.1259, 'grad_norm': 2.1604244709014893, 'learning_rate': 1e-05, 'epoch': 0.02} +{'loss': 1.2145, 'grad_norm': 1.995413899421692, 'learning_rate': 1.012820512820513e-05, 'epoch': 0.02} +{'loss': 1.1759, 'grad_norm': 2.4904634952545166, 'learning_rate': 1.0256410256410256e-05, 'epoch': 0.02} +{'loss': 1.243, 'grad_norm': 2.060880184173584, 'learning_rate': 1.0384615384615386e-05, 'epoch': 0.02} +{'loss': 1.1531, 'grad_norm': 2.1147937774658203, 'learning_rate': 1.0512820512820514e-05, 'epoch': 0.02} +{'loss': 0.9506, 'grad_norm': 1.6522949934005737, 'learning_rate': 1.0641025641025643e-05, 'epoch': 0.02} +{'loss': 1.1701, 'grad_norm': 2.2048704624176025, 'learning_rate': 1.076923076923077e-05, 'epoch': 0.02} +{'loss': 1.2011, 'grad_norm': 1.9545150995254517, 'learning_rate': 1.0897435897435898e-05, 'epoch': 0.02} +{'loss': 1.1103, 'grad_norm': 2.0530219078063965, 'learning_rate': 1.1025641025641028e-05, 'epoch': 0.02} +{'loss': 1.0949, 'grad_norm': 1.9556024074554443, 'learning_rate': 1.1153846153846154e-05, 'epoch': 0.02} +{'loss': 1.2281, 'grad_norm': 2.006627082824707, 'learning_rate': 1.1282051282051283e-05, 'epoch': 0.02} +{'loss': 1.1694, 'grad_norm': 2.2797443866729736, 'learning_rate': 1.1410256410256411e-05, 'epoch': 0.02} +{'loss': 1.1155, 'grad_norm': 1.9811910390853882, 'learning_rate': 1.1538461538461538e-05, 'epoch': 0.02} +{'loss': 1.1213, 'grad_norm': 2.027745485305786, 'learning_rate': 1.1666666666666668e-05, 'epoch': 0.02} +{'loss': 1.1145, 'grad_norm': 1.895713210105896, 'learning_rate': 1.1794871794871796e-05, 'epoch': 0.02} +{'loss': 1.1143, 'grad_norm': 2.2578670978546143, 'learning_rate': 1.1923076923076925e-05, 'epoch': 0.02} +{'loss': 0.9945, 'grad_norm': 1.4406471252441406, 'learning_rate': 1.2051282051282051e-05, 'epoch': 0.02} +{'loss': 1.1937, 'grad_norm': 2.3857421875, 'learning_rate': 1.217948717948718e-05, 'epoch': 0.02} +{'loss': 0.9247, 'grad_norm': 1.3613249063491821, 'learning_rate': 1.230769230769231e-05, 'epoch': 0.02} +{'loss': 1.1109, 'grad_norm': 2.0658631324768066, 'learning_rate': 1.2435897435897436e-05, 'epoch': 0.02} +{'loss': 1.1868, 'grad_norm': 2.048159122467041, 'learning_rate': 1.2564102564102565e-05, 'epoch': 0.02} +{'loss': 1.2179, 'grad_norm': 1.9883164167404175, 'learning_rate': 1.2692307692307693e-05, 'epoch': 0.02} +{'loss': 1.1249, 'grad_norm': 1.887096881866455, 'learning_rate': 1.2820512820512823e-05, 'epoch': 0.02} +{'loss': 1.1314, 'grad_norm': 1.8913002014160156, 'learning_rate': 1.294871794871795e-05, 'epoch': 0.02} +{'loss': 1.1349, 'grad_norm': 1.9547669887542725, 'learning_rate': 1.3076923076923078e-05, 'epoch': 0.02} +{'loss': 1.1626, 'grad_norm': 1.970658540725708, 'learning_rate': 1.3205128205128207e-05, 'epoch': 0.02} +{'loss': 1.1424, 'grad_norm': 2.0676755905151367, 'learning_rate': 1.3333333333333333e-05, 'epoch': 0.02} +{'loss': 1.005, 'grad_norm': 1.3007159233093262, 'learning_rate': 1.3461538461538463e-05, 'epoch': 0.02} +{'loss': 1.2203, 'grad_norm': 2.0430386066436768, 'learning_rate': 1.3589743589743592e-05, 'epoch': 0.02} +{'loss': 1.1334, 'grad_norm': 2.1048197746276855, 'learning_rate': 1.3717948717948718e-05, 'epoch': 0.02} +{'loss': 1.1844, 'grad_norm': 2.023083448410034, 'learning_rate': 1.3846153846153847e-05, 'epoch': 0.02} +{'loss': 1.1929, 'grad_norm': 2.017812490463257, 'learning_rate': 1.3974358974358975e-05, 'epoch': 0.02} +{'loss': 1.146, 'grad_norm': 1.9722820520401, 'learning_rate': 1.4102564102564105e-05, 'epoch': 0.02} +{'loss': 1.0048, 'grad_norm': 2.0664279460906982, 'learning_rate': 1.4230769230769232e-05, 'epoch': 0.02} +{'loss': 0.9728, 'grad_norm': 1.2306312322616577, 'learning_rate': 1.435897435897436e-05, 'epoch': 0.02} +{'loss': 1.1453, 'grad_norm': 2.1424176692962646, 'learning_rate': 1.4487179487179489e-05, 'epoch': 0.02} +{'loss': 1.2156, 'grad_norm': 2.203076124191284, 'learning_rate': 1.4615384615384615e-05, 'epoch': 0.02} +{'loss': 1.1461, 'grad_norm': 2.018467903137207, 'learning_rate': 1.4743589743589745e-05, 'epoch': 0.02} +{'loss': 1.1281, 'grad_norm': 1.85000741481781, 'learning_rate': 1.4871794871794874e-05, 'epoch': 0.02} +{'loss': 1.1198, 'grad_norm': 2.1398367881774902, 'learning_rate': 1.5000000000000002e-05, 'epoch': 0.02} +{'loss': 1.114, 'grad_norm': 2.32318115234375, 'learning_rate': 1.5128205128205129e-05, 'epoch': 0.02} +{'loss': 1.1538, 'grad_norm': 2.074957847595215, 'learning_rate': 1.5256410256410257e-05, 'epoch': 0.02} +{'loss': 1.1095, 'grad_norm': 2.0280978679656982, 'learning_rate': 1.5384615384615387e-05, 'epoch': 0.02} +{'loss': 1.0825, 'grad_norm': 2.0076022148132324, 'learning_rate': 1.5512820512820516e-05, 'epoch': 0.02} +{'loss': 1.2233, 'grad_norm': 2.159745454788208, 'learning_rate': 1.5641025641025644e-05, 'epoch': 0.02} +{'loss': 0.9763, 'grad_norm': 1.1482303142547607, 'learning_rate': 1.576923076923077e-05, 'epoch': 0.02} +{'loss': 1.1917, 'grad_norm': 2.2342629432678223, 'learning_rate': 1.5897435897435897e-05, 'epoch': 0.02} +{'loss': 1.1897, 'grad_norm': 2.173105478286743, 'learning_rate': 1.602564102564103e-05, 'epoch': 0.02} +{'loss': 1.2244, 'grad_norm': 1.921324372291565, 'learning_rate': 1.6153846153846154e-05, 'epoch': 0.02} +{'loss': 0.9404, 'grad_norm': 1.0997803211212158, 'learning_rate': 1.6282051282051282e-05, 'epoch': 0.02} +{'loss': 0.9621, 'grad_norm': 1.1733825206756592, 'learning_rate': 1.641025641025641e-05, 'epoch': 0.02} +{'loss': 1.1321, 'grad_norm': 2.0403177738189697, 'learning_rate': 1.653846153846154e-05, 'epoch': 0.02} +{'loss': 1.1288, 'grad_norm': 2.2034287452697754, 'learning_rate': 1.6666666666666667e-05, 'epoch': 0.03} +{'loss': 1.1416, 'grad_norm': 1.9954860210418701, 'learning_rate': 1.6794871794871796e-05, 'epoch': 0.03} +{'loss': 1.1004, 'grad_norm': 2.0325286388397217, 'learning_rate': 1.6923076923076924e-05, 'epoch': 0.03} +{'loss': 1.136, 'grad_norm': 2.074540853500366, 'learning_rate': 1.7051282051282053e-05, 'epoch': 0.03} +{'loss': 1.1463, 'grad_norm': 2.2531938552856445, 'learning_rate': 1.717948717948718e-05, 'epoch': 0.03} +{'loss': 1.2, 'grad_norm': 2.1192843914031982, 'learning_rate': 1.730769230769231e-05, 'epoch': 0.03} +{'loss': 1.1567, 'grad_norm': 1.9313995838165283, 'learning_rate': 1.7435897435897438e-05, 'epoch': 0.03} +{'loss': 1.1183, 'grad_norm': 2.152517080307007, 'learning_rate': 1.7564102564102566e-05, 'epoch': 0.03} +{'loss': 1.1914, 'grad_norm': 2.020488977432251, 'learning_rate': 1.7692307692307694e-05, 'epoch': 0.03} +{'loss': 1.1844, 'grad_norm': 2.1419143676757812, 'learning_rate': 1.7820512820512823e-05, 'epoch': 0.03} +{'loss': 1.1305, 'grad_norm': 2.026850938796997, 'learning_rate': 1.794871794871795e-05, 'epoch': 0.03} +{'loss': 1.1148, 'grad_norm': 1.9881304502487183, 'learning_rate': 1.807692307692308e-05, 'epoch': 0.03} +{'loss': 1.094, 'grad_norm': 1.998692274093628, 'learning_rate': 1.8205128205128208e-05, 'epoch': 0.03} +{'loss': 1.1346, 'grad_norm': 2.50514817237854, 'learning_rate': 1.8333333333333333e-05, 'epoch': 0.03} +{'loss': 1.1131, 'grad_norm': 2.39231538772583, 'learning_rate': 1.8461538461538465e-05, 'epoch': 0.03} +{'loss': 1.1793, 'grad_norm': 1.901489019393921, 'learning_rate': 1.8589743589743593e-05, 'epoch': 0.03} +{'loss': 1.1929, 'grad_norm': 2.197298049926758, 'learning_rate': 1.8717948717948718e-05, 'epoch': 0.03} +{'loss': 1.1342, 'grad_norm': 2.0326902866363525, 'learning_rate': 1.8846153846153846e-05, 'epoch': 0.03} +{'loss': 1.0769, 'grad_norm': 2.216961622238159, 'learning_rate': 1.8974358974358975e-05, 'epoch': 0.03} +{'loss': 0.925, 'grad_norm': 1.140304684638977, 'learning_rate': 1.9102564102564106e-05, 'epoch': 0.03} +{'loss': 1.1478, 'grad_norm': 2.226663827896118, 'learning_rate': 1.923076923076923e-05, 'epoch': 0.03} +{'loss': 1.1396, 'grad_norm': 2.1292662620544434, 'learning_rate': 1.935897435897436e-05, 'epoch': 0.03} +{'loss': 1.1398, 'grad_norm': 2.017021656036377, 'learning_rate': 1.9487179487179488e-05, 'epoch': 0.03} +{'loss': 1.1515, 'grad_norm': 2.2312324047088623, 'learning_rate': 1.9615384615384617e-05, 'epoch': 0.03} +{'loss': 1.1344, 'grad_norm': 1.9998685121536255, 'learning_rate': 1.9743589743589745e-05, 'epoch': 0.03} +{'loss': 0.9517, 'grad_norm': 1.1204949617385864, 'learning_rate': 1.9871794871794873e-05, 'epoch': 0.03} +{'loss': 0.8972, 'grad_norm': 1.111180305480957, 'learning_rate': 2e-05, 'epoch': 0.03} +{'loss': 1.1266, 'grad_norm': 2.190965414047241, 'learning_rate': 1.9999998057286928e-05, 'epoch': 0.03} +{'loss': 1.1165, 'grad_norm': 2.132208824157715, 'learning_rate': 1.9999992229148463e-05, 'epoch': 0.03} +{'loss': 1.0923, 'grad_norm': 2.4220693111419678, 'learning_rate': 1.999998251558687e-05, 'epoch': 0.03} +{'loss': 1.1371, 'grad_norm': 2.010136127471924, 'learning_rate': 1.999996891660592e-05, 'epoch': 0.03} +{'loss': 1.0209, 'grad_norm': 2.1428442001342773, 'learning_rate': 1.9999951432210905e-05, 'epoch': 0.03} +{'loss': 1.08, 'grad_norm': 2.006835460662842, 'learning_rate': 1.9999930062408612e-05, 'epoch': 0.03} +{'loss': 1.072, 'grad_norm': 2.074542760848999, 'learning_rate': 1.9999904807207348e-05, 'epoch': 0.03} +{'loss': 1.0841, 'grad_norm': 1.9647022485733032, 'learning_rate': 1.9999875666616918e-05, 'epoch': 0.03} +{'loss': 1.1265, 'grad_norm': 2.094486713409424, 'learning_rate': 1.9999842640648656e-05, 'epoch': 0.03} +{'loss': 1.1326, 'grad_norm': 2.219172477722168, 'learning_rate': 1.9999805729315383e-05, 'epoch': 0.03} +{'loss': 1.0738, 'grad_norm': 1.8253027200698853, 'learning_rate': 1.999976493263145e-05, 'epoch': 0.03} +{'loss': 1.0201, 'grad_norm': 2.1188716888427734, 'learning_rate': 1.99997202506127e-05, 'epoch': 0.03} +{'loss': 1.1132, 'grad_norm': 1.8973318338394165, 'learning_rate': 1.99996716832765e-05, 'epoch': 0.03} +{'loss': 1.0555, 'grad_norm': 2.0979061126708984, 'learning_rate': 1.9999619230641714e-05, 'epoch': 0.03} +{'loss': 1.0812, 'grad_norm': 2.499190330505371, 'learning_rate': 1.999956289272873e-05, 'epoch': 0.03} +{'loss': 1.0962, 'grad_norm': 2.0691018104553223, 'learning_rate': 1.9999502669559432e-05, 'epoch': 0.03} +{'loss': 1.1312, 'grad_norm': 2.364868640899658, 'learning_rate': 1.999943856115722e-05, 'epoch': 0.03} +{'loss': 1.1411, 'grad_norm': 1.8841214179992676, 'learning_rate': 1.9999370567547008e-05, 'epoch': 0.03} +{'loss': 1.1903, 'grad_norm': 2.040099859237671, 'learning_rate': 1.999929868875521e-05, 'epoch': 0.03} +{'loss': 1.0833, 'grad_norm': 2.3591747283935547, 'learning_rate': 1.999922292480975e-05, 'epoch': 0.03} +{'loss': 1.1074, 'grad_norm': 1.877581000328064, 'learning_rate': 1.999914327574007e-05, 'epoch': 0.03} +{'loss': 1.1314, 'grad_norm': 1.9595013856887817, 'learning_rate': 1.999905974157712e-05, 'epoch': 0.03} +{'loss': 1.1579, 'grad_norm': 2.207402229309082, 'learning_rate': 1.9998972322353353e-05, 'epoch': 0.03} +{'loss': 0.9843, 'grad_norm': 2.2019686698913574, 'learning_rate': 1.9998881018102735e-05, 'epoch': 0.03} +{'loss': 1.1583, 'grad_norm': 2.210948944091797, 'learning_rate': 1.9998785828860744e-05, 'epoch': 0.03} +{'loss': 1.031, 'grad_norm': 1.94996976852417, 'learning_rate': 1.999868675466436e-05, 'epoch': 0.04} +{'loss': 1.1142, 'grad_norm': 2.045884370803833, 'learning_rate': 1.9998583795552086e-05, 'epoch': 0.04} +{'loss': 1.1559, 'grad_norm': 2.313939094543457, 'learning_rate': 1.9998476951563914e-05, 'epoch': 0.04} +{'loss': 0.9635, 'grad_norm': 1.1143927574157715, 'learning_rate': 1.999836622274137e-05, 'epoch': 0.04} +{'loss': 1.2159, 'grad_norm': 1.903009295463562, 'learning_rate': 1.9998251609127465e-05, 'epoch': 0.04} +{'loss': 1.0963, 'grad_norm': 1.9762166738510132, 'learning_rate': 1.9998133110766744e-05, 'epoch': 0.04} +{'loss': 1.1517, 'grad_norm': 1.9005874395370483, 'learning_rate': 1.9998010727705237e-05, 'epoch': 0.04} +{'loss': 1.1811, 'grad_norm': 2.1902811527252197, 'learning_rate': 1.9997884459990502e-05, 'epoch': 0.04} +{'loss': 1.0531, 'grad_norm': 2.223172903060913, 'learning_rate': 1.9997754307671597e-05, 'epoch': 0.04} +{'loss': 1.0264, 'grad_norm': 1.78392493724823, 'learning_rate': 1.999762027079909e-05, 'epoch': 0.04} +{'loss': 1.1008, 'grad_norm': 2.302363157272339, 'learning_rate': 1.999748234942507e-05, 'epoch': 0.04} +{'loss': 1.0804, 'grad_norm': 2.0841126441955566, 'learning_rate': 1.999734054360311e-05, 'epoch': 0.04} +{'loss': 1.1621, 'grad_norm': 2.021899938583374, 'learning_rate': 1.9997194853388316e-05, 'epoch': 0.04} +{'loss': 1.1416, 'grad_norm': 1.9690407514572144, 'learning_rate': 1.9997045278837298e-05, 'epoch': 0.04} +{'loss': 1.1493, 'grad_norm': 1.9700400829315186, 'learning_rate': 1.9996891820008165e-05, 'epoch': 0.04} +{'loss': 1.0752, 'grad_norm': 2.160266637802124, 'learning_rate': 1.9996734476960547e-05, 'epoch': 0.04} +{'loss': 1.0964, 'grad_norm': 2.356799840927124, 'learning_rate': 1.9996573249755573e-05, 'epoch': 0.04} +{'loss': 1.1226, 'grad_norm': 2.0327095985412598, 'learning_rate': 1.9996408138455896e-05, 'epoch': 0.04} +{'loss': 1.0983, 'grad_norm': 3.0659170150756836, 'learning_rate': 1.999623914312566e-05, 'epoch': 0.04} +{'loss': 1.0875, 'grad_norm': 2.049588441848755, 'learning_rate': 1.9996066263830533e-05, 'epoch': 0.04} +{'loss': 1.1423, 'grad_norm': 2.2685041427612305, 'learning_rate': 1.999588950063768e-05, 'epoch': 0.04} +{'loss': 1.1237, 'grad_norm': 2.1024460792541504, 'learning_rate': 1.9995708853615783e-05, 'epoch': 0.04} +{'loss': 1.1537, 'grad_norm': 1.8129985332489014, 'learning_rate': 1.9995524322835035e-05, 'epoch': 0.04} +{'loss': 1.111, 'grad_norm': 2.0320887565612793, 'learning_rate': 1.9995335908367132e-05, 'epoch': 0.04} +{'loss': 1.0067, 'grad_norm': 1.8262051343917847, 'learning_rate': 1.9995143610285275e-05, 'epoch': 0.04} +{'loss': 1.1224, 'grad_norm': 2.10689115524292, 'learning_rate': 1.9994947428664194e-05, 'epoch': 0.04} +{'loss': 1.1291, 'grad_norm': 2.1416141986846924, 'learning_rate': 1.99947473635801e-05, 'epoch': 0.04} +{'loss': 1.0204, 'grad_norm': 2.124711275100708, 'learning_rate': 1.9994543415110733e-05, 'epoch': 0.04} +{'loss': 1.1063, 'grad_norm': 2.1166152954101562, 'learning_rate': 1.9994335583335336e-05, 'epoch': 0.04} +{'loss': 1.061, 'grad_norm': 1.729125738143921, 'learning_rate': 1.9994123868334655e-05, 'epoch': 0.04} +{'loss': 1.1302, 'grad_norm': 1.9856681823730469, 'learning_rate': 1.999390827019096e-05, 'epoch': 0.04} +{'loss': 1.1118, 'grad_norm': 2.037698268890381, 'learning_rate': 1.999368878898801e-05, 'epoch': 0.04} +{'loss': 1.2069, 'grad_norm': 2.082167148590088, 'learning_rate': 1.999346542481109e-05, 'epoch': 0.04} +{'loss': 1.0668, 'grad_norm': 2.0513463020324707, 'learning_rate': 1.999323817774698e-05, 'epoch': 0.04} +{'loss': 1.0668, 'grad_norm': 1.8028476238250732, 'learning_rate': 1.9993007047883988e-05, 'epoch': 0.04} +{'loss': 1.0504, 'grad_norm': 2.1010804176330566, 'learning_rate': 1.9992772035311902e-05, 'epoch': 0.04} +{'loss': 1.0689, 'grad_norm': 1.9072201251983643, 'learning_rate': 1.9992533140122044e-05, 'epoch': 0.04} +{'loss': 1.074, 'grad_norm': 2.158442974090576, 'learning_rate': 1.9992290362407232e-05, 'epoch': 0.04} +{'loss': 1.1394, 'grad_norm': 1.8606246709823608, 'learning_rate': 1.9992043702261795e-05, 'epoch': 0.04} +{'loss': 1.0943, 'grad_norm': 2.1072592735290527, 'learning_rate': 1.999179315978157e-05, 'epoch': 0.04} +{'loss': 1.0793, 'grad_norm': 2.0352225303649902, 'learning_rate': 1.9991538735063906e-05, 'epoch': 0.04} +{'loss': 1.1638, 'grad_norm': 2.1362555027008057, 'learning_rate': 1.999128042820766e-05, 'epoch': 0.04} +{'loss': 1.1725, 'grad_norm': 2.0524795055389404, 'learning_rate': 1.999101823931319e-05, 'epoch': 0.04} +{'loss': 1.0912, 'grad_norm': 1.973961591720581, 'learning_rate': 1.999075216848237e-05, 'epoch': 0.04} +{'loss': 1.2127, 'grad_norm': 2.0757157802581787, 'learning_rate': 1.999048221581858e-05, 'epoch': 0.04} +{'loss': 1.0485, 'grad_norm': 2.2070868015289307, 'learning_rate': 1.9990208381426708e-05, 'epoch': 0.04} +{'loss': 1.0384, 'grad_norm': 2.063763380050659, 'learning_rate': 1.9989930665413148e-05, 'epoch': 0.04} +{'loss': 1.1655, 'grad_norm': 2.2438125610351562, 'learning_rate': 1.9989649067885808e-05, 'epoch': 0.04} +{'loss': 1.0873, 'grad_norm': 2.054109811782837, 'learning_rate': 1.99893635889541e-05, 'epoch': 0.04} +{'loss': 1.0926, 'grad_norm': 2.274214029312134, 'learning_rate': 1.9989074228728942e-05, 'epoch': 0.04} +{'loss': 1.1451, 'grad_norm': 2.40297532081604, 'learning_rate': 1.998878098732277e-05, 'epoch': 0.04} +{'loss': 1.1457, 'grad_norm': 1.9875500202178955, 'learning_rate': 1.998848386484951e-05, 'epoch': 0.04} +{'loss': 1.1048, 'grad_norm': 1.7313205003738403, 'learning_rate': 1.9988182861424613e-05, 'epoch': 0.05} +{'loss': 1.1879, 'grad_norm': 1.8364102840423584, 'learning_rate': 1.998787797716503e-05, 'epoch': 0.05} +{'loss': 1.0332, 'grad_norm': 2.029669761657715, 'learning_rate': 1.9987569212189224e-05, 'epoch': 0.05} +{'loss': 1.1231, 'grad_norm': 2.001997470855713, 'learning_rate': 1.9987256566617165e-05, 'epoch': 0.05} +{'loss': 1.1591, 'grad_norm': 2.0194694995880127, 'learning_rate': 1.9986940040570323e-05, 'epoch': 0.05} +{'loss': 1.0383, 'grad_norm': 2.448617935180664, 'learning_rate': 1.9986619634171682e-05, 'epoch': 0.05} +{'loss': 1.0934, 'grad_norm': 1.8933806419372559, 'learning_rate': 1.9986295347545738e-05, 'epoch': 0.05} +{'loss': 0.9531, 'grad_norm': 1.1100465059280396, 'learning_rate': 1.9985967180818493e-05, 'epoch': 0.05} +{'loss': 1.0524, 'grad_norm': 2.1387996673583984, 'learning_rate': 1.9985635134117445e-05, 'epoch': 0.05} +{'loss': 1.0888, 'grad_norm': 2.0161595344543457, 'learning_rate': 1.9985299207571614e-05, 'epoch': 0.05} +{'loss': 1.1353, 'grad_norm': 1.9262018203735352, 'learning_rate': 1.998495940131152e-05, 'epoch': 0.05} +{'loss': 1.0808, 'grad_norm': 2.0804708003997803, 'learning_rate': 1.998461571546919e-05, 'epoch': 0.05} +{'loss': 1.1381, 'grad_norm': 1.9080052375793457, 'learning_rate': 1.998426815017817e-05, 'epoch': 0.05} +{'loss': 1.1836, 'grad_norm': 1.9068005084991455, 'learning_rate': 1.998391670557349e-05, 'epoch': 0.05} +{'loss': 1.0609, 'grad_norm': 1.915165662765503, 'learning_rate': 1.998356138179171e-05, 'epoch': 0.05} +{'loss': 1.1005, 'grad_norm': 2.2248706817626953, 'learning_rate': 1.998320217897089e-05, 'epoch': 0.05} +{'loss': 1.0812, 'grad_norm': 1.8674229383468628, 'learning_rate': 1.9982839097250588e-05, 'epoch': 0.05} +{'loss': 0.9959, 'grad_norm': 1.7297526597976685, 'learning_rate': 1.998247213677188e-05, 'epoch': 0.05} +{'loss': 0.9548, 'grad_norm': 1.0943597555160522, 'learning_rate': 1.998210129767735e-05, 'epoch': 0.05} +{'loss': 1.0855, 'grad_norm': 2.0170698165893555, 'learning_rate': 1.9981726580111087e-05, 'epoch': 0.05} +{'loss': 1.0936, 'grad_norm': 2.2094550132751465, 'learning_rate': 1.998134798421867e-05, 'epoch': 0.05} +{'loss': 1.0249, 'grad_norm': 1.9037797451019287, 'learning_rate': 1.9980965510147214e-05, 'epoch': 0.05} +{'loss': 1.1117, 'grad_norm': 1.8956867456436157, 'learning_rate': 1.9980579158045322e-05, 'epoch': 0.05} +{'loss': 1.0112, 'grad_norm': 1.036456823348999, 'learning_rate': 1.9980188928063103e-05, 'epoch': 0.05} +WARNING: tokenization mismatch: 0 vs. 1211. (ignored) +number of rounds: 2 +rounds: ["A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. USER: ASSISTANT: Du bist eine auf Kundenfeedback spezialisierte Agentur mit den folgenden Kenntnissen und Eigenschaften.\n\nKENNTNISSE: Kundenfeedback, wissenschaftliche Datenerhebung, Fragebogendesign, Benutzerfreundlichkeit, Kundenbindung, Service-Qualitätsevaluation.\n\nEIGENSCHAFTEN: Präzision, Effizienz, Empathie, Anpassungsfähigkeit, Datenorientierung, Kundenfokus.\n\nAUFGABE: Nutze dein Wissen, um ein Kundenfeedback-Formular zu entwickeln, das die Service-Qualität einer Steuerberatung evaluiert. Das Formular soll auf wissenschaftlich fundierten Datenniveaus basieren und dabei darauf achten, dass es nur so lang ist, dass es tatsächlich vom Kunden ausgefüllt wird. Konzentriere dich auf die Gestaltung eines benutzerfreundlichen und effizienten Fragebogens, der aussagekräftige Informationen über die Steuerberatung liefert.\n\nOUTPUT: Ein strukturiertes Kundenfeedback-Formular, das auf wissenschaftlich fundierten Datenniveaus basiert und eine angemessene Länge hat, um von Kunden tatsächlich ausgefüllt zu werden, und speziell auf die Evaluation der Service-Qualität einer Steuerberatung abzielt. ASSISTANT: Kundenfeedback-Formular: Steuerberatung Service-Qualität\n\nTeil 1: Allgemeine Informationen\n(Bitte füllen Sie die folgenden Informationen aus.)\n\n1. Name (Optional):\n2. Alter:\n☐ Unter 25\n☐ 25-34\n☐ 35-44\n☐ 45-54\n☐ 55-64\n☐ 65 oder älter\n3. Geschlecht:\n☐ Männlich\n☐ Weiblich\n☐ Andere\n☐ Lieber nicht angeben\n\nTeil 2: Service-Qualität\n(Bitte bewerten Sie die folgenden Aspekte der Steuerberatung auf einer Skala von 1 bis 5, wobei 1 = sehr unzufrieden und 5 = sehr zufrieden.)\n\n1. Fachliche Kompetenz des Steuerberaters:\n☐ 1\n☐ 2\n☐ 3\n☐ 4\n☐ 5\n2. Freundlichkeit und Hilfsbereitschaft des Personals:\n☐ 1\n☐ 2\n☐ 3\n☐ 4\n☐ 5\n3. Reaktionsgeschwindigkeit auf Anfragen und Probleme:\n☐ 1\n☐ 2\n☐ 3\n☐ 4\n☐ 5\n4. Verständlichkeit der erbrachten Informationen und Beratung:\n☐ 1\n☐ 2\n☐ 3\n☐ 4\n☐ 5\n5. Qualität der bereitgestellten Dienstleistungen und Lösungen:\n☐ 1\n☐ 2\n☐ 3\n☐ 4\n☐ 5\n6. Preis-Leistungs-Verhältnis:\n☐ 1\n☐ 2\n☐ 3\n☐ 4\n☐ 5\n\nTeil 3: Kundenbindung und Zufriedenheit\n(Bitte beantworten Sie die folgenden Fragen.)\n\n1. Wie wahrscheinlich ist es, dass Sie die Steuerberatung an Freunde, Familie oder Kollegen weiterempfehlen? (Skala von 1 bis 10, wobei 1 = sehr unwahrscheinlich und 10 = sehr wahrscheinlich)\n☐ 1\n☐ 2\n☐ 3\n☐ 4\n☐ 5\n☐ 6\n☐ 7\n☐ 8\n☐ 9\n☐ 10\n2. Wie zufrieden sind Sie insgesamt mit der Steuerberatung? (Skala von 1 bis 5, wobei 1 = sehr unzufrieden und 5 = sehr zufrieden)\n☐ 1\n☐ 2\n☐ 3\n☐ 4\n☐ 5\n\nTeil 4: Offene Fragen\n(Bitte teilen Sie uns Ihre Gedanken und Erfahrungen mit.)\n\n1. Was hat Ihnen am meisten an der Steuerberatung gefallen? (Optional)\n2. Was könnte die Steuerberatung verbessern, um Ihre Zufriedenheit zu erhöhen? (Optional)\n\nVielen Dank für Ihre Zeit und Ihr", 'USER: Continue. ASSISTANT: wertvolles Feedback! Ihre Meinung hilft uns dabei, den Service und die Qualität unserer Steuerberatung kontinuierlich zu verbessern. Wenn Sie Fragen oder Anmerkungen haben oder zusätzliche Informationen benötigen, zögern Sie bitte nicht, uns zu kontaktieren.\n\nTeil 5: Kontaktaufnahme (Optional)\n(Bitte geben Sie Ihre Kontaktinformationen an, falls Sie möchten, dass wir Ihnen bezüglich Ihres Feedbacks antworten oder weitere Informationen anbieten.)\n\n1. Name:\n2. E-Mail-Adresse:\n3. Telefonnummer:\n\nHinweis zum Datenschutz: Ihre persönlichen Daten werden vertraulich behandelt und nur für interne Zwecke verwendet. Sie werden nicht an Dritte weitergegeben.\n\nBitte klicken Sie auf "Absenden", um das Formular abzuschließen. Nochmals vielen Dank für Ihre Rückmeldung und Ihre Unterstützung bei der Verbesserung unserer Steuerberatung!'] +prompt: A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. USER: ASSISTANT: Du bist eine auf Kundenfeedback spezialisierte Agentur mit den folgenden Kenntnissen und Eigenschaften. + +KENNTNISSE: Kundenfeedback, wissenschaftliche Datenerhebung, Fragebogendesign, Benutzerfreundlichkeit, Kundenbindung, Service-Qualitätsevaluation. + +EIGENSCHAFTEN: Präzision, Effizienz, Empathie, Anpassungsfähigkeit, Datenorientierung, Kundenfokus. + +AUFGABE: Nutze dein Wissen, um ein Kundenfeedback-Formular zu entwickeln, das die Service-Qualität einer Steuerberatung evaluiert. Das Formular soll auf wissenschaftlich fundierten Datenniveaus basieren und dabei darauf achten, dass es nur so lang ist, dass es tatsächlich vom Kunden ausgefüllt wird. Konzentriere dich auf die Gestaltung eines benutzerfreundlichen und effizienten Fragebogens, der aussagekräftige Informationen über die Steuerberatung liefert. + +OUTPUT: Ein strukturiertes Kundenfeedback-Formular, das auf wissenschaftlich fundierten Datenniveaus basiert und eine angemessene Länge hat, um von Kunden tatsächlich ausgefüllt zu werden, und speziell auf die Evaluation der Service-Qualität einer Steuerberatung abzielt. ASSISTANT: Kundenfeedback-Formular: Steuerberatung Service-Qualität + +Teil 1: Allgemeine Informationen +(Bitte füllen Sie die folgenden Informationen aus.) + +1. Name (Optional): +2. Alter: +☐ Unter 25 +☐ 25-34 +☐ 35-44 +☐ 45-54 +☐ 55-64 +☐ 65 oder älter +3. Geschlecht: +☐ Männlich +☐ Weiblich +☐ Andere +☐ Lieber nicht angeben + +Teil 2: Service-Qualität +(Bitte bewerten Sie die folgenden Aspekte der Steuerberatung auf einer Skala von 1 bis 5, wobei 1 = sehr unzufrieden und 5 = sehr zufrieden.) + +1. Fachliche Kompetenz des Steuerberaters: +☐ 1 +☐ 2 +☐ 3 +☐ 4 +☐ 5 +2. Freundlichkeit und Hilfsbereitschaft des Personals: +☐ 1 +☐ 2 +☐ 3 +☐ 4 +☐ 5 +3. Reaktionsgeschwindigkeit auf Anfragen und Probleme: +☐ 1 +☐ 2 +☐ 3 +☐ 4 +☐ 5 +4. Verständlichkeit der erbrachten Informationen und Beratung: +☐ 1 +☐ 2 +☐ 3 +☐ 4 +☐ 5 +5. Qualität der bereitgestellten Dienstleistungen und Lösungen: +☐ 1 +☐ 2 +☐ 3 +☐ 4 +☐ 5 +6. Preis-Leistungs-Verhältnis: +☐ 1 +☐ 2 +☐ 3 +☐ 4 +☐ 5 + +Teil 3: Kundenbindung und Zufriedenheit +(Bitte beantworten Sie die folgenden Fragen.) + +1. Wie wahrscheinlich ist es, dass Sie die Steuerberatung an Freunde, Familie oder Kollegen weiterempfehlen? (Skala von 1 bis 10, wobei 1 = sehr unwahrscheinlich und 10 = sehr wahrscheinlich) +☐ 1 +☐ 2 +☐ 3 +☐ 4 +☐ 5 +☐ 6 +☐ 7 +☐ 8 +☐ 9 +☐ 10 +2. Wie zufrieden sind Sie insgesamt mit der Steuerberatung? (Skala von 1 bis 5, wobei 1 = sehr unzufrieden und 5 = sehr zufrieden) +☐ 1 +☐ 2 +☐ 3 +☐ 4 +☐ 5 + +Teil 4: Offene Fragen +(Bitte teilen Sie uns Ihre Gedanken und Erfahrungen mit.) + +1. Was hat Ihnen am meisten an der Steuerberatung gefallen? (Optional) +2. Was könnte die Steuerberatung verbessern, um Ihre Zufriedenheit zu erhöhen? (Optional) + +Vielen Dank für Ihre Zeit und Ihr<|im_end|>USER: Continue. ASSISTANT: wertvolles Feedback! Ihre Meinung hilft uns dabei, den Service und die Qualität unserer Steuerberatung kontinuierlich zu verbessern. Wenn Sie Fragen oder Anmerkungen haben oder zusätzliche Informationen benötigen, zögern Sie bitte nicht, uns zu kontaktieren. + +Teil 5: Kontaktaufnahme (Optional) +(Bitte geben Sie Ihre Kontaktinformationen an, falls Sie möchten, dass wir Ihnen bezüglich Ihres Feedbacks antworten oder weitere Informationen anbieten.) + +1. Name: +2. E-Mail-Adresse: +3. Telefonnummer: + +Hinweis zum Datenschutz: Ihre persönlichen Daten werden vertraulich behandelt und nur für interne Zwecke verwendet. Sie werden nicht an Dritte weitergegeben. + +Bitte klicken Sie auf "Absenden", um das Formular abzuschließen. Nochmals vielen Dank für Ihre Rückmeldung und Ihre Unterstützung bei der Verbesserung unserer Steuerberatung!<|im_end|> +tensor([-100, -100, -100, ..., -100, -100, -100]) +tensor([ 32, 6236, 1948, ..., 2185, 0, 151645]) +{'loss': 1.1047, 'grad_norm': 1.9397242069244385, 'learning_rate': 1.9979794820352187e-05, 'epoch': 0.05} +{'loss': 1.0315, 'grad_norm': 2.1860578060150146, 'learning_rate': 1.9979396835065696e-05, 'epoch': 0.05} +{'loss': 1.1123, 'grad_norm': 1.8199071884155273, 'learning_rate': 1.9978994972358265e-05, 'epoch': 0.05} +{'loss': 1.1737, 'grad_norm': 2.0878307819366455, 'learning_rate': 1.9978589232386036e-05, 'epoch': 0.05} +{'loss': 1.1193, 'grad_norm': 2.149130344390869, 'learning_rate': 1.9978179615306658e-05, 'epoch': 0.05} +{'loss': 1.1536, 'grad_norm': 1.7451056241989136, 'learning_rate': 1.997776612127928e-05, 'epoch': 0.05} +{'loss': 1.106, 'grad_norm': 2.0781421661376953, 'learning_rate': 1.997734875046456e-05, 'epoch': 0.05} +{'loss': 1.0439, 'grad_norm': 1.9924591779708862, 'learning_rate': 1.9976927503024677e-05, 'epoch': 0.05} +{'loss': 1.0725, 'grad_norm': 1.8670432567596436, 'learning_rate': 1.997650237912329e-05, 'epoch': 0.05} +{'loss': 1.0405, 'grad_norm': 1.8259646892547607, 'learning_rate': 1.9976073378925586e-05, 'epoch': 0.05} +{'loss': 1.0982, 'grad_norm': 1.9916901588439941, 'learning_rate': 1.9975640502598243e-05, 'epoch': 0.05} +{'loss': 1.1468, 'grad_norm': 1.8680857419967651, 'learning_rate': 1.997520375030946e-05, 'epoch': 0.05} +{'loss': 1.134, 'grad_norm': 2.3491151332855225, 'learning_rate': 1.9974763122228927e-05, 'epoch': 0.05} +{'loss': 1.1176, 'grad_norm': 2.0595386028289795, 'learning_rate': 1.997431861852785e-05, 'epoch': 0.05} +{'loss': 1.1129, 'grad_norm': 2.0007033348083496, 'learning_rate': 1.9973870239378938e-05, 'epoch': 0.05} +{'loss': 1.1284, 'grad_norm': 1.9455167055130005, 'learning_rate': 1.9973417984956404e-05, 'epoch': 0.05} +{'loss': 1.1369, 'grad_norm': 1.8305734395980835, 'learning_rate': 1.997296185543597e-05, 'epoch': 0.05} +{'loss': 0.9114, 'grad_norm': 1.0071614980697632, 'learning_rate': 1.9972501850994857e-05, 'epoch': 0.05} +{'loss': 1.1351, 'grad_norm': 1.9528825283050537, 'learning_rate': 1.9972037971811802e-05, 'epoch': 0.05} +{'loss': 0.9553, 'grad_norm': 1.0603333711624146, 'learning_rate': 1.997157021806704e-05, 'epoch': 0.05} +{'loss': 1.0726, 'grad_norm': 2.001891851425171, 'learning_rate': 1.9971098589942314e-05, 'epoch': 0.05} +{'loss': 1.1241, 'grad_norm': 1.7532188892364502, 'learning_rate': 1.9970623087620867e-05, 'epoch': 0.05} +{'loss': 1.068, 'grad_norm': 1.9638525247573853, 'learning_rate': 1.997014371128746e-05, 'epoch': 0.05} +{'loss': 1.0596, 'grad_norm': 1.8798953294754028, 'learning_rate': 1.996966046112834e-05, 'epoch': 0.05} +{'loss': 1.0751, 'grad_norm': 1.8201245069503784, 'learning_rate': 1.9969173337331283e-05, 'epoch': 0.05} +{'loss': 1.107, 'grad_norm': 1.845236897468567, 'learning_rate': 1.9968682340085546e-05, 'epoch': 0.05} +{'loss': 1.0952, 'grad_norm': 2.2483530044555664, 'learning_rate': 1.996818746958191e-05, 'epoch': 0.05} +{'loss': 1.066, 'grad_norm': 2.0356900691986084, 'learning_rate': 1.996768872601265e-05, 'epoch': 0.05} +{'loss': 1.1225, 'grad_norm': 1.838862657546997, 'learning_rate': 1.996718610957155e-05, 'epoch': 0.06} +{'loss': 1.066, 'grad_norm': 1.8150132894515991, 'learning_rate': 1.9966679620453903e-05, 'epoch': 0.06} +{'loss': 1.1009, 'grad_norm': 1.8981868028640747, 'learning_rate': 1.9966169258856488e-05, 'epoch': 0.06} +{'loss': 1.099, 'grad_norm': 1.955868124961853, 'learning_rate': 1.9965655024977617e-05, 'epoch': 0.06} +{'loss': 1.0287, 'grad_norm': 1.846192717552185, 'learning_rate': 1.996513691901708e-05, 'epoch': 0.06} +{'loss': 0.9961, 'grad_norm': 1.7783055305480957, 'learning_rate': 1.9964614941176194e-05, 'epoch': 0.06} +{'loss': 1.0306, 'grad_norm': 1.9272091388702393, 'learning_rate': 1.996408909165776e-05, 'epoch': 0.06} +{'loss': 1.0241, 'grad_norm': 1.9080709218978882, 'learning_rate': 1.99635593706661e-05, 'epoch': 0.06} +{'loss': 1.0523, 'grad_norm': 1.8557677268981934, 'learning_rate': 1.996302577840703e-05, 'epoch': 0.06} +{'loss': 0.9096, 'grad_norm': 1.0520893335342407, 'learning_rate': 1.9962488315087872e-05, 'epoch': 0.06} +{'loss': 1.058, 'grad_norm': 1.9271793365478516, 'learning_rate': 1.9961946980917457e-05, 'epoch': 0.06} +{'loss': 0.9968, 'grad_norm': 2.0063111782073975, 'learning_rate': 1.9961401776106113e-05, 'epoch': 0.06} +{'loss': 1.1263, 'grad_norm': 1.998648762702942, 'learning_rate': 1.996085270086568e-05, 'epoch': 0.06} +{'loss': 1.0912, 'grad_norm': 1.8948326110839844, 'learning_rate': 1.996029975540949e-05, 'epoch': 0.06} +{'loss': 1.186, 'grad_norm': 2.044820547103882, 'learning_rate': 1.9959742939952393e-05, 'epoch': 0.06} +{'loss': 1.0976, 'grad_norm': 2.0060698986053467, 'learning_rate': 1.995918225471073e-05, 'epoch': 0.06} +{'loss': 1.1375, 'grad_norm': 1.9796364307403564, 'learning_rate': 1.9958617699902354e-05, 'epoch': 0.06} +{'loss': 1.0859, 'grad_norm': 2.122551679611206, 'learning_rate': 1.9958049275746617e-05, 'epoch': 0.06} +{'loss': 1.1212, 'grad_norm': 1.786564826965332, 'learning_rate': 1.9957476982464382e-05, 'epoch': 0.06} +{'loss': 0.9609, 'grad_norm': 1.9185203313827515, 'learning_rate': 1.9956900820278e-05, 'epoch': 0.06} +{'loss': 1.1476, 'grad_norm': 2.0406312942504883, 'learning_rate': 1.9956320789411338e-05, 'epoch': 0.06} +{'loss': 1.06, 'grad_norm': 1.8319671154022217, 'learning_rate': 1.9955736890089768e-05, 'epoch': 0.06} +{'loss': 1.048, 'grad_norm': 1.9506793022155762, 'learning_rate': 1.995514912254015e-05, 'epoch': 0.06} +{'loss': 1.0628, 'grad_norm': 1.7802228927612305, 'learning_rate': 1.995455748699087e-05, 'epoch': 0.06} +{'loss': 1.1178, 'grad_norm': 1.8466390371322632, 'learning_rate': 1.9953961983671792e-05, 'epoch': 0.06} +{'loss': 1.0751, 'grad_norm': 1.8039296865463257, 'learning_rate': 1.9953362612814294e-05, 'epoch': 0.06} +{'loss': 1.0708, 'grad_norm': 2.070028066635132, 'learning_rate': 1.9952759374651266e-05, 'epoch': 0.06} +{'loss': 0.9121, 'grad_norm': 1.0980297327041626, 'learning_rate': 1.9952152269417088e-05, 'epoch': 0.06} +{'loss': 1.1029, 'grad_norm': 1.9291366338729858, 'learning_rate': 1.9951541297347642e-05, 'epoch': 0.06} +{'loss': 1.081, 'grad_norm': 1.8742951154708862, 'learning_rate': 1.995092645868032e-05, 'epoch': 0.06} +{'loss': 1.1246, 'grad_norm': 1.9387081861495972, 'learning_rate': 1.9950307753654016e-05, 'epoch': 0.06} +{'loss': 1.0838, 'grad_norm': 2.033475160598755, 'learning_rate': 1.9949685182509117e-05, 'epoch': 0.06} +{'loss': 1.044, 'grad_norm': 2.0009639263153076, 'learning_rate': 1.9949058745487524e-05, 'epoch': 0.06} +{'loss': 1.1293, 'grad_norm': 2.098273515701294, 'learning_rate': 1.9948428442832633e-05, 'epoch': 0.06} +{'loss': 1.0483, 'grad_norm': 1.8275796175003052, 'learning_rate': 1.994779427478934e-05, 'epoch': 0.06} +{'loss': 1.0825, 'grad_norm': 1.8539766073226929, 'learning_rate': 1.994715624160405e-05, 'epoch': 0.06} +{'loss': 1.0593, 'grad_norm': 2.0822293758392334, 'learning_rate': 1.994651434352467e-05, 'epoch': 0.06} +{'loss': 1.0776, 'grad_norm': 2.0292410850524902, 'learning_rate': 1.9945868580800593e-05, 'epoch': 0.06} +{'loss': 1.0782, 'grad_norm': 1.727033019065857, 'learning_rate': 1.9945218953682736e-05, 'epoch': 0.06} +{'loss': 1.0875, 'grad_norm': 1.9884116649627686, 'learning_rate': 1.9944565462423502e-05, 'epoch': 0.06} +{'loss': 0.9993, 'grad_norm': 1.898795247077942, 'learning_rate': 1.99439081072768e-05, 'epoch': 0.06} +{'loss': 1.0776, 'grad_norm': 1.8701682090759277, 'learning_rate': 1.994324688849804e-05, 'epoch': 0.06} +{'loss': 1.1456, 'grad_norm': 1.8718277215957642, 'learning_rate': 1.994258180634414e-05, 'epoch': 0.06} +{'loss': 1.0685, 'grad_norm': 1.8626608848571777, 'learning_rate': 1.9941912861073507e-05, 'epoch': 0.06} +{'loss': 1.1204, 'grad_norm': 2.0976054668426514, 'learning_rate': 1.9941240052946055e-05, 'epoch': 0.06} +{'loss': 1.1056, 'grad_norm': 1.9197001457214355, 'learning_rate': 1.9940563382223196e-05, 'epoch': 0.06} +{'loss': 1.1318, 'grad_norm': 1.9192863702774048, 'learning_rate': 1.9939882849167853e-05, 'epoch': 0.06} +{'loss': 1.0828, 'grad_norm': 1.756412148475647, 'learning_rate': 1.993919845404444e-05, 'epoch': 0.06} +{'loss': 1.0581, 'grad_norm': 1.804733395576477, 'learning_rate': 1.9938510197118868e-05, 'epoch': 0.06} +{'loss': 1.1149, 'grad_norm': 1.925678014755249, 'learning_rate': 1.9937818078658556e-05, 'epoch': 0.06} +{'loss': 1.1522, 'grad_norm': 1.9162620306015015, 'learning_rate': 1.9937122098932428e-05, 'epoch': 0.06} +{'loss': 1.0731, 'grad_norm': 1.7437552213668823, 'learning_rate': 1.9936422258210894e-05, 'epoch': 0.06} +{'loss': 1.0235, 'grad_norm': 1.802807092666626, 'learning_rate': 1.9935718556765878e-05, 'epoch': 0.07} +{'loss': 1.0161, 'grad_norm': 1.8687171936035156, 'learning_rate': 1.993501099487079e-05, 'epoch': 0.07} +{'loss': 0.9383, 'grad_norm': 1.0427062511444092, 'learning_rate': 1.9934299572800556e-05, 'epoch': 0.07} +{'loss': 1.0689, 'grad_norm': 2.008777618408203, 'learning_rate': 1.9933584290831593e-05, 'epoch': 0.07} +{'loss': 1.071, 'grad_norm': 1.9812185764312744, 'learning_rate': 1.9932865149241813e-05, 'epoch': 0.07} +{'loss': 1.1289, 'grad_norm': 1.9773062467575073, 'learning_rate': 1.9932142148310638e-05, 'epoch': 0.07} +{'loss': 1.091, 'grad_norm': 1.7774780988693237, 'learning_rate': 1.9931415288318985e-05, 'epoch': 0.07} +{'loss': 1.0723, 'grad_norm': 2.137747287750244, 'learning_rate': 1.9930684569549265e-05, 'epoch': 0.07} +{'loss': 1.0538, 'grad_norm': 2.0008246898651123, 'learning_rate': 1.9929949992285397e-05, 'epoch': 0.07} +{'loss': 1.0339, 'grad_norm': 1.9377113580703735, 'learning_rate': 1.9929211556812794e-05, 'epoch': 0.07} +{'loss': 1.1363, 'grad_norm': 1.8344753980636597, 'learning_rate': 1.9928469263418376e-05, 'epoch': 0.07} +{'loss': 1.0178, 'grad_norm': 2.0816099643707275, 'learning_rate': 1.9927723112390547e-05, 'epoch': 0.07} +{'loss': 0.8425, 'grad_norm': 0.8840500712394714, 'learning_rate': 1.992697310401922e-05, 'epoch': 0.07} +{'loss': 1.0392, 'grad_norm': 2.039031982421875, 'learning_rate': 1.992621923859581e-05, 'epoch': 0.07} +{'loss': 1.0477, 'grad_norm': 2.1741487979888916, 'learning_rate': 1.9925461516413224e-05, 'epoch': 0.07} +{'loss': 1.0269, 'grad_norm': 1.8604975938796997, 'learning_rate': 1.9924699937765863e-05, 'epoch': 0.07} +{'loss': 1.0948, 'grad_norm': 2.037583112716675, 'learning_rate': 1.9923934502949645e-05, 'epoch': 0.07} +{'loss': 1.038, 'grad_norm': 1.9853684902191162, 'learning_rate': 1.992316521226196e-05, 'epoch': 0.07} +{'loss': 0.9963, 'grad_norm': 1.8607470989227295, 'learning_rate': 1.9922392066001724e-05, 'epoch': 0.07} +{'loss': 1.1849, 'grad_norm': 1.8960143327713013, 'learning_rate': 1.9921615064469328e-05, 'epoch': 0.07} +{'loss': 1.0686, 'grad_norm': 1.7322030067443848, 'learning_rate': 1.992083420796667e-05, 'epoch': 0.07} +{'loss': 1.0509, 'grad_norm': 2.041977882385254, 'learning_rate': 1.9920049496797153e-05, 'epoch': 0.07} +{'loss': 1.1118, 'grad_norm': 1.9696863889694214, 'learning_rate': 1.9919260931265666e-05, 'epoch': 0.07} +{'loss': 1.1194, 'grad_norm': 2.049968719482422, 'learning_rate': 1.99184685116786e-05, 'epoch': 0.07} +{'loss': 1.0606, 'grad_norm': 1.967901587486267, 'learning_rate': 1.991767223834384e-05, 'epoch': 0.07} +{'loss': 1.0418, 'grad_norm': 1.858583688735962, 'learning_rate': 1.9916872111570782e-05, 'epoch': 0.07} +{'loss': 0.9747, 'grad_norm': 1.0986965894699097, 'learning_rate': 1.9916068131670302e-05, 'epoch': 0.07} +{'loss': 1.0825, 'grad_norm': 2.4967737197875977, 'learning_rate': 1.9915260298954786e-05, 'epoch': 0.07} +{'loss': 1.1308, 'grad_norm': 1.919541835784912, 'learning_rate': 1.9914448613738107e-05, 'epoch': 0.07} +{'loss': 0.9576, 'grad_norm': 0.9795111417770386, 'learning_rate': 1.9913633076335638e-05, 'epoch': 0.07} +{'loss': 1.0511, 'grad_norm': 2.0834782123565674, 'learning_rate': 1.9912813687064255e-05, 'epoch': 0.07} +{'loss': 1.003, 'grad_norm': 1.8799113035202026, 'learning_rate': 1.991199044624232e-05, 'epoch': 0.07} +{'loss': 1.0391, 'grad_norm': 1.8467507362365723, 'learning_rate': 1.9911163354189702e-05, 'epoch': 0.07} +{'loss': 1.071, 'grad_norm': 2.0189335346221924, 'learning_rate': 1.991033241122776e-05, 'epoch': 0.07} +{'loss': 1.1198, 'grad_norm': 2.0685229301452637, 'learning_rate': 1.990949761767935e-05, 'epoch': 0.07} +{'loss': 1.0945, 'grad_norm': 1.9276286363601685, 'learning_rate': 1.9908658973868823e-05, 'epoch': 0.07} +{'loss': 1.1694, 'grad_norm': 2.1261022090911865, 'learning_rate': 1.9907816480122032e-05, 'epoch': 0.07} +{'loss': 1.0982, 'grad_norm': 2.046309471130371, 'learning_rate': 1.9906970136766323e-05, 'epoch': 0.07} +{'loss': 1.1492, 'grad_norm': 2.1238198280334473, 'learning_rate': 1.9906119944130527e-05, 'epoch': 0.07} +{'loss': 1.0794, 'grad_norm': 1.8909717798233032, 'learning_rate': 1.990526590254499e-05, 'epoch': 0.07} +{'loss': 1.022, 'grad_norm': 2.1433300971984863, 'learning_rate': 1.9904408012341542e-05, 'epoch': 0.07} +{'loss': 1.0335, 'grad_norm': 1.944788932800293, 'learning_rate': 1.9903546273853503e-05, 'epoch': 0.07} +{'loss': 1.1729, 'grad_norm': 1.7993412017822266, 'learning_rate': 1.9902680687415704e-05, 'epoch': 0.07} +{'loss': 0.9873, 'grad_norm': 2.169924736022949, 'learning_rate': 1.9901811253364458e-05, 'epoch': 0.07} +{'loss': 1.1523, 'grad_norm': 2.0918314456939697, 'learning_rate': 1.9900937972037577e-05, 'epoch': 0.07} +{'loss': 1.1308, 'grad_norm': 1.8481202125549316, 'learning_rate': 1.9900060843774364e-05, 'epoch': 0.07} +{'loss': 1.1532, 'grad_norm': 1.8293710947036743, 'learning_rate': 1.989917986891563e-05, 'epoch': 0.07} +{'loss': 1.0799, 'grad_norm': 2.140244483947754, 'learning_rate': 1.9898295047803665e-05, 'epoch': 0.07} +{'loss': 1.1405, 'grad_norm': 2.045227527618408, 'learning_rate': 1.9897406380782262e-05, 'epoch': 0.07} +{'loss': 1.0717, 'grad_norm': 2.0215260982513428, 'learning_rate': 1.9896513868196706e-05, 'epoch': 0.07} +{'loss': 1.0307, 'grad_norm': 2.1683692932128906, 'learning_rate': 1.9895617510393773e-05, 'epoch': 0.07} +{'loss': 1.0851, 'grad_norm': 1.9758880138397217, 'learning_rate': 1.9894717307721734e-05, 'epoch': 0.07} +{'loss': 1.1128, 'grad_norm': 1.7286189794540405, 'learning_rate': 1.9893813260530368e-05, 'epoch': 0.08} +{'loss': 0.9664, 'grad_norm': 1.9432743787765503, 'learning_rate': 1.989290536917093e-05, 'epoch': 0.08} +{'loss': 1.0528, 'grad_norm': 2.002781629562378, 'learning_rate': 1.9891993633996164e-05, 'epoch': 0.08} +{'loss': 1.02, 'grad_norm': 1.921952486038208, 'learning_rate': 1.9891078055360332e-05, 'epoch': 0.08} +{'loss': 0.9773, 'grad_norm': 1.068670392036438, 'learning_rate': 1.989015863361917e-05, 'epoch': 0.08} +{'loss': 1.1125, 'grad_norm': 1.9413923025131226, 'learning_rate': 1.9889235369129913e-05, 'epoch': 0.08} +{'loss': 1.176, 'grad_norm': 1.9656031131744385, 'learning_rate': 1.9888308262251286e-05, 'epoch': 0.08} +{'loss': 1.069, 'grad_norm': 2.0729422569274902, 'learning_rate': 1.9887377313343514e-05, 'epoch': 0.08} +{'loss': 1.0769, 'grad_norm': 2.3098745346069336, 'learning_rate': 1.988644252276831e-05, 'epoch': 0.08} +{'loss': 1.0606, 'grad_norm': 1.8306607007980347, 'learning_rate': 1.9885503890888874e-05, 'epoch': 0.08} +{'loss': 1.1003, 'grad_norm': 2.4004573822021484, 'learning_rate': 1.988456141806991e-05, 'epoch': 0.08} +{'loss': 1.044, 'grad_norm': 2.3030784130096436, 'learning_rate': 1.988361510467761e-05, 'epoch': 0.08} +{'loss': 0.8923, 'grad_norm': 0.9968491792678833, 'learning_rate': 1.9882664951079652e-05, 'epoch': 0.08} +{'loss': 1.0133, 'grad_norm': 2.032879590988159, 'learning_rate': 1.988171095764521e-05, 'epoch': 0.08} +{'loss': 1.0867, 'grad_norm': 1.8498350381851196, 'learning_rate': 1.9880753124744964e-05, 'epoch': 0.08} +{'loss': 1.0984, 'grad_norm': 2.009408950805664, 'learning_rate': 1.9879791452751055e-05, 'epoch': 0.08} +{'loss': 1.0417, 'grad_norm': 2.1746535301208496, 'learning_rate': 1.9878825942037147e-05, 'epoch': 0.08} +{'loss': 0.904, 'grad_norm': 0.9810822606086731, 'learning_rate': 1.987785659297838e-05, 'epoch': 0.08} +{'loss': 1.0549, 'grad_norm': 2.107671022415161, 'learning_rate': 1.9876883405951378e-05, 'epoch': 0.08} +{'loss': 1.0272, 'grad_norm': 2.4178225994110107, 'learning_rate': 1.9875906381334277e-05, 'epoch': 0.08} +{'loss': 0.9923, 'grad_norm': 2.0550055503845215, 'learning_rate': 1.9874925519506686e-05, 'epoch': 0.08} +{'loss': 1.0564, 'grad_norm': 2.1893374919891357, 'learning_rate': 1.9873940820849714e-05, 'epoch': 0.08} +{'loss': 1.139, 'grad_norm': 2.048220157623291, 'learning_rate': 1.9872952285745958e-05, 'epoch': 0.08} +{'loss': 1.0641, 'grad_norm': 1.8988646268844604, 'learning_rate': 1.987195991457951e-05, 'epoch': 0.08} +{'loss': 1.059, 'grad_norm': 1.9113562107086182, 'learning_rate': 1.987096370773594e-05, 'epoch': 0.08} +{'loss': 1.0276, 'grad_norm': 2.0818557739257812, 'learning_rate': 1.9869963665602322e-05, 'epoch': 0.08} +{'loss': 1.126, 'grad_norm': 1.956807017326355, 'learning_rate': 1.9868959788567213e-05, 'epoch': 0.08} +{'loss': 1.1835, 'grad_norm': 1.8188530206680298, 'learning_rate': 1.9867952077020667e-05, 'epoch': 0.08} +{'loss': 1.0995, 'grad_norm': 1.8978670835494995, 'learning_rate': 1.9866940531354214e-05, 'epoch': 0.08} +{'loss': 1.0056, 'grad_norm': 1.8703874349594116, 'learning_rate': 1.9865925151960893e-05, 'epoch': 0.08} +{'loss': 1.1252, 'grad_norm': 2.131010055541992, 'learning_rate': 1.9864905939235215e-05, 'epoch': 0.08} +{'loss': 1.0677, 'grad_norm': 1.9145139455795288, 'learning_rate': 1.9863882893573188e-05, 'epoch': 0.08} +{'loss': 1.081, 'grad_norm': 2.0749764442443848, 'learning_rate': 1.9862856015372315e-05, 'epoch': 0.08} +{'loss': 0.9055, 'grad_norm': 0.9918889403343201, 'learning_rate': 1.9861825305031576e-05, 'epoch': 0.08} +{'loss': 1.0934, 'grad_norm': 2.1616575717926025, 'learning_rate': 1.9860790762951447e-05, 'epoch': 0.08} +{'loss': 1.1145, 'grad_norm': 1.9510889053344727, 'learning_rate': 1.985975238953389e-05, 'epoch': 0.08} +{'loss': 1.0763, 'grad_norm': 1.8759390115737915, 'learning_rate': 1.985871018518236e-05, 'epoch': 0.08} +{'loss': 1.0472, 'grad_norm': 1.9842033386230469, 'learning_rate': 1.9857664150301798e-05, 'epoch': 0.08} +{'loss': 1.1517, 'grad_norm': 1.724048137664795, 'learning_rate': 1.985661428529863e-05, 'epoch': 0.08} +{'loss': 0.8785, 'grad_norm': 1.0283504724502563, 'learning_rate': 1.985556059058078e-05, 'epoch': 0.08} +{'loss': 1.0595, 'grad_norm': 2.0816354751586914, 'learning_rate': 1.9854503066557646e-05, 'epoch': 0.08} +{'loss': 0.9407, 'grad_norm': 1.015547752380371, 'learning_rate': 1.9853441713640123e-05, 'epoch': 0.08} +{'loss': 0.9853, 'grad_norm': 1.887131929397583, 'learning_rate': 1.9852376532240594e-05, 'epoch': 0.08} +{'loss': 1.0211, 'grad_norm': 1.8701651096343994, 'learning_rate': 1.9851307522772926e-05, 'epoch': 0.08} +{'loss': 1.0719, 'grad_norm': 1.9214274883270264, 'learning_rate': 1.9850234685652478e-05, 'epoch': 0.08} +{'loss': 1.0983, 'grad_norm': 1.7181062698364258, 'learning_rate': 1.9849158021296083e-05, 'epoch': 0.08} +{'loss': 0.9326, 'grad_norm': 2.1615700721740723, 'learning_rate': 1.9848077530122083e-05, 'epoch': 0.08} +{'loss': 1.1539, 'grad_norm': 1.783625602722168, 'learning_rate': 1.9846993212550286e-05, 'epoch': 0.08} +{'loss': 1.0632, 'grad_norm': 1.9634573459625244, 'learning_rate': 1.9845905069002004e-05, 'epoch': 0.08} +{'loss': 1.0204, 'grad_norm': 2.064742088317871, 'learning_rate': 1.984481309990002e-05, 'epoch': 0.08} +{'loss': 1.0388, 'grad_norm': 1.9431153535842896, 'learning_rate': 1.984371730566861e-05, 'epoch': 0.08} +{'loss': 1.0669, 'grad_norm': 1.9191973209381104, 'learning_rate': 1.9842617686733546e-05, 'epoch': 0.08} +{'loss': 0.9307, 'grad_norm': 1.0412219762802124, 'learning_rate': 1.9841514243522065e-05, 'epoch': 0.09} +{'loss': 0.9571, 'grad_norm': 0.9976733326911926, 'learning_rate': 1.984040697646291e-05, 'epoch': 0.09} +{'loss': 1.0569, 'grad_norm': 1.971793293952942, 'learning_rate': 1.98392958859863e-05, 'epoch': 0.09} +{'loss': 1.0777, 'grad_norm': 1.9992115497589111, 'learning_rate': 1.9838180972523936e-05, 'epoch': 0.09} +{'loss': 1.1332, 'grad_norm': 1.8540982007980347, 'learning_rate': 1.9837062236509013e-05, 'epoch': 0.09} +{'loss': 1.0257, 'grad_norm': 1.8273231983184814, 'learning_rate': 1.9835939678376213e-05, 'epoch': 0.09} +{'loss': 1.074, 'grad_norm': 1.8232215642929077, 'learning_rate': 1.9834813298561686e-05, 'epoch': 0.09} +{'loss': 1.0785, 'grad_norm': 2.028073310852051, 'learning_rate': 1.9833683097503087e-05, 'epoch': 0.09} +{'loss': 1.1016, 'grad_norm': 2.0158872604370117, 'learning_rate': 1.983254907563955e-05, 'epoch': 0.09} +{'loss': 1.1075, 'grad_norm': 1.934831976890564, 'learning_rate': 1.983141123341168e-05, 'epoch': 0.09} +{'loss': 1.0142, 'grad_norm': 1.8577687740325928, 'learning_rate': 1.9830269571261585e-05, 'epoch': 0.09} +{'loss': 1.0305, 'grad_norm': 1.7445321083068848, 'learning_rate': 1.982912408963285e-05, 'epoch': 0.09} +{'loss': 1.0868, 'grad_norm': 1.948188304901123, 'learning_rate': 1.9827974788970536e-05, 'epoch': 0.09} +{'loss': 1.0884, 'grad_norm': 2.034566879272461, 'learning_rate': 1.9826821669721205e-05, 'epoch': 0.09} +{'loss': 1.0592, 'grad_norm': 2.072718381881714, 'learning_rate': 1.9825664732332886e-05, 'epoch': 0.09} +{'loss': 1.0747, 'grad_norm': 1.9664205312728882, 'learning_rate': 1.98245039772551e-05, 'epoch': 0.09} +{'loss': 1.0716, 'grad_norm': 2.13956356048584, 'learning_rate': 1.982333940493885e-05, 'epoch': 0.09} +{'loss': 0.9272, 'grad_norm': 1.06049382686615, 'learning_rate': 1.9822171015836625e-05, 'epoch': 0.09} +{'loss': 1.066, 'grad_norm': 2.323936939239502, 'learning_rate': 1.982099881040239e-05, 'epoch': 0.09} +{'loss': 1.0847, 'grad_norm': 1.823792576789856, 'learning_rate': 1.9819822789091597e-05, 'epoch': 0.09} +{'loss': 1.1211, 'grad_norm': 1.8866627216339111, 'learning_rate': 1.9818642952361188e-05, 'epoch': 0.09} +{'loss': 1.0453, 'grad_norm': 1.824110507965088, 'learning_rate': 1.9817459300669567e-05, 'epoch': 0.09} +{'loss': 1.1479, 'grad_norm': 2.1441428661346436, 'learning_rate': 1.9816271834476642e-05, 'epoch': 0.09} +{'loss': 1.023, 'grad_norm': 1.911728024482727, 'learning_rate': 1.981508055424379e-05, 'epoch': 0.09} +{'loss': 1.107, 'grad_norm': 2.1533942222595215, 'learning_rate': 1.981388546043388e-05, 'epoch': 0.09} +{'loss': 1.0745, 'grad_norm': 1.9569392204284668, 'learning_rate': 1.9812686553511246e-05, 'epoch': 0.09} +{'loss': 0.9213, 'grad_norm': 1.01942777633667, 'learning_rate': 1.9811483833941726e-05, 'epoch': 0.09} +{'loss': 1.1049, 'grad_norm': 2.0068280696868896, 'learning_rate': 1.9810277302192624e-05, 'epoch': 0.09} +{'loss': 0.9636, 'grad_norm': 2.019318103790283, 'learning_rate': 1.9809066958732727e-05, 'epoch': 0.09} +{'loss': 0.9739, 'grad_norm': 1.8868746757507324, 'learning_rate': 1.9807852804032306e-05, 'epoch': 0.09} +{'loss': 1.0486, 'grad_norm': 1.979896068572998, 'learning_rate': 1.9806634838563113e-05, 'epoch': 0.09} +{'loss': 1.011, 'grad_norm': 2.0071215629577637, 'learning_rate': 1.9805413062798376e-05, 'epoch': 0.09} +{'loss': 1.0585, 'grad_norm': 1.9919341802597046, 'learning_rate': 1.980418747721281e-05, 'epoch': 0.09} +{'loss': 0.9632, 'grad_norm': 2.2145938873291016, 'learning_rate': 1.980295808228261e-05, 'epoch': 0.09} +{'loss': 1.0504, 'grad_norm': 1.92658531665802, 'learning_rate': 1.9801724878485438e-05, 'epoch': 0.09} +{'loss': 0.993, 'grad_norm': 1.8957911729812622, 'learning_rate': 1.980048786630046e-05, 'epoch': 0.09} +{'loss': 1.0914, 'grad_norm': 1.8102710247039795, 'learning_rate': 1.9799247046208297e-05, 'epoch': 0.09} +{'loss': 1.0703, 'grad_norm': 1.8429460525512695, 'learning_rate': 1.9798002418691067e-05, 'epoch': 0.09} +{'loss': 1.1114, 'grad_norm': 1.7665605545043945, 'learning_rate': 1.9796753984232357e-05, 'epoch': 0.09} +{'loss': 1.0789, 'grad_norm': 1.9230808019638062, 'learning_rate': 1.979550174331724e-05, 'epoch': 0.09} +{'loss': 1.0645, 'grad_norm': 1.7684491872787476, 'learning_rate': 1.9794245696432262e-05, 'epoch': 0.09} +{'loss': 1.0845, 'grad_norm': 1.9817627668380737, 'learning_rate': 1.979298584406545e-05, 'epoch': 0.09} +{'loss': 1.0144, 'grad_norm': 2.1252405643463135, 'learning_rate': 1.9791722186706318e-05, 'epoch': 0.09} +{'loss': 1.0366, 'grad_norm': 2.1359128952026367, 'learning_rate': 1.979045472484584e-05, 'epoch': 0.09} +{'loss': 1.0818, 'grad_norm': 1.8751832246780396, 'learning_rate': 1.9789183458976485e-05, 'epoch': 0.09} +{'loss': 1.0357, 'grad_norm': 2.0680389404296875, 'learning_rate': 1.9787908389592197e-05, 'epoch': 0.09} +{'loss': 1.0464, 'grad_norm': 1.8467575311660767, 'learning_rate': 1.9786629517188384e-05, 'epoch': 0.09} +{'loss': 1.0413, 'grad_norm': 1.9802943468093872, 'learning_rate': 1.9785346842261957e-05, 'epoch': 0.09} +{'loss': 1.078, 'grad_norm': 1.9998193979263306, 'learning_rate': 1.9784060365311276e-05, 'epoch': 0.09} +{'loss': 1.1261, 'grad_norm': 2.2239794731140137, 'learning_rate': 1.97827700868362e-05, 'epoch': 0.09} +{'loss': 1.062, 'grad_norm': 1.8362261056900024, 'learning_rate': 1.9781476007338058e-05, 'epoch': 0.09} +{'loss': 1.053, 'grad_norm': 1.9770945310592651, 'learning_rate': 1.978017812731965e-05, 'epoch': 0.09} +{'loss': 1.0547, 'grad_norm': 1.838757038116455, 'learning_rate': 1.9778876447285264e-05, 'epoch': 0.1} +{'loss': 1.0621, 'grad_norm': 1.8650875091552734, 'learning_rate': 1.9777570967740648e-05, 'epoch': 0.1} +{'loss': 1.018, 'grad_norm': 1.843515396118164, 'learning_rate': 1.977626168919305e-05, 'epoch': 0.1} +{'loss': 1.1262, 'grad_norm': 1.9621353149414062, 'learning_rate': 1.977494861215117e-05, 'epoch': 0.1} +{'loss': 1.1076, 'grad_norm': 1.9379910230636597, 'learning_rate': 1.9773631737125192e-05, 'epoch': 0.1} +{'loss': 1.0357, 'grad_norm': 2.0728769302368164, 'learning_rate': 1.977231106462679e-05, 'epoch': 0.1} +{'loss': 1.1233, 'grad_norm': 2.0274953842163086, 'learning_rate': 1.9770986595169096e-05, 'epoch': 0.1} +{'loss': 1.0848, 'grad_norm': 2.0266332626342773, 'learning_rate': 1.9769658329266718e-05, 'epoch': 0.1} +{'loss': 1.0968, 'grad_norm': 1.8384894132614136, 'learning_rate': 1.976832626743575e-05, 'epoch': 0.1} +{'loss': 1.0704, 'grad_norm': 1.831685185432434, 'learning_rate': 1.9766990410193755e-05, 'epoch': 0.1} +{'loss': 1.1019, 'grad_norm': 1.9811996221542358, 'learning_rate': 1.9765650758059766e-05, 'epoch': 0.1} +{'loss': 1.0505, 'grad_norm': 1.9394782781600952, 'learning_rate': 1.97643073115543e-05, 'epoch': 0.1} +{'loss': 0.9832, 'grad_norm': 1.9711352586746216, 'learning_rate': 1.9762960071199334e-05, 'epoch': 0.1} +{'loss': 1.087, 'grad_norm': 1.8081459999084473, 'learning_rate': 1.976160903751834e-05, 'epoch': 0.1} +{'loss': 1.0364, 'grad_norm': 1.7603284120559692, 'learning_rate': 1.9760254211036245e-05, 'epoch': 0.1} +{'loss': 1.0721, 'grad_norm': 1.915087103843689, 'learning_rate': 1.9758895592279458e-05, 'epoch': 0.1} +{'loss': 1.1291, 'grad_norm': 1.7813336849212646, 'learning_rate': 1.9757533181775865e-05, 'epoch': 0.1} +{'loss': 1.0705, 'grad_norm': 1.7126929759979248, 'learning_rate': 1.9756166980054812e-05, 'epoch': 0.1} +{'loss': 1.0951, 'grad_norm': 1.811356782913208, 'learning_rate': 1.9754796987647134e-05, 'epoch': 0.1} +{'loss': 0.9113, 'grad_norm': 1.018143653869629, 'learning_rate': 1.9753423205085127e-05, 'epoch': 0.1} +{'loss': 1.06, 'grad_norm': 1.9739432334899902, 'learning_rate': 1.9752045632902567e-05, 'epoch': 0.1} +{'loss': 1.1066, 'grad_norm': 1.6870028972625732, 'learning_rate': 1.97506642716347e-05, 'epoch': 0.1} +{'loss': 1.0121, 'grad_norm': 1.9246145486831665, 'learning_rate': 1.9749279121818235e-05, 'epoch': 0.1} +{'loss': 1.1529, 'grad_norm': 1.8966962099075317, 'learning_rate': 1.9747890183991376e-05, 'epoch': 0.1} +{'loss': 1.0772, 'grad_norm': 2.0046677589416504, 'learning_rate': 1.9746497458693776e-05, 'epoch': 0.1} +{'loss': 0.9979, 'grad_norm': 1.9526045322418213, 'learning_rate': 1.9745100946466563e-05, 'epoch': 0.1} +{'loss': 0.9828, 'grad_norm': 1.9230717420578003, 'learning_rate': 1.9743700647852356e-05, 'epoch': 0.1} +{'loss': 1.0033, 'grad_norm': 1.8227673768997192, 'learning_rate': 1.9742296563395218e-05, 'epoch': 0.1} +{'loss': 1.078, 'grad_norm': 2.004889965057373, 'learning_rate': 1.9740888693640698e-05, 'epoch': 0.1} +{'loss': 0.923, 'grad_norm': 1.7988289594650269, 'learning_rate': 1.973947703913582e-05, 'epoch': 0.1} +{'loss': 1.0365, 'grad_norm': 1.7453898191452026, 'learning_rate': 1.9738061600429062e-05, 'epoch': 0.1} +{'loss': 1.0579, 'grad_norm': 2.1042673587799072, 'learning_rate': 1.9736642378070393e-05, 'epoch': 0.1} +{'loss': 1.0106, 'grad_norm': 1.8706365823745728, 'learning_rate': 1.9735219372611232e-05, 'epoch': 0.1} +{'loss': 0.9984, 'grad_norm': 1.6723026037216187, 'learning_rate': 1.9733792584604487e-05, 'epoch': 0.1} +{'loss': 1.1141, 'grad_norm': 1.8463131189346313, 'learning_rate': 1.9732362014604515e-05, 'epoch': 0.1} +{'loss': 1.0125, 'grad_norm': 1.9170361757278442, 'learning_rate': 1.973092766316716e-05, 'epoch': 0.1} +{'loss': 0.9906, 'grad_norm': 2.0003836154937744, 'learning_rate': 1.972948953084973e-05, 'epoch': 0.1} +{'loss': 1.1314, 'grad_norm': 1.8767521381378174, 'learning_rate': 1.9728047618210995e-05, 'epoch': 0.1} +{'loss': 1.1017, 'grad_norm': 1.8070870637893677, 'learning_rate': 1.9726601925811204e-05, 'epoch': 0.1} +{'loss': 0.9788, 'grad_norm': 1.5967661142349243, 'learning_rate': 1.9725152454212068e-05, 'epoch': 0.1} +{'loss': 1.1194, 'grad_norm': 1.8374614715576172, 'learning_rate': 1.9723699203976768e-05, 'epoch': 0.1} +{'loss': 1.0921, 'grad_norm': 1.682363748550415, 'learning_rate': 1.9722242175669955e-05, 'epoch': 0.1} +{'loss': 1.0255, 'grad_norm': 1.8782944679260254, 'learning_rate': 1.9720781369857747e-05, 'epoch': 0.1} +{'loss': 1.1391, 'grad_norm': 1.7316479682922363, 'learning_rate': 1.9719316787107728e-05, 'epoch': 0.1} +{'loss': 1.0727, 'grad_norm': 1.9749709367752075, 'learning_rate': 1.9717848427988954e-05, 'epoch': 0.1} +{'loss': 0.9908, 'grad_norm': 1.855950117111206, 'learning_rate': 1.971637629307194e-05, 'epoch': 0.1} +{'loss': 0.9219, 'grad_norm': 1.8045189380645752, 'learning_rate': 1.9714900382928674e-05, 'epoch': 0.1} +{'loss': 1.0857, 'grad_norm': 1.9687528610229492, 'learning_rate': 1.9713420698132614e-05, 'epoch': 0.1} +{'loss': 0.9757, 'grad_norm': 0.974544107913971, 'learning_rate': 1.971193723925868e-05, 'epoch': 0.1} +{'loss': 1.0488, 'grad_norm': 1.768945336341858, 'learning_rate': 1.9710450006883258e-05, 'epoch': 0.1} +{'loss': 0.8928, 'grad_norm': 0.9058737754821777, 'learning_rate': 1.9708959001584197e-05, 'epoch': 0.1} +{'loss': 1.122, 'grad_norm': 2.22385311126709, 'learning_rate': 1.970746422394082e-05, 'epoch': 0.1} +{'loss': 0.9028, 'grad_norm': 0.9252240061759949, 'learning_rate': 1.970596567453391e-05, 'epoch': 0.11} +{'loss': 1.0755, 'grad_norm': 1.9228028059005737, 'learning_rate': 1.9704463353945724e-05, 'epoch': 0.11} +{'loss': 1.0918, 'grad_norm': 2.069215774536133, 'learning_rate': 1.9702957262759964e-05, 'epoch': 0.11} +{'loss': 1.047, 'grad_norm': 2.1567542552948, 'learning_rate': 1.9701447401561826e-05, 'epoch': 0.11} +{'loss': 1.1015, 'grad_norm': 1.9342552423477173, 'learning_rate': 1.9699933770937948e-05, 'epoch': 0.11} +{'loss': 0.9997, 'grad_norm': 1.9535177946090698, 'learning_rate': 1.9698416371476434e-05, 'epoch': 0.11} +{'loss': 1.0987, 'grad_norm': 2.7614104747772217, 'learning_rate': 1.969689520376687e-05, 'epoch': 0.11} +{'loss': 1.1007, 'grad_norm': 2.055088996887207, 'learning_rate': 1.9695370268400288e-05, 'epoch': 0.11} +{'loss': 1.0789, 'grad_norm': 1.912510633468628, 'learning_rate': 1.969384156596919e-05, 'epoch': 0.11} +{'loss': 1.0581, 'grad_norm': 1.903411626815796, 'learning_rate': 1.9692309097067546e-05, 'epoch': 0.11} +{'loss': 1.0157, 'grad_norm': 1.8434637784957886, 'learning_rate': 1.969077286229078e-05, 'epoch': 0.11} +{'loss': 1.1297, 'grad_norm': 2.0096569061279297, 'learning_rate': 1.968923286223579e-05, 'epoch': 0.11} +{'loss': 1.1228, 'grad_norm': 2.015148878097534, 'learning_rate': 1.9687689097500927e-05, 'epoch': 0.11} +{'loss': 1.0319, 'grad_norm': 2.0655734539031982, 'learning_rate': 1.9686141568686015e-05, 'epoch': 0.11} +{'loss': 1.0363, 'grad_norm': 2.00614857673645, 'learning_rate': 1.968459027639233e-05, 'epoch': 0.11} +{'loss': 1.117, 'grad_norm': 1.9746959209442139, 'learning_rate': 1.9683035221222617e-05, 'epoch': 0.11} +{'loss': 1.0235, 'grad_norm': 1.7862404584884644, 'learning_rate': 1.968147640378108e-05, 'epoch': 0.11} +{'loss': 1.0343, 'grad_norm': 2.0600969791412354, 'learning_rate': 1.9679913824673388e-05, 'epoch': 0.11} +{'loss': 1.0694, 'grad_norm': 1.8007007837295532, 'learning_rate': 1.9678347484506667e-05, 'epoch': 0.11} +{'loss': 1.0535, 'grad_norm': 1.9060628414154053, 'learning_rate': 1.9676777383889506e-05, 'epoch': 0.11} +{'loss': 1.1166, 'grad_norm': 2.004653215408325, 'learning_rate': 1.9675203523431964e-05, 'epoch': 0.11} +{'loss': 1.0181, 'grad_norm': 1.0712146759033203, 'learning_rate': 1.9673625903745546e-05, 'epoch': 0.11} +{'loss': 0.9983, 'grad_norm': 2.315927505493164, 'learning_rate': 1.9672044525443222e-05, 'epoch': 0.11} +{'loss': 0.997, 'grad_norm': 1.891409158706665, 'learning_rate': 1.9670459389139433e-05, 'epoch': 0.11} +{'loss': 1.16, 'grad_norm': 2.5097477436065674, 'learning_rate': 1.9668870495450064e-05, 'epoch': 0.11} +{'loss': 1.0349, 'grad_norm': 1.763094425201416, 'learning_rate': 1.9667277844992476e-05, 'epoch': 0.11} +{'loss': 0.9978, 'grad_norm': 1.909881591796875, 'learning_rate': 1.9665681438385475e-05, 'epoch': 0.11} +{'loss': 1.0984, 'grad_norm': 1.8126075267791748, 'learning_rate': 1.9664081276249335e-05, 'epoch': 0.11} +{'loss': 1.0099, 'grad_norm': 2.0055270195007324, 'learning_rate': 1.9662477359205786e-05, 'epoch': 0.11} +{'loss': 1.014, 'grad_norm': 2.153329372406006, 'learning_rate': 1.966086968787802e-05, 'epoch': 0.11} +{'loss': 1.0863, 'grad_norm': 1.7271853685379028, 'learning_rate': 1.9659258262890683e-05, 'epoch': 0.11} +{'loss': 1.1253, 'grad_norm': 1.9132412672042847, 'learning_rate': 1.965764308486989e-05, 'epoch': 0.11} +{'loss': 1.13, 'grad_norm': 1.6975780725479126, 'learning_rate': 1.9656024154443195e-05, 'epoch': 0.11} +{'loss': 1.0631, 'grad_norm': 1.930691123008728, 'learning_rate': 1.965440147223963e-05, 'epoch': 0.11} +{'loss': 1.0652, 'grad_norm': 2.0817456245422363, 'learning_rate': 1.9652775038889676e-05, 'epoch': 0.11} +{'loss': 1.103, 'grad_norm': 1.8197139501571655, 'learning_rate': 1.9651144855025265e-05, 'epoch': 0.11} +{'loss': 1.0925, 'grad_norm': 1.7528233528137207, 'learning_rate': 1.9649510921279803e-05, 'epoch': 0.11} +{'loss': 1.0693, 'grad_norm': 1.9419900178909302, 'learning_rate': 1.964787323828813e-05, 'epoch': 0.11} +{'loss': 1.1371, 'grad_norm': 1.7717524766921997, 'learning_rate': 1.9646231806686566e-05, 'epoch': 0.11} +{'loss': 1.0661, 'grad_norm': 1.75080406665802, 'learning_rate': 1.9644586627112877e-05, 'epoch': 0.11} +{'loss': 1.1077, 'grad_norm': 1.903153896331787, 'learning_rate': 1.964293770020628e-05, 'epoch': 0.11} +{'loss': 0.9577, 'grad_norm': 1.043135166168213, 'learning_rate': 1.9641285026607453e-05, 'epoch': 0.11} +{'loss': 0.9664, 'grad_norm': 1.8222750425338745, 'learning_rate': 1.9639628606958535e-05, 'epoch': 0.11} +{'loss': 0.9989, 'grad_norm': 2.1395421028137207, 'learning_rate': 1.9637968441903112e-05, 'epoch': 0.11} +{'loss': 1.0442, 'grad_norm': 1.6749343872070312, 'learning_rate': 1.963630453208623e-05, 'epoch': 0.11} +{'loss': 1.1011, 'grad_norm': 1.8441596031188965, 'learning_rate': 1.9634636878154393e-05, 'epoch': 0.11} +{'loss': 1.0822, 'grad_norm': 1.7566505670547485, 'learning_rate': 1.963296548075555e-05, 'epoch': 0.11} +{'loss': 0.989, 'grad_norm': 1.7567025423049927, 'learning_rate': 1.963129034053911e-05, 'epoch': 0.11} +{'loss': 1.0389, 'grad_norm': 1.8125393390655518, 'learning_rate': 1.9629611458155936e-05, 'epoch': 0.11} +{'loss': 1.0655, 'grad_norm': 1.9036537408828735, 'learning_rate': 1.962792883425835e-05, 'epoch': 0.11} +{'loss': 1.0282, 'grad_norm': 2.057142734527588, 'learning_rate': 1.962624246950012e-05, 'epoch': 0.11} +{'loss': 1.0344, 'grad_norm': 1.6065772771835327, 'learning_rate': 1.9624552364536472e-05, 'epoch': 0.11} +{'loss': 0.9306, 'grad_norm': 0.9914276003837585, 'learning_rate': 1.9622858520024087e-05, 'epoch': 0.12} +{'loss': 1.009, 'grad_norm': 1.8407113552093506, 'learning_rate': 1.9621160936621086e-05, 'epoch': 0.12} +{'loss': 1.0379, 'grad_norm': 1.9886919260025024, 'learning_rate': 1.961945961498706e-05, 'epoch': 0.12} +{'loss': 1.0639, 'grad_norm': 1.884535312652588, 'learning_rate': 1.9617754555783045e-05, 'epoch': 0.12} +{'loss': 1.0109, 'grad_norm': 1.7920173406600952, 'learning_rate': 1.9616045759671524e-05, 'epoch': 0.12} +{'loss': 1.0452, 'grad_norm': 1.819301962852478, 'learning_rate': 1.9614333227316443e-05, 'epoch': 0.12} +{'loss': 1.1223, 'grad_norm': 1.9905906915664673, 'learning_rate': 1.961261695938319e-05, 'epoch': 0.12} +{'loss': 1.0391, 'grad_norm': 1.6836289167404175, 'learning_rate': 1.961089695653861e-05, 'epoch': 0.12} +{'loss': 1.0347, 'grad_norm': 1.6615184545516968, 'learning_rate': 1.9609173219450998e-05, 'epoch': 0.12} +{'loss': 1.1475, 'grad_norm': 1.8804123401641846, 'learning_rate': 1.9607445748790096e-05, 'epoch': 0.12} +{'loss': 0.9761, 'grad_norm': 0.9769880771636963, 'learning_rate': 1.9605714545227105e-05, 'epoch': 0.12} +{'loss': 0.9815, 'grad_norm': 1.9053921699523926, 'learning_rate': 1.9603979609434666e-05, 'epoch': 0.12} +{'loss': 0.9917, 'grad_norm': 1.7690304517745972, 'learning_rate': 1.960224094208688e-05, 'epoch': 0.12} +{'loss': 0.9546, 'grad_norm': 1.8213820457458496, 'learning_rate': 1.960049854385929e-05, 'epoch': 0.12} +{'loss': 1.0954, 'grad_norm': 1.846914529800415, 'learning_rate': 1.9598752415428893e-05, 'epoch': 0.12} +{'loss': 1.036, 'grad_norm': 1.7468945980072021, 'learning_rate': 1.9597002557474136e-05, 'epoch': 0.12} +{'loss': 1.0865, 'grad_norm': 1.8613026142120361, 'learning_rate': 1.959524897067491e-05, 'epoch': 0.12} +{'loss': 0.9994, 'grad_norm': 1.9264206886291504, 'learning_rate': 1.959349165571256e-05, 'epoch': 0.12} +{'loss': 1.0364, 'grad_norm': 1.689648985862732, 'learning_rate': 1.9591730613269878e-05, 'epoch': 0.12} +{'loss': 1.1069, 'grad_norm': 2.080777645111084, 'learning_rate': 1.9589965844031108e-05, 'epoch': 0.12} +{'loss': 1.0624, 'grad_norm': 1.9339449405670166, 'learning_rate': 1.958819734868193e-05, 'epoch': 0.12} +{'loss': 0.9883, 'grad_norm': 1.9516017436981201, 'learning_rate': 1.9586425127909488e-05, 'epoch': 0.12} +{'loss': 1.0618, 'grad_norm': 1.8677557706832886, 'learning_rate': 1.9584649182402358e-05, 'epoch': 0.12} +{'loss': 1.0377, 'grad_norm': 1.9306186437606812, 'learning_rate': 1.9582869512850576e-05, 'epoch': 0.12} +{'loss': 1.0273, 'grad_norm': 1.8844149112701416, 'learning_rate': 1.9581086119945615e-05, 'epoch': 0.12} +{'loss': 1.0812, 'grad_norm': 1.7717770338058472, 'learning_rate': 1.9579299004380402e-05, 'epoch': 0.12} +{'loss': 0.993, 'grad_norm': 1.7238742113113403, 'learning_rate': 1.9577508166849308e-05, 'epoch': 0.12} +{'loss': 1.1625, 'grad_norm': 1.7734516859054565, 'learning_rate': 1.9575713608048146e-05, 'epoch': 0.12} +{'loss': 1.1511, 'grad_norm': 1.9877711534500122, 'learning_rate': 1.957391532867418e-05, 'epoch': 0.12} +{'loss': 1.0304, 'grad_norm': 1.9654442071914673, 'learning_rate': 1.9572113329426123e-05, 'epoch': 0.12} +{'loss': 1.0758, 'grad_norm': 1.7001477479934692, 'learning_rate': 1.9570307611004124e-05, 'epoch': 0.12} +{'loss': 1.0066, 'grad_norm': 1.8202993869781494, 'learning_rate': 1.9568498174109783e-05, 'epoch': 0.12} +{'loss': 1.1291, 'grad_norm': 1.792492389678955, 'learning_rate': 1.956668501944614e-05, 'epoch': 0.12} +{'loss': 1.0128, 'grad_norm': 2.0775163173675537, 'learning_rate': 1.956486814771769e-05, 'epoch': 0.12} +{'loss': 1.0225, 'grad_norm': 1.7778408527374268, 'learning_rate': 1.9563047559630356e-05, 'epoch': 0.12} +{'loss': 1.0558, 'grad_norm': 1.847807765007019, 'learning_rate': 1.956122325589152e-05, 'epoch': 0.12} +{'loss': 1.0394, 'grad_norm': 1.7681615352630615, 'learning_rate': 1.9559395237210002e-05, 'epoch': 0.12} +{'loss': 1.1396, 'grad_norm': 1.804386854171753, 'learning_rate': 1.9557563504296064e-05, 'epoch': 0.12} +{'loss': 1.0802, 'grad_norm': 1.7099944353103638, 'learning_rate': 1.955572805786141e-05, 'epoch': 0.12} +{'loss': 1.0913, 'grad_norm': 1.7447009086608887, 'learning_rate': 1.955388889861919e-05, 'epoch': 0.12} +{'loss': 0.9541, 'grad_norm': 0.9787389636039734, 'learning_rate': 1.9552046027284e-05, 'epoch': 0.12} +{'loss': 0.9432, 'grad_norm': 1.8490904569625854, 'learning_rate': 1.9550199444571868e-05, 'epoch': 0.12} +{'loss': 1.0078, 'grad_norm': 2.0911710262298584, 'learning_rate': 1.9548349151200275e-05, 'epoch': 0.12} +{'loss': 1.053, 'grad_norm': 1.8651716709136963, 'learning_rate': 1.9546495147888134e-05, 'epoch': 0.12} +{'loss': 1.0763, 'grad_norm': 2.1007957458496094, 'learning_rate': 1.954463743535581e-05, 'epoch': 0.12} +{'loss': 1.0845, 'grad_norm': 1.841106653213501, 'learning_rate': 1.95427760143251e-05, 'epoch': 0.12} +{'loss': 1.0461, 'grad_norm': 1.9699455499649048, 'learning_rate': 1.9540910885519243e-05, 'epoch': 0.12} +{'loss': 1.0162, 'grad_norm': 1.774623155593872, 'learning_rate': 1.9539042049662926e-05, 'epoch': 0.12} +{'loss': 1.0493, 'grad_norm': 1.8765196800231934, 'learning_rate': 1.953716950748227e-05, 'epoch': 0.12} +{'loss': 1.1348, 'grad_norm': 1.9721708297729492, 'learning_rate': 1.953529325970484e-05, 'epoch': 0.12} +{'loss': 1.0174, 'grad_norm': 1.871772050857544, 'learning_rate': 1.953341330705963e-05, 'epoch': 0.12} +{'loss': 1.0637, 'grad_norm': 1.9706321954727173, 'learning_rate': 1.9531529650277087e-05, 'epoch': 0.12} +{'loss': 1.1391, 'grad_norm': 2.1656877994537354, 'learning_rate': 1.952964229008909e-05, 'epoch': 0.13} +{'loss': 1.1237, 'grad_norm': 1.806382656097412, 'learning_rate': 1.9527751227228964e-05, 'epoch': 0.13} +{'loss': 1.0158, 'grad_norm': 1.9285237789154053, 'learning_rate': 1.9525856462431463e-05, 'epoch': 0.13} +{'loss': 1.0418, 'grad_norm': 1.8296648263931274, 'learning_rate': 1.9523957996432785e-05, 'epoch': 0.13} +{'loss': 1.1238, 'grad_norm': 1.9126784801483154, 'learning_rate': 1.9522055829970564e-05, 'epoch': 0.13} +{'loss': 1.023, 'grad_norm': 1.90383780002594, 'learning_rate': 1.9520149963783873e-05, 'epoch': 0.13} +{'loss': 1.0669, 'grad_norm': 1.9747416973114014, 'learning_rate': 1.9518240398613226e-05, 'epoch': 0.13} +{'loss': 0.9861, 'grad_norm': 1.873603105545044, 'learning_rate': 1.9516327135200565e-05, 'epoch': 0.13} +{'loss': 1.1182, 'grad_norm': 1.7834066152572632, 'learning_rate': 1.951441017428928e-05, 'epoch': 0.13} +{'loss': 0.8713, 'grad_norm': 2.0852115154266357, 'learning_rate': 1.951248951662418e-05, 'epoch': 0.13} +{'loss': 1.0676, 'grad_norm': 1.9413955211639404, 'learning_rate': 1.9510565162951538e-05, 'epoch': 0.13} +{'loss': 1.082, 'grad_norm': 1.8750298023223877, 'learning_rate': 1.9508637114019037e-05, 'epoch': 0.13} +{'loss': 1.0576, 'grad_norm': 1.7423945665359497, 'learning_rate': 1.950670537057581e-05, 'epoch': 0.13} +{'loss': 1.1287, 'grad_norm': 1.818292260169983, 'learning_rate': 1.9504769933372424e-05, 'epoch': 0.13} +{'loss': 0.9947, 'grad_norm': 1.8188986778259277, 'learning_rate': 1.9502830803160873e-05, 'epoch': 0.13} +{'loss': 1.0222, 'grad_norm': 1.9943432807922363, 'learning_rate': 1.9500887980694597e-05, 'epoch': 0.13} +{'loss': 1.0457, 'grad_norm': 1.9611740112304688, 'learning_rate': 1.9498941466728462e-05, 'epoch': 0.13} +{'loss': 0.9739, 'grad_norm': 1.8990379571914673, 'learning_rate': 1.949699126201877e-05, 'epoch': 0.13} +{'loss': 0.9853, 'grad_norm': 0.9880357980728149, 'learning_rate': 1.9495037367323264e-05, 'epoch': 0.13} +{'loss': 1.0149, 'grad_norm': 1.750780463218689, 'learning_rate': 1.9493079783401116e-05, 'epoch': 0.13} +{'loss': 0.976, 'grad_norm': 1.894282341003418, 'learning_rate': 1.949111851101292e-05, 'epoch': 0.13} +{'loss': 1.0545, 'grad_norm': 1.6835464239120483, 'learning_rate': 1.9489153550920726e-05, 'epoch': 0.13} +{'loss': 1.0143, 'grad_norm': 1.9264110326766968, 'learning_rate': 1.9487184903888e-05, 'epoch': 0.13} +{'loss': 1.0893, 'grad_norm': 1.9531846046447754, 'learning_rate': 1.9485212570679643e-05, 'epoch': 0.13} +{'loss': 0.9767, 'grad_norm': 1.9108635187149048, 'learning_rate': 1.9483236552061996e-05, 'epoch': 0.13} +{'loss': 1.0638, 'grad_norm': 1.835564374923706, 'learning_rate': 1.948125684880282e-05, 'epoch': 0.13} +{'loss': 1.0006, 'grad_norm': 1.7745845317840576, 'learning_rate': 1.947927346167132e-05, 'epoch': 0.13} +{'loss': 1.0371, 'grad_norm': 1.862092137336731, 'learning_rate': 1.9477286391438122e-05, 'epoch': 0.13} +{'loss': 1.0369, 'grad_norm': 2.0028343200683594, 'learning_rate': 1.947529563887529e-05, 'epoch': 0.13} +{'loss': 1.0018, 'grad_norm': 1.8679611682891846, 'learning_rate': 1.9473301204756315e-05, 'epoch': 0.13} +{'loss': 1.0419, 'grad_norm': 1.8468900918960571, 'learning_rate': 1.947130308985612e-05, 'epoch': 0.13} +{'loss': 1.0513, 'grad_norm': 1.845431923866272, 'learning_rate': 1.946930129495106e-05, 'epoch': 0.13} +{'loss': 0.9758, 'grad_norm': 1.8284912109375, 'learning_rate': 1.9467295820818913e-05, 'epoch': 0.13} +{'loss': 1.1309, 'grad_norm': 2.0402565002441406, 'learning_rate': 1.9465286668238895e-05, 'epoch': 0.13} +{'loss': 1.095, 'grad_norm': 1.9071311950683594, 'learning_rate': 1.9463273837991643e-05, 'epoch': 0.13} +{'loss': 1.0461, 'grad_norm': 1.7187628746032715, 'learning_rate': 1.9461257330859233e-05, 'epoch': 0.13} +{'loss': 1.1084, 'grad_norm': 1.9771403074264526, 'learning_rate': 1.945923714762516e-05, 'epoch': 0.13} +{'loss': 0.9969, 'grad_norm': 1.8100978136062622, 'learning_rate': 1.9457213289074353e-05, 'epoch': 0.13} +{'loss': 0.9531, 'grad_norm': 0.999468207359314, 'learning_rate': 1.945518575599317e-05, 'epoch': 0.13} +{'loss': 0.9845, 'grad_norm': 1.7867693901062012, 'learning_rate': 1.945315454916939e-05, 'epoch': 0.13} +{'loss': 1.0697, 'grad_norm': 2.030405044555664, 'learning_rate': 1.945111966939222e-05, 'epoch': 0.13} +{'loss': 1.0837, 'grad_norm': 1.9287108182907104, 'learning_rate': 1.9449081117452304e-05, 'epoch': 0.13} +{'loss': 1.079, 'grad_norm': 1.7917439937591553, 'learning_rate': 1.9447038894141707e-05, 'epoch': 0.13} +{'loss': 1.0983, 'grad_norm': 1.6798985004425049, 'learning_rate': 1.9444993000253912e-05, 'epoch': 0.13} +{'loss': 1.085, 'grad_norm': 1.597659707069397, 'learning_rate': 1.9442943436583845e-05, 'epoch': 0.13} +{'loss': 1.0683, 'grad_norm': 1.7162896394729614, 'learning_rate': 1.9440890203927846e-05, 'epoch': 0.13} +{'loss': 1.0327, 'grad_norm': 1.8342747688293457, 'learning_rate': 1.9438833303083677e-05, 'epoch': 0.13} +{'loss': 1.1206, 'grad_norm': 1.6689069271087646, 'learning_rate': 1.943677273485054e-05, 'epoch': 0.13} +{'loss': 1.0248, 'grad_norm': 1.8049540519714355, 'learning_rate': 1.943470850002905e-05, 'epoch': 0.13} +{'loss': 0.9161, 'grad_norm': 1.0162373781204224, 'learning_rate': 1.943264059942125e-05, 'epoch': 0.13} +{'loss': 1.1096, 'grad_norm': 1.958119511604309, 'learning_rate': 1.9430569033830606e-05, 'epoch': 0.13} +{'loss': 0.8849, 'grad_norm': 0.9479613900184631, 'learning_rate': 1.9428493804062013e-05, 'epoch': 0.13} +{'loss': 1.0403, 'grad_norm': 1.9936504364013672, 'learning_rate': 1.9426414910921785e-05, 'epoch': 0.14} +{'loss': 1.0638, 'grad_norm': 1.726816177368164, 'learning_rate': 1.942433235521766e-05, 'epoch': 0.14} +{'loss': 1.1033, 'grad_norm': 1.807843804359436, 'learning_rate': 1.94222461377588e-05, 'epoch': 0.14} +{'loss': 0.9907, 'grad_norm': 1.8652249574661255, 'learning_rate': 1.942015625935579e-05, 'epoch': 0.14} +{'loss': 1.0503, 'grad_norm': 1.7422491312026978, 'learning_rate': 1.9418062720820636e-05, 'epoch': 0.14} +{'loss': 1.0422, 'grad_norm': 1.7758936882019043, 'learning_rate': 1.9415965522966767e-05, 'epoch': 0.14} +{'loss': 1.0043, 'grad_norm': 2.0055336952209473, 'learning_rate': 1.9413864666609036e-05, 'epoch': 0.14} +{'loss': 0.9133, 'grad_norm': 0.9495923519134521, 'learning_rate': 1.941176015256371e-05, 'epoch': 0.14} +{'loss': 0.9774, 'grad_norm': 1.839900016784668, 'learning_rate': 1.9409651981648486e-05, 'epoch': 0.14} +{'loss': 1.1266, 'grad_norm': 2.0253968238830566, 'learning_rate': 1.9407540154682473e-05, 'epoch': 0.14} +{'loss': 0.9988, 'grad_norm': 1.7584038972854614, 'learning_rate': 1.9405424672486218e-05, 'epoch': 0.14} +{'loss': 1.0785, 'grad_norm': 1.696969747543335, 'learning_rate': 1.9403305535881664e-05, 'epoch': 0.14} +{'loss': 1.0387, 'grad_norm': 1.9572157859802246, 'learning_rate': 1.940118274569219e-05, 'epoch': 0.14} +{'loss': 0.9452, 'grad_norm': 2.1646480560302734, 'learning_rate': 1.9399056302742598e-05, 'epoch': 0.14} +{'loss': 1.0117, 'grad_norm': 2.0360054969787598, 'learning_rate': 1.9396926207859085e-05, 'epoch': 0.14} +{'loss': 1.0578, 'grad_norm': 1.7679258584976196, 'learning_rate': 1.9394792461869296e-05, 'epoch': 0.14} +{'loss': 1.0105, 'grad_norm': 1.8553813695907593, 'learning_rate': 1.939265506560228e-05, 'epoch': 0.14} +{'loss': 1.0122, 'grad_norm': 1.863491177558899, 'learning_rate': 1.9390514019888508e-05, 'epoch': 0.14} +{'loss': 1.0633, 'grad_norm': 2.064150094985962, 'learning_rate': 1.9388369325559862e-05, 'epoch': 0.14} +{'loss': 1.133, 'grad_norm': 2.0232086181640625, 'learning_rate': 1.9386220983449652e-05, 'epoch': 0.14} +{'loss': 1.0152, 'grad_norm': 1.7371606826782227, 'learning_rate': 1.93840689943926e-05, 'epoch': 0.14} +{'loss': 1.0253, 'grad_norm': 1.7722785472869873, 'learning_rate': 1.9381913359224844e-05, 'epoch': 0.14} +{'loss': 1.0857, 'grad_norm': 1.88107168674469, 'learning_rate': 1.937975407878394e-05, 'epoch': 0.14} +{'loss': 1.0026, 'grad_norm': 1.9407353401184082, 'learning_rate': 1.937759115390886e-05, 'epoch': 0.14} +{'loss': 0.995, 'grad_norm': 1.8366038799285889, 'learning_rate': 1.9375424585439994e-05, 'epoch': 0.14} +{'loss': 1.0137, 'grad_norm': 1.9896798133850098, 'learning_rate': 1.9373254374219146e-05, 'epoch': 0.14} +{'loss': 0.9944, 'grad_norm': 2.165299892425537, 'learning_rate': 1.9371080521089536e-05, 'epoch': 0.14} +{'loss': 1.0369, 'grad_norm': 1.9276577234268188, 'learning_rate': 1.9368903026895795e-05, 'epoch': 0.14} +{'loss': 1.0448, 'grad_norm': 2.167565107345581, 'learning_rate': 1.9366721892483976e-05, 'epoch': 0.14} +{'loss': 1.0133, 'grad_norm': 1.7721621990203857, 'learning_rate': 1.9364537118701542e-05, 'epoch': 0.14} +{'loss': 0.9684, 'grad_norm': 2.0589303970336914, 'learning_rate': 1.9362348706397374e-05, 'epoch': 0.14} +{'loss': 1.1383, 'grad_norm': 1.9515771865844727, 'learning_rate': 1.9360156656421756e-05, 'epoch': 0.14} +{'loss': 1.0224, 'grad_norm': 1.5488319396972656, 'learning_rate': 1.9357960969626398e-05, 'epoch': 0.14} +{'loss': 1.0746, 'grad_norm': 1.8274539709091187, 'learning_rate': 1.9355761646864417e-05, 'epoch': 0.14} +{'loss': 1.0307, 'grad_norm': 1.7872482538223267, 'learning_rate': 1.935355868899034e-05, 'epoch': 0.14} +{'loss': 0.9522, 'grad_norm': 1.8236021995544434, 'learning_rate': 1.935135209686012e-05, 'epoch': 0.14} +{'loss': 1.1675, 'grad_norm': 1.8238246440887451, 'learning_rate': 1.93491418713311e-05, 'epoch': 0.14} +{'loss': 1.0319, 'grad_norm': 1.0203803777694702, 'learning_rate': 1.9346928013262052e-05, 'epoch': 0.14} +{'loss': 0.9993, 'grad_norm': 1.9282431602478027, 'learning_rate': 1.9344710523513157e-05, 'epoch': 0.14} +{'loss': 1.0621, 'grad_norm': 1.8842706680297852, 'learning_rate': 1.9342489402945997e-05, 'epoch': 0.14} +{'loss': 0.9429, 'grad_norm': 1.772351861000061, 'learning_rate': 1.9340264652423584e-05, 'epoch': 0.14} +{'loss': 1.1064, 'grad_norm': 1.8866921663284302, 'learning_rate': 1.9338036272810314e-05, 'epoch': 0.14} +{'loss': 0.9967, 'grad_norm': 1.9042494297027588, 'learning_rate': 1.9335804264972018e-05, 'epoch': 0.14} +{'loss': 1.0165, 'grad_norm': 1.964693546295166, 'learning_rate': 1.933356862977592e-05, 'epoch': 0.14} +{'loss': 1.0561, 'grad_norm': 1.8669610023498535, 'learning_rate': 1.9331329368090664e-05, 'epoch': 0.14} +{'loss': 1.1563, 'grad_norm': 1.8360422849655151, 'learning_rate': 1.93290864807863e-05, 'epoch': 0.14} +{'loss': 1.0895, 'grad_norm': 1.8937022686004639, 'learning_rate': 1.9326839968734278e-05, 'epoch': 0.14} +{'loss': 1.0141, 'grad_norm': 1.7559276819229126, 'learning_rate': 1.9324589832807467e-05, 'epoch': 0.14} +{'loss': 1.0842, 'grad_norm': 1.801269769668579, 'learning_rate': 1.9322336073880143e-05, 'epoch': 0.14} +{'loss': 1.0211, 'grad_norm': 1.855718970298767, 'learning_rate': 1.932007869282799e-05, 'epoch': 0.14} +{'loss': 1.1162, 'grad_norm': 1.8507449626922607, 'learning_rate': 1.9317817690528086e-05, 'epoch': 0.14} +{'loss': 0.9554, 'grad_norm': 1.0060285329818726, 'learning_rate': 1.9315553067858933e-05, 'epoch': 0.14} +{'loss': 1.163, 'grad_norm': 1.9339544773101807, 'learning_rate': 1.931328482570044e-05, 'epoch': 0.15} +{'loss': 0.9483, 'grad_norm': 1.7005418539047241, 'learning_rate': 1.93110129649339e-05, 'epoch': 0.15} +{'loss': 1.0619, 'grad_norm': 1.756054401397705, 'learning_rate': 1.9308737486442045e-05, 'epoch': 0.15} +{'loss': 1.1276, 'grad_norm': 1.9170414209365845, 'learning_rate': 1.9306458391108984e-05, 'epoch': 0.15} +{'loss': 0.9985, 'grad_norm': 1.679626226425171, 'learning_rate': 1.9304175679820247e-05, 'epoch': 0.15} +{'loss': 1.0142, 'grad_norm': 1.7036267518997192, 'learning_rate': 1.9301889353462765e-05, 'epoch': 0.15} +{'loss': 1.0215, 'grad_norm': 1.7625948190689087, 'learning_rate': 1.929959941292487e-05, 'epoch': 0.15} +{'loss': 0.9214, 'grad_norm': 0.9800795912742615, 'learning_rate': 1.9297305859096305e-05, 'epoch': 0.15} +{'loss': 1.0697, 'grad_norm': 1.9728052616119385, 'learning_rate': 1.929500869286821e-05, 'epoch': 0.15} +{'loss': 1.1, 'grad_norm': 1.859421968460083, 'learning_rate': 1.9292707915133134e-05, 'epoch': 0.15} +{'loss': 0.9823, 'grad_norm': 1.9221793413162231, 'learning_rate': 1.9290403526785025e-05, 'epoch': 0.15} +{'loss': 1.0665, 'grad_norm': 1.963499903678894, 'learning_rate': 1.9288095528719245e-05, 'epoch': 0.15} +{'loss': 1.1006, 'grad_norm': 2.0811595916748047, 'learning_rate': 1.9285783921832537e-05, 'epoch': 0.15} +{'loss': 1.0655, 'grad_norm': 1.94386887550354, 'learning_rate': 1.9283468707023068e-05, 'epoch': 0.15} +{'loss': 0.9012, 'grad_norm': 0.9830089807510376, 'learning_rate': 1.928114988519039e-05, 'epoch': 0.15} +{'loss': 0.9684, 'grad_norm': 2.0522351264953613, 'learning_rate': 1.9278827457235473e-05, 'epoch': 0.15} +{'loss': 1.0186, 'grad_norm': 1.7430166006088257, 'learning_rate': 1.927650142406067e-05, 'epoch': 0.15} +{'loss': 1.0093, 'grad_norm': 1.981881856918335, 'learning_rate': 1.927417178656975e-05, 'epoch': 0.15} +{'loss': 1.0438, 'grad_norm': 1.9995741844177246, 'learning_rate': 1.9271838545667876e-05, 'epoch': 0.15} +{'loss': 0.9754, 'grad_norm': 1.7669248580932617, 'learning_rate': 1.926950170226161e-05, 'epoch': 0.15} +{'loss': 1.1056, 'grad_norm': 1.790802240371704, 'learning_rate': 1.9267161257258916e-05, 'epoch': 0.15} +{'loss': 1.0025, 'grad_norm': 2.016782522201538, 'learning_rate': 1.9264817211569154e-05, 'epoch': 0.15} +{'loss': 0.9733, 'grad_norm': 1.8168582916259766, 'learning_rate': 1.926246956610309e-05, 'epoch': 0.15} +{'loss': 1.0003, 'grad_norm': 1.8395148515701294, 'learning_rate': 1.926011832177288e-05, 'epoch': 0.15} +{'loss': 1.0005, 'grad_norm': 1.775355577468872, 'learning_rate': 1.9257763479492086e-05, 'epoch': 0.15} +{'loss': 0.9266, 'grad_norm': 2.0468311309814453, 'learning_rate': 1.9255405040175666e-05, 'epoch': 0.15} +{'loss': 1.0404, 'grad_norm': 1.788172960281372, 'learning_rate': 1.9253043004739967e-05, 'epoch': 0.15} +{'loss': 1.0117, 'grad_norm': 1.8667372465133667, 'learning_rate': 1.9250677374102752e-05, 'epoch': 0.15} +{'loss': 1.1426, 'grad_norm': 1.8807982206344604, 'learning_rate': 1.9248308149183155e-05, 'epoch': 0.15} +{'loss': 1.0419, 'grad_norm': 1.8556149005889893, 'learning_rate': 1.9245935330901733e-05, 'epoch': 0.15} +{'loss': 1.0587, 'grad_norm': 1.7991968393325806, 'learning_rate': 1.9243558920180417e-05, 'epoch': 0.15} +{'loss': 0.9274, 'grad_norm': 1.0295864343643188, 'learning_rate': 1.9241178917942552e-05, 'epoch': 0.15} +{'loss': 0.9866, 'grad_norm': 1.7402684688568115, 'learning_rate': 1.9238795325112867e-05, 'epoch': 0.15} +{'loss': 1.0244, 'grad_norm': 1.9365077018737793, 'learning_rate': 1.923640814261749e-05, 'epoch': 0.15} +{'loss': 1.1417, 'grad_norm': 1.855178713798523, 'learning_rate': 1.9234017371383946e-05, 'epoch': 0.15} +{'loss': 1.0281, 'grad_norm': 1.8731263875961304, 'learning_rate': 1.9231623012341145e-05, 'epoch': 0.15} +{'loss': 1.0159, 'grad_norm': 1.8208426237106323, 'learning_rate': 1.92292250664194e-05, 'epoch': 0.15} +{'loss': 1.0611, 'grad_norm': 1.7611134052276611, 'learning_rate': 1.9226823534550418e-05, 'epoch': 0.15} +{'loss': 1.0512, 'grad_norm': 1.7592854499816895, 'learning_rate': 1.9224418417667295e-05, 'epoch': 0.15} +{'loss': 1.0744, 'grad_norm': 1.9874253273010254, 'learning_rate': 1.922200971670452e-05, 'epoch': 0.15} +{'loss': 1.0596, 'grad_norm': 1.8959243297576904, 'learning_rate': 1.9219597432597975e-05, 'epoch': 0.15} +{'loss': 1.0273, 'grad_norm': 1.75994873046875, 'learning_rate': 1.9217181566284942e-05, 'epoch': 0.15} +{'loss': 1.023, 'grad_norm': 1.964096188545227, 'learning_rate': 1.921476211870408e-05, 'epoch': 0.15} +{'loss': 1.093, 'grad_norm': 1.9010738134384155, 'learning_rate': 1.921233909079545e-05, 'epoch': 0.15} +{'loss': 1.0219, 'grad_norm': 1.8385283946990967, 'learning_rate': 1.92099124835005e-05, 'epoch': 0.15} +{'loss': 1.1217, 'grad_norm': 1.7191005945205688, 'learning_rate': 1.9207482297762076e-05, 'epoch': 0.15} +{'loss': 1.0655, 'grad_norm': 1.8148480653762817, 'learning_rate': 1.9205048534524405e-05, 'epoch': 0.15} +{'loss': 1.0402, 'grad_norm': 2.1281018257141113, 'learning_rate': 1.9202611194733107e-05, 'epoch': 0.15} +{'loss': 1.0629, 'grad_norm': 1.8952546119689941, 'learning_rate': 1.9200170279335194e-05, 'epoch': 0.15} +{'loss': 1.0021, 'grad_norm': 1.9646159410476685, 'learning_rate': 1.9197725789279065e-05, 'epoch': 0.15} +{'loss': 1.0693, 'grad_norm': 1.8048688173294067, 'learning_rate': 1.919527772551451e-05, 'epoch': 0.15} +{'loss': 0.9513, 'grad_norm': 1.8859809637069702, 'learning_rate': 1.91928260889927e-05, 'epoch': 0.15} +{'loss': 1.103, 'grad_norm': 1.968471646308899, 'learning_rate': 1.9190370880666206e-05, 'epoch': 0.16} +{'loss': 1.0569, 'grad_norm': 1.9892020225524902, 'learning_rate': 1.9187912101488986e-05, 'epoch': 0.16} +{'loss': 1.137, 'grad_norm': 1.8863886594772339, 'learning_rate': 1.9185449752416367e-05, 'epoch': 0.16} +{'loss': 1.0581, 'grad_norm': 1.9442975521087646, 'learning_rate': 1.9182983834405088e-05, 'epoch': 0.16} +{'loss': 0.9794, 'grad_norm': 1.7951229810714722, 'learning_rate': 1.918051434841326e-05, 'epoch': 0.16} +{'loss': 0.9596, 'grad_norm': 1.824581503868103, 'learning_rate': 1.9178041295400383e-05, 'epoch': 0.16} +{'loss': 0.9584, 'grad_norm': 1.814587116241455, 'learning_rate': 1.917556467632734e-05, 'epoch': 0.16} +{'loss': 0.9154, 'grad_norm': 1.9821945428848267, 'learning_rate': 1.9173084492156406e-05, 'epoch': 0.16} +{'loss': 1.1803, 'grad_norm': 1.9843652248382568, 'learning_rate': 1.917060074385124e-05, 'epoch': 0.16} +{'loss': 1.0117, 'grad_norm': 2.114295244216919, 'learning_rate': 1.9168113432376885e-05, 'epoch': 0.16} +{'loss': 0.9599, 'grad_norm': 1.9193452596664429, 'learning_rate': 1.9165622558699763e-05, 'epoch': 0.16} +{'loss': 1.1139, 'grad_norm': 2.21612286567688, 'learning_rate': 1.9163128123787688e-05, 'epoch': 0.16} +{'loss': 0.98, 'grad_norm': 1.6431894302368164, 'learning_rate': 1.916063012860985e-05, 'epoch': 0.16} +{'loss': 0.9225, 'grad_norm': 1.0864559412002563, 'learning_rate': 1.9158128574136833e-05, 'epoch': 0.16} +{'loss': 1.0454, 'grad_norm': 2.1066713333129883, 'learning_rate': 1.9155623461340594e-05, 'epoch': 0.16} +{'loss': 1.0646, 'grad_norm': 1.9968464374542236, 'learning_rate': 1.9153114791194475e-05, 'epoch': 0.16} +{'loss': 1.0267, 'grad_norm': 1.7027695178985596, 'learning_rate': 1.91506025646732e-05, 'epoch': 0.16} +{'loss': 0.995, 'grad_norm': 1.8297191858291626, 'learning_rate': 1.914808678275288e-05, 'epoch': 0.16} +{'loss': 0.9352, 'grad_norm': 1.872512936592102, 'learning_rate': 1.9145567446411003e-05, 'epoch': 0.16} +{'loss': 1.021, 'grad_norm': 1.9383331537246704, 'learning_rate': 1.9143044556626437e-05, 'epoch': 0.16} +{'loss': 1.0426, 'grad_norm': 1.8862186670303345, 'learning_rate': 1.9140518114379433e-05, 'epoch': 0.16} +{'loss': 1.0066, 'grad_norm': 1.9327762126922607, 'learning_rate': 1.913798812065162e-05, 'epoch': 0.16} +{'loss': 1.0, 'grad_norm': 1.7620855569839478, 'learning_rate': 1.913545457642601e-05, 'epoch': 0.16} +{'loss': 1.0615, 'grad_norm': 1.817885398864746, 'learning_rate': 1.913291748268699e-05, 'epoch': 0.16} +{'loss': 1.0813, 'grad_norm': 1.7099207639694214, 'learning_rate': 1.9130376840420337e-05, 'epoch': 0.16} +{'loss': 1.0048, 'grad_norm': 1.6124449968338013, 'learning_rate': 1.912783265061319e-05, 'epoch': 0.16} +{'loss': 1.0251, 'grad_norm': 1.8943891525268555, 'learning_rate': 1.912528491425408e-05, 'epoch': 0.16} +{'loss': 1.1226, 'grad_norm': 2.0991952419281006, 'learning_rate': 1.9122733632332907e-05, 'epoch': 0.16} +{'loss': 1.0328, 'grad_norm': 1.8139768838882446, 'learning_rate': 1.9120178805840956e-05, 'epoch': 0.16} +{'loss': 0.9977, 'grad_norm': 1.8370256423950195, 'learning_rate': 1.911762043577089e-05, 'epoch': 0.16} +{'loss': 0.9854, 'grad_norm': 1.8669499158859253, 'learning_rate': 1.9115058523116734e-05, 'epoch': 0.16} +{'loss': 1.0551, 'grad_norm': 1.6820703744888306, 'learning_rate': 1.9112493068873907e-05, 'epoch': 0.16} +{'loss': 1.103, 'grad_norm': 1.6766277551651, 'learning_rate': 1.9109924074039198e-05, 'epoch': 0.16} +{'loss': 1.0707, 'grad_norm': 1.718582034111023, 'learning_rate': 1.910735153961077e-05, 'epoch': 0.16} +{'loss': 1.0311, 'grad_norm': 1.974519968032837, 'learning_rate': 1.9104775466588162e-05, 'epoch': 0.16} +{'loss': 0.9859, 'grad_norm': 1.6999092102050781, 'learning_rate': 1.9102195855972287e-05, 'epoch': 0.16} +{'loss': 1.1022, 'grad_norm': 1.9100570678710938, 'learning_rate': 1.9099612708765432e-05, 'epoch': 0.16} +{'loss': 1.0511, 'grad_norm': 1.8535345792770386, 'learning_rate': 1.9097026025971268e-05, 'epoch': 0.16} +{'loss': 1.0539, 'grad_norm': 1.8734996318817139, 'learning_rate': 1.9094435808594823e-05, 'epoch': 0.16} +{'loss': 1.0534, 'grad_norm': 1.7165933847427368, 'learning_rate': 1.909184205764251e-05, 'epoch': 0.16} +{'loss': 0.9644, 'grad_norm': 1.7317732572555542, 'learning_rate': 1.908924477412211e-05, 'epoch': 0.16} +{'loss': 1.0719, 'grad_norm': 1.723581314086914, 'learning_rate': 1.908664395904278e-05, 'epoch': 0.16} +{'loss': 1.0579, 'grad_norm': 1.9269397258758545, 'learning_rate': 1.9084039613415052e-05, 'epoch': 0.16} +{'loss': 1.108, 'grad_norm': 1.907837152481079, 'learning_rate': 1.9081431738250815e-05, 'epoch': 0.16} +{'loss': 1.0208, 'grad_norm': 1.8354381322860718, 'learning_rate': 1.9078820334563346e-05, 'epoch': 0.16} +{'loss': 1.0629, 'grad_norm': 1.839447021484375, 'learning_rate': 1.9076205403367287e-05, 'epoch': 0.16} +{'loss': 1.0667, 'grad_norm': 2.001133918762207, 'learning_rate': 1.907358694567865e-05, 'epoch': 0.16} +{'loss': 1.0761, 'grad_norm': 1.8356646299362183, 'learning_rate': 1.9070964962514816e-05, 'epoch': 0.16} +{'loss': 1.0059, 'grad_norm': 1.8363560438156128, 'learning_rate': 1.9068339454894536e-05, 'epoch': 0.16} +{'loss': 1.0461, 'grad_norm': 1.9907485246658325, 'learning_rate': 1.9065710423837936e-05, 'epoch': 0.16} +{'loss': 0.9761, 'grad_norm': 1.6925932168960571, 'learning_rate': 1.9063077870366504e-05, 'epoch': 0.16} +{'loss': 1.1256, 'grad_norm': 1.841995120048523, 'learning_rate': 1.9060441795503095e-05, 'epoch': 0.16} +{'loss': 1.0439, 'grad_norm': 1.8130391836166382, 'learning_rate': 1.9057802200271943e-05, 'epoch': 0.17} +{'loss': 0.972, 'grad_norm': 1.8468430042266846, 'learning_rate': 1.9055159085698638e-05, 'epoch': 0.17} +{'loss': 1.0338, 'grad_norm': 1.9957631826400757, 'learning_rate': 1.905251245281015e-05, 'epoch': 0.17} +{'loss': 1.0852, 'grad_norm': 1.8274272680282593, 'learning_rate': 1.90498623026348e-05, 'epoch': 0.17} +{'loss': 0.9864, 'grad_norm': 1.87775719165802, 'learning_rate': 1.904720863620229e-05, 'epoch': 0.17} +{'loss': 0.9975, 'grad_norm': 1.8939504623413086, 'learning_rate': 1.9044551454543683e-05, 'epoch': 0.17} +{'loss': 1.036, 'grad_norm': 1.9896374940872192, 'learning_rate': 1.9041890758691403e-05, 'epoch': 0.17} +{'loss': 1.033, 'grad_norm': 1.9141910076141357, 'learning_rate': 1.9039226549679243e-05, 'epoch': 0.17} +{'loss': 1.0824, 'grad_norm': 1.8497891426086426, 'learning_rate': 1.903655882854237e-05, 'epoch': 0.17} +{'loss': 1.1061, 'grad_norm': 2.010185956954956, 'learning_rate': 1.90338875963173e-05, 'epoch': 0.17} +{'loss': 1.0955, 'grad_norm': 1.9594199657440186, 'learning_rate': 1.903121285404192e-05, 'epoch': 0.17} +{'loss': 1.0985, 'grad_norm': 1.7790595293045044, 'learning_rate': 1.9028534602755485e-05, 'epoch': 0.17} +{'loss': 0.9343, 'grad_norm': 2.3293259143829346, 'learning_rate': 1.902585284349861e-05, 'epoch': 0.17} +{'loss': 0.9271, 'grad_norm': 1.0938304662704468, 'learning_rate': 1.9023167577313267e-05, 'epoch': 0.17} +{'loss': 0.9875, 'grad_norm': 1.8699318170547485, 'learning_rate': 1.9020478805242803e-05, 'epoch': 0.17} +{'loss': 0.9839, 'grad_norm': 1.808601975440979, 'learning_rate': 1.901778652833192e-05, 'epoch': 0.17} +{'loss': 0.9988, 'grad_norm': 1.60258150100708, 'learning_rate': 1.901509074762668e-05, 'epoch': 0.17} +{'loss': 1.0161, 'grad_norm': 2.071719169616699, 'learning_rate': 1.901239146417451e-05, 'epoch': 0.17} +{'loss': 1.0098, 'grad_norm': 1.7650020122528076, 'learning_rate': 1.900968867902419e-05, 'epoch': 0.17} +{'loss': 1.0466, 'grad_norm': 1.8329802751541138, 'learning_rate': 1.9006982393225878e-05, 'epoch': 0.17} +{'loss': 1.0673, 'grad_norm': 1.88319993019104, 'learning_rate': 1.9004272607831077e-05, 'epoch': 0.17} +{'loss': 1.0881, 'grad_norm': 2.0712344646453857, 'learning_rate': 1.9001559323892652e-05, 'epoch': 0.17} +{'loss': 1.0519, 'grad_norm': 1.769935131072998, 'learning_rate': 1.8998842542464833e-05, 'epoch': 0.17} +{'loss': 1.031, 'grad_norm': 1.8526180982589722, 'learning_rate': 1.8996122264603202e-05, 'epoch': 0.17} +{'loss': 1.1069, 'grad_norm': 1.8072012662887573, 'learning_rate': 1.8993398491364702e-05, 'epoch': 0.17} +{'loss': 1.0453, 'grad_norm': 2.0149877071380615, 'learning_rate': 1.899067122380764e-05, 'epoch': 0.17} +{'loss': 1.0435, 'grad_norm': 1.8691116571426392, 'learning_rate': 1.8987940462991673e-05, 'epoch': 0.17} +{'loss': 0.9968, 'grad_norm': 2.2917675971984863, 'learning_rate': 1.8985206209977813e-05, 'epoch': 0.17} +{'loss': 1.0338, 'grad_norm': 1.609276533126831, 'learning_rate': 1.898246846582844e-05, 'epoch': 0.17} +{'loss': 1.0497, 'grad_norm': 1.957764744758606, 'learning_rate': 1.897972723160729e-05, 'epoch': 0.17} +{'loss': 1.0472, 'grad_norm': 1.8542054891586304, 'learning_rate': 1.8976982508379436e-05, 'epoch': 0.17} +{'loss': 1.0095, 'grad_norm': 2.0098819732666016, 'learning_rate': 1.8974234297211327e-05, 'epoch': 0.17} +{'loss': 1.0575, 'grad_norm': 1.6726094484329224, 'learning_rate': 1.897148259917076e-05, 'epoch': 0.17} +{'loss': 0.9313, 'grad_norm': 1.19387948513031, 'learning_rate': 1.8968727415326885e-05, 'epoch': 0.17} +{'loss': 1.0161, 'grad_norm': 2.0022661685943604, 'learning_rate': 1.896596874675021e-05, 'epoch': 0.17} +{'loss': 1.0429, 'grad_norm': 1.772822618484497, 'learning_rate': 1.8963206594512593e-05, 'epoch': 0.17} +{'loss': 1.0505, 'grad_norm': 1.7540333271026611, 'learning_rate': 1.8960440959687254e-05, 'epoch': 0.17} +{'loss': 1.1187, 'grad_norm': 1.7388813495635986, 'learning_rate': 1.8957671843348754e-05, 'epoch': 0.17} +{'loss': 1.0796, 'grad_norm': 1.9682852029800415, 'learning_rate': 1.895489924657301e-05, 'epoch': 0.17} +{'loss': 0.9628, 'grad_norm': 1.7046477794647217, 'learning_rate': 1.8952123170437304e-05, 'epoch': 0.17} +{'loss': 1.0176, 'grad_norm': 1.7901122570037842, 'learning_rate': 1.894934361602025e-05, 'epoch': 0.17} +{'loss': 1.0652, 'grad_norm': 1.7585915327072144, 'learning_rate': 1.8946560584401833e-05, 'epoch': 0.17} +{'loss': 1.0247, 'grad_norm': 1.7957626581192017, 'learning_rate': 1.8943774076663372e-05, 'epoch': 0.17} +{'loss': 0.9572, 'grad_norm': 1.6159627437591553, 'learning_rate': 1.894098409388754e-05, 'epoch': 0.17} +{'loss': 1.0195, 'grad_norm': 1.7876954078674316, 'learning_rate': 1.8938190637158377e-05, 'epoch': 0.17} +{'loss': 1.0292, 'grad_norm': 1.9027154445648193, 'learning_rate': 1.893539370756125e-05, 'epoch': 0.17} +{'loss': 1.144, 'grad_norm': 1.6486036777496338, 'learning_rate': 1.893259330618289e-05, 'epoch': 0.17} +{'loss': 0.9814, 'grad_norm': 2.005584716796875, 'learning_rate': 1.892978943411137e-05, 'epoch': 0.17} +{'loss': 1.0134, 'grad_norm': 1.9639264345169067, 'learning_rate': 1.8926982092436117e-05, 'epoch': 0.17} +{'loss': 1.0381, 'grad_norm': 1.887213110923767, 'learning_rate': 1.89241712822479e-05, 'epoch': 0.17} +{'loss': 1.0286, 'grad_norm': 1.9802154302597046, 'learning_rate': 1.8921357004638837e-05, 'epoch': 0.17} +{'loss': 1.0876, 'grad_norm': 1.821059226989746, 'learning_rate': 1.8918539260702394e-05, 'epoch': 0.17} +{'loss': 0.9766, 'grad_norm': 1.7930537462234497, 'learning_rate': 1.891571805153339e-05, 'epoch': 0.18} +{'loss': 0.9653, 'grad_norm': 1.7542009353637695, 'learning_rate': 1.8912893378227984e-05, 'epoch': 0.18} +{'loss': 1.0256, 'grad_norm': 1.6874165534973145, 'learning_rate': 1.891006524188368e-05, 'epoch': 0.18} +{'loss': 1.0658, 'grad_norm': 1.8666621446609497, 'learning_rate': 1.890723364359933e-05, 'epoch': 0.18} +{'loss': 1.0411, 'grad_norm': 1.7374211549758911, 'learning_rate': 1.890439858447513e-05, 'epoch': 0.18} +{'loss': 0.9538, 'grad_norm': 1.645519495010376, 'learning_rate': 1.890156006561262e-05, 'epoch': 0.18} +{'loss': 1.0689, 'grad_norm': 1.8398423194885254, 'learning_rate': 1.8898718088114688e-05, 'epoch': 0.18} +{'loss': 1.0464, 'grad_norm': 1.8389241695404053, 'learning_rate': 1.8895872653085563e-05, 'epoch': 0.18} +{'loss': 1.0614, 'grad_norm': 2.086231231689453, 'learning_rate': 1.8893023761630816e-05, 'epoch': 0.18} +{'loss': 1.0646, 'grad_norm': 1.8300734758377075, 'learning_rate': 1.8890171414857366e-05, 'epoch': 0.18} +{'loss': 1.1739, 'grad_norm': 1.8184055089950562, 'learning_rate': 1.888731561387347e-05, 'epoch': 0.18} +{'loss': 1.0628, 'grad_norm': 1.7660341262817383, 'learning_rate': 1.8884456359788725e-05, 'epoch': 0.18} +{'loss': 0.9451, 'grad_norm': 1.0114213228225708, 'learning_rate': 1.8881593653714075e-05, 'epoch': 0.18} +{'loss': 1.0482, 'grad_norm': 1.8874154090881348, 'learning_rate': 1.8878727496761808e-05, 'epoch': 0.18} +{'loss': 1.0374, 'grad_norm': 1.9241492748260498, 'learning_rate': 1.8875857890045544e-05, 'epoch': 0.18} +{'loss': 1.0077, 'grad_norm': 1.6878619194030762, 'learning_rate': 1.8872984834680242e-05, 'epoch': 0.18} +{'loss': 1.0752, 'grad_norm': 1.9827361106872559, 'learning_rate': 1.887010833178222e-05, 'epoch': 0.18} +{'loss': 1.0553, 'grad_norm': 1.8027729988098145, 'learning_rate': 1.886722838246911e-05, 'epoch': 0.18} +{'loss': 0.9973, 'grad_norm': 1.835041880607605, 'learning_rate': 1.8864344987859898e-05, 'epoch': 0.18} +{'loss': 1.0732, 'grad_norm': 1.9430558681488037, 'learning_rate': 1.886145814907491e-05, 'epoch': 0.18} +{'loss': 1.0697, 'grad_norm': 1.938186764717102, 'learning_rate': 1.88585678672358e-05, 'epoch': 0.18} +{'loss': 1.009, 'grad_norm': 1.7199450731277466, 'learning_rate': 1.8855674143465567e-05, 'epoch': 0.18} +{'loss': 1.0555, 'grad_norm': 1.7696146965026855, 'learning_rate': 1.885277697888855e-05, 'epoch': 0.18} +{'loss': 0.9418, 'grad_norm': 0.9714207649230957, 'learning_rate': 1.884987637463042e-05, 'epoch': 0.18} +{'loss': 1.0621, 'grad_norm': 1.8125615119934082, 'learning_rate': 1.8846972331818182e-05, 'epoch': 0.18} +{'loss': 1.0307, 'grad_norm': 1.8445812463760376, 'learning_rate': 1.8844064851580182e-05, 'epoch': 0.18} +{'loss': 0.9727, 'grad_norm': 1.851665735244751, 'learning_rate': 1.8841153935046098e-05, 'epoch': 0.18} +{'loss': 1.1111, 'grad_norm': 1.925013542175293, 'learning_rate': 1.8838239583346948e-05, 'epoch': 0.18} +{'loss': 0.9995, 'grad_norm': 1.74274742603302, 'learning_rate': 1.8835321797615083e-05, 'epoch': 0.18} +{'loss': 1.0525, 'grad_norm': 1.7983896732330322, 'learning_rate': 1.8832400578984183e-05, 'epoch': 0.18} +{'loss': 0.9644, 'grad_norm': 1.9717377424240112, 'learning_rate': 1.8829475928589272e-05, 'epoch': 0.18} +{'loss': 1.0454, 'grad_norm': 1.8048958778381348, 'learning_rate': 1.8826547847566692e-05, 'epoch': 0.18} +{'loss': 1.0467, 'grad_norm': 1.7836039066314697, 'learning_rate': 1.8823616337054136e-05, 'epoch': 0.18} +{'loss': 1.1058, 'grad_norm': 1.9517724514007568, 'learning_rate': 1.8820681398190618e-05, 'epoch': 0.18} +{'loss': 0.9662, 'grad_norm': 1.874281644821167, 'learning_rate': 1.8817743032116483e-05, 'epoch': 0.18} +{'loss': 0.9874, 'grad_norm': 1.9344921112060547, 'learning_rate': 1.8814801239973413e-05, 'epoch': 0.18} +{'loss': 1.0832, 'grad_norm': 1.6398276090621948, 'learning_rate': 1.8811856022904423e-05, 'epoch': 0.18} +{'loss': 0.9568, 'grad_norm': 1.0052623748779297, 'learning_rate': 1.880890738205386e-05, 'epoch': 0.18} +{'loss': 1.0276, 'grad_norm': 1.9227954149246216, 'learning_rate': 1.880595531856738e-05, 'epoch': 0.18} +{'loss': 0.998, 'grad_norm': 1.767619252204895, 'learning_rate': 1.8802999833592e-05, 'epoch': 0.18} +{'loss': 0.993, 'grad_norm': 0.9438467025756836, 'learning_rate': 1.880004092827605e-05, 'epoch': 0.18} +{'loss': 1.0433, 'grad_norm': 1.8609060049057007, 'learning_rate': 1.8797078603769184e-05, 'epoch': 0.18} +{'loss': 1.0787, 'grad_norm': 1.75949227809906, 'learning_rate': 1.87941128612224e-05, 'epoch': 0.18} +{'loss': 1.0621, 'grad_norm': 1.7850619554519653, 'learning_rate': 1.8791143701788004e-05, 'epoch': 0.18} +{'loss': 1.1148, 'grad_norm': 1.5783555507659912, 'learning_rate': 1.8788171126619653e-05, 'epoch': 0.18} +{'loss': 1.0273, 'grad_norm': 1.8430248498916626, 'learning_rate': 1.8785195136872315e-05, 'epoch': 0.18} +{'loss': 0.9435, 'grad_norm': 1.79518723487854, 'learning_rate': 1.8782215733702286e-05, 'epoch': 0.18} +{'loss': 0.9921, 'grad_norm': 1.7462091445922852, 'learning_rate': 1.8779232918267194e-05, 'epoch': 0.18} +{'loss': 0.9917, 'grad_norm': 1.8421075344085693, 'learning_rate': 1.8776246691725988e-05, 'epoch': 0.18} +{'loss': 1.0652, 'grad_norm': 1.698609709739685, 'learning_rate': 1.8773257055238946e-05, 'epoch': 0.18} +{'loss': 1.0075, 'grad_norm': 1.9753444194793701, 'learning_rate': 1.8770264009967667e-05, 'epoch': 0.18} +{'loss': 0.989, 'grad_norm': 1.8233006000518799, 'learning_rate': 1.876726755707508e-05, 'epoch': 0.18} +{'loss': 1.0366, 'grad_norm': 1.813793420791626, 'learning_rate': 1.8764267697725432e-05, 'epoch': 0.19} +{'loss': 1.0749, 'grad_norm': 1.7772347927093506, 'learning_rate': 1.8761264433084294e-05, 'epoch': 0.19} +{'loss': 1.0448, 'grad_norm': 1.8173842430114746, 'learning_rate': 1.8758257764318566e-05, 'epoch': 0.19} +{'loss': 0.9166, 'grad_norm': 1.8422237634658813, 'learning_rate': 1.8755247692596467e-05, 'epoch': 0.19} +{'loss': 1.0763, 'grad_norm': 1.8423718214035034, 'learning_rate': 1.8752234219087538e-05, 'epoch': 0.19} +{'loss': 0.9247, 'grad_norm': 2.0342793464660645, 'learning_rate': 1.874921734496264e-05, 'epoch': 0.19} +{'loss': 1.036, 'grad_norm': 1.8517146110534668, 'learning_rate': 1.874619707139396e-05, 'epoch': 0.19} +{'loss': 0.9724, 'grad_norm': 1.822220802307129, 'learning_rate': 1.8743173399555e-05, 'epoch': 0.19} +{'loss': 1.0395, 'grad_norm': 1.901777982711792, 'learning_rate': 1.8740146330620584e-05, 'epoch': 0.19} +{'loss': 1.04, 'grad_norm': 1.7870514392852783, 'learning_rate': 1.8737115865766865e-05, 'epoch': 0.19} +{'loss': 1.024, 'grad_norm': 1.8566724061965942, 'learning_rate': 1.87340820061713e-05, 'epoch': 0.19} +{'loss': 1.0918, 'grad_norm': 1.833913803100586, 'learning_rate': 1.8731044753012674e-05, 'epoch': 0.19} +{'loss': 0.9741, 'grad_norm': 1.9312456846237183, 'learning_rate': 1.872800410747109e-05, 'epoch': 0.19} +{'loss': 1.1241, 'grad_norm': 1.8630257844924927, 'learning_rate': 1.8724960070727974e-05, 'epoch': 0.19} +{'loss': 0.9371, 'grad_norm': 0.979623556137085, 'learning_rate': 1.8721912643966055e-05, 'epoch': 0.19} +{'loss': 0.9364, 'grad_norm': 1.0421589612960815, 'learning_rate': 1.871886182836939e-05, 'epoch': 0.19} +{'loss': 1.0129, 'grad_norm': 1.923294186592102, 'learning_rate': 1.871580762512336e-05, 'epoch': 0.19} +{'loss': 1.0578, 'grad_norm': 1.8897676467895508, 'learning_rate': 1.871275003541464e-05, 'epoch': 0.19} +{'loss': 1.0375, 'grad_norm': 1.9939303398132324, 'learning_rate': 1.8709689060431242e-05, 'epoch': 0.19} +{'loss': 0.9813, 'grad_norm': 1.75008225440979, 'learning_rate': 1.8706624701362485e-05, 'epoch': 0.19} +{'loss': 1.0379, 'grad_norm': 1.8540035486221313, 'learning_rate': 1.8703556959398998e-05, 'epoch': 0.19} +{'loss': 0.9546, 'grad_norm': 1.5888439416885376, 'learning_rate': 1.8700485835732735e-05, 'epoch': 0.19} +{'loss': 1.1183, 'grad_norm': 1.9681564569473267, 'learning_rate': 1.8697411331556958e-05, 'epoch': 0.19} +{'loss': 0.9255, 'grad_norm': 1.9214798212051392, 'learning_rate': 1.8694333448066238e-05, 'epoch': 0.19} +{'loss': 0.9977, 'grad_norm': 1.9251784086227417, 'learning_rate': 1.8691252186456465e-05, 'epoch': 0.19} +{'loss': 0.9714, 'grad_norm': 1.7150720357894897, 'learning_rate': 1.8688167547924846e-05, 'epoch': 0.19} +{'loss': 0.9933, 'grad_norm': 1.9250214099884033, 'learning_rate': 1.868507953366989e-05, 'epoch': 0.19} +{'loss': 0.9743, 'grad_norm': 1.750757098197937, 'learning_rate': 1.8681988144891425e-05, 'epoch': 0.19} +{'loss': 1.0849, 'grad_norm': 1.8341478109359741, 'learning_rate': 1.8678893382790583e-05, 'epoch': 0.19} +{'loss': 1.0745, 'grad_norm': 2.0515639781951904, 'learning_rate': 1.8675795248569816e-05, 'epoch': 0.19} +{'loss': 0.8873, 'grad_norm': 1.0423712730407715, 'learning_rate': 1.8672693743432875e-05, 'epoch': 0.19} +{'loss': 1.0347, 'grad_norm': 1.8037941455841064, 'learning_rate': 1.8669588868584832e-05, 'epoch': 0.19} +{'loss': 1.1049, 'grad_norm': 2.019803285598755, 'learning_rate': 1.866648062523206e-05, 'epoch': 0.19} +{'loss': 1.0171, 'grad_norm': 2.1451826095581055, 'learning_rate': 1.866336901458225e-05, 'epoch': 0.19} +{'loss': 1.03, 'grad_norm': 1.9621726274490356, 'learning_rate': 1.866025403784439e-05, 'epoch': 0.19} +{'loss': 0.981, 'grad_norm': 1.7517309188842773, 'learning_rate': 1.8657135696228777e-05, 'epoch': 0.19} +{'loss': 1.0487, 'grad_norm': 1.7853367328643799, 'learning_rate': 1.865401399094703e-05, 'epoch': 0.19} +{'loss': 0.9861, 'grad_norm': 2.146730422973633, 'learning_rate': 1.8650888923212057e-05, 'epoch': 0.19} +{'loss': 0.9189, 'grad_norm': 0.9793761968612671, 'learning_rate': 1.8647760494238082e-05, 'epoch': 0.19} +{'loss': 1.0055, 'grad_norm': 1.868817687034607, 'learning_rate': 1.8644628705240636e-05, 'epoch': 0.19} +{'loss': 1.0008, 'grad_norm': 2.1687662601470947, 'learning_rate': 1.864149355743655e-05, 'epoch': 0.19} +{'loss': 1.1333, 'grad_norm': 1.9123890399932861, 'learning_rate': 1.863835505204396e-05, 'epoch': 0.19} +{'loss': 1.0287, 'grad_norm': 1.8504670858383179, 'learning_rate': 1.8635213190282312e-05, 'epoch': 0.19} +{'loss': 1.009, 'grad_norm': 2.1333136558532715, 'learning_rate': 1.863206797337235e-05, 'epoch': 0.19} +{'loss': 0.9733, 'grad_norm': 1.7836076021194458, 'learning_rate': 1.862891940253613e-05, 'epoch': 0.19} +{'loss': 0.9108, 'grad_norm': 0.9302992820739746, 'learning_rate': 1.8625767478997003e-05, 'epoch': 0.19} +{'loss': 1.0476, 'grad_norm': 1.951202154159546, 'learning_rate': 1.8622612203979628e-05, 'epoch': 0.19} +{'loss': 0.9919, 'grad_norm': 2.02200984954834, 'learning_rate': 1.8619453578709957e-05, 'epoch': 0.19} +{'loss': 1.0626, 'grad_norm': 1.8609297275543213, 'learning_rate': 1.861629160441526e-05, 'epoch': 0.19} +{'loss': 1.0691, 'grad_norm': 1.8526087999343872, 'learning_rate': 1.8613126282324092e-05, 'epoch': 0.19} +{'loss': 1.021, 'grad_norm': 1.845171332359314, 'learning_rate': 1.8609957613666316e-05, 'epoch': 0.19} +{'loss': 1.0055, 'grad_norm': 1.7719639539718628, 'learning_rate': 1.8606785599673098e-05, 'epoch': 0.19} +{'loss': 1.0531, 'grad_norm': 1.7800822257995605, 'learning_rate': 1.86036102415769e-05, 'epoch': 0.2} +{'loss': 1.0525, 'grad_norm': 2.0064759254455566, 'learning_rate': 1.860043154061148e-05, 'epoch': 0.2} +{'loss': 1.0499, 'grad_norm': 1.925990104675293, 'learning_rate': 1.8597249498011906e-05, 'epoch': 0.2} +{'loss': 1.0514, 'grad_norm': 1.6583619117736816, 'learning_rate': 1.8594064115014528e-05, 'epoch': 0.2} +{'loss': 1.0058, 'grad_norm': 2.0886902809143066, 'learning_rate': 1.8590875392857008e-05, 'epoch': 0.2} +{'loss': 1.0443, 'grad_norm': 1.8365750312805176, 'learning_rate': 1.8587683332778305e-05, 'epoch': 0.2} +{'loss': 1.0608, 'grad_norm': 1.7067655324935913, 'learning_rate': 1.8584487936018663e-05, 'epoch': 0.2} +{'loss': 1.0696, 'grad_norm': 3.241781711578369, 'learning_rate': 1.858128920381963e-05, 'epoch': 0.2} +{'loss': 0.9824, 'grad_norm': 2.1947684288024902, 'learning_rate': 1.8578087137424057e-05, 'epoch': 0.2} +{'loss': 1.0991, 'grad_norm': 1.9641575813293457, 'learning_rate': 1.8574881738076076e-05, 'epoch': 0.2} +{'loss': 1.0245, 'grad_norm': 1.8190679550170898, 'learning_rate': 1.8571673007021124e-05, 'epoch': 0.2} +{'loss': 1.0534, 'grad_norm': 1.632847785949707, 'learning_rate': 1.856846094550593e-05, 'epoch': 0.2} +{'loss': 0.9974, 'grad_norm': 2.068267583847046, 'learning_rate': 1.8565245554778516e-05, 'epoch': 0.2} +{'loss': 1.0699, 'grad_norm': 1.9693275690078735, 'learning_rate': 1.8562026836088198e-05, 'epoch': 0.2} +{'loss': 1.0121, 'grad_norm': 2.0901858806610107, 'learning_rate': 1.855880479068559e-05, 'epoch': 0.2} +{'loss': 1.0849, 'grad_norm': 1.9260458946228027, 'learning_rate': 1.855557941982258e-05, 'epoch': 0.2} +{'loss': 1.0075, 'grad_norm': 1.7626837491989136, 'learning_rate': 1.8552350724752383e-05, 'epoch': 0.2} +{'loss': 1.1387, 'grad_norm': 1.8949304819107056, 'learning_rate': 1.854911870672947e-05, 'epoch': 0.2} +{'loss': 1.0301, 'grad_norm': 1.7402981519699097, 'learning_rate': 1.854588336700962e-05, 'epoch': 0.2} +{'loss': 1.0361, 'grad_norm': 1.6345536708831787, 'learning_rate': 1.85426447068499e-05, 'epoch': 0.2} +{'loss': 0.9988, 'grad_norm': 1.7676799297332764, 'learning_rate': 1.853940272750867e-05, 'epoch': 0.2} +{'loss': 1.0244, 'grad_norm': 1.7999532222747803, 'learning_rate': 1.8536157430245577e-05, 'epoch': 0.2} +{'loss': 1.0727, 'grad_norm': 1.8965489864349365, 'learning_rate': 1.8532908816321557e-05, 'epoch': 0.2} +{'loss': 1.0685, 'grad_norm': 1.9054021835327148, 'learning_rate': 1.8529656886998838e-05, 'epoch': 0.2} +{'loss': 1.1067, 'grad_norm': 1.817887783050537, 'learning_rate': 1.8526401643540924e-05, 'epoch': 0.2} +{'loss': 0.975, 'grad_norm': 1.9422781467437744, 'learning_rate': 1.8523143087212623e-05, 'epoch': 0.2} +{'loss': 1.0833, 'grad_norm': 1.9313528537750244, 'learning_rate': 1.851988121928002e-05, 'epoch': 0.2} +{'loss': 1.0025, 'grad_norm': 1.6501028537750244, 'learning_rate': 1.8516616041010495e-05, 'epoch': 0.2} +{'loss': 1.0794, 'grad_norm': 1.608765959739685, 'learning_rate': 1.85133475536727e-05, 'epoch': 0.2} +{'loss': 0.9518, 'grad_norm': 1.7184313535690308, 'learning_rate': 1.851007575853659e-05, 'epoch': 0.2} +{'loss': 0.8879, 'grad_norm': 0.9287310838699341, 'learning_rate': 1.8506800656873397e-05, 'epoch': 0.2} +{'loss': 1.0023, 'grad_norm': 1.8343358039855957, 'learning_rate': 1.850352224995563e-05, 'epoch': 0.2} +{'loss': 1.0496, 'grad_norm': 1.8882182836532593, 'learning_rate': 1.8500240539057093e-05, 'epoch': 0.2} +{'loss': 0.9506, 'grad_norm': 1.8249081373214722, 'learning_rate': 1.8496955525452877e-05, 'epoch': 0.2} +{'loss': 1.0689, 'grad_norm': 1.760257601737976, 'learning_rate': 1.8493667210419337e-05, 'epoch': 0.2} +{'loss': 0.9769, 'grad_norm': 2.1494243144989014, 'learning_rate': 1.8490375595234135e-05, 'epoch': 0.2} +{'loss': 0.9155, 'grad_norm': 0.9508609771728516, 'learning_rate': 1.8487080681176195e-05, 'epoch': 0.2} +{'loss': 1.0206, 'grad_norm': 1.8663274049758911, 'learning_rate': 1.848378246952574e-05, 'epoch': 0.2} +{'loss': 1.0712, 'grad_norm': 1.6670690774917603, 'learning_rate': 1.848048096156426e-05, 'epoch': 0.2} +{'loss': 1.023, 'grad_norm': 1.9200055599212646, 'learning_rate': 1.8477176158574533e-05, 'epoch': 0.2} +{'loss': 0.9584, 'grad_norm': 0.9380425810813904, 'learning_rate': 1.8473868061840617e-05, 'epoch': 0.2} +{'loss': 0.994, 'grad_norm': 1.8937983512878418, 'learning_rate': 1.847055667264785e-05, 'epoch': 0.2} +{'loss': 0.9207, 'grad_norm': 0.9107078909873962, 'learning_rate': 1.8467241992282842e-05, 'epoch': 0.2} +{'loss': 0.9767, 'grad_norm': 1.8114209175109863, 'learning_rate': 1.8463924022033492e-05, 'epoch': 0.2} +{'loss': 1.0149, 'grad_norm': 1.8540204763412476, 'learning_rate': 1.8460602763188974e-05, 'epoch': 0.2} +{'loss': 1.0119, 'grad_norm': 1.8666528463363647, 'learning_rate': 1.8457278217039735e-05, 'epoch': 0.2} +{'loss': 1.0667, 'grad_norm': 1.8038444519042969, 'learning_rate': 1.8453950384877504e-05, 'epoch': 0.2} +{'loss': 1.0885, 'grad_norm': 1.9152212142944336, 'learning_rate': 1.8450619267995283e-05, 'epoch': 0.2} +{'loss': 1.0792, 'grad_norm': 1.750767707824707, 'learning_rate': 1.844728486768736e-05, 'epoch': 0.2} +{'loss': 0.9974, 'grad_norm': 1.5561922788619995, 'learning_rate': 1.8443947185249287e-05, 'epoch': 0.2} +{'loss': 0.9861, 'grad_norm': 1.6789472103118896, 'learning_rate': 1.8440606221977893e-05, 'epoch': 0.2} +{'loss': 0.9904, 'grad_norm': 1.8870909214019775, 'learning_rate': 1.8437261979171294e-05, 'epoch': 0.2} +{'loss': 0.9707, 'grad_norm': 1.9678127765655518, 'learning_rate': 1.843391445812886e-05, 'epoch': 0.21} +{'loss': 1.0365, 'grad_norm': 1.8673638105392456, 'learning_rate': 1.843056366015125e-05, 'epoch': 0.21} +{'loss': 1.115, 'grad_norm': 1.6730300188064575, 'learning_rate': 1.8427209586540392e-05, 'epoch': 0.21} +{'loss': 1.1257, 'grad_norm': 1.9282186031341553, 'learning_rate': 1.8423852238599488e-05, 'epoch': 0.21} +{'loss': 1.0134, 'grad_norm': 1.8388830423355103, 'learning_rate': 1.842049161763301e-05, 'epoch': 0.21} +{'loss': 1.0023, 'grad_norm': 1.644269347190857, 'learning_rate': 1.84171277249467e-05, 'epoch': 0.21} +{'loss': 1.0929, 'grad_norm': 1.7801090478897095, 'learning_rate': 1.841376056184758e-05, 'epoch': 0.21} +{'loss': 0.9899, 'grad_norm': 1.9444172382354736, 'learning_rate': 1.8410390129643927e-05, 'epoch': 0.21} +{'loss': 0.9575, 'grad_norm': 1.7458233833312988, 'learning_rate': 1.8407016429645305e-05, 'epoch': 0.21} +{'loss': 1.1052, 'grad_norm': 2.0225753784179688, 'learning_rate': 1.8403639463162536e-05, 'epoch': 0.21} +{'loss': 1.0638, 'grad_norm': 1.6574417352676392, 'learning_rate': 1.8400259231507716e-05, 'epoch': 0.21} +{'loss': 1.0221, 'grad_norm': 1.6813980340957642, 'learning_rate': 1.839687573599421e-05, 'epoch': 0.21} +{'loss': 1.0771, 'grad_norm': 1.9149795770645142, 'learning_rate': 1.8393488977936654e-05, 'epoch': 0.21} +{'loss': 1.0251, 'grad_norm': 1.6382049322128296, 'learning_rate': 1.839009895865094e-05, 'epoch': 0.21} +{'loss': 1.0837, 'grad_norm': 1.9716051816940308, 'learning_rate': 1.8386705679454243e-05, 'epoch': 0.21} +{'loss': 1.1278, 'grad_norm': 2.0258374214172363, 'learning_rate': 1.8383309141664992e-05, 'epoch': 0.21} +{'loss': 1.063, 'grad_norm': 1.6770477294921875, 'learning_rate': 1.837990934660288e-05, 'epoch': 0.21} +{'loss': 1.0724, 'grad_norm': 1.7865912914276123, 'learning_rate': 1.8376506295588887e-05, 'epoch': 0.21} +{'loss': 0.9722, 'grad_norm': 2.0237650871276855, 'learning_rate': 1.8373099989945236e-05, 'epoch': 0.21} +{'loss': 1.026, 'grad_norm': 1.9553661346435547, 'learning_rate': 1.836969043099542e-05, 'epoch': 0.21} +{'loss': 1.0627, 'grad_norm': 1.7831957340240479, 'learning_rate': 1.83662776200642e-05, 'epoch': 0.21} +{'loss': 1.0774, 'grad_norm': 1.9168123006820679, 'learning_rate': 1.8362861558477597e-05, 'epoch': 0.21} +{'loss': 1.0233, 'grad_norm': 1.8233869075775146, 'learning_rate': 1.8359442247562896e-05, 'epoch': 0.21} +{'loss': 0.8869, 'grad_norm': 2.010859727859497, 'learning_rate': 1.835601968864865e-05, 'epoch': 0.21} +{'loss': 1.0672, 'grad_norm': 1.8921481370925903, 'learning_rate': 1.8352593883064664e-05, 'epoch': 0.21} +{'loss': 1.0628, 'grad_norm': 1.8805242776870728, 'learning_rate': 1.8349164832142015e-05, 'epoch': 0.21} +{'loss': 1.0324, 'grad_norm': 1.7704228162765503, 'learning_rate': 1.834573253721303e-05, 'epoch': 0.21} +{'loss': 1.0192, 'grad_norm': 1.6177185773849487, 'learning_rate': 1.83422969996113e-05, 'epoch': 0.21} +{'loss': 1.0338, 'grad_norm': 1.7440119981765747, 'learning_rate': 1.8338858220671683e-05, 'epoch': 0.21} +{'loss': 0.9533, 'grad_norm': 1.7780840396881104, 'learning_rate': 1.833541620173029e-05, 'epoch': 0.21} +{'loss': 1.1008, 'grad_norm': 1.9631036520004272, 'learning_rate': 1.833197094412449e-05, 'epoch': 0.21} +{'loss': 1.0777, 'grad_norm': 1.7581381797790527, 'learning_rate': 1.8328522449192915e-05, 'epoch': 0.21} +{'loss': 1.0441, 'grad_norm': 1.6984455585479736, 'learning_rate': 1.8325070718275452e-05, 'epoch': 0.21} +{'loss': 0.9438, 'grad_norm': 1.8124308586120605, 'learning_rate': 1.8321615752713242e-05, 'epoch': 0.21} +{'loss': 0.9867, 'grad_norm': 1.8689213991165161, 'learning_rate': 1.8318157553848694e-05, 'epoch': 0.21} +{'loss': 1.0766, 'grad_norm': 1.8502029180526733, 'learning_rate': 1.8314696123025456e-05, 'epoch': 0.21} +{'loss': 1.0017, 'grad_norm': 1.9622750282287598, 'learning_rate': 1.8311231461588446e-05, 'epoch': 0.21} +{'loss': 0.9984, 'grad_norm': 1.793318748474121, 'learning_rate': 1.830776357088383e-05, 'epoch': 0.21} +{'loss': 1.0222, 'grad_norm': 1.8471704721450806, 'learning_rate': 1.8304292452259037e-05, 'epoch': 0.21} +{'loss': 1.0161, 'grad_norm': 1.885473608970642, 'learning_rate': 1.8300818107062744e-05, 'epoch': 0.21} +{'loss': 1.0857, 'grad_norm': 1.7097712755203247, 'learning_rate': 1.8297340536644877e-05, 'epoch': 0.21} +{'loss': 1.0589, 'grad_norm': 1.632116436958313, 'learning_rate': 1.8293859742356625e-05, 'epoch': 0.21} +{'loss': 0.9824, 'grad_norm': 1.7865877151489258, 'learning_rate': 1.8290375725550417e-05, 'epoch': 0.21} +{'loss': 0.961, 'grad_norm': 1.8652921915054321, 'learning_rate': 1.8286888487579954e-05, 'epoch': 0.21} +{'loss': 1.0006, 'grad_norm': 1.826355218887329, 'learning_rate': 1.8283398029800167e-05, 'epoch': 0.21} +{'loss': 0.9484, 'grad_norm': 1.707168698310852, 'learning_rate': 1.827990435356725e-05, 'epoch': 0.21} +{'loss': 0.9068, 'grad_norm': 1.7456754446029663, 'learning_rate': 1.827640746023865e-05, 'epoch': 0.21} +{'loss': 1.0244, 'grad_norm': 1.827331304550171, 'learning_rate': 1.827290735117305e-05, 'epoch': 0.21} +{'loss': 1.0135, 'grad_norm': 1.875224232673645, 'learning_rate': 1.82694040277304e-05, 'epoch': 0.21} +{'loss': 1.0461, 'grad_norm': 1.7828434705734253, 'learning_rate': 1.8265897491271885e-05, 'epoch': 0.21} +{'loss': 0.8909, 'grad_norm': 1.8433310985565186, 'learning_rate': 1.826238774315995e-05, 'epoch': 0.21} +{'loss': 1.014, 'grad_norm': 1.804031252861023, 'learning_rate': 1.8258874784758276e-05, 'epoch': 0.21} +{'loss': 1.0486, 'grad_norm': 1.7992842197418213, 'learning_rate': 1.82553586174318e-05, 'epoch': 0.22} +{'loss': 1.0638, 'grad_norm': 1.848079800605774, 'learning_rate': 1.8251839242546696e-05, 'epoch': 0.22} +{'loss': 0.9732, 'grad_norm': 1.8478999137878418, 'learning_rate': 1.82483166614704e-05, 'epoch': 0.22} +{'loss': 1.0612, 'grad_norm': 1.8616169691085815, 'learning_rate': 1.8244790875571582e-05, 'epoch': 0.22} +{'loss': 1.0503, 'grad_norm': 1.9023991823196411, 'learning_rate': 1.8241261886220155e-05, 'epoch': 0.22} +{'loss': 0.9433, 'grad_norm': 1.1413387060165405, 'learning_rate': 1.8237729694787295e-05, 'epoch': 0.22} +{'loss': 1.027, 'grad_norm': 1.787695288658142, 'learning_rate': 1.8234194302645393e-05, 'epoch': 0.22} +{'loss': 0.9202, 'grad_norm': 1.963045358657837, 'learning_rate': 1.8230655711168112e-05, 'epoch': 0.22} +{'loss': 1.0207, 'grad_norm': 1.711930274963379, 'learning_rate': 1.8227113921730336e-05, 'epoch': 0.22} +{'loss': 0.9479, 'grad_norm': 0.9423871636390686, 'learning_rate': 1.822356893570821e-05, 'epoch': 0.22} +{'loss': 0.9918, 'grad_norm': 1.8183504343032837, 'learning_rate': 1.8220020754479104e-05, 'epoch': 0.22} +{'loss': 1.1492, 'grad_norm': 1.918871521949768, 'learning_rate': 1.821646937942164e-05, 'epoch': 0.22} +{'loss': 0.898, 'grad_norm': 2.0717124938964844, 'learning_rate': 1.821291481191568e-05, 'epoch': 0.22} +{'loss': 1.0465, 'grad_norm': 1.7555091381072998, 'learning_rate': 1.8209357053342325e-05, 'epoch': 0.22} +{'loss': 1.0205, 'grad_norm': 1.7912899255752563, 'learning_rate': 1.8205796105083917e-05, 'epoch': 0.22} +{'loss': 0.9695, 'grad_norm': 1.930643081665039, 'learning_rate': 1.8202231968524033e-05, 'epoch': 0.22} +{'loss': 0.9631, 'grad_norm': 1.831152081489563, 'learning_rate': 1.8198664645047496e-05, 'epoch': 0.22} +{'loss': 1.0294, 'grad_norm': 1.735793113708496, 'learning_rate': 1.8195094136040357e-05, 'epoch': 0.22} +{'loss': 0.9889, 'grad_norm': 1.81747567653656, 'learning_rate': 1.819152044288992e-05, 'epoch': 0.22} +{'loss': 1.0412, 'grad_norm': 1.8590980768203735, 'learning_rate': 1.818794356698471e-05, 'epoch': 0.22} +{'loss': 0.9631, 'grad_norm': 2.285672903060913, 'learning_rate': 1.8184363509714494e-05, 'epoch': 0.22} +{'loss': 1.0693, 'grad_norm': 2.0010228157043457, 'learning_rate': 1.818078027247028e-05, 'epoch': 0.22} +{'loss': 0.9676, 'grad_norm': 1.8023267984390259, 'learning_rate': 1.8177193856644315e-05, 'epoch': 0.22} +{'loss': 0.9698, 'grad_norm': 1.6251740455627441, 'learning_rate': 1.8173604263630066e-05, 'epoch': 0.22} +{'loss': 0.9922, 'grad_norm': 1.848659634590149, 'learning_rate': 1.8170011494822242e-05, 'epoch': 0.22} +{'loss': 0.9706, 'grad_norm': 1.7886439561843872, 'learning_rate': 1.816641555161679e-05, 'epoch': 0.22} +{'loss': 1.1177, 'grad_norm': 1.7670153379440308, 'learning_rate': 1.8162816435410892e-05, 'epoch': 0.22} +{'loss': 0.9645, 'grad_norm': 1.720010757446289, 'learning_rate': 1.8159214147602947e-05, 'epoch': 0.22} +{'loss': 1.0013, 'grad_norm': 1.9693390130996704, 'learning_rate': 1.8155608689592604e-05, 'epoch': 0.22} +{'loss': 0.9904, 'grad_norm': 1.756457805633545, 'learning_rate': 1.8152000062780735e-05, 'epoch': 0.22} +{'loss': 0.9492, 'grad_norm': 1.805774211883545, 'learning_rate': 1.8148388268569453e-05, 'epoch': 0.22} +{'loss': 1.0284, 'grad_norm': 1.7399941682815552, 'learning_rate': 1.814477330836208e-05, 'epoch': 0.22} +{'loss': 1.0929, 'grad_norm': 1.838537335395813, 'learning_rate': 1.8141155183563195e-05, 'epoch': 0.22} +{'loss': 1.063, 'grad_norm': 1.9893226623535156, 'learning_rate': 1.8137533895578585e-05, 'epoch': 0.22} +{'loss': 1.0447, 'grad_norm': 1.577759027481079, 'learning_rate': 1.8133909445815277e-05, 'epoch': 0.22} +{'loss': 1.0892, 'grad_norm': 1.8358311653137207, 'learning_rate': 1.813028183568153e-05, 'epoch': 0.22} +{'loss': 0.9327, 'grad_norm': 1.0686964988708496, 'learning_rate': 1.8126651066586818e-05, 'epoch': 0.22} +{'loss': 0.9343, 'grad_norm': 1.7629766464233398, 'learning_rate': 1.8123017139941853e-05, 'epoch': 0.22} +{'loss': 1.1111, 'grad_norm': 1.798951506614685, 'learning_rate': 1.811938005715857e-05, 'epoch': 0.22} +{'loss': 1.0597, 'grad_norm': 1.740086317062378, 'learning_rate': 1.8115739819650124e-05, 'epoch': 0.22} +{'loss': 1.0726, 'grad_norm': 1.7950950860977173, 'learning_rate': 1.8112096428830912e-05, 'epoch': 0.22} +{'loss': 1.1226, 'grad_norm': 1.825238585472107, 'learning_rate': 1.8108449886116543e-05, 'epoch': 0.22} +{'loss': 1.021, 'grad_norm': 1.8829210996627808, 'learning_rate': 1.8104800192923856e-05, 'epoch': 0.22} +{'loss': 0.953, 'grad_norm': 1.971191167831421, 'learning_rate': 1.8101147350670905e-05, 'epoch': 0.22} +{'loss': 1.0679, 'grad_norm': 1.8453309535980225, 'learning_rate': 1.8097491360776987e-05, 'epoch': 0.22} +{'loss': 0.9755, 'grad_norm': 1.9674023389816284, 'learning_rate': 1.8093832224662598e-05, 'epoch': 0.22} +{'loss': 0.9797, 'grad_norm': 1.70065438747406, 'learning_rate': 1.8090169943749477e-05, 'epoch': 0.22} +{'loss': 1.0253, 'grad_norm': 1.9551289081573486, 'learning_rate': 1.808650451946057e-05, 'epoch': 0.22} +{'loss': 1.022, 'grad_norm': 1.5888679027557373, 'learning_rate': 1.8082835953220055e-05, 'epoch': 0.22} +{'loss': 1.0853, 'grad_norm': 1.6807249784469604, 'learning_rate': 1.8079164246453322e-05, 'epoch': 0.22} +{'loss': 1.0652, 'grad_norm': 1.8890981674194336, 'learning_rate': 1.8075489400586993e-05, 'epoch': 0.22} +{'loss': 1.0159, 'grad_norm': 1.772186279296875, 'learning_rate': 1.807181141704889e-05, 'epoch': 0.22} +{'loss': 0.9559, 'grad_norm': 0.9067140817642212, 'learning_rate': 1.806813029726808e-05, 'epoch': 0.23} +{'loss': 0.9861, 'grad_norm': 1.6426454782485962, 'learning_rate': 1.806444604267483e-05, 'epoch': 0.23} +{'loss': 1.0108, 'grad_norm': 1.745294451713562, 'learning_rate': 1.8060758654700622e-05, 'epoch': 0.23} +{'loss': 1.0605, 'grad_norm': 1.8601793050765991, 'learning_rate': 1.8057068134778174e-05, 'epoch': 0.23} +{'loss': 1.0164, 'grad_norm': 1.8416272401809692, 'learning_rate': 1.8053374484341403e-05, 'epoch': 0.23} +{'loss': 1.0863, 'grad_norm': 1.7557302713394165, 'learning_rate': 1.8049677704825455e-05, 'epoch': 0.23} +{'loss': 0.9973, 'grad_norm': 1.7593114376068115, 'learning_rate': 1.8045977797666685e-05, 'epoch': 0.23} +{'loss': 0.8869, 'grad_norm': 1.832168698310852, 'learning_rate': 1.8042274764302664e-05, 'epoch': 0.23} +{'loss': 1.0508, 'grad_norm': 1.7560023069381714, 'learning_rate': 1.8038568606172172e-05, 'epoch': 0.23} +{'loss': 1.0583, 'grad_norm': 1.8156278133392334, 'learning_rate': 1.803485932471522e-05, 'epoch': 0.23} +{'loss': 1.0035, 'grad_norm': 1.6816129684448242, 'learning_rate': 1.803114692137302e-05, 'epoch': 0.23} +{'loss': 1.0421, 'grad_norm': 1.7617651224136353, 'learning_rate': 1.8027431397587993e-05, 'epoch': 0.23} +{'loss': 1.038, 'grad_norm': 1.6813771724700928, 'learning_rate': 1.8023712754803783e-05, 'epoch': 0.23} +{'loss': 0.9613, 'grad_norm': 1.7724182605743408, 'learning_rate': 1.8019990994465235e-05, 'epoch': 0.23} +{'loss': 1.0767, 'grad_norm': 1.9362376928329468, 'learning_rate': 1.801626611801842e-05, 'epoch': 0.23} +{'loss': 1.0002, 'grad_norm': 2.093571901321411, 'learning_rate': 1.801253812691061e-05, 'epoch': 0.23} +{'loss': 0.9694, 'grad_norm': 1.7024599313735962, 'learning_rate': 1.8008807022590283e-05, 'epoch': 0.23} +{'loss': 1.0255, 'grad_norm': 1.7977104187011719, 'learning_rate': 1.8005072806507132e-05, 'epoch': 0.23} +{'loss': 1.0358, 'grad_norm': 2.2798855304718018, 'learning_rate': 1.8001335480112067e-05, 'epoch': 0.23} +{'loss': 1.0198, 'grad_norm': 1.7836755514144897, 'learning_rate': 1.799759504485719e-05, 'epoch': 0.23} +{'loss': 0.9821, 'grad_norm': 2.0673530101776123, 'learning_rate': 1.7993851502195822e-05, 'epoch': 0.23} +{'loss': 1.1074, 'grad_norm': 1.8233660459518433, 'learning_rate': 1.7990104853582494e-05, 'epoch': 0.23} +{'loss': 0.9712, 'grad_norm': 1.7256102561950684, 'learning_rate': 1.798635510047293e-05, 'epoch': 0.23} +{'loss': 1.0184, 'grad_norm': 1.5374864339828491, 'learning_rate': 1.7982602244324074e-05, 'epoch': 0.23} +{'loss': 1.0586, 'grad_norm': 1.7800910472869873, 'learning_rate': 1.7978846286594073e-05, 'epoch': 0.23} +{'loss': 1.0483, 'grad_norm': 1.6155908107757568, 'learning_rate': 1.7975087228742272e-05, 'epoch': 0.23} +{'loss': 1.0085, 'grad_norm': 1.812739610671997, 'learning_rate': 1.7971325072229227e-05, 'epoch': 0.23} +{'loss': 0.9515, 'grad_norm': 1.69160795211792, 'learning_rate': 1.7967559818516697e-05, 'epoch': 0.23} +{'loss': 0.9843, 'grad_norm': 1.69354248046875, 'learning_rate': 1.7963791469067637e-05, 'epoch': 0.23} +{'loss': 1.037, 'grad_norm': 1.8023440837860107, 'learning_rate': 1.796002002534622e-05, 'epoch': 0.23} +{'loss': 1.0643, 'grad_norm': 1.8462785482406616, 'learning_rate': 1.795624548881781e-05, 'epoch': 0.23} +{'loss': 0.9829, 'grad_norm': 1.996802806854248, 'learning_rate': 1.7952467860948975e-05, 'epoch': 0.23} +{'loss': 1.0581, 'grad_norm': 1.8798552751541138, 'learning_rate': 1.7948687143207485e-05, 'epoch': 0.23} +{'loss': 0.9505, 'grad_norm': 1.8175809383392334, 'learning_rate': 1.7944903337062304e-05, 'epoch': 0.23} +{'loss': 0.9593, 'grad_norm': 1.9423161745071411, 'learning_rate': 1.7941116443983612e-05, 'epoch': 0.23} +{'loss': 1.0067, 'grad_norm': 1.7487837076187134, 'learning_rate': 1.793732646544277e-05, 'epoch': 0.23} +{'loss': 1.0329, 'grad_norm': 1.8825668096542358, 'learning_rate': 1.7933533402912354e-05, 'epoch': 0.23} +{'loss': 0.9492, 'grad_norm': 1.6054389476776123, 'learning_rate': 1.7929737257866123e-05, 'epoch': 0.23} +{'loss': 1.0566, 'grad_norm': 1.8480180501937866, 'learning_rate': 1.7925938031779044e-05, 'epoch': 0.23} +{'loss': 1.0747, 'grad_norm': 1.8325356245040894, 'learning_rate': 1.7922135726127276e-05, 'epoch': 0.23} +{'loss': 1.0237, 'grad_norm': 1.872121810913086, 'learning_rate': 1.7918330342388183e-05, 'epoch': 0.23} +{'loss': 1.0303, 'grad_norm': 1.8002170324325562, 'learning_rate': 1.791452188204031e-05, 'epoch': 0.23} +{'loss': 1.0768, 'grad_norm': 1.7081952095031738, 'learning_rate': 1.7910710346563417e-05, 'epoch': 0.23} +{'loss': 1.0116, 'grad_norm': 1.76996648311615, 'learning_rate': 1.7906895737438437e-05, 'epoch': 0.23} +{'loss': 1.0709, 'grad_norm': 1.8718223571777344, 'learning_rate': 1.7903078056147515e-05, 'epoch': 0.23} +{'loss': 0.9604, 'grad_norm': 1.5310128927230835, 'learning_rate': 1.7899257304173975e-05, 'epoch': 0.23} +{'loss': 0.9809, 'grad_norm': 1.7675104141235352, 'learning_rate': 1.7895433483002356e-05, 'epoch': 0.23} +{'loss': 1.0158, 'grad_norm': 1.7944415807724, 'learning_rate': 1.7891606594118362e-05, 'epoch': 0.23} +{'loss': 1.1133, 'grad_norm': 2.0859057903289795, 'learning_rate': 1.7887776639008912e-05, 'epoch': 0.23} +{'loss': 1.0571, 'grad_norm': 1.7554936408996582, 'learning_rate': 1.7883943619162102e-05, 'epoch': 0.23} +{'loss': 0.9529, 'grad_norm': 1.7676604986190796, 'learning_rate': 1.788010753606722e-05, 'epoch': 0.23} +{'loss': 0.9976, 'grad_norm': 1.5903501510620117, 'learning_rate': 1.7876268391214756e-05, 'epoch': 0.23} +{'loss': 1.0401, 'grad_norm': 1.6037919521331787, 'learning_rate': 1.7872426186096374e-05, 'epoch': 0.24} +{'loss': 0.8902, 'grad_norm': 0.9709982872009277, 'learning_rate': 1.786858092220494e-05, 'epoch': 0.24} +{'loss': 0.9767, 'grad_norm': 1.7277920246124268, 'learning_rate': 1.78647326010345e-05, 'epoch': 0.24} +{'loss': 0.9068, 'grad_norm': 0.9047741293907166, 'learning_rate': 1.786088122408029e-05, 'epoch': 0.24} +{'loss': 1.1505, 'grad_norm': 1.9055099487304688, 'learning_rate': 1.785702679283874e-05, 'epoch': 0.24} +{'loss': 0.9357, 'grad_norm': 0.9288726449012756, 'learning_rate': 1.785316930880745e-05, 'epoch': 0.24} +{'loss': 0.969, 'grad_norm': 1.7098684310913086, 'learning_rate': 1.7849308773485226e-05, 'epoch': 0.24} +{'loss': 0.9651, 'grad_norm': 1.840373158454895, 'learning_rate': 1.7845445188372047e-05, 'epoch': 0.24} +{'loss': 0.9595, 'grad_norm': 1.9371370077133179, 'learning_rate': 1.784157855496908e-05, 'epoch': 0.24} +{'loss': 0.8936, 'grad_norm': 1.808653473854065, 'learning_rate': 1.7837708874778683e-05, 'epoch': 0.24} +{'loss': 0.8878, 'grad_norm': 1.819419026374817, 'learning_rate': 1.783383614930438e-05, 'epoch': 0.24} +{'loss': 1.0108, 'grad_norm': 1.772814154624939, 'learning_rate': 1.78299603800509e-05, 'epoch': 0.24} +{'loss': 0.9906, 'grad_norm': 1.695981740951538, 'learning_rate': 1.782608156852414e-05, 'epoch': 0.24} +{'loss': 0.9955, 'grad_norm': 1.7509560585021973, 'learning_rate': 1.7822199716231186e-05, 'epoch': 0.24} +{'loss': 0.9274, 'grad_norm': 1.8696647882461548, 'learning_rate': 1.78183148246803e-05, 'epoch': 0.24} +{'loss': 0.9698, 'grad_norm': 1.6070022583007812, 'learning_rate': 1.781442689538093e-05, 'epoch': 0.24} +{'loss': 1.107, 'grad_norm': 1.897267460823059, 'learning_rate': 1.7810535929843703e-05, 'epoch': 0.24} +{'loss': 1.0508, 'grad_norm': 1.9193824529647827, 'learning_rate': 1.780664192958042e-05, 'epoch': 0.24} +{'loss': 1.0556, 'grad_norm': 1.8030322790145874, 'learning_rate': 1.780274489610407e-05, 'epoch': 0.24} +{'loss': 1.0318, 'grad_norm': 1.7325868606567383, 'learning_rate': 1.7798844830928818e-05, 'epoch': 0.24} +{'loss': 1.0345, 'grad_norm': 1.8228734731674194, 'learning_rate': 1.7794941735570005e-05, 'epoch': 0.24} +{'loss': 1.0488, 'grad_norm': 1.743476152420044, 'learning_rate': 1.7791035611544146e-05, 'epoch': 0.24} +{'loss': 1.0806, 'grad_norm': 1.841093897819519, 'learning_rate': 1.778712646036894e-05, 'epoch': 0.24} +{'loss': 0.9101, 'grad_norm': 1.0069082975387573, 'learning_rate': 1.778321428356326e-05, 'epoch': 0.24} +{'loss': 1.0576, 'grad_norm': 1.8351349830627441, 'learning_rate': 1.777929908264715e-05, 'epoch': 0.24} +{'loss': 0.9264, 'grad_norm': 0.9957099556922913, 'learning_rate': 1.7775380859141837e-05, 'epoch': 0.24} +{'loss': 1.0947, 'grad_norm': 1.7579529285430908, 'learning_rate': 1.777145961456971e-05, 'epoch': 0.24} +{'loss': 1.0202, 'grad_norm': 1.727107286453247, 'learning_rate': 1.7767535350454347e-05, 'epoch': 0.24} +{'loss': 1.1389, 'grad_norm': 1.709868311882019, 'learning_rate': 1.7763608068320487e-05, 'epoch': 0.24} +{'loss': 0.9958, 'grad_norm': 1.717427372932434, 'learning_rate': 1.775967776969405e-05, 'epoch': 0.24} +{'loss': 0.9765, 'grad_norm': 1.6196602582931519, 'learning_rate': 1.7755744456102123e-05, 'epoch': 0.24} +{'loss': 1.0123, 'grad_norm': 1.794206142425537, 'learning_rate': 1.7751808129072968e-05, 'epoch': 0.24} +{'loss': 1.0263, 'grad_norm': 1.7889125347137451, 'learning_rate': 1.7747868790136012e-05, 'epoch': 0.24} +{'loss': 1.0379, 'grad_norm': 1.6667126417160034, 'learning_rate': 1.7743926440821857e-05, 'epoch': 0.24} +{'loss': 1.0555, 'grad_norm': 1.7728615999221802, 'learning_rate': 1.7739981082662275e-05, 'epoch': 0.24} +{'loss': 1.0027, 'grad_norm': 1.777949333190918, 'learning_rate': 1.7736032717190208e-05, 'epoch': 0.24} +{'loss': 0.978, 'grad_norm': 1.6838446855545044, 'learning_rate': 1.773208134593976e-05, 'epoch': 0.24} +{'loss': 1.0812, 'grad_norm': 1.857752799987793, 'learning_rate': 1.772812697044621e-05, 'epoch': 0.24} +{'loss': 0.9992, 'grad_norm': 1.7607934474945068, 'learning_rate': 1.7724169592245996e-05, 'epoch': 0.24} +{'loss': 1.0517, 'grad_norm': 1.7443361282348633, 'learning_rate': 1.772020921287674e-05, 'epoch': 0.24} +{'loss': 0.8986, 'grad_norm': 1.8854649066925049, 'learning_rate': 1.7716245833877202e-05, 'epoch': 0.24} +{'loss': 1.0414, 'grad_norm': 1.7788060903549194, 'learning_rate': 1.7712279456787334e-05, 'epoch': 0.24} +{'loss': 1.0034, 'grad_norm': 1.8839924335479736, 'learning_rate': 1.7708310083148242e-05, 'epoch': 0.24} +{'loss': 0.9332, 'grad_norm': 1.1061986684799194, 'learning_rate': 1.7704337714502195e-05, 'epoch': 0.24} +{'loss': 1.0976, 'grad_norm': 1.904776692390442, 'learning_rate': 1.7700362352392632e-05, 'epoch': 0.24} +{'loss': 1.036, 'grad_norm': 1.9352667331695557, 'learning_rate': 1.769638399836414e-05, 'epoch': 0.24} +{'loss': 0.9766, 'grad_norm': 1.694658875465393, 'learning_rate': 1.769240265396249e-05, 'epoch': 0.24} +{'loss': 1.0676, 'grad_norm': 1.8434773683547974, 'learning_rate': 1.7688418320734596e-05, 'epoch': 0.24} +{'loss': 1.0183, 'grad_norm': 1.8939404487609863, 'learning_rate': 1.7684431000228548e-05, 'epoch': 0.24} +{'loss': 0.9748, 'grad_norm': 1.8293036222457886, 'learning_rate': 1.7680440693993586e-05, 'epoch': 0.24} +{'loss': 0.9581, 'grad_norm': 1.837016224861145, 'learning_rate': 1.7676447403580114e-05, 'epoch': 0.24} +{'loss': 1.0667, 'grad_norm': 1.7812628746032715, 'learning_rate': 1.7672451130539697e-05, 'epoch': 0.24} +{'loss': 1.1035, 'grad_norm': 1.7685917615890503, 'learning_rate': 1.7668451876425056e-05, 'epoch': 0.25} +{'loss': 1.0731, 'grad_norm': 1.7301274538040161, 'learning_rate': 1.766444964279007e-05, 'epoch': 0.25} +{'loss': 1.0233, 'grad_norm': 1.724979043006897, 'learning_rate': 1.766044443118978e-05, 'epoch': 0.25} +{'loss': 1.0054, 'grad_norm': 1.7777516841888428, 'learning_rate': 1.7656436243180385e-05, 'epoch': 0.25} +{'loss': 1.0269, 'grad_norm': 1.7832542657852173, 'learning_rate': 1.765242508031923e-05, 'epoch': 0.25} +{'loss': 0.9269, 'grad_norm': 1.6509227752685547, 'learning_rate': 1.7648410944164823e-05, 'epoch': 0.25} +{'loss': 1.0168, 'grad_norm': 1.7656588554382324, 'learning_rate': 1.7644393836276832e-05, 'epoch': 0.25} +{'loss': 1.006, 'grad_norm': 1.8130340576171875, 'learning_rate': 1.7640373758216075e-05, 'epoch': 0.25} +{'loss': 1.0527, 'grad_norm': 1.8497333526611328, 'learning_rate': 1.7636350711544518e-05, 'epoch': 0.25} +{'loss': 0.9754, 'grad_norm': 1.7750145196914673, 'learning_rate': 1.7632324697825288e-05, 'epoch': 0.25} +{'loss': 1.0049, 'grad_norm': 1.7450518608093262, 'learning_rate': 1.7628295718622666e-05, 'epoch': 0.25} +{'loss': 1.0718, 'grad_norm': 1.7295411825180054, 'learning_rate': 1.762426377550208e-05, 'epoch': 0.25} +{'loss': 0.905, 'grad_norm': 0.9832800626754761, 'learning_rate': 1.762022887003011e-05, 'epoch': 0.25} +{'loss': 0.9969, 'grad_norm': 1.9119161367416382, 'learning_rate': 1.761619100377449e-05, 'epoch': 0.25} +{'loss': 1.0315, 'grad_norm': 1.6869584321975708, 'learning_rate': 1.7612150178304102e-05, 'epoch': 0.25} +{'loss': 0.9896, 'grad_norm': 1.627511978149414, 'learning_rate': 1.7608106395188983e-05, 'epoch': 0.25} +{'loss': 1.0248, 'grad_norm': 1.7231568098068237, 'learning_rate': 1.7604059656000313e-05, 'epoch': 0.25} +{'loss': 1.0182, 'grad_norm': 1.6172611713409424, 'learning_rate': 1.7600009962310417e-05, 'epoch': 0.25} +{'loss': 0.9912, 'grad_norm': 2.021790027618408, 'learning_rate': 1.7595957315692782e-05, 'epoch': 0.25} +{'loss': 1.078, 'grad_norm': 1.859525203704834, 'learning_rate': 1.759190171772203e-05, 'epoch': 0.25} +{'loss': 0.9745, 'grad_norm': 1.68208909034729, 'learning_rate': 1.7587843169973934e-05, 'epoch': 0.25} +{'loss': 1.0394, 'grad_norm': 1.7649648189544678, 'learning_rate': 1.7583781674025415e-05, 'epoch': 0.25} +{'loss': 1.0309, 'grad_norm': 1.6666499376296997, 'learning_rate': 1.757971723145453e-05, 'epoch': 0.25} +{'loss': 0.9921, 'grad_norm': 1.6025043725967407, 'learning_rate': 1.75756498438405e-05, 'epoch': 0.25} +{'loss': 1.06, 'grad_norm': 1.816798210144043, 'learning_rate': 1.7571579512763665e-05, 'epoch': 0.25} +{'loss': 1.0393, 'grad_norm': 1.614940881729126, 'learning_rate': 1.756750623980553e-05, 'epoch': 0.25} +{'loss': 0.9786, 'grad_norm': 1.6103315353393555, 'learning_rate': 1.7563430026548737e-05, 'epoch': 0.25} +{'loss': 1.0101, 'grad_norm': 1.058126449584961, 'learning_rate': 1.7559350874577066e-05, 'epoch': 0.25} +{'loss': 1.0158, 'grad_norm': 1.711808204650879, 'learning_rate': 1.7555268785475438e-05, 'epoch': 0.25} +{'loss': 1.0493, 'grad_norm': 1.819156527519226, 'learning_rate': 1.755118376082992e-05, 'epoch': 0.25} +{'loss': 0.9808, 'grad_norm': 1.6954679489135742, 'learning_rate': 1.7547095802227723e-05, 'epoch': 0.25} +{'loss': 1.0386, 'grad_norm': 1.9226264953613281, 'learning_rate': 1.7543004911257186e-05, 'epoch': 0.25} +{'loss': 1.0898, 'grad_norm': 1.7474966049194336, 'learning_rate': 1.75389110895078e-05, 'epoch': 0.25} +{'loss': 1.0258, 'grad_norm': 1.8988890647888184, 'learning_rate': 1.7534814338570187e-05, 'epoch': 0.25} +{'loss': 1.1225, 'grad_norm': 1.687604308128357, 'learning_rate': 1.7530714660036112e-05, 'epoch': 0.25} +{'loss': 0.8735, 'grad_norm': 0.8564708828926086, 'learning_rate': 1.7526612055498468e-05, 'epoch': 0.25} +{'loss': 0.9413, 'grad_norm': 1.9589653015136719, 'learning_rate': 1.7522506526551303e-05, 'epoch': 0.25} +{'loss': 1.0326, 'grad_norm': 1.763735055923462, 'learning_rate': 1.7518398074789776e-05, 'epoch': 0.25} +{'loss': 1.0755, 'grad_norm': 1.8903660774230957, 'learning_rate': 1.7514286701810203e-05, 'epoch': 0.25} +{'loss': 1.0698, 'grad_norm': 1.7349900007247925, 'learning_rate': 1.751017240921003e-05, 'epoch': 0.25} +{'loss': 0.9767, 'grad_norm': 1.749704122543335, 'learning_rate': 1.7506055198587835e-05, 'epoch': 0.25} +{'loss': 0.8928, 'grad_norm': 1.7060540914535522, 'learning_rate': 1.7501935071543325e-05, 'epoch': 0.25} +{'loss': 1.0696, 'grad_norm': 1.8056551218032837, 'learning_rate': 1.7497812029677344e-05, 'epoch': 0.25} +{'loss': 0.9184, 'grad_norm': 1.7558966875076294, 'learning_rate': 1.7493686074591873e-05, 'epoch': 0.25} +{'loss': 1.0018, 'grad_norm': 1.8495056629180908, 'learning_rate': 1.7489557207890025e-05, 'epoch': 0.25} +{'loss': 1.0815, 'grad_norm': 1.800793170928955, 'learning_rate': 1.7485425431176032e-05, 'epoch': 0.25} +{'loss': 1.0077, 'grad_norm': 1.952811598777771, 'learning_rate': 1.748129074605527e-05, 'epoch': 0.25} +{'loss': 0.9185, 'grad_norm': 1.7768923044204712, 'learning_rate': 1.7477153154134244e-05, 'epoch': 0.25} +{'loss': 0.9175, 'grad_norm': 1.576505184173584, 'learning_rate': 1.7473012657020577e-05, 'epoch': 0.25} +{'loss': 0.9495, 'grad_norm': 1.6869131326675415, 'learning_rate': 1.7468869256323036e-05, 'epoch': 0.25} +{'loss': 1.0346, 'grad_norm': 1.8397648334503174, 'learning_rate': 1.7464722953651504e-05, 'epoch': 0.25} +{'loss': 1.0139, 'grad_norm': 1.7348251342773438, 'learning_rate': 1.7460573750616996e-05, 'epoch': 0.26} +{'loss': 1.0418, 'grad_norm': 1.7938367128372192, 'learning_rate': 1.7456421648831658e-05, 'epoch': 0.26} +{'loss': 1.0398, 'grad_norm': 1.7588385343551636, 'learning_rate': 1.7452266649908756e-05, 'epoch': 0.26} +{'loss': 0.9784, 'grad_norm': 1.6426222324371338, 'learning_rate': 1.7448108755462684e-05, 'epoch': 0.26} +{'loss': 0.9632, 'grad_norm': 1.8306379318237305, 'learning_rate': 1.744394796710896e-05, 'epoch': 0.26} +{'loss': 1.0239, 'grad_norm': 1.7624549865722656, 'learning_rate': 1.7439784286464234e-05, 'epoch': 0.26} +{'loss': 1.0682, 'grad_norm': 1.6674563884735107, 'learning_rate': 1.743561771514626e-05, 'epoch': 0.26} +{'loss': 1.0796, 'grad_norm': 1.8462331295013428, 'learning_rate': 1.7431448254773943e-05, 'epoch': 0.26} +{'loss': 1.0622, 'grad_norm': 1.751166582107544, 'learning_rate': 1.7427275906967288e-05, 'epoch': 0.26} +{'loss': 1.0154, 'grad_norm': 1.7317910194396973, 'learning_rate': 1.7423100673347435e-05, 'epoch': 0.26} +{'loss': 0.9863, 'grad_norm': 1.7881197929382324, 'learning_rate': 1.741892255553664e-05, 'epoch': 0.26} +{'loss': 1.0101, 'grad_norm': 1.92635178565979, 'learning_rate': 1.741474155515827e-05, 'epoch': 0.26} +{'loss': 0.9825, 'grad_norm': 1.8678805828094482, 'learning_rate': 1.741055767383683e-05, 'epoch': 0.26} +{'loss': 0.9571, 'grad_norm': 1.5679118633270264, 'learning_rate': 1.7406370913197938e-05, 'epoch': 0.26} +{'loss': 0.9932, 'grad_norm': 1.802016258239746, 'learning_rate': 1.7402181274868323e-05, 'epoch': 0.26} +{'loss': 0.9404, 'grad_norm': 1.0259318351745605, 'learning_rate': 1.7397988760475842e-05, 'epoch': 0.26} +{'loss': 1.0702, 'grad_norm': 1.7486754655838013, 'learning_rate': 1.739379337164946e-05, 'epoch': 0.26} +{'loss': 1.081, 'grad_norm': 1.8874517679214478, 'learning_rate': 1.738959511001928e-05, 'epoch': 0.26} +{'loss': 0.9043, 'grad_norm': 1.7347866296768188, 'learning_rate': 1.7385393977216486e-05, 'epoch': 0.26} +{'loss': 1.0167, 'grad_norm': 1.917789101600647, 'learning_rate': 1.738118997487341e-05, 'epoch': 0.26} +{'loss': 0.9877, 'grad_norm': 1.7709996700286865, 'learning_rate': 1.7376983104623477e-05, 'epoch': 0.26} +{'loss': 0.9062, 'grad_norm': 1.7030057907104492, 'learning_rate': 1.737277336810124e-05, 'epoch': 0.26} +{'loss': 0.9742, 'grad_norm': 1.7927793264389038, 'learning_rate': 1.7368560766942362e-05, 'epoch': 0.26} +{'loss': 1.0647, 'grad_norm': 1.974885106086731, 'learning_rate': 1.736434530278362e-05, 'epoch': 0.26} +{'loss': 0.9913, 'grad_norm': 1.8438800573349, 'learning_rate': 1.73601269772629e-05, 'epoch': 0.26} +{'loss': 1.0457, 'grad_norm': 1.8494850397109985, 'learning_rate': 1.7355905792019195e-05, 'epoch': 0.26} +{'loss': 1.0228, 'grad_norm': 1.992661714553833, 'learning_rate': 1.7351681748692622e-05, 'epoch': 0.26} +{'loss': 1.0338, 'grad_norm': 1.733710527420044, 'learning_rate': 1.73474548489244e-05, 'epoch': 0.26} +{'loss': 0.9681, 'grad_norm': 1.829520344734192, 'learning_rate': 1.7343225094356857e-05, 'epoch': 0.26} +{'loss': 1.0252, 'grad_norm': 1.976354956626892, 'learning_rate': 1.7338992486633438e-05, 'epoch': 0.26} +{'loss': 0.9862, 'grad_norm': 1.8615800142288208, 'learning_rate': 1.733475702739869e-05, 'epoch': 0.26} +{'loss': 0.9432, 'grad_norm': 1.6784212589263916, 'learning_rate': 1.7330518718298263e-05, 'epoch': 0.26} +{'loss': 0.9529, 'grad_norm': 1.9273643493652344, 'learning_rate': 1.732627756097893e-05, 'epoch': 0.26} +{'loss': 0.9107, 'grad_norm': 1.0276539325714111, 'learning_rate': 1.732203355708856e-05, 'epoch': 0.26} +{'loss': 1.0597, 'grad_norm': 1.8310490846633911, 'learning_rate': 1.7317786708276124e-05, 'epoch': 0.26} +{'loss': 0.9781, 'grad_norm': 1.8171107769012451, 'learning_rate': 1.7313537016191706e-05, 'epoch': 0.26} +{'loss': 0.9609, 'grad_norm': 1.8415398597717285, 'learning_rate': 1.7309284482486494e-05, 'epoch': 0.26} +{'loss': 1.0246, 'grad_norm': 1.808424472808838, 'learning_rate': 1.7305029108812777e-05, 'epoch': 0.26} +{'loss': 1.0021, 'grad_norm': 2.0431182384490967, 'learning_rate': 1.730077089682395e-05, 'epoch': 0.26} +{'loss': 1.0615, 'grad_norm': 1.7229528427124023, 'learning_rate': 1.729650984817451e-05, 'epoch': 0.26} +{'loss': 1.0028, 'grad_norm': 1.954764723777771, 'learning_rate': 1.7292245964520056e-05, 'epoch': 0.26} +{'loss': 0.9881, 'grad_norm': 1.7568436861038208, 'learning_rate': 1.7287979247517285e-05, 'epoch': 0.26} +{'loss': 0.8825, 'grad_norm': 1.8195327520370483, 'learning_rate': 1.7283709698824004e-05, 'epoch': 0.26} +{'loss': 0.9414, 'grad_norm': 1.807549238204956, 'learning_rate': 1.727943732009911e-05, 'epoch': 0.26} +{'loss': 1.026, 'grad_norm': 1.665419101715088, 'learning_rate': 1.7275162113002607e-05, 'epoch': 0.26} +{'loss': 1.0202, 'grad_norm': 1.8149460554122925, 'learning_rate': 1.7270884079195596e-05, 'epoch': 0.26} +{'loss': 0.9394, 'grad_norm': 1.875658631324768, 'learning_rate': 1.7266603220340273e-05, 'epoch': 0.26} +{'loss': 1.082, 'grad_norm': 1.882163405418396, 'learning_rate': 1.726231953809993e-05, 'epoch': 0.26} +{'loss': 0.9827, 'grad_norm': 1.8240703344345093, 'learning_rate': 1.7258033034138972e-05, 'epoch': 0.26} +{'loss': 0.8961, 'grad_norm': 1.9846265316009521, 'learning_rate': 1.7253743710122877e-05, 'epoch': 0.26} +{'loss': 0.9887, 'grad_norm': 1.7519563436508179, 'learning_rate': 1.7249451567718236e-05, 'epoch': 0.26} +{'loss': 1.0265, 'grad_norm': 1.8336985111236572, 'learning_rate': 1.7245156608592727e-05, 'epoch': 0.26} +{'loss': 1.0037, 'grad_norm': 1.8857085704803467, 'learning_rate': 1.7240858834415126e-05, 'epoch': 0.27} +{'loss': 0.9228, 'grad_norm': 1.7547630071640015, 'learning_rate': 1.7236558246855303e-05, 'epoch': 0.27} +{'loss': 1.0608, 'grad_norm': 1.7152235507965088, 'learning_rate': 1.7232254847584212e-05, 'epoch': 0.27} +{'loss': 1.0041, 'grad_norm': 1.7370901107788086, 'learning_rate': 1.7227948638273918e-05, 'epoch': 0.27} +{'loss': 0.9868, 'grad_norm': 1.6386165618896484, 'learning_rate': 1.7223639620597556e-05, 'epoch': 0.27} +{'loss': 0.96, 'grad_norm': 1.0060551166534424, 'learning_rate': 1.721932779622937e-05, 'epoch': 0.27} +{'loss': 1.1159, 'grad_norm': 1.8396605253219604, 'learning_rate': 1.721501316684469e-05, 'epoch': 0.27} +{'loss': 0.9782, 'grad_norm': 1.8105412721633911, 'learning_rate': 1.7210695734119926e-05, 'epoch': 0.27} +{'loss': 1.1463, 'grad_norm': 1.8018509149551392, 'learning_rate': 1.7206375499732586e-05, 'epoch': 0.27} +{'loss': 0.9705, 'grad_norm': 1.8077846765518188, 'learning_rate': 1.7202052465361268e-05, 'epoch': 0.27} +{'loss': 1.0391, 'grad_norm': 1.6987974643707275, 'learning_rate': 1.7197726632685653e-05, 'epoch': 0.27} +{'loss': 0.9726, 'grad_norm': 1.6711596250534058, 'learning_rate': 1.7193398003386514e-05, 'epoch': 0.27} +{'loss': 1.0221, 'grad_norm': 1.7686970233917236, 'learning_rate': 1.7189066579145704e-05, 'epoch': 0.27} +{'loss': 0.9996, 'grad_norm': 1.8154048919677734, 'learning_rate': 1.7184732361646168e-05, 'epoch': 0.27} +{'loss': 1.0399, 'grad_norm': 1.779394507408142, 'learning_rate': 1.718039535257194e-05, 'epoch': 0.27} +{'loss': 1.0439, 'grad_norm': 1.8855408430099487, 'learning_rate': 1.717605555360812e-05, 'epoch': 0.27} +{'loss': 0.9833, 'grad_norm': 1.8445767164230347, 'learning_rate': 1.7171712966440916e-05, 'epoch': 0.27} +{'loss': 1.01, 'grad_norm': 1.8350696563720703, 'learning_rate': 1.71673675927576e-05, 'epoch': 0.27} +{'loss': 1.0609, 'grad_norm': 2.0257089138031006, 'learning_rate': 1.7163019434246545e-05, 'epoch': 0.27} +{'loss': 1.0513, 'grad_norm': 1.6797620058059692, 'learning_rate': 1.7158668492597186e-05, 'epoch': 0.27} +{'loss': 1.0546, 'grad_norm': 1.6715037822723389, 'learning_rate': 1.7154314769500052e-05, 'epoch': 0.27} +{'loss': 1.0242, 'grad_norm': 1.692131519317627, 'learning_rate': 1.7149958266646756e-05, 'epoch': 0.27} +{'loss': 0.9728, 'grad_norm': 1.886602520942688, 'learning_rate': 1.7145598985729976e-05, 'epoch': 0.27} +{'loss': 1.0239, 'grad_norm': 1.510108232498169, 'learning_rate': 1.7141236928443482e-05, 'epoch': 0.27} +{'loss': 1.0234, 'grad_norm': 1.9030672311782837, 'learning_rate': 1.7136872096482123e-05, 'epoch': 0.27} +{'loss': 0.9716, 'grad_norm': 1.6661264896392822, 'learning_rate': 1.713250449154182e-05, 'epoch': 0.27} +{'loss': 1.0058, 'grad_norm': 1.672459363937378, 'learning_rate': 1.7128134115319567e-05, 'epoch': 0.27} +{'loss': 1.0422, 'grad_norm': 1.665493369102478, 'learning_rate': 1.712376096951345e-05, 'epoch': 0.27} +{'loss': 1.0554, 'grad_norm': 2.1093878746032715, 'learning_rate': 1.711938505582262e-05, 'epoch': 0.27} +{'loss': 1.0035, 'grad_norm': 1.7243448495864868, 'learning_rate': 1.7115006375947304e-05, 'epoch': 0.27} +{'loss': 0.9071, 'grad_norm': 1.8660968542099, 'learning_rate': 1.7110624931588805e-05, 'epoch': 0.27} +{'loss': 1.0344, 'grad_norm': 1.950353741645813, 'learning_rate': 1.7106240724449507e-05, 'epoch': 0.27} +{'loss': 1.0574, 'grad_norm': 1.7867465019226074, 'learning_rate': 1.7101853756232856e-05, 'epoch': 0.27} +{'loss': 0.8997, 'grad_norm': 1.833276629447937, 'learning_rate': 1.7097464028643376e-05, 'epoch': 0.27} +{'loss': 1.0942, 'grad_norm': 1.7349493503570557, 'learning_rate': 1.7093071543386667e-05, 'epoch': 0.27} +{'loss': 1.0613, 'grad_norm': 1.99166738986969, 'learning_rate': 1.7088676302169394e-05, 'epoch': 0.27} +{'loss': 0.9649, 'grad_norm': 1.8484245538711548, 'learning_rate': 1.7084278306699295e-05, 'epoch': 0.27} +{'loss': 1.0351, 'grad_norm': 1.764248013496399, 'learning_rate': 1.7079877558685178e-05, 'epoch': 0.27} +{'loss': 1.0147, 'grad_norm': 1.7142534255981445, 'learning_rate': 1.7075474059836923e-05, 'epoch': 0.27} +{'loss': 0.9572, 'grad_norm': 1.048027753829956, 'learning_rate': 1.7071067811865477e-05, 'epoch': 0.27} +{'loss': 0.9821, 'grad_norm': 1.7482398748397827, 'learning_rate': 1.7066658816482856e-05, 'epoch': 0.27} +{'loss': 1.0296, 'grad_norm': 2.0123450756073, 'learning_rate': 1.7062247075402137e-05, 'epoch': 0.27} +{'loss': 0.965, 'grad_norm': 1.903601884841919, 'learning_rate': 1.705783259033748e-05, 'epoch': 0.27} +{'loss': 1.066, 'grad_norm': 1.7349658012390137, 'learning_rate': 1.705341536300409e-05, 'epoch': 0.27} +{'loss': 0.9331, 'grad_norm': 0.969205915927887, 'learning_rate': 1.7048995395118253e-05, 'epoch': 0.27} +{'loss': 1.0358, 'grad_norm': 1.8710532188415527, 'learning_rate': 1.7044572688397316e-05, 'epoch': 0.27} +{'loss': 1.0545, 'grad_norm': 1.883230447769165, 'learning_rate': 1.7040147244559688e-05, 'epoch': 0.27} +{'loss': 0.9576, 'grad_norm': 1.9921497106552124, 'learning_rate': 1.7035719065324837e-05, 'epoch': 0.27} +{'loss': 1.0137, 'grad_norm': 1.786797285079956, 'learning_rate': 1.7031288152413304e-05, 'epoch': 0.27} +{'loss': 0.9465, 'grad_norm': 1.7009328603744507, 'learning_rate': 1.7026854507546694e-05, 'epoch': 0.27} +{'loss': 1.0471, 'grad_norm': 1.8068492412567139, 'learning_rate': 1.7022418132447656e-05, 'epoch': 0.27} +{'loss': 0.9715, 'grad_norm': 1.7573258876800537, 'learning_rate': 1.7017979028839918e-05, 'epoch': 0.27} +{'loss': 0.983, 'grad_norm': 1.7368972301483154, 'learning_rate': 1.7013537198448253e-05, 'epoch': 0.28} +{'loss': 0.935, 'grad_norm': 1.7669157981872559, 'learning_rate': 1.700909264299851e-05, 'epoch': 0.28} +{'loss': 1.0562, 'grad_norm': 1.910591959953308, 'learning_rate': 1.7004645364217584e-05, 'epoch': 0.28} +{'loss': 0.9297, 'grad_norm': 0.9342668056488037, 'learning_rate': 1.7000195363833434e-05, 'epoch': 0.28} +{'loss': 1.0257, 'grad_norm': 1.724603533744812, 'learning_rate': 1.6995742643575076e-05, 'epoch': 0.28} +{'loss': 0.9676, 'grad_norm': 1.8259358406066895, 'learning_rate': 1.6991287205172575e-05, 'epoch': 0.28} +{'loss': 1.0805, 'grad_norm': 1.8452657461166382, 'learning_rate': 1.6986829050357068e-05, 'epoch': 0.28} +{'loss': 1.118, 'grad_norm': 1.8732038736343384, 'learning_rate': 1.698236818086073e-05, 'epoch': 0.28} +{'loss': 1.0446, 'grad_norm': 1.706579327583313, 'learning_rate': 1.6977904598416803e-05, 'epoch': 0.28} +{'loss': 0.929, 'grad_norm': 1.7886749505996704, 'learning_rate': 1.697343830475958e-05, 'epoch': 0.28} +{'loss': 0.872, 'grad_norm': 0.9286351799964905, 'learning_rate': 1.6968969301624402e-05, 'epoch': 0.28} +{'loss': 0.995, 'grad_norm': 1.6689391136169434, 'learning_rate': 1.696449759074767e-05, 'epoch': 0.28} +{'loss': 1.0275, 'grad_norm': 1.6442140340805054, 'learning_rate': 1.6960023173866834e-05, 'epoch': 0.28} +{'loss': 0.8755, 'grad_norm': 0.9267822504043579, 'learning_rate': 1.6955546052720398e-05, 'epoch': 0.28} +{'loss': 0.9753, 'grad_norm': 1.802245020866394, 'learning_rate': 1.695106622904791e-05, 'epoch': 0.28} +{'loss': 1.0423, 'grad_norm': 1.7879210710525513, 'learning_rate': 1.6946583704589973e-05, 'epoch': 0.28} +{'loss': 0.93, 'grad_norm': 1.692579984664917, 'learning_rate': 1.6942098481088245e-05, 'epoch': 0.28} +{'loss': 0.9838, 'grad_norm': 1.565054178237915, 'learning_rate': 1.693761056028542e-05, 'epoch': 0.28} +{'loss': 0.994, 'grad_norm': 1.7526954412460327, 'learning_rate': 1.6933119943925243e-05, 'epoch': 0.28} +{'loss': 1.0117, 'grad_norm': 1.9390277862548828, 'learning_rate': 1.692862663375252e-05, 'epoch': 0.28} +{'loss': 0.9235, 'grad_norm': 1.6273677349090576, 'learning_rate': 1.692413063151309e-05, 'epoch': 0.28} +{'loss': 0.9116, 'grad_norm': 1.0046995878219604, 'learning_rate': 1.6919631938953835e-05, 'epoch': 0.28} +{'loss': 0.9815, 'grad_norm': 1.7830642461776733, 'learning_rate': 1.6915130557822698e-05, 'epoch': 0.28} +{'loss': 0.9248, 'grad_norm': 1.6447077989578247, 'learning_rate': 1.691062648986865e-05, 'epoch': 0.28} +{'loss': 0.8766, 'grad_norm': 0.9249933362007141, 'learning_rate': 1.690611973684172e-05, 'epoch': 0.28} +{'loss': 0.9857, 'grad_norm': 1.8931480646133423, 'learning_rate': 1.6901610300492963e-05, 'epoch': 0.28} +{'loss': 0.9355, 'grad_norm': 1.5578685998916626, 'learning_rate': 1.6897098182574502e-05, 'epoch': 0.28} +{'loss': 1.067, 'grad_norm': 1.6910210847854614, 'learning_rate': 1.689258338483947e-05, 'epoch': 0.28} +{'loss': 1.0014, 'grad_norm': 1.639620065689087, 'learning_rate': 1.6888065909042074e-05, 'epoch': 0.28} +{'loss': 0.9396, 'grad_norm': 1.7743351459503174, 'learning_rate': 1.688354575693754e-05, 'epoch': 0.28} +{'loss': 0.9773, 'grad_norm': 1.8035134077072144, 'learning_rate': 1.687902293028214e-05, 'epoch': 0.28} +{'loss': 0.993, 'grad_norm': 1.723802924156189, 'learning_rate': 1.6874497430833182e-05, 'epoch': 0.28} +{'loss': 1.0971, 'grad_norm': 1.8689061403274536, 'learning_rate': 1.686996926034902e-05, 'epoch': 0.28} +{'loss': 0.9967, 'grad_norm': 1.9459902048110962, 'learning_rate': 1.6865438420589036e-05, 'epoch': 0.28} +{'loss': 1.0684, 'grad_norm': 1.9046951532363892, 'learning_rate': 1.6860904913313656e-05, 'epoch': 0.28} +{'loss': 0.9766, 'grad_norm': 1.7166478633880615, 'learning_rate': 1.6856368740284342e-05, 'epoch': 0.28} +{'loss': 0.9946, 'grad_norm': 1.767199993133545, 'learning_rate': 1.685182990326359e-05, 'epoch': 0.28} +{'loss': 1.0311, 'grad_norm': 1.681044101715088, 'learning_rate': 1.6847288404014937e-05, 'epoch': 0.28} +{'loss': 1.0712, 'grad_norm': 1.7135965824127197, 'learning_rate': 1.684274424430294e-05, 'epoch': 0.28} +{'loss': 1.0661, 'grad_norm': 1.7834993600845337, 'learning_rate': 1.68381974258932e-05, 'epoch': 0.28} +{'loss': 0.9496, 'grad_norm': 1.8556705713272095, 'learning_rate': 1.6833647950552357e-05, 'epoch': 0.28} +{'loss': 0.9948, 'grad_norm': 2.6503992080688477, 'learning_rate': 1.682909582004807e-05, 'epoch': 0.28} +{'loss': 0.988, 'grad_norm': 1.7716891765594482, 'learning_rate': 1.682454103614904e-05, 'epoch': 0.28} +{'loss': 0.9067, 'grad_norm': 1.7190643548965454, 'learning_rate': 1.6819983600624986e-05, 'epoch': 0.28} +{'loss': 0.9504, 'grad_norm': 1.8176895380020142, 'learning_rate': 1.6815423515246674e-05, 'epoch': 0.28} +{'loss': 1.029, 'grad_norm': 1.6780424118041992, 'learning_rate': 1.681086078178589e-05, 'epoch': 0.28} +{'loss': 0.8811, 'grad_norm': 1.7558003664016724, 'learning_rate': 1.680629540201545e-05, 'epoch': 0.28} +{'loss': 0.9793, 'grad_norm': 1.7794471979141235, 'learning_rate': 1.6801727377709195e-05, 'epoch': 0.28} +{'loss': 0.9426, 'grad_norm': 1.855647087097168, 'learning_rate': 1.6797156710642003e-05, 'epoch': 0.28} +{'loss': 0.886, 'grad_norm': 1.7893325090408325, 'learning_rate': 1.679258340258977e-05, 'epoch': 0.28} +{'loss': 1.0244, 'grad_norm': 1.7456836700439453, 'learning_rate': 1.678800745532942e-05, 'epoch': 0.28} +{'loss': 1.014, 'grad_norm': 1.8668369054794312, 'learning_rate': 1.6783428870638904e-05, 'epoch': 0.28} +{'loss': 1.022, 'grad_norm': 1.6180843114852905, 'learning_rate': 1.67788476502972e-05, 'epoch': 0.29} +{'loss': 1.0654, 'grad_norm': 1.9680200815200806, 'learning_rate': 1.6774263796084304e-05, 'epoch': 0.29} +{'loss': 0.8917, 'grad_norm': 0.9683606028556824, 'learning_rate': 1.676967730978124e-05, 'epoch': 0.29} +{'loss': 0.9863, 'grad_norm': 1.856134295463562, 'learning_rate': 1.6765088193170055e-05, 'epoch': 0.29} +{'loss': 1.0727, 'grad_norm': 1.7471874952316284, 'learning_rate': 1.676049644803381e-05, 'epoch': 0.29} +{'loss': 0.9402, 'grad_norm': 1.6205285787582397, 'learning_rate': 1.6755902076156606e-05, 'epoch': 0.29} +{'loss': 0.9591, 'grad_norm': 1.7780214548110962, 'learning_rate': 1.6751305079323538e-05, 'epoch': 0.29} +{'loss': 0.9756, 'grad_norm': 1.5687700510025024, 'learning_rate': 1.6746705459320746e-05, 'epoch': 0.29} +{'loss': 0.9644, 'grad_norm': 1.8139265775680542, 'learning_rate': 1.6742103217935372e-05, 'epoch': 0.29} +{'loss': 0.9729, 'grad_norm': 1.7928632497787476, 'learning_rate': 1.6737498356955586e-05, 'epoch': 0.29} +{'loss': 0.9307, 'grad_norm': 2.1075594425201416, 'learning_rate': 1.6732890878170573e-05, 'epoch': 0.29} +{'loss': 1.0713, 'grad_norm': 1.886930227279663, 'learning_rate': 1.672828078337053e-05, 'epoch': 0.29} +{'loss': 1.0121, 'grad_norm': 1.808154821395874, 'learning_rate': 1.672366807434668e-05, 'epoch': 0.29} +{'loss': 1.0397, 'grad_norm': 1.666575312614441, 'learning_rate': 1.6719052752891262e-05, 'epoch': 0.29} +{'loss': 0.9979, 'grad_norm': 1.7579233646392822, 'learning_rate': 1.6714434820797515e-05, 'epoch': 0.29} +{'loss': 1.0059, 'grad_norm': 1.8627372980117798, 'learning_rate': 1.67098142798597e-05, 'epoch': 0.29} +WARNING: tokenization mismatch: 0 vs. 674. (ignored) +number of rounds: 1 +rounds: ['A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user\'s questions. USER: ASSISTANT: You are a forward-thinking HR consultancy with the following knowledge and traits.\n\nKNOWLEDGE: HR management, employee satisfaction, talent retention, team building, workplace culture.\n\nTRAITS: Innovation, empathy, solution orientation, creativity, understanding, teamwork.\n\nTASK: Use your knowledge to create an internal strategy document. We are an HR consultancy looking to develop 10 creative ways to improve employee satisfaction. Focus on ideas that are innovative and engaging, and cater to the needs of HR managers and company leaders.\n\nOUTPUT: A table in German containing 10 creative strategies to improve employee satisfaction, including a brief description for each. ASSISTANT: | Nr. | Strategie | Beschreibung |\n| --- | --- | --- |\n| 1. | "Wertschätzungswand" | Eine Wand im Büro, auf der Mitarbeiter anonym ihre Kollegen für ihre Leistungen und Beiträge loben und anerkennen können. |\n| 2. | Flexible Arbeitsmodelle | Einführung von flexiblen Arbeitszeiten und Homeoffice-Optionen, um Work-Life-Balance zu verbessern und individuelle Bedürfnisse zu berücksichtigen. |\n| 3. | Mitarbeiter-Workshops für persönliche Entwicklung | Organisieren von internen Workshops und Schulungen, um persönliche und berufliche Fähigkeiten der Mitarbeiter zu fördern. |\n| 4. | Team-Building-Aktivitäten außerhalb des Arbeitsplatzes | Regelmäßige Veranstaltungen wie Escape Rooms, Sportevents oder Kochkurse, um den Zusammenhalt und die Kommunikation im Team zu stärken. |\n| 5. | Gesundheits- und Wellness-Programme | Einführung von Fitness- und Entspannungskursen wie Yoga, Meditation oder Massagen, um das Wohlbefinden der Mitarbeiter zu fördern. |\n| 6. | Feedback-Kultur und transparente Kommunikation | Etablierung einer offenen Feedback-Kultur und transparenter Kommunikation, um die Mitarbeiter in Entscheidungsprozesse einzubinden. |\n| 7. | Mitarbeiter-Mentoring-Programm | Einführung eines Mentoring-Programms, bei dem erfahrene Mitarbeiter ihr Wissen und ihre Erfahrungen mit jüngeren Kollegen teilen. |\n| 8. | "Innovationstage" | Einrichtung von speziellen Tagen, an denen Mitarbeiter an eigenen Projekten arbeiten oder innovative Ideen für das Unternehmen entwickeln können. |\n| 9. | Individuelle Karriereentwicklungspläne | Erstellung von personalisierten Karriereplänen in Zusammenarbeit mit den Mitarbeitern, um ihre berufliche Entwicklung zu unterstützen. |\n| 10. | Gemeinschaftliche soziale Verantwortung (CSR) Initiativen | Organisation von Freiwilligenprojekten und CSR-Aktivitäten, um den Mitarbeitern die Möglichkeit zu geben, sozialen Einfluss zu nehmen. |'] +prompt: A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. USER: ASSISTANT: You are a forward-thinking HR consultancy with the following knowledge and traits. + +KNOWLEDGE: HR management, employee satisfaction, talent retention, team building, workplace culture. + +TRAITS: Innovation, empathy, solution orientation, creativity, understanding, teamwork. + +TASK: Use your knowledge to create an internal strategy document. We are an HR consultancy looking to develop 10 creative ways to improve employee satisfaction. Focus on ideas that are innovative and engaging, and cater to the needs of HR managers and company leaders. + +OUTPUT: A table in German containing 10 creative strategies to improve employee satisfaction, including a brief description for each. ASSISTANT: | Nr. | Strategie | Beschreibung | +| --- | --- | --- | +| 1. | "Wertschätzungswand" | Eine Wand im Büro, auf der Mitarbeiter anonym ihre Kollegen für ihre Leistungen und Beiträge loben und anerkennen können. | +| 2. | Flexible Arbeitsmodelle | Einführung von flexiblen Arbeitszeiten und Homeoffice-Optionen, um Work-Life-Balance zu verbessern und individuelle Bedürfnisse zu berücksichtigen. | +| 3. | Mitarbeiter-Workshops für persönliche Entwicklung | Organisieren von internen Workshops und Schulungen, um persönliche und berufliche Fähigkeiten der Mitarbeiter zu fördern. | +| 4. | Team-Building-Aktivitäten außerhalb des Arbeitsplatzes | Regelmäßige Veranstaltungen wie Escape Rooms, Sportevents oder Kochkurse, um den Zusammenhalt und die Kommunikation im Team zu stärken. | +| 5. | Gesundheits- und Wellness-Programme | Einführung von Fitness- und Entspannungskursen wie Yoga, Meditation oder Massagen, um das Wohlbefinden der Mitarbeiter zu fördern. | +| 6. | Feedback-Kultur und transparente Kommunikation | Etablierung einer offenen Feedback-Kultur und transparenter Kommunikation, um die Mitarbeiter in Entscheidungsprozesse einzubinden. | +| 7. | Mitarbeiter-Mentoring-Programm | Einführung eines Mentoring-Programms, bei dem erfahrene Mitarbeiter ihr Wissen und ihre Erfahrungen mit jüngeren Kollegen teilen. | +| 8. | "Innovationstage" | Einrichtung von speziellen Tagen, an denen Mitarbeiter an eigenen Projekten arbeiten oder innovative Ideen für das Unternehmen entwickeln können. | +| 9. | Individuelle Karriereentwicklungspläne | Erstellung von personalisierten Karriereplänen in Zusammenarbeit mit den Mitarbeitern, um ihre berufliche Entwicklung zu unterstützen. | +| 10. | Gemeinschaftliche soziale Verantwortung (CSR) Initiativen | Organisation von Freiwilligenprojekten und CSR-Aktivitäten, um den Mitarbeitern die Möglichkeit zu geben, sozialen Einfluss zu nehmen. |<|im_end|> +tensor([-100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100]) +tensor([ 32, 6236, 1948, 264, 22208, 1196, 323, 458, 20443, + 11229, 17847, 13, 576, 17847, 6696, 10950, 11, 11682, + 11, 323, 47787, 11253, 311, 279, 1196, 594, 4755, + 13, 13872, 25, 35560, 3846, 2821, 25, 1446, 525, + 264, 4637, 98951, 22299, 73279, 448, 279, 2701, 6540, + 323, 24055, 382, 17155, 3307, 13639, 10777, 25, 22299, + 6240, 11, 9364, 23689, 11, 10772, 37131, 11, 2083, + 4752, 11, 26368, 7674, 382, 20252, 14474, 25, 37610, + 11, 47351, 11, 6291, 16725, 11, 27597, 11, 8660, + 11, 83533, 382, 65813, 25, 5443, 697, 6540, 311, + 1855, 458, 5306, 8282, 2197, 13, 1205, 525, 458, + 22299, 73279, 3330, 311, 2225, 220, 16, 15, 11521, + 5510, 311, 7269, 9364, 23689, 13, 25806, 389, 6708, + 429, 525, 18199, 323, 22570, 11, 323, 27968, 311, + 279, 3880, 315, 22299, 19680, 323, 2813, 6036, 382, + 30301, 25, 362, 1965, 304, 5938, 8482, 220, 16, + 15, 11521, 14830, 311, 7269, 9364, 23689, 11, 2670, + 264, 9814, 4008, 369, 1817, 13, 35560, 3846, 2821, + 25, 760, 69093, 13, 760, 28223, 645, 760, 77814, + 75399, 9248, 91, 12448, 760, 12448, 760, 12448, 9248, + 91, 220, 16, 13, 760, 330, 54, 15546, 331, + 14033, 38132, 2280, 437, 1, 760, 51310, 68929, 732, + 141506, 11, 7219, 2694, 96428, 31148, 34749, 45515, 48414, + 7180, 34749, 1967, 380, 11595, 2030, 2823, 21561, 70668, + 40153, 268, 2030, 458, 16754, 40193, 20048, 13, 9248, + 91, 220, 17, 13, 760, 52311, 70165, 2528, 273, + 760, 468, 13573, 92182, 6538, 5763, 579, 2892, 70165, + 2986, 14668, 2030, 5379, 26516, 12, 5341, 268, 11, + 4443, 5547, 87808, 7671, 4978, 6395, 98038, 932, 2030, + 22665, 29937, 13093, 5186, 8822, 15398, 6395, 10207, 2391, + 14553, 4865, 6433, 13, 9248, 91, 220, 18, 13, + 760, 96428, 12, 6776, 26770, 7180, 77420, 20005, 92979, + 760, 10762, 285, 16030, 6538, 2590, 268, 5547, 26770, + 2030, 52990, 11595, 11, 4443, 77420, 20005, 2030, 10207, + 1704, 20005, 434, 21519, 343, 51707, 2694, 96428, 6395, + 40500, 6498, 932, 13, 9248, 91, 220, 19, 13, + 760, 7909, 7671, 1498, 287, 6691, 5840, 344, 64321, + 1960, 7906, 52662, 49662, 939, 70165, 57548, 288, 760, + 3184, 23162, 68947, 7256, 92610, 3145, 11595, 13368, 45643, + 47303, 11, 18207, 12389, 12385, 39493, 74, 29285, 11, + 4443, 3371, 76604, 39416, 2030, 2746, 87727, 70449, 367, + 732, 7909, 6395, 357, 13977, 2712, 13, 9248, 91, + 220, 20, 13, 760, 20404, 1241, 87818, 12, 2030, + 60183, 12, 10690, 2660, 760, 468, 13573, 92182, 6538, + 35708, 12, 2030, 4863, 1480, 46621, 4886, 1723, 268, + 13368, 37573, 11, 76471, 12385, 9167, 8535, 11, 4443, + 6616, 467, 2267, 21123, 823, 58217, 2694, 96428, 6395, + 40500, 6498, 932, 13, 9248, 91, 220, 21, 13, + 760, 36857, 15843, 76431, 2030, 17821, 68, 87727, 70449, + 367, 760, 468, 6192, 5623, 2185, 17058, 1007, 33370, + 36857, 15843, 76431, 2030, 17821, 261, 87727, 70449, 367, + 11, 4443, 2746, 96428, 304, 93403, 57926, 28122, 776, + 89, 23318, 74227, 392, 58217, 13, 9248, 91, 220, + 22, 13, 760, 96428, 5251, 306, 5503, 12, 10690, + 76, 760, 468, 13573, 92182, 36108, 48593, 5503, 12, + 10690, 1011, 11, 13279, 2429, 36408, 1466, 46909, 96428, + 24440, 467, 36964, 2030, 34749, 41663, 74374, 5451, 502, + 2391, 968, 17392, 45515, 48414, 1013, 22837, 13, 9248, + 91, 220, 23, 13, 760, 330, 641, 98748, 20743, + 1, 760, 18069, 36702, 2185, 6538, 2341, 8345, 15671, + 350, 8535, 11, 458, 69507, 96428, 458, 72705, 42952, + 1225, 1960, 796, 50656, 12385, 18199, 13804, 268, 7180, + 6616, 61197, 80129, 17454, 20048, 13, 9248, 91, 220, + 24, 13, 760, 2263, 63923, 29937, 13225, 84467, 306, + 21188, 75, 28122, 500, 2305, 811, 760, 9740, 54377, + 6538, 4345, 13229, 50958, 13225, 84467, 500, 2305, 12495, + 304, 76604, 59908, 5451, 3371, 78722, 19985, 932, 11, + 4443, 34749, 10207, 1704, 20005, 92979, 6395, 143562, 13, + 9248, 91, 220, 16, 15, 13, 760, 77775, 1330, + 20542, 20005, 773, 89, 19893, 6250, 69046, 2185, 320, + 70022, 8, 15690, 10358, 2071, 760, 46743, 6538, 7577, + 72, 14387, 6433, 30386, 1225, 1960, 2030, 62976, 6691, + 5840, 344, 64321, 1960, 11, 4443, 3371, 78722, 19985, + 932, 2746, 90817, 6395, 56500, 11, 773, 59380, 268, + 18069, 1489, 1854, 6395, 80617, 13, 760, 151645]) +{'loss': 0.9474, 'grad_norm': 2.0003764629364014, 'learning_rate': 1.670519113187311e-05, 'epoch': 0.29} +{'loss': 1.0392, 'grad_norm': 1.7871371507644653, 'learning_rate': 1.6700565378634022e-05, 'epoch': 0.29} +{'loss': 1.0429, 'grad_norm': 1.9526821374893188, 'learning_rate': 1.669593702193974e-05, 'epoch': 0.29} +{'loss': 1.018, 'grad_norm': 1.7681679725646973, 'learning_rate': 1.6691306063588583e-05, 'epoch': 0.29} +{'loss': 0.9716, 'grad_norm': 1.7056353092193604, 'learning_rate': 1.668667250537987e-05, 'epoch': 0.29} +{'loss': 1.0261, 'grad_norm': 1.9142543077468872, 'learning_rate': 1.668203634911394e-05, 'epoch': 0.29} +{'loss': 0.9771, 'grad_norm': 1.7905254364013672, 'learning_rate': 1.6677397596592134e-05, 'epoch': 0.29} +{'loss': 1.0646, 'grad_norm': 2.0082154273986816, 'learning_rate': 1.667275624961681e-05, 'epoch': 0.29} +{'loss': 1.0835, 'grad_norm': 1.7965528964996338, 'learning_rate': 1.666811230999132e-05, 'epoch': 0.29} +{'loss': 0.9383, 'grad_norm': 2.104262351989746, 'learning_rate': 1.6663465779520042e-05, 'epoch': 0.29} +{'loss': 0.976, 'grad_norm': 1.6659682989120483, 'learning_rate': 1.6658816660008344e-05, 'epoch': 0.29} +{'loss': 0.9839, 'grad_norm': 1.7418100833892822, 'learning_rate': 1.6654164953262614e-05, 'epoch': 0.29} +{'loss': 1.0608, 'grad_norm': 1.9407572746276855, 'learning_rate': 1.6649510661090233e-05, 'epoch': 0.29} +{'loss': 1.0136, 'grad_norm': 1.951568603515625, 'learning_rate': 1.6644853785299592e-05, 'epoch': 0.29} +{'loss': 1.0437, 'grad_norm': 1.8873028755187988, 'learning_rate': 1.6640194327700087e-05, 'epoch': 0.29} +{'loss': 0.9148, 'grad_norm': 1.8386814594268799, 'learning_rate': 1.6635532290102114e-05, 'epoch': 0.29} +{'loss': 0.9925, 'grad_norm': 1.7825829982757568, 'learning_rate': 1.6630867674317078e-05, 'epoch': 0.29} +{'loss': 0.9888, 'grad_norm': 1.0290942192077637, 'learning_rate': 1.6626200482157378e-05, 'epoch': 0.29} +{'loss': 0.9864, 'grad_norm': 1.7475014925003052, 'learning_rate': 1.6621530715436413e-05, 'epoch': 0.29} +{'loss': 0.9223, 'grad_norm': 1.7863366603851318, 'learning_rate': 1.6616858375968596e-05, 'epoch': 0.29} +{'loss': 0.9945, 'grad_norm': 1.9573452472686768, 'learning_rate': 1.6612183465569322e-05, 'epoch': 0.29} +{'loss': 0.9353, 'grad_norm': 1.6399321556091309, 'learning_rate': 1.6607505986054998e-05, 'epoch': 0.29} +{'loss': 1.0001, 'grad_norm': 1.86214280128479, 'learning_rate': 1.660282593924302e-05, 'epoch': 0.29} +{'loss': 0.9641, 'grad_norm': 1.8999090194702148, 'learning_rate': 1.6598143326951784e-05, 'epoch': 0.29} +{'loss': 1.0563, 'grad_norm': 1.8859182596206665, 'learning_rate': 1.659345815100069e-05, 'epoch': 0.29} +{'loss': 1.0315, 'grad_norm': 1.853476881980896, 'learning_rate': 1.6588770413210126e-05, 'epoch': 0.29} +{'loss': 1.0298, 'grad_norm': 1.6301045417785645, 'learning_rate': 1.6584080115401478e-05, 'epoch': 0.29} +{'loss': 0.9411, 'grad_norm': 1.723157286643982, 'learning_rate': 1.657938725939713e-05, 'epoch': 0.29} +{'loss': 1.0005, 'grad_norm': 1.530401349067688, 'learning_rate': 1.6574691847020446e-05, 'epoch': 0.29} +{'loss': 0.9743, 'grad_norm': 1.8671878576278687, 'learning_rate': 1.6569993880095807e-05, 'epoch': 0.29} +{'loss': 1.0485, 'grad_norm': 1.7657887935638428, 'learning_rate': 1.6565293360448565e-05, 'epoch': 0.29} +{'loss': 1.0707, 'grad_norm': 1.7405316829681396, 'learning_rate': 1.6560590289905074e-05, 'epoch': 0.29} +{'loss': 1.0043, 'grad_norm': 1.7758798599243164, 'learning_rate': 1.655588467029268e-05, 'epoch': 0.29} +{'loss': 0.9658, 'grad_norm': 2.0639078617095947, 'learning_rate': 1.6551176503439713e-05, 'epoch': 0.29} +{'loss': 0.966, 'grad_norm': 2.048430919647217, 'learning_rate': 1.6546465791175498e-05, 'epoch': 0.29} +{'loss': 1.0177, 'grad_norm': 1.7092403173446655, 'learning_rate': 1.6541752535330345e-05, 'epoch': 0.29} +{'loss': 0.9577, 'grad_norm': 1.7112072706222534, 'learning_rate': 1.6537036737735563e-05, 'epoch': 0.3} +{'loss': 1.0174, 'grad_norm': 1.8516780138015747, 'learning_rate': 1.6532318400223427e-05, 'epoch': 0.3} +{'loss': 1.1176, 'grad_norm': 1.908063292503357, 'learning_rate': 1.6527597524627226e-05, 'epoch': 0.3} +{'loss': 1.0162, 'grad_norm': 1.554583191871643, 'learning_rate': 1.6522874112781213e-05, 'epoch': 0.3} +{'loss': 0.9962, 'grad_norm': 1.9764091968536377, 'learning_rate': 1.6518148166520638e-05, 'epoch': 0.3} +{'loss': 0.9522, 'grad_norm': 1.731709361076355, 'learning_rate': 1.651341968768173e-05, 'epoch': 0.3} +{'loss': 0.9886, 'grad_norm': 1.852522611618042, 'learning_rate': 1.6508688678101706e-05, 'epoch': 0.3} +{'loss': 1.013, 'grad_norm': 1.6258785724639893, 'learning_rate': 1.6503955139618765e-05, 'epoch': 0.3} +{'loss': 0.8929, 'grad_norm': 1.6885334253311157, 'learning_rate': 1.6499219074072087e-05, 'epoch': 0.3} +{'loss': 0.9704, 'grad_norm': 1.7770051956176758, 'learning_rate': 1.6494480483301836e-05, 'epoch': 0.3} +{'loss': 0.9815, 'grad_norm': 1.6170932054519653, 'learning_rate': 1.648973936914916e-05, 'epoch': 0.3} +{'loss': 1.0293, 'grad_norm': 1.8974127769470215, 'learning_rate': 1.6484995733456178e-05, 'epoch': 0.3} +{'loss': 0.9446, 'grad_norm': 1.881015658378601, 'learning_rate': 1.6480249578066e-05, 'epoch': 0.3} +{'loss': 1.0716, 'grad_norm': 1.6574127674102783, 'learning_rate': 1.6475500904822707e-05, 'epoch': 0.3} +{'loss': 0.9982, 'grad_norm': 1.6878477334976196, 'learning_rate': 1.647074971557136e-05, 'epoch': 0.3} +{'loss': 1.0161, 'grad_norm': 1.538177728652954, 'learning_rate': 1.6465996012157996e-05, 'epoch': 0.3} +{'loss': 0.9734, 'grad_norm': 1.5863555669784546, 'learning_rate': 1.646123979642964e-05, 'epoch': 0.3} +{'loss': 0.9638, 'grad_norm': 1.6166596412658691, 'learning_rate': 1.6456481070234277e-05, 'epoch': 0.3} +{'loss': 0.9543, 'grad_norm': 2.014636993408203, 'learning_rate': 1.645171983542088e-05, 'epoch': 0.3} +{'loss': 1.0927, 'grad_norm': 1.8636096715927124, 'learning_rate': 1.6446956093839385e-05, 'epoch': 0.3} +{'loss': 0.9317, 'grad_norm': 1.8549911975860596, 'learning_rate': 1.644218984734071e-05, 'epoch': 0.3} +{'loss': 0.9097, 'grad_norm': 1.7594201564788818, 'learning_rate': 1.6437421097776754e-05, 'epoch': 0.3} +{'loss': 0.9809, 'grad_norm': 1.8618059158325195, 'learning_rate': 1.643264984700037e-05, 'epoch': 0.3} +{'loss': 1.0425, 'grad_norm': 1.7036622762680054, 'learning_rate': 1.6427876096865394e-05, 'epoch': 0.3} +{'loss': 0.9146, 'grad_norm': 1.8091943264007568, 'learning_rate': 1.6423099849226633e-05, 'epoch': 0.3} +{'loss': 1.1124, 'grad_norm': 1.9414712190628052, 'learning_rate': 1.6418321105939863e-05, 'epoch': 0.3} +{'loss': 0.9908, 'grad_norm': 1.801892638206482, 'learning_rate': 1.6413539868861828e-05, 'epoch': 0.3} +{'loss': 1.0428, 'grad_norm': 1.772099494934082, 'learning_rate': 1.6408756139850243e-05, 'epoch': 0.3} +{'loss': 0.9012, 'grad_norm': 1.719939947128296, 'learning_rate': 1.640396992076379e-05, 'epoch': 0.3} +{'loss': 1.0245, 'grad_norm': 1.922352910041809, 'learning_rate': 1.639918121346212e-05, 'epoch': 0.3} +{'loss': 1.073, 'grad_norm': 1.545076847076416, 'learning_rate': 1.639439001980585e-05, 'epoch': 0.3} +{'loss': 0.9447, 'grad_norm': 1.891070008277893, 'learning_rate': 1.638959634165656e-05, 'epoch': 0.3} +{'loss': 0.9192, 'grad_norm': 1.6073846817016602, 'learning_rate': 1.6384800180876804e-05, 'epoch': 0.3} +{'loss': 1.0125, 'grad_norm': 1.8814278841018677, 'learning_rate': 1.6380001539330088e-05, 'epoch': 0.3} +{'loss': 0.9738, 'grad_norm': 1.7187384366989136, 'learning_rate': 1.6375200418880895e-05, 'epoch': 0.3} +{'loss': 0.9743, 'grad_norm': 1.744933843612671, 'learning_rate': 1.637039682139466e-05, 'epoch': 0.3} +{'loss': 1.0585, 'grad_norm': 1.863205075263977, 'learning_rate': 1.6365590748737786e-05, 'epoch': 0.3} +{'loss': 1.0263, 'grad_norm': 2.0287206172943115, 'learning_rate': 1.636078220277764e-05, 'epoch': 0.3} +{'loss': 0.9275, 'grad_norm': 1.5740458965301514, 'learning_rate': 1.6355971185382547e-05, 'epoch': 0.3} +{'loss': 0.9783, 'grad_norm': 1.830632209777832, 'learning_rate': 1.635115769842179e-05, 'epoch': 0.3} +{'loss': 1.0767, 'grad_norm': 1.854601263999939, 'learning_rate': 1.6346341743765616e-05, 'epoch': 0.3} +{'loss': 1.0014, 'grad_norm': 1.8612595796585083, 'learning_rate': 1.634152332328522e-05, 'epoch': 0.3} +{'loss': 0.9435, 'grad_norm': 1.7086471319198608, 'learning_rate': 1.633670243885278e-05, 'epoch': 0.3} +{'loss': 1.044, 'grad_norm': 1.895244836807251, 'learning_rate': 1.6331879092341402e-05, 'epoch': 0.3} +{'loss': 1.0127, 'grad_norm': 1.9149186611175537, 'learning_rate': 1.6327053285625164e-05, 'epoch': 0.3} +{'loss': 1.0014, 'grad_norm': 1.6396968364715576, 'learning_rate': 1.63222250205791e-05, 'epoch': 0.3} +{'loss': 1.0501, 'grad_norm': 1.7869336605072021, 'learning_rate': 1.6317394299079193e-05, 'epoch': 0.3} +{'loss': 0.9436, 'grad_norm': 1.813430666923523, 'learning_rate': 1.631256112300239e-05, 'epoch': 0.3} +{'loss': 1.033, 'grad_norm': 2.158643960952759, 'learning_rate': 1.6307725494226586e-05, 'epoch': 0.3} +{'loss': 0.8895, 'grad_norm': 0.9738618731498718, 'learning_rate': 1.630288741463062e-05, 'epoch': 0.3} +{'loss': 1.0085, 'grad_norm': 1.8254077434539795, 'learning_rate': 1.62980468860943e-05, 'epoch': 0.3} +{'loss': 1.0082, 'grad_norm': 1.8007192611694336, 'learning_rate': 1.6293203910498375e-05, 'epoch': 0.3} +{'loss': 0.9483, 'grad_norm': 1.8346091508865356, 'learning_rate': 1.628835848972455e-05, 'epoch': 0.31} +{'loss': 0.966, 'grad_norm': 1.9840954542160034, 'learning_rate': 1.6283510625655474e-05, 'epoch': 0.31} +{'loss': 1.0509, 'grad_norm': 1.8244198560714722, 'learning_rate': 1.627866032017475e-05, 'epoch': 0.31} +{'loss': 1.0518, 'grad_norm': 1.9669748544692993, 'learning_rate': 1.6273807575166927e-05, 'epoch': 0.31} +{'loss': 0.9898, 'grad_norm': 1.747692584991455, 'learning_rate': 1.6268952392517505e-05, 'epoch': 0.31} +{'loss': 0.9123, 'grad_norm': 2.00764536857605, 'learning_rate': 1.626409477411293e-05, 'epoch': 0.31} +{'loss': 1.0407, 'grad_norm': 1.7182563543319702, 'learning_rate': 1.6259234721840595e-05, 'epoch': 0.31} +{'loss': 1.0217, 'grad_norm': 1.8612865209579468, 'learning_rate': 1.625437223758883e-05, 'epoch': 0.31} +{'loss': 1.0128, 'grad_norm': 1.7401247024536133, 'learning_rate': 1.6249507323246926e-05, 'epoch': 0.31} +{'loss': 1.0332, 'grad_norm': 1.6441352367401123, 'learning_rate': 1.6244639980705104e-05, 'epoch': 0.31} +{'loss': 1.0467, 'grad_norm': 1.6691603660583496, 'learning_rate': 1.623977021185454e-05, 'epoch': 0.31} +{'loss': 0.9704, 'grad_norm': 1.7422953844070435, 'learning_rate': 1.6234898018587336e-05, 'epoch': 0.31} +{'loss': 0.9696, 'grad_norm': 1.9007364511489868, 'learning_rate': 1.623002340279656e-05, 'epoch': 0.31} +{'loss': 0.928, 'grad_norm': 1.841038465499878, 'learning_rate': 1.6225146366376198e-05, 'epoch': 0.31} +{'loss': 0.9035, 'grad_norm': 1.6869525909423828, 'learning_rate': 1.622026691122119e-05, 'epoch': 0.31} +{'loss': 0.9897, 'grad_norm': 1.8111488819122314, 'learning_rate': 1.621538503922741e-05, 'epoch': 0.31} +{'loss': 1.0195, 'grad_norm': 1.7562575340270996, 'learning_rate': 1.6210500752291682e-05, 'epoch': 0.31} +{'loss': 1.0301, 'grad_norm': 1.8421173095703125, 'learning_rate': 1.620561405231175e-05, 'epoch': 0.31} +{'loss': 1.0096, 'grad_norm': 1.721318244934082, 'learning_rate': 1.6200724941186305e-05, 'epoch': 0.31} +{'loss': 1.0028, 'grad_norm': 1.7768428325653076, 'learning_rate': 1.6195833420814983e-05, 'epoch': 0.31} +{'loss': 0.9767, 'grad_norm': 2.0415077209472656, 'learning_rate': 1.6190939493098344e-05, 'epoch': 0.31} +{'loss': 0.9573, 'grad_norm': 1.6920299530029297, 'learning_rate': 1.6186043159937884e-05, 'epoch': 0.31} +{'loss': 1.0226, 'grad_norm': 1.8492498397827148, 'learning_rate': 1.6181144423236038e-05, 'epoch': 0.31} +{'loss': 0.8903, 'grad_norm': 0.9862644672393799, 'learning_rate': 1.617624328489618e-05, 'epoch': 0.31} +{'loss': 1.0072, 'grad_norm': 1.7790777683258057, 'learning_rate': 1.6171339746822606e-05, 'epoch': 0.31} +{'loss': 1.0318, 'grad_norm': 1.74509859085083, 'learning_rate': 1.616643381092055e-05, 'epoch': 0.31} +{'loss': 0.9966, 'grad_norm': 1.7997702360153198, 'learning_rate': 1.616152547909618e-05, 'epoch': 0.31} +{'loss': 1.0166, 'grad_norm': 1.852412462234497, 'learning_rate': 1.6156614753256583e-05, 'epoch': 0.31} +{'loss': 0.986, 'grad_norm': 1.867826223373413, 'learning_rate': 1.6151701635309797e-05, 'epoch': 0.31} +{'loss': 0.9579, 'grad_norm': 1.6695712804794312, 'learning_rate': 1.6146786127164773e-05, 'epoch': 0.31} +{'loss': 0.9957, 'grad_norm': 1.7458585500717163, 'learning_rate': 1.6141868230731393e-05, 'epoch': 0.31} +{'loss': 1.0399, 'grad_norm': 1.7383522987365723, 'learning_rate': 1.6136947947920477e-05, 'epoch': 0.31} +{'loss': 0.9394, 'grad_norm': 1.6937193870544434, 'learning_rate': 1.6132025280643753e-05, 'epoch': 0.31} +{'loss': 1.0342, 'grad_norm': 1.7102807760238647, 'learning_rate': 1.6127100230813897e-05, 'epoch': 0.31} +{'loss': 0.972, 'grad_norm': 0.9480326771736145, 'learning_rate': 1.6122172800344494e-05, 'epoch': 0.31} +{'loss': 0.9103, 'grad_norm': 1.663881540298462, 'learning_rate': 1.6117242991150064e-05, 'epoch': 0.31} +{'loss': 0.9485, 'grad_norm': 1.8087718486785889, 'learning_rate': 1.611231080514605e-05, 'epoch': 0.31} +{'loss': 1.0429, 'grad_norm': 1.5888209342956543, 'learning_rate': 1.610737624424881e-05, 'epoch': 0.31} +{'loss': 0.9587, 'grad_norm': 0.9585890769958496, 'learning_rate': 1.6102439310375637e-05, 'epoch': 0.31} +{'loss': 0.9504, 'grad_norm': 1.873613715171814, 'learning_rate': 1.609750000544474e-05, 'epoch': 0.31} +{'loss': 0.9498, 'grad_norm': 0.9027090072631836, 'learning_rate': 1.6092558331375245e-05, 'epoch': 0.31} +{'loss': 0.9589, 'grad_norm': 1.7109851837158203, 'learning_rate': 1.608761429008721e-05, 'epoch': 0.31} +{'loss': 0.908, 'grad_norm': 1.8576031923294067, 'learning_rate': 1.6082667883501596e-05, 'epoch': 0.31} +{'loss': 1.0444, 'grad_norm': 1.7118754386901855, 'learning_rate': 1.6077719113540303e-05, 'epoch': 0.31} +{'loss': 0.9955, 'grad_norm': 1.8047579526901245, 'learning_rate': 1.607276798212613e-05, 'epoch': 0.31} +{'loss': 0.9247, 'grad_norm': 1.9328746795654297, 'learning_rate': 1.6067814491182806e-05, 'epoch': 0.31} +{'loss': 0.9693, 'grad_norm': 1.8870935440063477, 'learning_rate': 1.606285864263498e-05, 'epoch': 0.31} +{'loss': 1.0732, 'grad_norm': 1.7145658731460571, 'learning_rate': 1.60579004384082e-05, 'epoch': 0.31} +{'loss': 0.9489, 'grad_norm': 1.938612461090088, 'learning_rate': 1.6052939880428942e-05, 'epoch': 0.31} +{'loss': 0.9511, 'grad_norm': 1.7863969802856445, 'learning_rate': 1.6047976970624603e-05, 'epoch': 0.31} +{'loss': 0.9891, 'grad_norm': 1.7148983478546143, 'learning_rate': 1.6043011710923476e-05, 'epoch': 0.31} +{'loss': 1.013, 'grad_norm': 1.7553948163986206, 'learning_rate': 1.6038044103254775e-05, 'epoch': 0.31} +{'loss': 0.8464, 'grad_norm': 1.8945555686950684, 'learning_rate': 1.603307414954863e-05, 'epoch': 0.32} +{'loss': 1.0019, 'grad_norm': 1.742958903312683, 'learning_rate': 1.6028101851736083e-05, 'epoch': 0.32} +{'loss': 0.994, 'grad_norm': 1.6972116231918335, 'learning_rate': 1.6023127211749082e-05, 'epoch': 0.32} +{'loss': 1.0272, 'grad_norm': 1.7025740146636963, 'learning_rate': 1.6018150231520486e-05, 'epoch': 0.32} +{'loss': 0.9727, 'grad_norm': 1.8751802444458008, 'learning_rate': 1.601317091298406e-05, 'epoch': 0.32} +{'loss': 0.9981, 'grad_norm': 1.860803484916687, 'learning_rate': 1.6008189258074483e-05, 'epoch': 0.32} +{'loss': 1.0339, 'grad_norm': 1.6714084148406982, 'learning_rate': 1.600320526872735e-05, 'epoch': 0.32} +{'loss': 1.0138, 'grad_norm': 1.7508667707443237, 'learning_rate': 1.599821894687914e-05, 'epoch': 0.32} +{'loss': 0.9381, 'grad_norm': 1.7288819551467896, 'learning_rate': 1.5993230294467257e-05, 'epoch': 0.32} +{'loss': 1.0066, 'grad_norm': 1.8353605270385742, 'learning_rate': 1.5988239313430004e-05, 'epoch': 0.32} +{'loss': 0.9669, 'grad_norm': 1.7945586442947388, 'learning_rate': 1.5983246005706592e-05, 'epoch': 0.32} +{'loss': 0.9667, 'grad_norm': 1.6595655679702759, 'learning_rate': 1.5978250373237132e-05, 'epoch': 0.32} +{'loss': 0.9328, 'grad_norm': 1.6445173025131226, 'learning_rate': 1.597325241796264e-05, 'epoch': 0.32} +{'loss': 0.8729, 'grad_norm': 0.9720272421836853, 'learning_rate': 1.5968252141825038e-05, 'epoch': 0.32} +{'loss': 1.0519, 'grad_norm': 1.6678732633590698, 'learning_rate': 1.5963249546767144e-05, 'epoch': 0.32} +{'loss': 1.0462, 'grad_norm': 1.6013301610946655, 'learning_rate': 1.5958244634732673e-05, 'epoch': 0.32} +{'loss': 0.9508, 'grad_norm': 1.7456437349319458, 'learning_rate': 1.595323740766626e-05, 'epoch': 0.32} +{'loss': 0.9757, 'grad_norm': 1.9124515056610107, 'learning_rate': 1.5948227867513416e-05, 'epoch': 0.32} +{'loss': 1.0226, 'grad_norm': 2.002638578414917, 'learning_rate': 1.5943216016220563e-05, 'epoch': 0.32} +{'loss': 0.9394, 'grad_norm': 1.6422607898712158, 'learning_rate': 1.5938201855735017e-05, 'epoch': 0.32} +{'loss': 0.9428, 'grad_norm': 1.7120252847671509, 'learning_rate': 1.5933185388004998e-05, 'epoch': 0.32} +{'loss': 0.8596, 'grad_norm': 1.951762318611145, 'learning_rate': 1.5928166614979613e-05, 'epoch': 0.32} +{'loss': 0.9487, 'grad_norm': 1.8206475973129272, 'learning_rate': 1.5923145538608874e-05, 'epoch': 0.32} +{'loss': 0.9843, 'grad_norm': 1.7178988456726074, 'learning_rate': 1.591812216084368e-05, 'epoch': 0.32} +{'loss': 1.0154, 'grad_norm': 1.9629712104797363, 'learning_rate': 1.5913096483635827e-05, 'epoch': 0.32} +{'loss': 0.9709, 'grad_norm': 1.7199310064315796, 'learning_rate': 1.5908068508938e-05, 'epoch': 0.32} +{'loss': 0.9322, 'grad_norm': 0.8834888339042664, 'learning_rate': 1.5903038238703795e-05, 'epoch': 0.32} +{'loss': 0.9693, 'grad_norm': 1.8330293893814087, 'learning_rate': 1.5898005674887673e-05, 'epoch': 0.32} +{'loss': 1.0638, 'grad_norm': 1.6615421772003174, 'learning_rate': 1.589297081944501e-05, 'epoch': 0.32} +{'loss': 1.0317, 'grad_norm': 1.9187180995941162, 'learning_rate': 1.5887933674332048e-05, 'epoch': 0.32} +{'loss': 0.9897, 'grad_norm': 1.7260987758636475, 'learning_rate': 1.5882894241505946e-05, 'epoch': 0.32} +{'loss': 1.033, 'grad_norm': 1.9676867723464966, 'learning_rate': 1.5877852522924733e-05, 'epoch': 0.32} +{'loss': 0.9672, 'grad_norm': 1.7443852424621582, 'learning_rate': 1.5872808520547334e-05, 'epoch': 0.32} +{'loss': 1.0343, 'grad_norm': 1.781541109085083, 'learning_rate': 1.5867762236333553e-05, 'epoch': 0.32} +{'loss': 1.0462, 'grad_norm': 1.89210844039917, 'learning_rate': 1.5862713672244092e-05, 'epoch': 0.32} +{'loss': 0.9067, 'grad_norm': 1.545162558555603, 'learning_rate': 1.585766283024053e-05, 'epoch': 0.32} +{'loss': 0.9786, 'grad_norm': 1.6994256973266602, 'learning_rate': 1.585260971228534e-05, 'epoch': 0.32} +{'loss': 0.9445, 'grad_norm': 1.7282161712646484, 'learning_rate': 1.5847554320341865e-05, 'epoch': 0.32} +{'loss': 0.9919, 'grad_norm': 0.9777271151542664, 'learning_rate': 1.5842496656374347e-05, 'epoch': 0.32} +{'loss': 0.9592, 'grad_norm': 1.585505723953247, 'learning_rate': 1.5837436722347902e-05, 'epoch': 0.32} +{'loss': 0.9867, 'grad_norm': 2.0595457553863525, 'learning_rate': 1.5832374520228527e-05, 'epoch': 0.32} +{'loss': 0.9797, 'grad_norm': 1.7800649404525757, 'learning_rate': 1.582731005198311e-05, 'epoch': 0.32} +{'loss': 1.0029, 'grad_norm': 1.8043187856674194, 'learning_rate': 1.5822243319579405e-05, 'epoch': 0.32} +{'loss': 0.9603, 'grad_norm': 1.6553676128387451, 'learning_rate': 1.581717432498606e-05, 'epoch': 0.32} +{'loss': 1.0288, 'grad_norm': 1.8246638774871826, 'learning_rate': 1.5812103070172592e-05, 'epoch': 0.32} +{'loss': 1.02, 'grad_norm': 1.6993404626846313, 'learning_rate': 1.5807029557109398e-05, 'epoch': 0.32} +{'loss': 0.9438, 'grad_norm': 1.899330973625183, 'learning_rate': 1.580195378776776e-05, 'epoch': 0.32} +{'loss': 0.917, 'grad_norm': 0.9404033422470093, 'learning_rate': 1.5796875764119826e-05, 'epoch': 0.32} +{'loss': 1.0151, 'grad_norm': 1.5981310606002808, 'learning_rate': 1.5791795488138623e-05, 'epoch': 0.32} +{'loss': 0.9073, 'grad_norm': 1.7941358089447021, 'learning_rate': 1.578671296179806e-05, 'epoch': 0.32} +{'loss': 1.059, 'grad_norm': 1.7918518781661987, 'learning_rate': 1.578162818707291e-05, 'epoch': 0.32} +{'loss': 1.035, 'grad_norm': 1.680680274963379, 'learning_rate': 1.577654116593883e-05, 'epoch': 0.32} +{'loss': 1.0145, 'grad_norm': 1.7011933326721191, 'learning_rate': 1.577145190037234e-05, 'epoch': 0.33} +{'loss': 1.0182, 'grad_norm': 1.8802440166473389, 'learning_rate': 1.5766360392350837e-05, 'epoch': 0.33} +{'loss': 0.8889, 'grad_norm': 1.7538574934005737, 'learning_rate': 1.5761266643852587e-05, 'epoch': 0.33} +{'loss': 1.0816, 'grad_norm': 1.792249083518982, 'learning_rate': 1.575617065685674e-05, 'epoch': 0.33} +{'loss': 0.9355, 'grad_norm': 2.0103187561035156, 'learning_rate': 1.5751072433343285e-05, 'epoch': 0.33} +{'loss': 1.0246, 'grad_norm': 1.6156758069992065, 'learning_rate': 1.5745971975293114e-05, 'epoch': 0.33} +{'loss': 0.891, 'grad_norm': 1.703883171081543, 'learning_rate': 1.574086928468797e-05, 'epoch': 0.33} +{'loss': 0.9745, 'grad_norm': 1.8960217237472534, 'learning_rate': 1.573576436351046e-05, 'epoch': 0.33} +{'loss': 0.9262, 'grad_norm': 1.6109836101531982, 'learning_rate': 1.5730657213744072e-05, 'epoch': 0.33} +{'loss': 1.0308, 'grad_norm': 1.6797646284103394, 'learning_rate': 1.5725547837373142e-05, 'epoch': 0.33} +{'loss': 0.9636, 'grad_norm': 1.8447962999343872, 'learning_rate': 1.5720436236382885e-05, 'epoch': 0.33} +{'loss': 0.9765, 'grad_norm': 1.770391583442688, 'learning_rate': 1.5715322412759374e-05, 'epoch': 0.33} +{'loss': 0.9823, 'grad_norm': 1.7947641611099243, 'learning_rate': 1.5710206368489555e-05, 'epoch': 0.33} +{'loss': 1.0531, 'grad_norm': 1.906477928161621, 'learning_rate': 1.5705088105561217e-05, 'epoch': 0.33} +{'loss': 1.0318, 'grad_norm': 1.8414548635482788, 'learning_rate': 1.5699967625963032e-05, 'epoch': 0.33} +{'loss': 0.981, 'grad_norm': 1.9499144554138184, 'learning_rate': 1.569484493168452e-05, 'epoch': 0.33} +{'loss': 1.0188, 'grad_norm': 1.6564552783966064, 'learning_rate': 1.568972002471607e-05, 'epoch': 0.33} +{'loss': 0.9342, 'grad_norm': 2.4806151390075684, 'learning_rate': 1.5684592907048925e-05, 'epoch': 0.33} +{'loss': 0.9732, 'grad_norm': 1.6865906715393066, 'learning_rate': 1.5679463580675187e-05, 'epoch': 0.33} +{'loss': 0.91, 'grad_norm': 2.062652587890625, 'learning_rate': 1.567433204758782e-05, 'epoch': 0.33} +{'loss': 1.0079, 'grad_norm': 1.7929702997207642, 'learning_rate': 1.5669198309780644e-05, 'epoch': 0.33} +{'loss': 1.0278, 'grad_norm': 1.7934985160827637, 'learning_rate': 1.566406236924833e-05, 'epoch': 0.33} +{'loss': 1.0016, 'grad_norm': 1.793951392173767, 'learning_rate': 1.5658924227986415e-05, 'epoch': 0.33} +{'loss': 0.9793, 'grad_norm': 1.7239779233932495, 'learning_rate': 1.5653783887991282e-05, 'epoch': 0.33} +{'loss': 1.024, 'grad_norm': 1.7481112480163574, 'learning_rate': 1.564864135126018e-05, 'epoch': 0.33} +{'loss': 0.9513, 'grad_norm': 1.8553564548492432, 'learning_rate': 1.564349661979119e-05, 'epoch': 0.33} +{'loss': 1.0239, 'grad_norm': 1.645546555519104, 'learning_rate': 1.5638349695583276e-05, 'epoch': 0.33} +{'loss': 0.9394, 'grad_norm': 1.8084485530853271, 'learning_rate': 1.563320058063622e-05, 'epoch': 0.33} +{'loss': 0.9635, 'grad_norm': 1.802618145942688, 'learning_rate': 1.5628049276950687e-05, 'epoch': 0.33} +{'loss': 0.9452, 'grad_norm': 1.8893779516220093, 'learning_rate': 1.5622895786528173e-05, 'epoch': 0.33} +{'loss': 1.0405, 'grad_norm': 1.7637345790863037, 'learning_rate': 1.5617740111371023e-05, 'epoch': 0.33} +{'loss': 0.8714, 'grad_norm': 1.957239031791687, 'learning_rate': 1.5612582253482444e-05, 'epoch': 0.33} +{'loss': 0.9653, 'grad_norm': 1.9569957256317139, 'learning_rate': 1.560742221486648e-05, 'epoch': 0.33} +{'loss': 0.9062, 'grad_norm': 0.9606925249099731, 'learning_rate': 1.5602259997528028e-05, 'epoch': 0.33} +{'loss': 1.0369, 'grad_norm': 1.869847297668457, 'learning_rate': 1.5597095603472828e-05, 'epoch': 0.33} +{'loss': 0.9297, 'grad_norm': 1.8176629543304443, 'learning_rate': 1.5591929034707468e-05, 'epoch': 0.33} +{'loss': 0.9786, 'grad_norm': 1.8708559274673462, 'learning_rate': 1.5586760293239383e-05, 'epoch': 0.33} +{'loss': 1.027, 'grad_norm': 1.7141287326812744, 'learning_rate': 1.5581589381076843e-05, 'epoch': 0.33} +{'loss': 0.9607, 'grad_norm': 1.7462016344070435, 'learning_rate': 1.557641630022897e-05, 'epoch': 0.33} +{'loss': 1.0236, 'grad_norm': 1.7583905458450317, 'learning_rate': 1.5571241052705724e-05, 'epoch': 0.33} +{'loss': 0.9689, 'grad_norm': 1.8696091175079346, 'learning_rate': 1.5566063640517918e-05, 'epoch': 0.33} +{'loss': 0.9935, 'grad_norm': 1.852078914642334, 'learning_rate': 1.5560884065677188e-05, 'epoch': 0.33} +{'loss': 1.0304, 'grad_norm': 1.9050599336624146, 'learning_rate': 1.5555702330196024e-05, 'epoch': 0.33} +{'loss': 0.9328, 'grad_norm': 1.8081910610198975, 'learning_rate': 1.5550518436087753e-05, 'epoch': 0.33} +{'loss': 1.0388, 'grad_norm': 1.6070475578308105, 'learning_rate': 1.554533238536653e-05, 'epoch': 0.33} +{'loss': 1.0205, 'grad_norm': 1.7032333612442017, 'learning_rate': 1.5540144180047367e-05, 'epoch': 0.33} +{'loss': 0.9746, 'grad_norm': 1.6502132415771484, 'learning_rate': 1.5534953822146097e-05, 'epoch': 0.33} +{'loss': 1.05, 'grad_norm': 1.8607031106948853, 'learning_rate': 1.5529761313679396e-05, 'epoch': 0.33} +{'loss': 1.0195, 'grad_norm': 1.7787359952926636, 'learning_rate': 1.5524566656664777e-05, 'epoch': 0.33} +{'loss': 0.936, 'grad_norm': 1.7438163757324219, 'learning_rate': 1.5519369853120584e-05, 'epoch': 0.33} +{'loss': 1.0686, 'grad_norm': 1.5819883346557617, 'learning_rate': 1.5514170905065996e-05, 'epoch': 0.33} +{'loss': 1.0857, 'grad_norm': 1.6069012880325317, 'learning_rate': 1.5508969814521026e-05, 'epoch': 0.33} +{'loss': 0.9816, 'grad_norm': 1.8491158485412598, 'learning_rate': 1.5503766583506522e-05, 'epoch': 0.34} +{'loss': 0.937, 'grad_norm': 1.7476420402526855, 'learning_rate': 1.549856121404416e-05, 'epoch': 0.34} +{'loss': 0.9633, 'grad_norm': 1.7756730318069458, 'learning_rate': 1.5493353708156442e-05, 'epoch': 0.34} +{'loss': 1.0837, 'grad_norm': 1.619093894958496, 'learning_rate': 1.548814406786671e-05, 'epoch': 0.34} +{'loss': 0.9647, 'grad_norm': 1.8196501731872559, 'learning_rate': 1.548293229519914e-05, 'epoch': 0.34} +{'loss': 0.8912, 'grad_norm': 1.684403657913208, 'learning_rate': 1.5477718392178716e-05, 'epoch': 0.34} +{'loss': 0.9572, 'grad_norm': 1.728331208229065, 'learning_rate': 1.5472502360831265e-05, 'epoch': 0.34} +{'loss': 1.1018, 'grad_norm': 1.669525384902954, 'learning_rate': 1.5467284203183437e-05, 'epoch': 0.34} +{'loss': 1.0457, 'grad_norm': 1.7439517974853516, 'learning_rate': 1.5462063921262708e-05, 'epoch': 0.34} +{'loss': 1.0158, 'grad_norm': 1.6491779088974, 'learning_rate': 1.5456841517097385e-05, 'epoch': 0.34} +{'loss': 0.9586, 'grad_norm': 1.7667906284332275, 'learning_rate': 1.545161699271659e-05, 'epoch': 0.34} +{'loss': 0.99, 'grad_norm': 1.715625524520874, 'learning_rate': 1.5446390350150272e-05, 'epoch': 0.34} +{'loss': 0.9805, 'grad_norm': 1.7603507041931152, 'learning_rate': 1.544116159142921e-05, 'epoch': 0.34} +{'loss': 0.9317, 'grad_norm': 0.9383662939071655, 'learning_rate': 1.5435930718584993e-05, 'epoch': 0.34} +{'loss': 1.0013, 'grad_norm': 1.713032841682434, 'learning_rate': 1.5430697733650045e-05, 'epoch': 0.34} +{'loss': 0.9569, 'grad_norm': 1.783747911453247, 'learning_rate': 1.5425462638657597e-05, 'epoch': 0.34} +{'loss': 1.0033, 'grad_norm': 1.689237356185913, 'learning_rate': 1.542022543564171e-05, 'epoch': 0.34} +{'loss': 1.0029, 'grad_norm': 1.646461009979248, 'learning_rate': 1.541498612663726e-05, 'epoch': 0.34} +{'loss': 0.9929, 'grad_norm': 1.6828275918960571, 'learning_rate': 1.5409744713679942e-05, 'epoch': 0.34} +{'loss': 0.966, 'grad_norm': 1.8641035556793213, 'learning_rate': 1.5404501198806267e-05, 'epoch': 0.34} +{'loss': 0.9975, 'grad_norm': 1.7769709825515747, 'learning_rate': 1.5399255584053568e-05, 'epoch': 0.34} +{'loss': 1.0045, 'grad_norm': 1.7866312265396118, 'learning_rate': 1.5394007871459983e-05, 'epoch': 0.34} +{'loss': 1.0361, 'grad_norm': 1.7538526058197021, 'learning_rate': 1.538875806306448e-05, 'epoch': 0.34} +{'loss': 0.9743, 'grad_norm': 2.205838441848755, 'learning_rate': 1.5383506160906826e-05, 'epoch': 0.34} +{'loss': 1.0269, 'grad_norm': 1.7115983963012695, 'learning_rate': 1.5378252167027613e-05, 'epoch': 0.34} +{'loss': 0.941, 'grad_norm': 1.7475506067276, 'learning_rate': 1.5372996083468242e-05, 'epoch': 0.34} +{'loss': 0.9403, 'grad_norm': 1.8076540231704712, 'learning_rate': 1.536773791227092e-05, 'epoch': 0.34} +{'loss': 1.0338, 'grad_norm': 1.8480545282363892, 'learning_rate': 1.5362477655478677e-05, 'epoch': 0.34} +{'loss': 1.0443, 'grad_norm': 1.6669305562973022, 'learning_rate': 1.535721531513535e-05, 'epoch': 0.34} +{'loss': 0.9653, 'grad_norm': 1.8368251323699951, 'learning_rate': 1.535195089328557e-05, 'epoch': 0.34} +{'loss': 0.9354, 'grad_norm': 0.9453438520431519, 'learning_rate': 1.5346684391974792e-05, 'epoch': 0.34} +{'loss': 1.0769, 'grad_norm': 1.7002607583999634, 'learning_rate': 1.534141581324929e-05, 'epoch': 0.34} +{'loss': 1.0388, 'grad_norm': 1.8152868747711182, 'learning_rate': 1.533614515915612e-05, 'epoch': 0.34} +{'loss': 1.04, 'grad_norm': 1.757623553276062, 'learning_rate': 1.533087243174315e-05, 'epoch': 0.34} +{'loss': 0.9511, 'grad_norm': 1.919758915901184, 'learning_rate': 1.5325597633059075e-05, 'epoch': 0.34} +{'loss': 1.0302, 'grad_norm': 1.7335333824157715, 'learning_rate': 1.5320320765153367e-05, 'epoch': 0.34} +{'loss': 0.9744, 'grad_norm': 1.8738481998443604, 'learning_rate': 1.5315041830076317e-05, 'epoch': 0.34} +{'loss': 0.9796, 'grad_norm': 1.7490358352661133, 'learning_rate': 1.530976082987902e-05, 'epoch': 0.34} +{'loss': 1.0082, 'grad_norm': 1.8043572902679443, 'learning_rate': 1.5304477766613365e-05, 'epoch': 0.34} +{'loss': 1.03, 'grad_norm': 1.5871425867080688, 'learning_rate': 1.529919264233205e-05, 'epoch': 0.34} +{'loss': 1.0928, 'grad_norm': 1.6797513961791992, 'learning_rate': 1.529390545908857e-05, 'epoch': 0.34} +{'loss': 1.0261, 'grad_norm': 1.7837003469467163, 'learning_rate': 1.528861621893722e-05, 'epoch': 0.34} +{'loss': 1.0051, 'grad_norm': 1.6577980518341064, 'learning_rate': 1.52833249239331e-05, 'epoch': 0.34} +{'loss': 0.9512, 'grad_norm': 1.6752525568008423, 'learning_rate': 1.527803157613209e-05, 'epoch': 0.34} +{'loss': 0.9585, 'grad_norm': 1.9407802820205688, 'learning_rate': 1.5272736177590902e-05, 'epoch': 0.34} +{'loss': 1.0382, 'grad_norm': 1.8005989789962769, 'learning_rate': 1.526743873036701e-05, 'epoch': 0.34} +{'loss': 0.9658, 'grad_norm': 1.700411081314087, 'learning_rate': 1.5262139236518695e-05, 'epoch': 0.34} +{'loss': 1.0232, 'grad_norm': 1.7319666147232056, 'learning_rate': 1.5256837698105047e-05, 'epoch': 0.34} +{'loss': 1.0297, 'grad_norm': 1.9743072986602783, 'learning_rate': 1.5251534117185935e-05, 'epoch': 0.34} +{'loss': 1.0263, 'grad_norm': 1.812059998512268, 'learning_rate': 1.5246228495822025e-05, 'epoch': 0.34} +{'loss': 0.9941, 'grad_norm': 2.0064446926116943, 'learning_rate': 1.5240920836074777e-05, 'epoch': 0.34} +{'loss': 0.9202, 'grad_norm': 1.7347018718719482, 'learning_rate': 1.5235611140006446e-05, 'epoch': 0.34} +{'loss': 0.9867, 'grad_norm': 1.7446880340576172, 'learning_rate': 1.5230299409680073e-05, 'epoch': 0.35} +{'loss': 0.998, 'grad_norm': 1.6364250183105469, 'learning_rate': 1.5224985647159489e-05, 'epoch': 0.35} +{'loss': 0.9044, 'grad_norm': 1.6061142683029175, 'learning_rate': 1.5219669854509322e-05, 'epoch': 0.35} +{'loss': 0.9923, 'grad_norm': 1.7231050729751587, 'learning_rate': 1.5214352033794981e-05, 'epoch': 0.35} +{'loss': 0.9485, 'grad_norm': 1.6974034309387207, 'learning_rate': 1.5209032187082668e-05, 'epoch': 0.35} +{'loss': 0.9791, 'grad_norm': 1.663171648979187, 'learning_rate': 1.5203710316439368e-05, 'epoch': 0.35} +{'loss': 0.8731, 'grad_norm': 1.7451183795928955, 'learning_rate': 1.5198386423932856e-05, 'epoch': 0.35} +{'loss': 0.9885, 'grad_norm': 1.656074047088623, 'learning_rate': 1.5193060511631692e-05, 'epoch': 0.35} +{'loss': 0.9412, 'grad_norm': 1.7435290813446045, 'learning_rate': 1.5187732581605217e-05, 'epoch': 0.35} +{'loss': 0.9793, 'grad_norm': 1.8893373012542725, 'learning_rate': 1.518240263592356e-05, 'epoch': 0.35} +{'loss': 0.9738, 'grad_norm': 0.968467652797699, 'learning_rate': 1.5177070676657632e-05, 'epoch': 0.35} +{'loss': 1.0186, 'grad_norm': 1.661312460899353, 'learning_rate': 1.5171736705879127e-05, 'epoch': 0.35} +{'loss': 1.0757, 'grad_norm': 1.8427565097808838, 'learning_rate': 1.5166400725660522e-05, 'epoch': 0.35} +{'loss': 1.0142, 'grad_norm': 1.962753176689148, 'learning_rate': 1.5161062738075068e-05, 'epoch': 0.35} +{'loss': 1.0406, 'grad_norm': 1.8477873802185059, 'learning_rate': 1.5155722745196802e-05, 'epoch': 0.35} +{'loss': 1.0145, 'grad_norm': 1.8629461526870728, 'learning_rate': 1.5150380749100545e-05, 'epoch': 0.35} +{'loss': 1.0678, 'grad_norm': 1.8187203407287598, 'learning_rate': 1.514503675186188e-05, 'epoch': 0.35} +{'loss': 1.1055, 'grad_norm': 1.8062211275100708, 'learning_rate': 1.5139690755557186e-05, 'epoch': 0.35} +{'loss': 0.9458, 'grad_norm': 1.7697410583496094, 'learning_rate': 1.5134342762263606e-05, 'epoch': 0.35} +{'loss': 0.9142, 'grad_norm': 1.891059160232544, 'learning_rate': 1.5128992774059063e-05, 'epoch': 0.35} +{'loss': 1.0509, 'grad_norm': 1.8962926864624023, 'learning_rate': 1.5123640793022255e-05, 'epoch': 0.35} +{'loss': 1.0551, 'grad_norm': 1.7912273406982422, 'learning_rate': 1.5118286821232661e-05, 'epoch': 0.35} +{'loss': 1.01, 'grad_norm': 1.796258568763733, 'learning_rate': 1.511293086077052e-05, 'epoch': 0.35} +{'loss': 0.9144, 'grad_norm': 0.9572339653968811, 'learning_rate': 1.5107572913716859e-05, 'epoch': 0.35} +{'loss': 0.8672, 'grad_norm': 0.9153892993927002, 'learning_rate': 1.510221298215346e-05, 'epoch': 0.35} +{'loss': 1.0036, 'grad_norm': 1.8201547861099243, 'learning_rate': 1.5096851068162885e-05, 'epoch': 0.35} +{'loss': 1.0024, 'grad_norm': 1.7657115459442139, 'learning_rate': 1.5091487173828471e-05, 'epoch': 0.35} +{'loss': 0.9945, 'grad_norm': 1.8045085668563843, 'learning_rate': 1.5086121301234318e-05, 'epoch': 0.35} +{'loss': 0.935, 'grad_norm': 1.8675174713134766, 'learning_rate': 1.5080753452465296e-05, 'epoch': 0.35} +{'loss': 0.9049, 'grad_norm': 1.0032260417938232, 'learning_rate': 1.5075383629607043e-05, 'epoch': 0.35} +{'loss': 0.9233, 'grad_norm': 1.978115439414978, 'learning_rate': 1.5070011834745963e-05, 'epoch': 0.35} +{'loss': 0.9496, 'grad_norm': 1.5761922597885132, 'learning_rate': 1.5064638069969228e-05, 'epoch': 0.35} +{'loss': 0.9874, 'grad_norm': 1.76466703414917, 'learning_rate': 1.5059262337364779e-05, 'epoch': 0.35} +{'loss': 1.0453, 'grad_norm': 1.7005233764648438, 'learning_rate': 1.505388463902131e-05, 'epoch': 0.35} +{'loss': 0.9828, 'grad_norm': 1.7527347803115845, 'learning_rate': 1.5048504977028289e-05, 'epoch': 0.35} +{'loss': 0.9622, 'grad_norm': 1.7358801364898682, 'learning_rate': 1.5043123353475944e-05, 'epoch': 0.35} +{'loss': 0.9236, 'grad_norm': 0.9531983137130737, 'learning_rate': 1.5037739770455263e-05, 'epoch': 0.35} +{'loss': 1.0767, 'grad_norm': 1.897436499595642, 'learning_rate': 1.5032354230058004e-05, 'epoch': 0.35} +{'loss': 0.9625, 'grad_norm': 1.7305550575256348, 'learning_rate': 1.502696673437667e-05, 'epoch': 0.35} +{'loss': 1.0259, 'grad_norm': 1.7434419393539429, 'learning_rate': 1.5021577285504538e-05, 'epoch': 0.35} +{'loss': 1.0457, 'grad_norm': 1.7427966594696045, 'learning_rate': 1.5016185885535639e-05, 'epoch': 0.35} +{'loss': 0.9799, 'grad_norm': 1.6742581129074097, 'learning_rate': 1.5010792536564762e-05, 'epoch': 0.35} +{'loss': 1.017, 'grad_norm': 1.6152372360229492, 'learning_rate': 1.5005397240687447e-05, 'epoch': 0.35} +{'loss': 0.9691, 'grad_norm': 1.6158370971679688, 'learning_rate': 1.5000000000000002e-05, 'epoch': 0.35} +{'loss': 1.1004, 'grad_norm': 1.7204499244689941, 'learning_rate': 1.4994600816599482e-05, 'epoch': 0.35} +{'loss': 0.9887, 'grad_norm': 1.7515671253204346, 'learning_rate': 1.4989199692583705e-05, 'epoch': 0.35} +{'loss': 1.0361, 'grad_norm': 1.8158537149429321, 'learning_rate': 1.498379663005123e-05, 'epoch': 0.35} +{'loss': 0.8766, 'grad_norm': 1.7149584293365479, 'learning_rate': 1.4978391631101383e-05, 'epoch': 0.35} +{'loss': 0.8509, 'grad_norm': 1.627145767211914, 'learning_rate': 1.4972984697834238e-05, 'epoch': 0.35} +{'loss': 0.991, 'grad_norm': 1.6957424879074097, 'learning_rate': 1.4967575832350614e-05, 'epoch': 0.35} +{'loss': 0.9685, 'grad_norm': 1.001355767250061, 'learning_rate': 1.4962165036752085e-05, 'epoch': 0.35} +{'loss': 0.9964, 'grad_norm': 1.664493441581726, 'learning_rate': 1.4956752313140978e-05, 'epoch': 0.35} +{'loss': 0.9445, 'grad_norm': 2.200862407684326, 'learning_rate': 1.4951337663620368e-05, 'epoch': 0.36} +{'loss': 1.0006, 'grad_norm': 1.8153678178787231, 'learning_rate': 1.4945921090294076e-05, 'epoch': 0.36} +{'loss': 1.1014, 'grad_norm': 1.8465709686279297, 'learning_rate': 1.494050259526667e-05, 'epoch': 0.36} +{'loss': 0.9557, 'grad_norm': 1.79034423828125, 'learning_rate': 1.493508218064347e-05, 'epoch': 0.36} +{'loss': 0.9405, 'grad_norm': 1.7255334854125977, 'learning_rate': 1.4929659848530537e-05, 'epoch': 0.36} +{'loss': 0.9204, 'grad_norm': 1.7785202264785767, 'learning_rate': 1.4924235601034673e-05, 'epoch': 0.36} +{'loss': 0.9728, 'grad_norm': 1.9693217277526855, 'learning_rate': 1.4918809440263435e-05, 'epoch': 0.36} +{'loss': 0.884, 'grad_norm': 1.6653003692626953, 'learning_rate': 1.4913381368325115e-05, 'epoch': 0.36} +{'loss': 1.0735, 'grad_norm': 1.7136613130569458, 'learning_rate': 1.4907951387328752e-05, 'epoch': 0.36} +{'loss': 1.0058, 'grad_norm': 1.8986228704452515, 'learning_rate': 1.4902519499384122e-05, 'epoch': 0.36} +{'loss': 0.951, 'grad_norm': 1.693052887916565, 'learning_rate': 1.489708570660175e-05, 'epoch': 0.36} +{'loss': 1.0251, 'grad_norm': 1.7758793830871582, 'learning_rate': 1.4891650011092896e-05, 'epoch': 0.36} +{'loss': 0.9106, 'grad_norm': 0.9435529708862305, 'learning_rate': 1.4886212414969551e-05, 'epoch': 0.36} +{'loss': 0.9583, 'grad_norm': 1.9558954238891602, 'learning_rate': 1.488077292034446e-05, 'epoch': 0.36} +{'loss': 1.0021, 'grad_norm': 1.7710357904434204, 'learning_rate': 1.4875331529331097e-05, 'epoch': 0.36} +{'loss': 1.0239, 'grad_norm': 1.700529932975769, 'learning_rate': 1.4869888244043674e-05, 'epoch': 0.36} +{'loss': 0.9699, 'grad_norm': 1.79655921459198, 'learning_rate': 1.486444306659714e-05, 'epoch': 0.36} +{'loss': 1.0352, 'grad_norm': 1.8918763399124146, 'learning_rate': 1.4858995999107175e-05, 'epoch': 0.36} +{'loss': 0.9867, 'grad_norm': 1.7336639165878296, 'learning_rate': 1.4853547043690201e-05, 'epoch': 0.36} +{'loss': 0.9806, 'grad_norm': 1.88038170337677, 'learning_rate': 1.4848096202463373e-05, 'epoch': 0.36} +{'loss': 0.9848, 'grad_norm': 1.765183687210083, 'learning_rate': 1.4842643477544565e-05, 'epoch': 0.36} +{'loss': 0.9358, 'grad_norm': 1.7156071662902832, 'learning_rate': 1.4837188871052399e-05, 'epoch': 0.36} +{'loss': 1.0382, 'grad_norm': 1.782707929611206, 'learning_rate': 1.4831732385106221e-05, 'epoch': 0.36} +{'loss': 0.9793, 'grad_norm': 1.8035192489624023, 'learning_rate': 1.482627402182611e-05, 'epoch': 0.36} +{'loss': 1.0671, 'grad_norm': 1.644707202911377, 'learning_rate': 1.482081378333287e-05, 'epoch': 0.36} +{'loss': 0.9814, 'grad_norm': 1.8341261148452759, 'learning_rate': 1.4815351671748039e-05, 'epoch': 0.36} +{'loss': 0.9508, 'grad_norm': 1.8055627346038818, 'learning_rate': 1.4809887689193878e-05, 'epoch': 0.36} +{'loss': 0.9645, 'grad_norm': 1.7969026565551758, 'learning_rate': 1.4804421837793379e-05, 'epoch': 0.36} +{'loss': 1.008, 'grad_norm': 1.8895519971847534, 'learning_rate': 1.4798954119670254e-05, 'epoch': 0.36} +{'loss': 0.934, 'grad_norm': 1.850370168685913, 'learning_rate': 1.479348453694895e-05, 'epoch': 0.36} +{'loss': 0.9489, 'grad_norm': 1.658559799194336, 'learning_rate': 1.4788013091754629e-05, 'epoch': 0.36} +{'loss': 1.0039, 'grad_norm': 1.8425266742706299, 'learning_rate': 1.4782539786213184e-05, 'epoch': 0.36} +{'loss': 1.0094, 'grad_norm': 1.7742093801498413, 'learning_rate': 1.4777064622451222e-05, 'epoch': 0.36} +{'loss': 1.0538, 'grad_norm': 1.7343881130218506, 'learning_rate': 1.4771587602596085e-05, 'epoch': 0.36} +{'loss': 0.9969, 'grad_norm': 1.6593002080917358, 'learning_rate': 1.4766108728775822e-05, 'epoch': 0.36} +{'loss': 1.0339, 'grad_norm': 1.8009185791015625, 'learning_rate': 1.4760628003119213e-05, 'epoch': 0.36} +{'loss': 1.0146, 'grad_norm': 1.823994517326355, 'learning_rate': 1.4755145427755755e-05, 'epoch': 0.36} +{'loss': 0.9343, 'grad_norm': 1.5586533546447754, 'learning_rate': 1.4749661004815654e-05, 'epoch': 0.36} +{'loss': 0.9596, 'grad_norm': 0.9568275809288025, 'learning_rate': 1.474417473642985e-05, 'epoch': 0.36} +{'loss': 0.9041, 'grad_norm': 1.8383195400238037, 'learning_rate': 1.4738686624729987e-05, 'epoch': 0.36} +{'loss': 1.061, 'grad_norm': 1.7363063097000122, 'learning_rate': 1.4733196671848435e-05, 'epoch': 0.36} +{'loss': 0.968, 'grad_norm': 0.953652024269104, 'learning_rate': 1.4727704879918272e-05, 'epoch': 0.36} +{'loss': 0.9449, 'grad_norm': 1.6911596059799194, 'learning_rate': 1.4722211251073293e-05, 'epoch': 0.36} +{'loss': 0.8973, 'grad_norm': 1.7536365985870361, 'learning_rate': 1.4716715787448007e-05, 'epoch': 0.36} +{'loss': 0.9418, 'grad_norm': 1.6344425678253174, 'learning_rate': 1.4711218491177636e-05, 'epoch': 0.36} +{'loss': 0.9604, 'grad_norm': 1.9258824586868286, 'learning_rate': 1.4705719364398117e-05, 'epoch': 0.36} +{'loss': 1.0507, 'grad_norm': 1.7928732633590698, 'learning_rate': 1.4700218409246087e-05, 'epoch': 0.36} +{'loss': 1.0997, 'grad_norm': 1.9834192991256714, 'learning_rate': 1.469471562785891e-05, 'epoch': 0.36} +{'loss': 1.0872, 'grad_norm': 1.7049096822738647, 'learning_rate': 1.4689211022374645e-05, 'epoch': 0.36} +{'loss': 0.9592, 'grad_norm': 1.9117679595947266, 'learning_rate': 1.4683704594932072e-05, 'epoch': 0.36} +{'loss': 0.9899, 'grad_norm': 1.756568431854248, 'learning_rate': 1.4678196347670665e-05, 'epoch': 0.36} +{'loss': 0.9564, 'grad_norm': 1.8233407735824585, 'learning_rate': 1.4672686282730622e-05, 'epoch': 0.36} +{'loss': 0.9611, 'grad_norm': 1.7915583848953247, 'learning_rate': 1.4667174402252831e-05, 'epoch': 0.37} +{'loss': 1.0075, 'grad_norm': 1.6582584381103516, 'learning_rate': 1.4661660708378893e-05, 'epoch': 0.37} +{'loss': 0.9696, 'grad_norm': 1.6997991800308228, 'learning_rate': 1.4656145203251116e-05, 'epoch': 0.37} +{'loss': 0.9643, 'grad_norm': 1.4926577806472778, 'learning_rate': 1.4650627889012507e-05, 'epoch': 0.37} +{'loss': 0.9739, 'grad_norm': 1.787379264831543, 'learning_rate': 1.4645108767806778e-05, 'epoch': 0.37} +{'loss': 0.9411, 'grad_norm': 1.5647180080413818, 'learning_rate': 1.4639587841778342e-05, 'epoch': 0.37} +{'loss': 0.9336, 'grad_norm': 1.569069504737854, 'learning_rate': 1.4634065113072316e-05, 'epoch': 0.37} +{'loss': 1.0497, 'grad_norm': 1.7829434871673584, 'learning_rate': 1.4628540583834511e-05, 'epoch': 0.37} +{'loss': 1.0149, 'grad_norm': 1.7288581132888794, 'learning_rate': 1.462301425621145e-05, 'epoch': 0.37} +{'loss': 0.9957, 'grad_norm': 1.8584659099578857, 'learning_rate': 1.4617486132350343e-05, 'epoch': 0.37} +{'loss': 0.9084, 'grad_norm': 1.8501023054122925, 'learning_rate': 1.4611956214399097e-05, 'epoch': 0.37} +{'loss': 1.078, 'grad_norm': 1.9918019771575928, 'learning_rate': 1.4606424504506325e-05, 'epoch': 0.37} +{'loss': 0.9808, 'grad_norm': 1.648996353149414, 'learning_rate': 1.4600891004821332e-05, 'epoch': 0.37} +{'loss': 0.9646, 'grad_norm': 1.7351601123809814, 'learning_rate': 1.4595355717494118e-05, 'epoch': 0.37} +{'loss': 1.0624, 'grad_norm': 2.054368495941162, 'learning_rate': 1.4589818644675378e-05, 'epoch': 0.37} +{'loss': 0.9769, 'grad_norm': 1.7488799095153809, 'learning_rate': 1.45842797885165e-05, 'epoch': 0.37} +{'loss': 0.9007, 'grad_norm': 1.8232616186141968, 'learning_rate': 1.4578739151169567e-05, 'epoch': 0.37} +{'loss': 0.9725, 'grad_norm': 1.8748761415481567, 'learning_rate': 1.4573196734787357e-05, 'epoch': 0.37} +{'loss': 0.905, 'grad_norm': 0.9844322204589844, 'learning_rate': 1.4567652541523325e-05, 'epoch': 0.37} +{'loss': 1.0371, 'grad_norm': 1.824452519416809, 'learning_rate': 1.4562106573531632e-05, 'epoch': 0.37} +{'loss': 0.9793, 'grad_norm': 2.2875170707702637, 'learning_rate': 1.4556558832967122e-05, 'epoch': 0.37} +{'loss': 0.9742, 'grad_norm': 1.6063363552093506, 'learning_rate': 1.455100932198533e-05, 'epoch': 0.37} +{'loss': 0.9639, 'grad_norm': 1.7440885305404663, 'learning_rate': 1.4545458042742477e-05, 'epoch': 0.37} +{'loss': 0.9666, 'grad_norm': 2.064701557159424, 'learning_rate': 1.4539904997395468e-05, 'epoch': 0.37} +{'loss': 0.9194, 'grad_norm': 1.7673022747039795, 'learning_rate': 1.4534350188101905e-05, 'epoch': 0.37} +{'loss': 0.951, 'grad_norm': 1.6951756477355957, 'learning_rate': 1.4528793617020063e-05, 'epoch': 0.37} +{'loss': 0.9517, 'grad_norm': 1.771060824394226, 'learning_rate': 1.4523235286308907e-05, 'epoch': 0.37} +{'loss': 0.9978, 'grad_norm': 1.7901089191436768, 'learning_rate': 1.4517675198128086e-05, 'epoch': 0.37} +{'loss': 1.0247, 'grad_norm': 1.6266430616378784, 'learning_rate': 1.4512113354637931e-05, 'epoch': 0.37} +{'loss': 1.0497, 'grad_norm': 1.6474885940551758, 'learning_rate': 1.4506549757999456e-05, 'epoch': 0.37} +{'loss': 0.989, 'grad_norm': 1.8308584690093994, 'learning_rate': 1.4500984410374353e-05, 'epoch': 0.37} +{'loss': 1.0045, 'grad_norm': 1.8437163829803467, 'learning_rate': 1.4495417313924996e-05, 'epoch': 0.37} +{'loss': 0.994, 'grad_norm': 1.6356878280639648, 'learning_rate': 1.4489848470814447e-05, 'epoch': 0.37} +{'loss': 1.0028, 'grad_norm': 1.624777913093567, 'learning_rate': 1.4484277883206433e-05, 'epoch': 0.37} +{'loss': 0.9101, 'grad_norm': 1.6456894874572754, 'learning_rate': 1.4478705553265363e-05, 'epoch': 0.37} +{'loss': 0.924, 'grad_norm': 1.755164384841919, 'learning_rate': 1.4473131483156326e-05, 'epoch': 0.37} +{'loss': 0.9734, 'grad_norm': 1.7762017250061035, 'learning_rate': 1.4467555675045088e-05, 'epoch': 0.37} +{'loss': 0.9483, 'grad_norm': 1.772132158279419, 'learning_rate': 1.4461978131098089e-05, 'epoch': 0.37} +{'loss': 1.0793, 'grad_norm': 1.8451095819473267, 'learning_rate': 1.4456398853482438e-05, 'epoch': 0.37} +{'loss': 1.0455, 'grad_norm': 1.7906150817871094, 'learning_rate': 1.4450817844365924e-05, 'epoch': 0.37} +{'loss': 1.0102, 'grad_norm': 1.7000939846038818, 'learning_rate': 1.444523510591701e-05, 'epoch': 0.37} +{'loss': 0.9774, 'grad_norm': 1.7745611667633057, 'learning_rate': 1.4439650640304822e-05, 'epoch': 0.37} +{'loss': 1.0389, 'grad_norm': 1.8367114067077637, 'learning_rate': 1.4434064449699168e-05, 'epoch': 0.37} +{'loss': 0.9139, 'grad_norm': 1.8302029371261597, 'learning_rate': 1.4428476536270517e-05, 'epoch': 0.37} +{'loss': 0.9879, 'grad_norm': 1.825661540031433, 'learning_rate': 1.4422886902190014e-05, 'epoch': 0.37} +{'loss': 0.9629, 'grad_norm': 1.690522313117981, 'learning_rate': 1.4417295549629469e-05, 'epoch': 0.37} +{'loss': 1.0092, 'grad_norm': 1.7939332723617554, 'learning_rate': 1.4411702480761362e-05, 'epoch': 0.37} +{'loss': 1.0481, 'grad_norm': 1.655850887298584, 'learning_rate': 1.4406107697758838e-05, 'epoch': 0.37} +{'loss': 0.9911, 'grad_norm': 1.7256571054458618, 'learning_rate': 1.440051120279571e-05, 'epoch': 0.37} +{'loss': 1.0133, 'grad_norm': 1.6558855772018433, 'learning_rate': 1.4394912998046451e-05, 'epoch': 0.37} +{'loss': 1.0212, 'grad_norm': 1.5809326171875, 'learning_rate': 1.4389313085686205e-05, 'epoch': 0.37} +{'loss': 1.0678, 'grad_norm': 1.5560415983200073, 'learning_rate': 1.4383711467890776e-05, 'epoch': 0.37} +{'loss': 0.8871, 'grad_norm': 1.7232911586761475, 'learning_rate': 1.4378108146836632e-05, 'epoch': 0.38} +{'loss': 1.0044, 'grad_norm': 1.86379075050354, 'learning_rate': 1.4372503124700899e-05, 'epoch': 0.38} +{'loss': 0.9294, 'grad_norm': 1.6726834774017334, 'learning_rate': 1.436689640366137e-05, 'epoch': 0.38} +{'loss': 0.9914, 'grad_norm': 1.7556164264678955, 'learning_rate': 1.4361287985896495e-05, 'epoch': 0.38} +{'loss': 0.9562, 'grad_norm': 1.6470776796340942, 'learning_rate': 1.4355677873585383e-05, 'epoch': 0.38} +{'loss': 0.9376, 'grad_norm': 1.621160626411438, 'learning_rate': 1.43500660689078e-05, 'epoch': 0.38} +{'loss': 1.0268, 'grad_norm': 1.7888695001602173, 'learning_rate': 1.4344452574044173e-05, 'epoch': 0.38} +{'loss': 0.9306, 'grad_norm': 1.7422411441802979, 'learning_rate': 1.4338837391175582e-05, 'epoch': 0.38} +{'loss': 1.0233, 'grad_norm': 1.7942121028900146, 'learning_rate': 1.4333220522483768e-05, 'epoch': 0.38} +{'loss': 0.9579, 'grad_norm': 1.6585218906402588, 'learning_rate': 1.4327601970151122e-05, 'epoch': 0.38} +{'loss': 1.0209, 'grad_norm': 1.7867778539657593, 'learning_rate': 1.432198173636069e-05, 'epoch': 0.38} +{'loss': 1.0178, 'grad_norm': 1.7259944677352905, 'learning_rate': 1.4316359823296174e-05, 'epoch': 0.38} +{'loss': 0.922, 'grad_norm': 4.83243465423584, 'learning_rate': 1.4310736233141926e-05, 'epoch': 0.38} +{'loss': 0.9977, 'grad_norm': 1.8803330659866333, 'learning_rate': 1.4305110968082953e-05, 'epoch': 0.38} +{'loss': 0.9645, 'grad_norm': 1.8122565746307373, 'learning_rate': 1.4299484030304906e-05, 'epoch': 0.38} +{'loss': 0.915, 'grad_norm': 1.755486011505127, 'learning_rate': 1.4293855421994094e-05, 'epoch': 0.38} +{'loss': 1.0457, 'grad_norm': 1.9173431396484375, 'learning_rate': 1.4288225145337467e-05, 'epoch': 0.38} +{'loss': 0.9553, 'grad_norm': 1.6177693605422974, 'learning_rate': 1.4282593202522627e-05, 'epoch': 0.38} +{'loss': 1.0855, 'grad_norm': 1.8555335998535156, 'learning_rate': 1.4276959595737832e-05, 'epoch': 0.38} +{'loss': 0.9304, 'grad_norm': 1.929799199104309, 'learning_rate': 1.4271324327171969e-05, 'epoch': 0.38} +{'loss': 1.0211, 'grad_norm': 1.8007659912109375, 'learning_rate': 1.4265687399014584e-05, 'epoch': 0.38} +{'loss': 1.0229, 'grad_norm': 1.7115871906280518, 'learning_rate': 1.4260048813455867e-05, 'epoch': 0.38} +{'loss': 1.0693, 'grad_norm': 1.7530657052993774, 'learning_rate': 1.4254408572686642e-05, 'epoch': 0.38} +{'loss': 0.9122, 'grad_norm': 1.8238295316696167, 'learning_rate': 1.4248766678898386e-05, 'epoch': 0.38} +{'loss': 0.9053, 'grad_norm': 1.6122338771820068, 'learning_rate': 1.4243123134283217e-05, 'epoch': 0.38} +{'loss': 0.9975, 'grad_norm': 1.673415184020996, 'learning_rate': 1.4237477941033888e-05, 'epoch': 0.38} +{'loss': 0.9626, 'grad_norm': 1.8511546850204468, 'learning_rate': 1.4231831101343802e-05, 'epoch': 0.38} +{'loss': 0.986, 'grad_norm': 1.4975576400756836, 'learning_rate': 1.4226182617406996e-05, 'epoch': 0.38} +{'loss': 0.9972, 'grad_norm': 1.554913878440857, 'learning_rate': 1.4220532491418144e-05, 'epoch': 0.38} +{'loss': 1.0087, 'grad_norm': 1.775629997253418, 'learning_rate': 1.4214880725572563e-05, 'epoch': 0.38} +{'loss': 0.8841, 'grad_norm': 1.651060938835144, 'learning_rate': 1.4209227322066206e-05, 'epoch': 0.38} +{'loss': 0.9728, 'grad_norm': 1.8014286756515503, 'learning_rate': 1.4203572283095657e-05, 'epoch': 0.38} +{'loss': 0.9638, 'grad_norm': 1.762376308441162, 'learning_rate': 1.4197915610858143e-05, 'epoch': 0.38} +{'loss': 1.0645, 'grad_norm': 1.7320339679718018, 'learning_rate': 1.419225730755152e-05, 'epoch': 0.38} +{'loss': 0.9905, 'grad_norm': 1.7174746990203857, 'learning_rate': 1.4186597375374283e-05, 'epoch': 0.38} +{'loss': 1.0071, 'grad_norm': 1.8385252952575684, 'learning_rate': 1.4180935816525554e-05, 'epoch': 0.38} +{'loss': 0.8468, 'grad_norm': 0.9697132110595703, 'learning_rate': 1.417527263320509e-05, 'epoch': 0.38} +{'loss': 1.0395, 'grad_norm': 2.0253593921661377, 'learning_rate': 1.4169607827613284e-05, 'epoch': 0.38} +{'loss': 1.0069, 'grad_norm': 2.054898500442505, 'learning_rate': 1.4163941401951147e-05, 'epoch': 0.38} +{'loss': 0.9927, 'grad_norm': 1.7338184118270874, 'learning_rate': 1.415827335842033e-05, 'epoch': 0.38} +{'loss': 0.9868, 'grad_norm': 1.6496775150299072, 'learning_rate': 1.4152603699223112e-05, 'epoch': 0.38} +{'loss': 0.9564, 'grad_norm': 1.7666347026824951, 'learning_rate': 1.4146932426562391e-05, 'epoch': 0.38} +{'loss': 1.0263, 'grad_norm': 1.791212797164917, 'learning_rate': 1.4141259542641706e-05, 'epoch': 0.38} +{'loss': 1.0899, 'grad_norm': 1.7299693822860718, 'learning_rate': 1.4135585049665207e-05, 'epoch': 0.38} +{'loss': 0.9862, 'grad_norm': 1.8792473077774048, 'learning_rate': 1.4129908949837684e-05, 'epoch': 0.38} +{'loss': 0.8934, 'grad_norm': 1.9791984558105469, 'learning_rate': 1.4124231245364534e-05, 'epoch': 0.38} +{'loss': 1.0477, 'grad_norm': 1.8039600849151611, 'learning_rate': 1.4118551938451797e-05, 'epoch': 0.38} +{'loss': 1.0477, 'grad_norm': 1.6801077127456665, 'learning_rate': 1.4112871031306118e-05, 'epoch': 0.38} +{'loss': 1.0081, 'grad_norm': 1.8121223449707031, 'learning_rate': 1.4107188526134774e-05, 'epoch': 0.38} +{'loss': 0.9686, 'grad_norm': 1.7899469137191772, 'learning_rate': 1.4101504425145662e-05, 'epoch': 0.38} +{'loss': 0.9467, 'grad_norm': 1.9349442720413208, 'learning_rate': 1.4095818730547297e-05, 'epoch': 0.38} +{'loss': 0.9958, 'grad_norm': 1.637374997138977, 'learning_rate': 1.4090131444548814e-05, 'epoch': 0.38} +{'loss': 0.9635, 'grad_norm': 1.6273151636123657, 'learning_rate': 1.4084442569359964e-05, 'epoch': 0.39} +{'loss': 1.0348, 'grad_norm': 1.8371778726577759, 'learning_rate': 1.4078752107191118e-05, 'epoch': 0.39} +{'loss': 0.9934, 'grad_norm': 1.560639500617981, 'learning_rate': 1.4073060060253265e-05, 'epoch': 0.39} +{'loss': 1.0171, 'grad_norm': 1.6575658321380615, 'learning_rate': 1.4067366430758004e-05, 'epoch': 0.39} +{'loss': 0.9354, 'grad_norm': 1.7119311094284058, 'learning_rate': 1.4061671220917557e-05, 'epoch': 0.39} +{'loss': 0.9984, 'grad_norm': 1.7783334255218506, 'learning_rate': 1.4055974432944753e-05, 'epoch': 0.39} +{'loss': 0.9115, 'grad_norm': 1.6485015153884888, 'learning_rate': 1.4050276069053036e-05, 'epoch': 0.39} +{'loss': 0.9653, 'grad_norm': 1.7605186700820923, 'learning_rate': 1.4044576131456466e-05, 'epoch': 0.39} +{'loss': 0.9846, 'grad_norm': 2.064528226852417, 'learning_rate': 1.4038874622369714e-05, 'epoch': 0.39} +{'loss': 0.9582, 'grad_norm': 1.8784908056259155, 'learning_rate': 1.4033171544008053e-05, 'epoch': 0.39} +{'loss': 1.0082, 'grad_norm': 1.869053602218628, 'learning_rate': 1.4027466898587375e-05, 'epoch': 0.39} +{'loss': 0.9752, 'grad_norm': 1.717316746711731, 'learning_rate': 1.4021760688324175e-05, 'epoch': 0.39} +{'loss': 1.019, 'grad_norm': 0.9960609674453735, 'learning_rate': 1.4016052915435565e-05, 'epoch': 0.39} +{'loss': 0.9974, 'grad_norm': 1.6758718490600586, 'learning_rate': 1.4010343582139252e-05, 'epoch': 0.39} +{'loss': 0.9908, 'grad_norm': 1.6149204969406128, 'learning_rate': 1.4004632690653559e-05, 'epoch': 0.39} +{'loss': 0.9454, 'grad_norm': 1.9212483167648315, 'learning_rate': 1.3998920243197408e-05, 'epoch': 0.39} +{'loss': 0.9856, 'grad_norm': 1.8701709508895874, 'learning_rate': 1.3993206241990333e-05, 'epoch': 0.39} +{'loss': 0.9247, 'grad_norm': 1.6525683403015137, 'learning_rate': 1.3987490689252463e-05, 'epoch': 0.39} +{'loss': 1.0949, 'grad_norm': 1.8931653499603271, 'learning_rate': 1.3981773587204535e-05, 'epoch': 0.39} +{'loss': 0.9748, 'grad_norm': 2.060068130493164, 'learning_rate': 1.3976054938067885e-05, 'epoch': 0.39} +{'loss': 0.9233, 'grad_norm': 1.5122934579849243, 'learning_rate': 1.3970334744064451e-05, 'epoch': 0.39} +{'loss': 0.9702, 'grad_norm': 2.8335907459259033, 'learning_rate': 1.3964613007416777e-05, 'epoch': 0.39} +{'loss': 0.9971, 'grad_norm': 1.7102752923965454, 'learning_rate': 1.3958889730347997e-05, 'epoch': 0.39} +{'loss': 1.0199, 'grad_norm': 1.8470264673233032, 'learning_rate': 1.3953164915081852e-05, 'epoch': 0.39} +{'loss': 0.9761, 'grad_norm': 1.741223692893982, 'learning_rate': 1.3947438563842672e-05, 'epoch': 0.39} +{'loss': 0.9552, 'grad_norm': 1.8601967096328735, 'learning_rate': 1.3941710678855396e-05, 'epoch': 0.39} +{'loss': 0.9976, 'grad_norm': 1.8208478689193726, 'learning_rate': 1.3935981262345541e-05, 'epoch': 0.39} +{'loss': 0.984, 'grad_norm': 1.7925888299942017, 'learning_rate': 1.3930250316539237e-05, 'epoch': 0.39} +{'loss': 1.0463, 'grad_norm': 1.8152565956115723, 'learning_rate': 1.39245178436632e-05, 'epoch': 0.39} +{'loss': 0.9954, 'grad_norm': 1.7336716651916504, 'learning_rate': 1.3918783845944737e-05, 'epoch': 0.39} +{'loss': 0.9174, 'grad_norm': 2.2292046546936035, 'learning_rate': 1.391304832561175e-05, 'epoch': 0.39} +{'loss': 1.0216, 'grad_norm': 1.7288459539413452, 'learning_rate': 1.3907311284892737e-05, 'epoch': 0.39} +{'loss': 0.9253, 'grad_norm': 1.637322187423706, 'learning_rate': 1.3901572726016782e-05, 'epoch': 0.39} +{'loss': 0.9872, 'grad_norm': 0.9683698415756226, 'learning_rate': 1.389583265121356e-05, 'epoch': 0.39} +{'loss': 0.9163, 'grad_norm': 1.0408151149749756, 'learning_rate': 1.3890091062713332e-05, 'epoch': 0.39} +{'loss': 0.9821, 'grad_norm': 1.9911779165267944, 'learning_rate': 1.3884347962746949e-05, 'epoch': 0.39} +{'loss': 1.1045, 'grad_norm': 2.588649034500122, 'learning_rate': 1.3878603353545854e-05, 'epoch': 0.39} +{'loss': 1.0634, 'grad_norm': 2.025618314743042, 'learning_rate': 1.3872857237342067e-05, 'epoch': 0.39} +{'loss': 0.8942, 'grad_norm': 0.9749510884284973, 'learning_rate': 1.3867109616368208e-05, 'epoch': 0.39} +{'loss': 0.929, 'grad_norm': 1.807681918144226, 'learning_rate': 1.3861360492857464e-05, 'epoch': 0.39} +{'loss': 0.9373, 'grad_norm': 1.8939077854156494, 'learning_rate': 1.3855609869043618e-05, 'epoch': 0.39} +{'loss': 0.9032, 'grad_norm': 1.6454027891159058, 'learning_rate': 1.3849857747161036e-05, 'epoch': 0.39} +{'loss': 0.9946, 'grad_norm': 1.7353684902191162, 'learning_rate': 1.3844104129444655e-05, 'epoch': 0.39} +{'loss': 0.9684, 'grad_norm': 1.8138177394866943, 'learning_rate': 1.3838349018130007e-05, 'epoch': 0.39} +{'loss': 0.9971, 'grad_norm': 1.9001697301864624, 'learning_rate': 1.3832592415453193e-05, 'epoch': 0.39} +{'loss': 1.0403, 'grad_norm': 1.7635122537612915, 'learning_rate': 1.3826834323650899e-05, 'epoch': 0.39} +{'loss': 0.954, 'grad_norm': 1.9057657718658447, 'learning_rate': 1.3821074744960394e-05, 'epoch': 0.39} +{'loss': 0.9798, 'grad_norm': 1.4782705307006836, 'learning_rate': 1.3815313681619515e-05, 'epoch': 0.39} +{'loss': 0.943, 'grad_norm': 1.6358892917633057, 'learning_rate': 1.3809551135866683e-05, 'epoch': 0.39} +{'loss': 0.9325, 'grad_norm': 1.8052687644958496, 'learning_rate': 1.3803787109940891e-05, 'epoch': 0.39} +{'loss': 0.9819, 'grad_norm': 1.6639820337295532, 'learning_rate': 1.3798021606081713e-05, 'epoch': 0.39} +{'loss': 1.0134, 'grad_norm': 1.8043951988220215, 'learning_rate': 1.3792254626529286e-05, 'epoch': 0.39} +{'loss': 1.0323, 'grad_norm': 1.6742961406707764, 'learning_rate': 1.3786486173524331e-05, 'epoch': 0.4} +{'loss': 0.9797, 'grad_norm': 1.7478245496749878, 'learning_rate': 1.378071624930814e-05, 'epoch': 0.4} +{'loss': 1.0842, 'grad_norm': 1.9097031354904175, 'learning_rate': 1.377494485612257e-05, 'epoch': 0.4} +{'loss': 1.0266, 'grad_norm': 1.8818575143814087, 'learning_rate': 1.3769171996210053e-05, 'epoch': 0.4} +{'loss': 0.9203, 'grad_norm': 1.6143834590911865, 'learning_rate': 1.3763397671813597e-05, 'epoch': 0.4} +{'loss': 0.9495, 'grad_norm': 1.798588514328003, 'learning_rate': 1.3757621885176768e-05, 'epoch': 0.4} +{'loss': 0.9649, 'grad_norm': 1.7057496309280396, 'learning_rate': 1.375184463854371e-05, 'epoch': 0.4} +{'loss': 1.0196, 'grad_norm': 1.6676299571990967, 'learning_rate': 1.3746065934159123e-05, 'epoch': 0.4} +{'loss': 0.9631, 'grad_norm': 1.8378809690475464, 'learning_rate': 1.3740285774268282e-05, 'epoch': 0.4} +{'loss': 0.9016, 'grad_norm': 1.7750494480133057, 'learning_rate': 1.3734504161117028e-05, 'epoch': 0.4} +{'loss': 0.8813, 'grad_norm': 1.8523062467575073, 'learning_rate': 1.3728721096951762e-05, 'epoch': 0.4} +{'loss': 0.9487, 'grad_norm': 1.8497612476348877, 'learning_rate': 1.3722936584019453e-05, 'epoch': 0.4} +{'loss': 0.9133, 'grad_norm': 1.9014673233032227, 'learning_rate': 1.3717150624567627e-05, 'epoch': 0.4} +{'loss': 1.0211, 'grad_norm': 1.6362453699111938, 'learning_rate': 1.371136322084438e-05, 'epoch': 0.4} +{'loss': 0.9428, 'grad_norm': 1.84226655960083, 'learning_rate': 1.3705574375098365e-05, 'epoch': 0.4} +{'loss': 0.9609, 'grad_norm': 1.7137995958328247, 'learning_rate': 1.3699784089578791e-05, 'epoch': 0.4} +{'loss': 0.9381, 'grad_norm': 1.7027982473373413, 'learning_rate': 1.3693992366535433e-05, 'epoch': 0.4} +{'loss': 1.0187, 'grad_norm': 1.8677493333816528, 'learning_rate': 1.3688199208218623e-05, 'epoch': 0.4} +{'loss': 0.9758, 'grad_norm': 1.9954644441604614, 'learning_rate': 1.3682404616879246e-05, 'epoch': 0.4} +{'loss': 0.9589, 'grad_norm': 1.8320896625518799, 'learning_rate': 1.3676608594768754e-05, 'epoch': 0.4} +{'loss': 1.0457, 'grad_norm': 1.7886749505996704, 'learning_rate': 1.3670811144139144e-05, 'epoch': 0.4} +{'loss': 0.951, 'grad_norm': 1.6116259098052979, 'learning_rate': 1.3665012267242974e-05, 'epoch': 0.4} +{'loss': 1.0526, 'grad_norm': 1.718163013458252, 'learning_rate': 1.3659211966333357e-05, 'epoch': 0.4} +{'loss': 1.0422, 'grad_norm': 1.8794950246810913, 'learning_rate': 1.3653410243663953e-05, 'epoch': 0.4} +{'loss': 0.9549, 'grad_norm': 1.728333592414856, 'learning_rate': 1.364760710148898e-05, 'epoch': 0.4} +{'loss': 0.9647, 'grad_norm': 1.887534499168396, 'learning_rate': 1.3641802542063208e-05, 'epoch': 0.4} +{'loss': 0.9717, 'grad_norm': 1.7697713375091553, 'learning_rate': 1.3635996567641952e-05, 'epoch': 0.4} +{'loss': 1.0381, 'grad_norm': 1.806296467781067, 'learning_rate': 1.3630189180481083e-05, 'epoch': 0.4} +{'loss': 0.9947, 'grad_norm': 1.862405776977539, 'learning_rate': 1.3624380382837017e-05, 'epoch': 0.4} +{'loss': 0.9294, 'grad_norm': 1.7284010648727417, 'learning_rate': 1.3618570176966723e-05, 'epoch': 0.4} +{'loss': 1.0141, 'grad_norm': 1.8863991498947144, 'learning_rate': 1.361275856512771e-05, 'epoch': 0.4} +{'loss': 0.9898, 'grad_norm': 1.7761805057525635, 'learning_rate': 1.3606945549578039e-05, 'epoch': 0.4} +{'loss': 0.9908, 'grad_norm': 1.9305976629257202, 'learning_rate': 1.360113113257631e-05, 'epoch': 0.4} +{'loss': 0.9411, 'grad_norm': 1.7332381010055542, 'learning_rate': 1.3595315316381676e-05, 'epoch': 0.4} +{'loss': 0.9499, 'grad_norm': 1.663541555404663, 'learning_rate': 1.3589498103253828e-05, 'epoch': 0.4} +{'loss': 0.8928, 'grad_norm': 1.8119417428970337, 'learning_rate': 1.3583679495453e-05, 'epoch': 0.4} +{'loss': 1.0262, 'grad_norm': 1.7064785957336426, 'learning_rate': 1.3577859495239975e-05, 'epoch': 0.4} +{'loss': 1.0146, 'grad_norm': 1.7013280391693115, 'learning_rate': 1.3572038104876066e-05, 'epoch': 0.4} +{'loss': 0.926, 'grad_norm': 1.8446394205093384, 'learning_rate': 1.3566215326623131e-05, 'epoch': 0.4} +{'loss': 1.0443, 'grad_norm': 1.7226033210754395, 'learning_rate': 1.356039116274357e-05, 'epoch': 0.4} +{'loss': 0.9462, 'grad_norm': 1.790978193283081, 'learning_rate': 1.3554565615500313e-05, 'epoch': 0.4} +{'loss': 1.0413, 'grad_norm': 1.7985692024230957, 'learning_rate': 1.3548738687156839e-05, 'epoch': 0.4} +{'loss': 0.9012, 'grad_norm': 0.8923540115356445, 'learning_rate': 1.3542910379977158e-05, 'epoch': 0.4} +{'loss': 0.9692, 'grad_norm': 1.636090636253357, 'learning_rate': 1.3537080696225815e-05, 'epoch': 0.4} +{'loss': 1.0121, 'grad_norm': 1.6827718019485474, 'learning_rate': 1.353124963816789e-05, 'epoch': 0.4} +{'loss': 1.0009, 'grad_norm': 1.589532494544983, 'learning_rate': 1.3525417208068996e-05, 'epoch': 0.4} +{'loss': 0.987, 'grad_norm': 1.7537076473236084, 'learning_rate': 1.3519583408195283e-05, 'epoch': 0.4} +{'loss': 0.866, 'grad_norm': 1.8689415454864502, 'learning_rate': 1.3513748240813429e-05, 'epoch': 0.4} +{'loss': 0.8814, 'grad_norm': 1.8183119297027588, 'learning_rate': 1.3507911708190646e-05, 'epoch': 0.4} +{'loss': 1.0339, 'grad_norm': 1.6112996339797974, 'learning_rate': 1.3502073812594677e-05, 'epoch': 0.4} +{'loss': 1.0126, 'grad_norm': 1.8475773334503174, 'learning_rate': 1.349623455629379e-05, 'epoch': 0.4} +{'loss': 1.0127, 'grad_norm': 1.754859209060669, 'learning_rate': 1.3490393941556787e-05, 'epoch': 0.4} +{'loss': 1.0049, 'grad_norm': 2.0299603939056396, 'learning_rate': 1.3484551970652997e-05, 'epoch': 0.41} +{'loss': 0.9512, 'grad_norm': 1.6877714395523071, 'learning_rate': 1.3478708645852272e-05, 'epoch': 0.41} +{'loss': 0.9732, 'grad_norm': 2.039612293243408, 'learning_rate': 1.3472863969424998e-05, 'epoch': 0.41} +{'loss': 0.9356, 'grad_norm': 1.910059928894043, 'learning_rate': 1.3467017943642074e-05, 'epoch': 0.41} +{'loss': 1.0055, 'grad_norm': 1.7781462669372559, 'learning_rate': 1.346117057077493e-05, 'epoch': 0.41} +{'loss': 0.9338, 'grad_norm': 1.6408418416976929, 'learning_rate': 1.3455321853095527e-05, 'epoch': 0.41} +{'loss': 1.0378, 'grad_norm': 1.7563893795013428, 'learning_rate': 1.3449471792876333e-05, 'epoch': 0.41} +{'loss': 0.9528, 'grad_norm': 0.9516794085502625, 'learning_rate': 1.3443620392390352e-05, 'epoch': 0.41} +{'loss': 0.9256, 'grad_norm': 1.733107328414917, 'learning_rate': 1.3437767653911097e-05, 'epoch': 0.41} +{'loss': 1.0075, 'grad_norm': 1.6630865335464478, 'learning_rate': 1.3431913579712612e-05, 'epoch': 0.41} +{'loss': 0.9268, 'grad_norm': 1.8065743446350098, 'learning_rate': 1.3426058172069451e-05, 'epoch': 0.41} +{'loss': 1.0093, 'grad_norm': 1.8041304349899292, 'learning_rate': 1.342020143325669e-05, 'epoch': 0.41} +{'loss': 1.0045, 'grad_norm': 1.630800724029541, 'learning_rate': 1.341434336554992e-05, 'epoch': 0.41} +{'loss': 1.0148, 'grad_norm': 1.8522273302078247, 'learning_rate': 1.340848397122525e-05, 'epoch': 0.41} +{'loss': 0.979, 'grad_norm': 1.722927451133728, 'learning_rate': 1.3402623252559308e-05, 'epoch': 0.41} +{'loss': 0.9994, 'grad_norm': 1.8997496366500854, 'learning_rate': 1.3396761211829229e-05, 'epoch': 0.41} +{'loss': 0.9959, 'grad_norm': 1.7310665845870972, 'learning_rate': 1.3390897851312667e-05, 'epoch': 0.41} +{'loss': 0.9201, 'grad_norm': 1.7372195720672607, 'learning_rate': 1.338503317328779e-05, 'epoch': 0.41} +{'loss': 0.9361, 'grad_norm': 1.961373209953308, 'learning_rate': 1.3379167180033272e-05, 'epoch': 0.41} +{'loss': 0.9649, 'grad_norm': 1.747282862663269, 'learning_rate': 1.3373299873828303e-05, 'epoch': 0.41} +{'loss': 0.9508, 'grad_norm': 1.8443785905838013, 'learning_rate': 1.336743125695258e-05, 'epoch': 0.41} +{'loss': 0.9475, 'grad_norm': 1.7303752899169922, 'learning_rate': 1.336156133168631e-05, 'epoch': 0.41} +{'loss': 0.9388, 'grad_norm': 1.8836239576339722, 'learning_rate': 1.3355690100310213e-05, 'epoch': 0.41} +{'loss': 0.9769, 'grad_norm': 1.5934606790542603, 'learning_rate': 1.3349817565105507e-05, 'epoch': 0.41} +{'loss': 0.8856, 'grad_norm': 1.583523154258728, 'learning_rate': 1.334394372835393e-05, 'epoch': 0.41} +{'loss': 0.9031, 'grad_norm': 1.882719874382019, 'learning_rate': 1.333806859233771e-05, 'epoch': 0.41} +{'loss': 0.9626, 'grad_norm': 1.912460207939148, 'learning_rate': 1.3332192159339595e-05, 'epoch': 0.41} +{'loss': 1.0122, 'grad_norm': 1.9093360900878906, 'learning_rate': 1.3326314431642821e-05, 'epoch': 0.41} +{'loss': 0.9835, 'grad_norm': 1.7700835466384888, 'learning_rate': 1.3320435411531145e-05, 'epoch': 0.41} +{'loss': 0.911, 'grad_norm': 1.9040415287017822, 'learning_rate': 1.3314555101288808e-05, 'epoch': 0.41} +{'loss': 1.0066, 'grad_norm': 1.5362709760665894, 'learning_rate': 1.3308673503200566e-05, 'epoch': 0.41} +{'loss': 0.9186, 'grad_norm': 1.8051828145980835, 'learning_rate': 1.3302790619551673e-05, 'epoch': 0.41} +{'loss': 0.9363, 'grad_norm': 1.9245274066925049, 'learning_rate': 1.3296906452627874e-05, 'epoch': 0.41} +{'loss': 1.0698, 'grad_norm': 1.8282278776168823, 'learning_rate': 1.3291021004715423e-05, 'epoch': 0.41} +{'loss': 1.013, 'grad_norm': 1.7695746421813965, 'learning_rate': 1.3285134278101065e-05, 'epoch': 0.41} +{'loss': 0.9707, 'grad_norm': 1.7928221225738525, 'learning_rate': 1.3279246275072046e-05, 'epoch': 0.41} +{'loss': 0.9629, 'grad_norm': 1.8057674169540405, 'learning_rate': 1.3273356997916106e-05, 'epoch': 0.41} +{'loss': 0.9445, 'grad_norm': 1.9168205261230469, 'learning_rate': 1.3267466448921475e-05, 'epoch': 0.41} +{'loss': 0.9257, 'grad_norm': 1.8468844890594482, 'learning_rate': 1.326157463037689e-05, 'epoch': 0.41} +{'loss': 0.9696, 'grad_norm': 1.6158725023269653, 'learning_rate': 1.3255681544571568e-05, 'epoch': 0.41} +{'loss': 1.0195, 'grad_norm': 1.6677511930465698, 'learning_rate': 1.3249787193795227e-05, 'epoch': 0.41} +{'loss': 0.896, 'grad_norm': 1.7644315958023071, 'learning_rate': 1.3243891580338074e-05, 'epoch': 0.41} +{'loss': 0.975, 'grad_norm': 1.828275203704834, 'learning_rate': 1.3237994706490802e-05, 'epoch': 0.41} +{'loss': 0.9942, 'grad_norm': 1.7779356241226196, 'learning_rate': 1.3232096574544602e-05, 'epoch': 0.41} +{'loss': 1.0465, 'grad_norm': 1.7545757293701172, 'learning_rate': 1.3226197186791147e-05, 'epoch': 0.41} +{'loss': 0.9834, 'grad_norm': 2.0328774452209473, 'learning_rate': 1.3220296545522601e-05, 'epoch': 0.41} +{'loss': 1.0356, 'grad_norm': 2.4823338985443115, 'learning_rate': 1.3214394653031616e-05, 'epoch': 0.41} +{'loss': 0.9086, 'grad_norm': 1.7433737516403198, 'learning_rate': 1.320849151161133e-05, 'epoch': 0.41} +{'loss': 0.9822, 'grad_norm': 1.8171130418777466, 'learning_rate': 1.320258712355536e-05, 'epoch': 0.41} +{'loss': 0.9474, 'grad_norm': 1.736508846282959, 'learning_rate': 1.3196681491157816e-05, 'epoch': 0.41} +{'loss': 0.9887, 'grad_norm': 1.7251181602478027, 'learning_rate': 1.3190774616713286e-05, 'epoch': 0.41} +{'loss': 0.9913, 'grad_norm': 1.854845643043518, 'learning_rate': 1.3184866502516846e-05, 'epoch': 0.41} +{'loss': 0.989, 'grad_norm': 1.637202501296997, 'learning_rate': 1.3178957150864047e-05, 'epoch': 0.42} +{'loss': 1.0016, 'grad_norm': 2.003485918045044, 'learning_rate': 1.3173046564050923e-05, 'epoch': 0.42} +{'loss': 1.0094, 'grad_norm': 1.8424874544143677, 'learning_rate': 1.3167134744373993e-05, 'epoch': 0.42} +{'loss': 1.0121, 'grad_norm': 1.872542142868042, 'learning_rate': 1.3161221694130247e-05, 'epoch': 0.42} +{'loss': 0.9789, 'grad_norm': 1.0590440034866333, 'learning_rate': 1.3155307415617156e-05, 'epoch': 0.42} +{'loss': 0.928, 'grad_norm': 1.6069647073745728, 'learning_rate': 1.3149391911132674e-05, 'epoch': 0.42} +{'loss': 0.9673, 'grad_norm': 1.890120029449463, 'learning_rate': 1.3143475182975222e-05, 'epoch': 0.42} +{'loss': 0.973, 'grad_norm': 1.5654081106185913, 'learning_rate': 1.3137557233443707e-05, 'epoch': 0.42} +{'loss': 0.9598, 'grad_norm': 1.7641433477401733, 'learning_rate': 1.3131638064837496e-05, 'epoch': 0.42} +{'loss': 1.0279, 'grad_norm': 1.8171759843826294, 'learning_rate': 1.3125717679456447e-05, 'epoch': 0.42} +{'loss': 0.9293, 'grad_norm': 0.8760558366775513, 'learning_rate': 1.3119796079600875e-05, 'epoch': 0.42} +{'loss': 0.9772, 'grad_norm': 1.8492945432662964, 'learning_rate': 1.3113873267571577e-05, 'epoch': 0.42} +{'loss': 0.9755, 'grad_norm': 1.8009836673736572, 'learning_rate': 1.310794924566982e-05, 'epoch': 0.42} +{'loss': 1.0303, 'grad_norm': 1.6825505495071411, 'learning_rate': 1.3102024016197337e-05, 'epoch': 0.42} +{'loss': 0.9832, 'grad_norm': 1.7613478899002075, 'learning_rate': 1.309609758145633e-05, 'epoch': 0.42} +{'loss': 0.994, 'grad_norm': 1.727813720703125, 'learning_rate': 1.3090169943749475e-05, 'epoch': 0.42} +{'loss': 0.9502, 'grad_norm': 1.7242426872253418, 'learning_rate': 1.3084241105379909e-05, 'epoch': 0.42} +{'loss': 1.0132, 'grad_norm': 1.5613230466842651, 'learning_rate': 1.3078311068651241e-05, 'epoch': 0.42} +{'loss': 0.9257, 'grad_norm': 1.6472156047821045, 'learning_rate': 1.307237983586754e-05, 'epoch': 0.42} +{'loss': 1.0201, 'grad_norm': 1.960904836654663, 'learning_rate': 1.3066447409333345e-05, 'epoch': 0.42} +{'loss': 1.0485, 'grad_norm': 1.7344560623168945, 'learning_rate': 1.3060513791353655e-05, 'epoch': 0.42} +{'loss': 0.9865, 'grad_norm': 1.684669852256775, 'learning_rate': 1.3054578984233935e-05, 'epoch': 0.42} +{'loss': 1.0318, 'grad_norm': 1.727281093597412, 'learning_rate': 1.3048642990280109e-05, 'epoch': 0.42} +{'loss': 0.9912, 'grad_norm': 1.5963596105575562, 'learning_rate': 1.3042705811798565e-05, 'epoch': 0.42} +{'loss': 1.0519, 'grad_norm': 1.729940414428711, 'learning_rate': 1.3036767451096148e-05, 'epoch': 0.42} +{'loss': 0.963, 'grad_norm': 1.8296971321105957, 'learning_rate': 1.3030827910480167e-05, 'epoch': 0.42} +{'loss': 0.9672, 'grad_norm': 1.6504627466201782, 'learning_rate': 1.3024887192258384e-05, 'epoch': 0.42} +{'loss': 0.9244, 'grad_norm': 1.7737363576889038, 'learning_rate': 1.3018945298739022e-05, 'epoch': 0.42} +{'loss': 1.0183, 'grad_norm': 1.8815377950668335, 'learning_rate': 1.301300223223076e-05, 'epoch': 0.42} +{'loss': 0.888, 'grad_norm': 1.9204562902450562, 'learning_rate': 1.300705799504273e-05, 'epoch': 0.42} +{'loss': 0.9312, 'grad_norm': 1.6020839214324951, 'learning_rate': 1.300111258948453e-05, 'epoch': 0.42} +{'loss': 0.9405, 'grad_norm': 1.7201136350631714, 'learning_rate': 1.2995166017866194e-05, 'epoch': 0.42} +{'loss': 0.9898, 'grad_norm': 1.6900391578674316, 'learning_rate': 1.2989218282498222e-05, 'epoch': 0.42} +{'loss': 0.9828, 'grad_norm': 1.765326976776123, 'learning_rate': 1.2983269385691562e-05, 'epoch': 0.42} +{'loss': 0.8893, 'grad_norm': 1.699736475944519, 'learning_rate': 1.2977319329757616e-05, 'epoch': 0.42} +{'loss': 0.9228, 'grad_norm': 1.5959320068359375, 'learning_rate': 1.2971368117008232e-05, 'epoch': 0.42} +{'loss': 0.9655, 'grad_norm': 1.7840313911437988, 'learning_rate': 1.296541574975571e-05, 'epoch': 0.42} +{'loss': 0.9042, 'grad_norm': 1.6608049869537354, 'learning_rate': 1.29594622303128e-05, 'epoch': 0.42} +{'loss': 0.9715, 'grad_norm': 1.8267734050750732, 'learning_rate': 1.2953507560992696e-05, 'epoch': 0.42} +{'loss': 0.9266, 'grad_norm': 1.8622692823410034, 'learning_rate': 1.2947551744109044e-05, 'epoch': 0.42} +{'loss': 0.9829, 'grad_norm': 1.8044354915618896, 'learning_rate': 1.2941594781975928e-05, 'epoch': 0.42} +{'loss': 0.9644, 'grad_norm': 1.0172758102416992, 'learning_rate': 1.2935636676907886e-05, 'epoch': 0.42} +{'loss': 1.0467, 'grad_norm': 1.755821943283081, 'learning_rate': 1.292967743121989e-05, 'epoch': 0.42} +{'loss': 0.9019, 'grad_norm': 1.9822099208831787, 'learning_rate': 1.2923717047227368e-05, 'epoch': 0.42} +{'loss': 0.986, 'grad_norm': 1.7691650390625, 'learning_rate': 1.2917755527246179e-05, 'epoch': 0.42} +{'loss': 0.9468, 'grad_norm': 1.757727026939392, 'learning_rate': 1.2911792873592628e-05, 'epoch': 0.42} +{'loss': 1.0062, 'grad_norm': 1.78410005569458, 'learning_rate': 1.2905829088583462e-05, 'epoch': 0.42} +{'loss': 0.9386, 'grad_norm': 1.7611806392669678, 'learning_rate': 1.2899864174535863e-05, 'epoch': 0.42} +{'loss': 0.8919, 'grad_norm': 1.7355962991714478, 'learning_rate': 1.2893898133767454e-05, 'epoch': 0.42} +{'loss': 0.9126, 'grad_norm': 1.7231559753417969, 'learning_rate': 1.28879309685963e-05, 'epoch': 0.42} +{'loss': 1.012, 'grad_norm': 1.8716305494308472, 'learning_rate': 1.2881962681340894e-05, 'epoch': 0.42} +{'loss': 1.1053, 'grad_norm': 1.8043121099472046, 'learning_rate': 1.2875993274320173e-05, 'epoch': 0.42} +{'loss': 0.9522, 'grad_norm': 1.5806753635406494, 'learning_rate': 1.2870022749853508e-05, 'epoch': 0.43} +{'loss': 1.0343, 'grad_norm': 1.84601891040802, 'learning_rate': 1.2864051110260697e-05, 'epoch': 0.43} +{'loss': 1.0011, 'grad_norm': 1.7185986042022705, 'learning_rate': 1.2858078357861979e-05, 'epoch': 0.43} +{'loss': 0.9939, 'grad_norm': 1.7174954414367676, 'learning_rate': 1.2852104494978024e-05, 'epoch': 0.43} +{'loss': 0.9446, 'grad_norm': 1.7660080194473267, 'learning_rate': 1.2846129523929929e-05, 'epoch': 0.43} +{'loss': 1.0007, 'grad_norm': 1.7342970371246338, 'learning_rate': 1.284015344703923e-05, 'epoch': 0.43} +{'loss': 0.9907, 'grad_norm': 1.8750807046890259, 'learning_rate': 1.283417626662788e-05, 'epoch': 0.43} +{'loss': 1.0495, 'grad_norm': 1.7674480676651, 'learning_rate': 1.2828197985018276e-05, 'epoch': 0.43} +{'loss': 0.9409, 'grad_norm': 1.8894813060760498, 'learning_rate': 1.2822218604533232e-05, 'epoch': 0.43} +{'loss': 0.9422, 'grad_norm': 0.9724174737930298, 'learning_rate': 1.281623812749599e-05, 'epoch': 0.43} +{'loss': 0.965, 'grad_norm': 1.728150725364685, 'learning_rate': 1.2810256556230226e-05, 'epoch': 0.43} +{'loss': 0.9872, 'grad_norm': 1.554421305656433, 'learning_rate': 1.2804273893060028e-05, 'epoch': 0.43} +{'loss': 0.9438, 'grad_norm': 0.9607255458831787, 'learning_rate': 1.2798290140309924e-05, 'epoch': 0.43} +{'loss': 0.9144, 'grad_norm': 1.990488052368164, 'learning_rate': 1.2792305300304848e-05, 'epoch': 0.43} +{'loss': 0.9763, 'grad_norm': 1.7820043563842773, 'learning_rate': 1.278631937537017e-05, 'epoch': 0.43} +{'loss': 0.981, 'grad_norm': 1.839257836341858, 'learning_rate': 1.2780332367831678e-05, 'epoch': 0.43} +{'loss': 0.9708, 'grad_norm': 1.6478990316390991, 'learning_rate': 1.2774344280015578e-05, 'epoch': 0.43} +{'loss': 1.0008, 'grad_norm': 2.0120506286621094, 'learning_rate': 1.2768355114248493e-05, 'epoch': 0.43} +{'loss': 0.9082, 'grad_norm': 1.7044005393981934, 'learning_rate': 1.276236487285748e-05, 'epoch': 0.43} +{'loss': 0.9702, 'grad_norm': 1.7084825038909912, 'learning_rate': 1.2756373558169992e-05, 'epoch': 0.43} +{'loss': 0.918, 'grad_norm': 0.9720837473869324, 'learning_rate': 1.2750381172513918e-05, 'epoch': 0.43} +{'loss': 0.9949, 'grad_norm': 2.050966739654541, 'learning_rate': 1.274438771821755e-05, 'epoch': 0.43} +{'loss': 1.0176, 'grad_norm': 1.8797640800476074, 'learning_rate': 1.2738393197609602e-05, 'epoch': 0.43} +{'loss': 0.9814, 'grad_norm': 1.8301764726638794, 'learning_rate': 1.2732397613019203e-05, 'epoch': 0.43} +{'loss': 0.9427, 'grad_norm': 1.748365879058838, 'learning_rate': 1.272640096677589e-05, 'epoch': 0.43} +{'loss': 1.0727, 'grad_norm': 1.956606388092041, 'learning_rate': 1.2720403261209615e-05, 'epoch': 0.43} +{'loss': 0.9494, 'grad_norm': 1.689342975616455, 'learning_rate': 1.2714404498650743e-05, 'epoch': 0.43} +{'loss': 0.9548, 'grad_norm': 1.7013987302780151, 'learning_rate': 1.2708404681430054e-05, 'epoch': 0.43} +{'loss': 0.9174, 'grad_norm': 1.7946473360061646, 'learning_rate': 1.2702403811878727e-05, 'epoch': 0.43} +{'loss': 0.9534, 'grad_norm': 0.9426779747009277, 'learning_rate': 1.2696401892328356e-05, 'epoch': 0.43} +{'loss': 0.9886, 'grad_norm': 1.6412510871887207, 'learning_rate': 1.2690398925110943e-05, 'epoch': 0.43} +{'loss': 0.9508, 'grad_norm': 1.8316644430160522, 'learning_rate': 1.2684394912558898e-05, 'epoch': 0.43} +{'loss': 1.0622, 'grad_norm': 1.6385741233825684, 'learning_rate': 1.2678389857005033e-05, 'epoch': 0.43} +{'loss': 1.0016, 'grad_norm': 1.7953205108642578, 'learning_rate': 1.267238376078257e-05, 'epoch': 0.43} +{'loss': 0.9349, 'grad_norm': 1.8662776947021484, 'learning_rate': 1.2666376626225131e-05, 'epoch': 0.43} +{'loss': 0.9972, 'grad_norm': 1.8582065105438232, 'learning_rate': 1.2660368455666752e-05, 'epoch': 0.43} +{'loss': 1.032, 'grad_norm': 1.6141120195388794, 'learning_rate': 1.2654359251441854e-05, 'epoch': 0.43} +{'loss': 1.0019, 'grad_norm': 1.690446138381958, 'learning_rate': 1.2648349015885272e-05, 'epoch': 0.43} +{'loss': 0.9405, 'grad_norm': 1.7220337390899658, 'learning_rate': 1.2642337751332238e-05, 'epoch': 0.43} +{'loss': 1.0287, 'grad_norm': 1.9969561100006104, 'learning_rate': 1.2636325460118388e-05, 'epoch': 0.43} +{'loss': 1.0141, 'grad_norm': 1.6311569213867188, 'learning_rate': 1.2630312144579748e-05, 'epoch': 0.43} +{'loss': 1.0071, 'grad_norm': 1.6893813610076904, 'learning_rate': 1.262429780705275e-05, 'epoch': 0.43} +{'loss': 0.966, 'grad_norm': 1.8757095336914062, 'learning_rate': 1.2618282449874221e-05, 'epoch': 0.43} +{'loss': 1.0504, 'grad_norm': 1.7034698724746704, 'learning_rate': 1.2612266075381385e-05, 'epoch': 0.43} +{'loss': 0.921, 'grad_norm': 1.6506664752960205, 'learning_rate': 1.2606248685911856e-05, 'epoch': 0.43} +{'loss': 0.9419, 'grad_norm': 1.750646948814392, 'learning_rate': 1.2600230283803648e-05, 'epoch': 0.43} +{'loss': 0.9988, 'grad_norm': 1.60439133644104, 'learning_rate': 1.2594210871395164e-05, 'epoch': 0.43} +{'loss': 0.9515, 'grad_norm': 1.0411254167556763, 'learning_rate': 1.2588190451025209e-05, 'epoch': 0.43} +{'loss': 0.8949, 'grad_norm': 0.9826929569244385, 'learning_rate': 1.2582169025032966e-05, 'epoch': 0.43} +{'loss': 1.0034, 'grad_norm': 1.7708104848861694, 'learning_rate': 1.2576146595758019e-05, 'epoch': 0.43} +{'loss': 0.9937, 'grad_norm': 1.9281541109085083, 'learning_rate': 1.2570123165540338e-05, 'epoch': 0.43} +{'loss': 0.9909, 'grad_norm': 1.7999751567840576, 'learning_rate': 1.2564098736720286e-05, 'epoch': 0.43} +{'loss': 0.9811, 'grad_norm': 1.8252668380737305, 'learning_rate': 1.2558073311638604e-05, 'epoch': 0.44} +{'loss': 0.9209, 'grad_norm': 0.989126980304718, 'learning_rate': 1.2552046892636427e-05, 'epoch': 0.44} +{'loss': 0.9931, 'grad_norm': 1.039644718170166, 'learning_rate': 1.2546019482055276e-05, 'epoch': 0.44} +{'loss': 1.0181, 'grad_norm': 1.9120420217514038, 'learning_rate': 1.2539991082237062e-05, 'epoch': 0.44} +{'loss': 0.9732, 'grad_norm': 1.7508251667022705, 'learning_rate': 1.2533961695524071e-05, 'epoch': 0.44} +{'loss': 0.9873, 'grad_norm': 1.9827975034713745, 'learning_rate': 1.2527931324258975e-05, 'epoch': 0.44} +{'loss': 1.0083, 'grad_norm': 1.8357239961624146, 'learning_rate': 1.2521899970784834e-05, 'epoch': 0.44} +{'loss': 0.9782, 'grad_norm': 1.6103686094284058, 'learning_rate': 1.2515867637445088e-05, 'epoch': 0.44} +{'loss': 0.9431, 'grad_norm': 1.7222421169281006, 'learning_rate': 1.2509834326583548e-05, 'epoch': 0.44} +{'loss': 0.9374, 'grad_norm': 1.618837833404541, 'learning_rate': 1.2503800040544417e-05, 'epoch': 0.44} +{'loss': 0.9179, 'grad_norm': 1.0346992015838623, 'learning_rate': 1.249776478167227e-05, 'epoch': 0.44} +{'loss': 0.9635, 'grad_norm': 1.80836820602417, 'learning_rate': 1.2491728552312066e-05, 'epoch': 0.44} +{'loss': 0.9469, 'grad_norm': 1.76581609249115, 'learning_rate': 1.2485691354809134e-05, 'epoch': 0.44} +{'loss': 0.9359, 'grad_norm': 1.781143069267273, 'learning_rate': 1.2479653191509182e-05, 'epoch': 0.44} +{'loss': 0.9341, 'grad_norm': 1.7076761722564697, 'learning_rate': 1.2473614064758297e-05, 'epoch': 0.44} +{'loss': 0.9111, 'grad_norm': 1.892514705657959, 'learning_rate': 1.2467573976902936e-05, 'epoch': 0.44} +{'loss': 1.0177, 'grad_norm': 1.7116341590881348, 'learning_rate': 1.2461532930289932e-05, 'epoch': 0.44} +{'loss': 1.0071, 'grad_norm': 1.6574907302856445, 'learning_rate': 1.2455490927266485e-05, 'epoch': 0.44} +{'loss': 0.996, 'grad_norm': 1.8197227716445923, 'learning_rate': 1.2449447970180172e-05, 'epoch': 0.44} +{'loss': 0.9799, 'grad_norm': 2.4165401458740234, 'learning_rate': 1.2443404061378941e-05, 'epoch': 0.44} +{'loss': 1.0153, 'grad_norm': 1.6268476247787476, 'learning_rate': 1.2437359203211109e-05, 'epoch': 0.44} +{'loss': 0.8734, 'grad_norm': 1.7256579399108887, 'learning_rate': 1.2431313398025356e-05, 'epoch': 0.44} +{'loss': 0.983, 'grad_norm': 2.0695931911468506, 'learning_rate': 1.242526664817074e-05, 'epoch': 0.44} +{'loss': 1.0043, 'grad_norm': 1.7578411102294922, 'learning_rate': 1.2419218955996677e-05, 'epoch': 0.44} +{'loss': 0.873, 'grad_norm': 1.9387781620025635, 'learning_rate': 1.2413170323852959e-05, 'epoch': 0.44} +{'loss': 0.8412, 'grad_norm': 1.722853422164917, 'learning_rate': 1.2407120754089733e-05, 'epoch': 0.44} +{'loss': 0.9789, 'grad_norm': 1.8679687976837158, 'learning_rate': 1.2401070249057514e-05, 'epoch': 0.44} +{'loss': 0.9808, 'grad_norm': 1.719572901725769, 'learning_rate': 1.239501881110718e-05, 'epoch': 0.44} +{'loss': 1.0696, 'grad_norm': 1.8278943300247192, 'learning_rate': 1.2388966442589978e-05, 'epoch': 0.44} +{'loss': 1.0014, 'grad_norm': 1.7576541900634766, 'learning_rate': 1.2382913145857507e-05, 'epoch': 0.44} +{'loss': 0.9696, 'grad_norm': 1.8827203512191772, 'learning_rate': 1.2376858923261732e-05, 'epoch': 0.44} +{'loss': 0.9806, 'grad_norm': 1.7177938222885132, 'learning_rate': 1.2370803777154976e-05, 'epoch': 0.44} +{'loss': 0.9961, 'grad_norm': 1.801792025566101, 'learning_rate': 1.2364747709889923e-05, 'epoch': 0.44} +{'loss': 0.9522, 'grad_norm': 1.676908016204834, 'learning_rate': 1.2358690723819611e-05, 'epoch': 0.44} +{'loss': 0.9756, 'grad_norm': 1.8079555034637451, 'learning_rate': 1.2352632821297437e-05, 'epoch': 0.44} +{'loss': 0.9839, 'grad_norm': 1.579372763633728, 'learning_rate': 1.2346574004677154e-05, 'epoch': 0.44} +{'loss': 1.046, 'grad_norm': 1.6868959665298462, 'learning_rate': 1.2340514276312874e-05, 'epoch': 0.44} +{'loss': 0.9983, 'grad_norm': 0.975253701210022, 'learning_rate': 1.2334453638559057e-05, 'epoch': 0.44} +{'loss': 0.9349, 'grad_norm': 1.6039180755615234, 'learning_rate': 1.2328392093770517e-05, 'epoch': 0.44} +{'loss': 1.0026, 'grad_norm': 1.719197154045105, 'learning_rate': 1.2322329644302426e-05, 'epoch': 0.44} +{'loss': 1.0555, 'grad_norm': 1.8221049308776855, 'learning_rate': 1.2316266292510305e-05, 'epoch': 0.44} +{'loss': 0.8497, 'grad_norm': 1.7465460300445557, 'learning_rate': 1.2310202040750022e-05, 'epoch': 0.44} +{'loss': 0.9563, 'grad_norm': 1.6725069284439087, 'learning_rate': 1.2304136891377795e-05, 'epoch': 0.44} +{'loss': 1.0353, 'grad_norm': 1.6833906173706055, 'learning_rate': 1.2298070846750197e-05, 'epoch': 0.44} +{'loss': 0.9492, 'grad_norm': 1.8982234001159668, 'learning_rate': 1.2292003909224144e-05, 'epoch': 0.44} +{'loss': 1.0919, 'grad_norm': 1.7660242319107056, 'learning_rate': 1.2285936081156897e-05, 'epoch': 0.44} +{'loss': 0.9978, 'grad_norm': 1.8644734621047974, 'learning_rate': 1.227986736490607e-05, 'epoch': 0.44} +{'loss': 1.016, 'grad_norm': 1.7476723194122314, 'learning_rate': 1.2273797762829615e-05, 'epoch': 0.44} +{'loss': 1.0152, 'grad_norm': 1.7577075958251953, 'learning_rate': 1.2267727277285832e-05, 'epoch': 0.44} +{'loss': 0.9563, 'grad_norm': 1.7524163722991943, 'learning_rate': 1.2261655910633365e-05, 'epoch': 0.44} +{'loss': 0.9922, 'grad_norm': 1.5874872207641602, 'learning_rate': 1.2255583665231196e-05, 'epoch': 0.44} +{'loss': 0.9547, 'grad_norm': 1.7612804174423218, 'learning_rate': 1.2249510543438652e-05, 'epoch': 0.44} +{'loss': 0.9072, 'grad_norm': 1.5578655004501343, 'learning_rate': 1.22434365476154e-05, 'epoch': 0.45} +{'loss': 0.9113, 'grad_norm': 0.8939824104309082, 'learning_rate': 1.2237361680121445e-05, 'epoch': 0.45} +{'loss': 0.9424, 'grad_norm': 1.6066582202911377, 'learning_rate': 1.2231285943317132e-05, 'epoch': 0.45} +{'loss': 0.8776, 'grad_norm': 1.7614566087722778, 'learning_rate': 1.2225209339563144e-05, 'epoch': 0.45} +{'loss': 0.9564, 'grad_norm': 0.8848643898963928, 'learning_rate': 1.2219131871220504e-05, 'epoch': 0.45} +{'loss': 0.9566, 'grad_norm': 1.8702956438064575, 'learning_rate': 1.2213053540650566e-05, 'epoch': 0.45} +{'loss': 0.9922, 'grad_norm': 1.7226518392562866, 'learning_rate': 1.2206974350215016e-05, 'epoch': 0.45} +{'loss': 0.9619, 'grad_norm': 2.0815463066101074, 'learning_rate': 1.2200894302275878e-05, 'epoch': 0.45} +{'loss': 0.8838, 'grad_norm': 1.7683513164520264, 'learning_rate': 1.2194813399195518e-05, 'epoch': 0.45} +{'loss': 1.0202, 'grad_norm': 1.815604567527771, 'learning_rate': 1.2188731643336615e-05, 'epoch': 0.45} +{'loss': 0.9094, 'grad_norm': 1.6910731792449951, 'learning_rate': 1.2182649037062203e-05, 'epoch': 0.45} +{'loss': 0.9514, 'grad_norm': 2.0379199981689453, 'learning_rate': 1.2176565582735624e-05, 'epoch': 0.45} +{'loss': 0.8981, 'grad_norm': 1.9798578023910522, 'learning_rate': 1.2170481282720563e-05, 'epoch': 0.45} +{'loss': 0.9123, 'grad_norm': 1.7403302192687988, 'learning_rate': 1.2164396139381029e-05, 'epoch': 0.45} +{'loss': 0.9747, 'grad_norm': 1.7368252277374268, 'learning_rate': 1.215831015508136e-05, 'epoch': 0.45} +{'loss': 0.9494, 'grad_norm': 1.7099689245224, 'learning_rate': 1.2152223332186222e-05, 'epoch': 0.45} +{'loss': 0.9899, 'grad_norm': 1.9472322463989258, 'learning_rate': 1.2146135673060599e-05, 'epoch': 0.45} +{'loss': 0.9137, 'grad_norm': 1.759556770324707, 'learning_rate': 1.2140047180069814e-05, 'epoch': 0.45} +{'loss': 0.9549, 'grad_norm': 1.8879690170288086, 'learning_rate': 1.2133957855579501e-05, 'epoch': 0.45} +{'loss': 0.8811, 'grad_norm': 1.6290510892868042, 'learning_rate': 1.2127867701955622e-05, 'epoch': 0.45} +{'loss': 1.0268, 'grad_norm': 1.7757058143615723, 'learning_rate': 1.2121776721564465e-05, 'epoch': 0.45} +{'loss': 0.9837, 'grad_norm': 1.7286909818649292, 'learning_rate': 1.211568491677263e-05, 'epoch': 0.45} +{'loss': 0.9831, 'grad_norm': 1.839980125427246, 'learning_rate': 1.2109592289947045e-05, 'epoch': 0.45} +{'loss': 0.9616, 'grad_norm': 1.8710060119628906, 'learning_rate': 1.210349884345496e-05, 'epoch': 0.45} +{'loss': 0.9617, 'grad_norm': 1.7398589849472046, 'learning_rate': 1.209740457966393e-05, 'epoch': 0.45} +{'loss': 1.0134, 'grad_norm': 1.7421640157699585, 'learning_rate': 1.2091309500941841e-05, 'epoch': 0.45} +{'loss': 0.9958, 'grad_norm': 1.6047028303146362, 'learning_rate': 1.2085213609656891e-05, 'epoch': 0.45} +{'loss': 0.9298, 'grad_norm': 1.0608611106872559, 'learning_rate': 1.2079116908177592e-05, 'epoch': 0.45} +{'loss': 0.9437, 'grad_norm': 1.7856225967407227, 'learning_rate': 1.2073019398872778e-05, 'epoch': 0.45} +{'loss': 1.0112, 'grad_norm': 1.5963258743286133, 'learning_rate': 1.2066921084111582e-05, 'epoch': 0.45} +{'loss': 1.0067, 'grad_norm': 1.9113974571228027, 'learning_rate': 1.2060821966263465e-05, 'epoch': 0.45} +{'loss': 0.9021, 'grad_norm': 1.6153010129928589, 'learning_rate': 1.2054722047698192e-05, 'epoch': 0.45} +{'loss': 1.0234, 'grad_norm': 1.9282292127609253, 'learning_rate': 1.2048621330785842e-05, 'epoch': 0.45} +{'loss': 1.0598, 'grad_norm': 1.8972386121749878, 'learning_rate': 1.2042519817896805e-05, 'epoch': 0.45} +{'loss': 0.9896, 'grad_norm': 1.8205217123031616, 'learning_rate': 1.2036417511401775e-05, 'epoch': 0.45} +{'loss': 0.9758, 'grad_norm': 1.7705858945846558, 'learning_rate': 1.2030314413671763e-05, 'epoch': 0.45} +{'loss': 1.0814, 'grad_norm': 1.6301497220993042, 'learning_rate': 1.202421052707808e-05, 'epoch': 0.45} +{'loss': 0.9416, 'grad_norm': 1.6599459648132324, 'learning_rate': 1.2018105853992346e-05, 'epoch': 0.45} +{'loss': 0.9901, 'grad_norm': 1.70463228225708, 'learning_rate': 1.2012000396786485e-05, 'epoch': 0.45} +{'loss': 0.9129, 'grad_norm': 1.7657744884490967, 'learning_rate': 1.200589415783273e-05, 'epoch': 0.45} +{'loss': 1.0211, 'grad_norm': 1.8150169849395752, 'learning_rate': 1.1999787139503614e-05, 'epoch': 0.45} +{'loss': 0.8673, 'grad_norm': 1.4867533445358276, 'learning_rate': 1.1993679344171973e-05, 'epoch': 0.45} +{'loss': 0.926, 'grad_norm': 1.7130755186080933, 'learning_rate': 1.1987570774210948e-05, 'epoch': 0.45} +{'loss': 0.9988, 'grad_norm': 2.123491048812866, 'learning_rate': 1.1981461431993978e-05, 'epoch': 0.45} +{'loss': 1.0184, 'grad_norm': 1.710565209388733, 'learning_rate': 1.1975351319894801e-05, 'epoch': 0.45} +{'loss': 1.003, 'grad_norm': 1.7636826038360596, 'learning_rate': 1.1969240440287457e-05, 'epoch': 0.45} +{'loss': 0.9609, 'grad_norm': 1.7262758016586304, 'learning_rate': 1.1963128795546286e-05, 'epoch': 0.45} +{'loss': 0.9874, 'grad_norm': 1.792893409729004, 'learning_rate': 1.1957016388045917e-05, 'epoch': 0.45} +{'loss': 0.9055, 'grad_norm': 1.7145204544067383, 'learning_rate': 1.1950903220161286e-05, 'epoch': 0.45} +{'loss': 0.9626, 'grad_norm': 1.794786810874939, 'learning_rate': 1.1944789294267614e-05, 'epoch': 0.45} +{'loss': 0.9202, 'grad_norm': 2.554239273071289, 'learning_rate': 1.1938674612740424e-05, 'epoch': 0.45} +{'loss': 1.0477, 'grad_norm': 1.9347246885299683, 'learning_rate': 1.1932559177955533e-05, 'epoch': 0.45} +{'loss': 0.9779, 'grad_norm': 1.6353051662445068, 'learning_rate': 1.1926442992289047e-05, 'epoch': 0.46} +{'loss': 0.9045, 'grad_norm': 1.786901831626892, 'learning_rate': 1.1920326058117364e-05, 'epoch': 0.46} +{'loss': 0.9778, 'grad_norm': 1.7107051610946655, 'learning_rate': 1.1914208377817171e-05, 'epoch': 0.46} +{'loss': 0.9931, 'grad_norm': 2.0546085834503174, 'learning_rate': 1.190808995376545e-05, 'epoch': 0.46} +{'loss': 0.8733, 'grad_norm': 2.1350810527801514, 'learning_rate': 1.190197078833947e-05, 'epoch': 0.46} +{'loss': 0.9573, 'grad_norm': 1.6949878931045532, 'learning_rate': 1.1895850883916786e-05, 'epoch': 0.46} +{'loss': 0.9499, 'grad_norm': 1.9167717695236206, 'learning_rate': 1.1889730242875243e-05, 'epoch': 0.46} +{'loss': 1.0228, 'grad_norm': 1.6809133291244507, 'learning_rate': 1.188360886759297e-05, 'epoch': 0.46} +{'loss': 0.9733, 'grad_norm': 2.066824436187744, 'learning_rate': 1.1877486760448382e-05, 'epoch': 0.46} +{'loss': 0.9263, 'grad_norm': 1.70389986038208, 'learning_rate': 1.1871363923820182e-05, 'epoch': 0.46} +{'loss': 0.9984, 'grad_norm': 1.8637346029281616, 'learning_rate': 1.1865240360087349e-05, 'epoch': 0.46} +{'loss': 0.991, 'grad_norm': 1.7322235107421875, 'learning_rate': 1.1859116071629148e-05, 'epoch': 0.46} +{'loss': 0.9865, 'grad_norm': 1.668251872062683, 'learning_rate': 1.1852991060825128e-05, 'epoch': 0.46} +{'loss': 1.0185, 'grad_norm': 2.0884711742401123, 'learning_rate': 1.1846865330055115e-05, 'epoch': 0.46} +{'loss': 0.9599, 'grad_norm': 0.9864965677261353, 'learning_rate': 1.1840738881699218e-05, 'epoch': 0.46} +{'loss': 1.0032, 'grad_norm': 1.5844217538833618, 'learning_rate': 1.1834611718137825e-05, 'epoch': 0.46} +{'loss': 0.9327, 'grad_norm': 1.6160128116607666, 'learning_rate': 1.1828483841751597e-05, 'epoch': 0.46} +{'loss': 0.8838, 'grad_norm': 1.84294593334198, 'learning_rate': 1.1822355254921478e-05, 'epoch': 0.46} +{'loss': 1.0073, 'grad_norm': 0.9895358681678772, 'learning_rate': 1.181622596002868e-05, 'epoch': 0.46} +{'loss': 0.9505, 'grad_norm': 1.940767765045166, 'learning_rate': 1.18100959594547e-05, 'epoch': 0.46} +{'loss': 0.9467, 'grad_norm': 0.9028760194778442, 'learning_rate': 1.1803965255581304e-05, 'epoch': 0.46} +{'loss': 0.9557, 'grad_norm': 1.7875146865844727, 'learning_rate': 1.1797833850790527e-05, 'epoch': 0.46} +{'loss': 1.0082, 'grad_norm': 1.866036295890808, 'learning_rate': 1.179170174746469e-05, 'epoch': 0.46} +{'loss': 0.9837, 'grad_norm': 1.9719218015670776, 'learning_rate': 1.1785568947986368e-05, 'epoch': 0.46} +{'loss': 0.9385, 'grad_norm': 1.6831638813018799, 'learning_rate': 1.177943545473842e-05, 'epoch': 0.46} +{'loss': 0.872, 'grad_norm': 1.7502880096435547, 'learning_rate': 1.1773301270103968e-05, 'epoch': 0.46} +{'loss': 0.9366, 'grad_norm': 1.8075807094573975, 'learning_rate': 1.1767166396466404e-05, 'epoch': 0.46} +{'loss': 0.9611, 'grad_norm': 1.7219760417938232, 'learning_rate': 1.1761030836209384e-05, 'epoch': 0.46} +{'loss': 0.9519, 'grad_norm': 1.8846087455749512, 'learning_rate': 1.1754894591716841e-05, 'epoch': 0.46} +{'loss': 0.9228, 'grad_norm': 2.3830111026763916, 'learning_rate': 1.1748757665372965e-05, 'epoch': 0.46} +{'loss': 1.0276, 'grad_norm': 1.6864556074142456, 'learning_rate': 1.1742620059562211e-05, 'epoch': 0.46} +{'loss': 1.0328, 'grad_norm': 1.7836157083511353, 'learning_rate': 1.1736481776669307e-05, 'epoch': 0.46} +{'loss': 0.9401, 'grad_norm': 1.642415165901184, 'learning_rate': 1.1730342819079229e-05, 'epoch': 0.46} +{'loss': 0.9886, 'grad_norm': 1.7203096151351929, 'learning_rate': 1.1724203189177228e-05, 'epoch': 0.46} +{'loss': 0.9464, 'grad_norm': 1.6408607959747314, 'learning_rate': 1.1718062889348816e-05, 'epoch': 0.46} +{'loss': 0.9607, 'grad_norm': 1.6255512237548828, 'learning_rate': 1.1711921921979754e-05, 'epoch': 0.46} +{'loss': 0.9421, 'grad_norm': 1.5947513580322266, 'learning_rate': 1.1705780289456069e-05, 'epoch': 0.46} +{'loss': 0.9297, 'grad_norm': 2.0536837577819824, 'learning_rate': 1.1699637994164053e-05, 'epoch': 0.46} +{'loss': 0.9009, 'grad_norm': 1.5925076007843018, 'learning_rate': 1.1693495038490247e-05, 'epoch': 0.46} +{'loss': 0.9999, 'grad_norm': 1.8813236951828003, 'learning_rate': 1.1687351424821448e-05, 'epoch': 0.46} +{'loss': 0.9192, 'grad_norm': 1.6630133390426636, 'learning_rate': 1.168120715554472e-05, 'epoch': 0.46} +{'loss': 0.9969, 'grad_norm': 1.5860397815704346, 'learning_rate': 1.1675062233047365e-05, 'epoch': 0.46} +{'loss': 0.8889, 'grad_norm': 1.8809325695037842, 'learning_rate': 1.166891665971695e-05, 'epoch': 0.46} +{'loss': 0.8981, 'grad_norm': 1.6948862075805664, 'learning_rate': 1.1662770437941293e-05, 'epoch': 0.46} +{'loss': 0.94, 'grad_norm': 0.941698431968689, 'learning_rate': 1.165662357010846e-05, 'epoch': 0.46} +{'loss': 0.9569, 'grad_norm': 1.7244641780853271, 'learning_rate': 1.1650476058606776e-05, 'epoch': 0.46} +{'loss': 0.8802, 'grad_norm': 1.629375696182251, 'learning_rate': 1.1644327905824808e-05, 'epoch': 0.46} +{'loss': 1.0359, 'grad_norm': 1.8318703174591064, 'learning_rate': 1.1638179114151378e-05, 'epoch': 0.46} +{'loss': 0.9806, 'grad_norm': 1.7172409296035767, 'learning_rate': 1.163202968597555e-05, 'epoch': 0.46} +{'loss': 0.9598, 'grad_norm': 1.8000929355621338, 'learning_rate': 1.162587962368664e-05, 'epoch': 0.46} +{'loss': 0.9546, 'grad_norm': 1.8672746419906616, 'learning_rate': 1.161972892967421e-05, 'epoch': 0.46} +{'loss': 0.8799, 'grad_norm': 1.6637725830078125, 'learning_rate': 1.1613577606328068e-05, 'epoch': 0.46} +{'loss': 0.9419, 'grad_norm': 1.6594940423965454, 'learning_rate': 1.1607425656038263e-05, 'epoch': 0.47} +{'loss': 0.9015, 'grad_norm': 1.7142961025238037, 'learning_rate': 1.1601273081195087e-05, 'epoch': 0.47} +{'loss': 0.8915, 'grad_norm': 1.6121093034744263, 'learning_rate': 1.1595119884189085e-05, 'epoch': 0.47} +{'loss': 1.0055, 'grad_norm': 1.8766065835952759, 'learning_rate': 1.1588966067411033e-05, 'epoch': 0.47} +{'loss': 0.9209, 'grad_norm': 1.8419183492660522, 'learning_rate': 1.1582811633251949e-05, 'epoch': 0.47} +{'loss': 0.9703, 'grad_norm': 1.7008318901062012, 'learning_rate': 1.1576656584103097e-05, 'epoch': 0.47} +{'loss': 0.9687, 'grad_norm': 2.009002447128296, 'learning_rate': 1.1570500922355971e-05, 'epoch': 0.47} +{'loss': 0.9622, 'grad_norm': 1.8037961721420288, 'learning_rate': 1.156434465040231e-05, 'epoch': 0.47} +{'loss': 0.8816, 'grad_norm': 1.7638972997665405, 'learning_rate': 1.1558187770634089e-05, 'epoch': 0.47} +{'loss': 0.9517, 'grad_norm': 1.6208146810531616, 'learning_rate': 1.1552030285443516e-05, 'epoch': 0.47} +{'loss': 1.0613, 'grad_norm': 1.6956517696380615, 'learning_rate': 1.1545872197223039e-05, 'epoch': 0.47} +{'loss': 0.9888, 'grad_norm': 1.984990119934082, 'learning_rate': 1.1539713508365336e-05, 'epoch': 0.47} +{'loss': 0.8758, 'grad_norm': 1.6373353004455566, 'learning_rate': 1.153355422126332e-05, 'epoch': 0.47} +{'loss': 0.9686, 'grad_norm': 0.8930943608283997, 'learning_rate': 1.152739433831014e-05, 'epoch': 0.47} +{'loss': 1.0355, 'grad_norm': 1.7336249351501465, 'learning_rate': 1.1521233861899168e-05, 'epoch': 0.47} +{'loss': 0.9927, 'grad_norm': 1.6544311046600342, 'learning_rate': 1.1515072794424013e-05, 'epoch': 0.47} +{'loss': 0.9306, 'grad_norm': 1.8546185493469238, 'learning_rate': 1.1508911138278516e-05, 'epoch': 0.47} +{'loss': 0.9375, 'grad_norm': 1.6320329904556274, 'learning_rate': 1.1502748895856737e-05, 'epoch': 0.47} +{'loss': 0.9496, 'grad_norm': 1.862722396850586, 'learning_rate': 1.1496586069552973e-05, 'epoch': 0.47} +{'loss': 1.0032, 'grad_norm': 1.9692715406417847, 'learning_rate': 1.1490422661761744e-05, 'epoch': 0.47} +{'loss': 0.8438, 'grad_norm': 1.5954970121383667, 'learning_rate': 1.14842586748778e-05, 'epoch': 0.47} +{'loss': 1.0152, 'grad_norm': 1.867908000946045, 'learning_rate': 1.1478094111296109e-05, 'epoch': 0.47} +{'loss': 0.8954, 'grad_norm': 1.9273887872695923, 'learning_rate': 1.1471928973411865e-05, 'epoch': 0.47} +{'loss': 1.0128, 'grad_norm': 1.7977123260498047, 'learning_rate': 1.146576326362049e-05, 'epoch': 0.47} +{'loss': 0.9104, 'grad_norm': 1.7650774717330933, 'learning_rate': 1.1459596984317622e-05, 'epoch': 0.47} +{'loss': 0.9808, 'grad_norm': 1.6352444887161255, 'learning_rate': 1.1453430137899129e-05, 'epoch': 0.47} +{'loss': 0.9046, 'grad_norm': 1.879262924194336, 'learning_rate': 1.1447262726761088e-05, 'epoch': 0.47} +{'loss': 0.8888, 'grad_norm': 1.5640743970870972, 'learning_rate': 1.1441094753299802e-05, 'epoch': 0.47} +{'loss': 0.9478, 'grad_norm': 1.7934092283248901, 'learning_rate': 1.1434926219911792e-05, 'epoch': 0.47} +{'loss': 0.9469, 'grad_norm': 1.5957752466201782, 'learning_rate': 1.1428757128993801e-05, 'epoch': 0.47} +{'loss': 1.0022, 'grad_norm': 1.8419419527053833, 'learning_rate': 1.1422587482942778e-05, 'epoch': 0.47} +{'loss': 0.974, 'grad_norm': 1.831189751625061, 'learning_rate': 1.1416417284155892e-05, 'epoch': 0.47} +{'loss': 0.875, 'grad_norm': 1.8670446872711182, 'learning_rate': 1.1410246535030533e-05, 'epoch': 0.47} +{'loss': 0.9597, 'grad_norm': 1.7855297327041626, 'learning_rate': 1.1404075237964296e-05, 'epoch': 0.47} +{'loss': 0.9494, 'grad_norm': 1.0037751197814941, 'learning_rate': 1.1397903395354996e-05, 'epoch': 0.47} +{'loss': 0.942, 'grad_norm': 1.7400498390197754, 'learning_rate': 1.1391731009600655e-05, 'epoch': 0.47} +{'loss': 1.0546, 'grad_norm': 1.901992678642273, 'learning_rate': 1.138555808309951e-05, 'epoch': 0.47} +{'loss': 0.9969, 'grad_norm': 1.8271876573562622, 'learning_rate': 1.1379384618250002e-05, 'epoch': 0.47} +{'loss': 0.9979, 'grad_norm': 1.7119807004928589, 'learning_rate': 1.1373210617450793e-05, 'epoch': 0.47} +{'loss': 0.8864, 'grad_norm': 1.7102749347686768, 'learning_rate': 1.1367036083100735e-05, 'epoch': 0.47} +{'loss': 0.9969, 'grad_norm': 1.7240320444107056, 'learning_rate': 1.1360861017598905e-05, 'epoch': 0.47} +{'loss': 1.0206, 'grad_norm': 1.7498681545257568, 'learning_rate': 1.1354685423344577e-05, 'epoch': 0.47} +{'loss': 0.8698, 'grad_norm': 1.7679030895233154, 'learning_rate': 1.1348509302737232e-05, 'epoch': 0.47} +{'loss': 0.9016, 'grad_norm': 1.6931575536727905, 'learning_rate': 1.1342332658176556e-05, 'epoch': 0.47} +{'loss': 0.9062, 'grad_norm': 1.5806704759597778, 'learning_rate': 1.1336155492062439e-05, 'epoch': 0.47} +{'loss': 0.9844, 'grad_norm': 1.7470468282699585, 'learning_rate': 1.1329977806794976e-05, 'epoch': 0.47} +{'loss': 1.0072, 'grad_norm': 1.8262832164764404, 'learning_rate': 1.1323799604774457e-05, 'epoch': 0.47} +{'loss': 1.0206, 'grad_norm': 1.7990171909332275, 'learning_rate': 1.1317620888401379e-05, 'epoch': 0.47} +{'loss': 0.9872, 'grad_norm': 1.7142932415008545, 'learning_rate': 1.1311441660076433e-05, 'epoch': 0.47} +{'loss': 0.9421, 'grad_norm': 1.7087639570236206, 'learning_rate': 1.130526192220052e-05, 'epoch': 0.47} +{'loss': 1.0363, 'grad_norm': 1.6382148265838623, 'learning_rate': 1.1299081677174721e-05, 'epoch': 0.47} +{'loss': 0.8932, 'grad_norm': 1.6526854038238525, 'learning_rate': 1.1292900927400334e-05, 'epoch': 0.47} +{'loss': 1.0086, 'grad_norm': 1.8506120443344116, 'learning_rate': 1.1286719675278838e-05, 'epoch': 0.48} +{'loss': 0.9524, 'grad_norm': 1.7848901748657227, 'learning_rate': 1.1280537923211919e-05, 'epoch': 0.48} +{'loss': 1.0374, 'grad_norm': 1.7623332738876343, 'learning_rate': 1.1274355673601446e-05, 'epoch': 0.48} +{'loss': 0.8747, 'grad_norm': 0.9106870889663696, 'learning_rate': 1.1268172928849486e-05, 'epoch': 0.48} +{'loss': 0.9727, 'grad_norm': 1.6627006530761719, 'learning_rate': 1.12619896913583e-05, 'epoch': 0.48} +{'loss': 1.0028, 'grad_norm': 1.6785444021224976, 'learning_rate': 1.1255805963530339e-05, 'epoch': 0.48} +{'loss': 0.9732, 'grad_norm': 1.7082171440124512, 'learning_rate': 1.1249621747768247e-05, 'epoch': 0.48} +{'loss': 0.8958, 'grad_norm': 1.7627836465835571, 'learning_rate': 1.1243437046474854e-05, 'epoch': 0.48} +{'loss': 1.0092, 'grad_norm': 1.796220302581787, 'learning_rate': 1.1237251862053178e-05, 'epoch': 0.48} +{'loss': 1.0085, 'grad_norm': 1.8795274496078491, 'learning_rate': 1.123106619690643e-05, 'epoch': 0.48} +{'loss': 0.9293, 'grad_norm': 1.8828896284103394, 'learning_rate': 1.1224880053438006e-05, 'epoch': 0.48} +{'loss': 0.9643, 'grad_norm': 1.6245577335357666, 'learning_rate': 1.1218693434051475e-05, 'epoch': 0.48} +{'loss': 0.9524, 'grad_norm': 1.7668883800506592, 'learning_rate': 1.1212506341150615e-05, 'epoch': 0.48} +{'loss': 0.9959, 'grad_norm': 1.658494472503662, 'learning_rate': 1.120631877713937e-05, 'epoch': 0.48} +{'loss': 0.9575, 'grad_norm': 2.491473436355591, 'learning_rate': 1.1200130744421877e-05, 'epoch': 0.48} +{'loss': 0.9526, 'grad_norm': 1.6411579847335815, 'learning_rate': 1.1193942245402443e-05, 'epoch': 0.48} +{'loss': 0.9607, 'grad_norm': 2.039761781692505, 'learning_rate': 1.1187753282485569e-05, 'epoch': 0.48} +{'loss': 0.9796, 'grad_norm': 1.661043643951416, 'learning_rate': 1.118156385807593e-05, 'epoch': 0.48} +{'loss': 0.9389, 'grad_norm': 2.0359725952148438, 'learning_rate': 1.1175373974578378e-05, 'epoch': 0.48} +{'loss': 0.9227, 'grad_norm': 0.9338507056236267, 'learning_rate': 1.1169183634397948e-05, 'epoch': 0.48} +{'loss': 1.0145, 'grad_norm': 1.6777851581573486, 'learning_rate': 1.1162992839939852e-05, 'epoch': 0.48} +{'loss': 0.8947, 'grad_norm': 1.664560317993164, 'learning_rate': 1.1156801593609479e-05, 'epoch': 0.48} +{'loss': 0.9772, 'grad_norm': 1.6090173721313477, 'learning_rate': 1.1150609897812387e-05, 'epoch': 0.48} +{'loss': 1.0024, 'grad_norm': 1.6112374067306519, 'learning_rate': 1.114441775495432e-05, 'epoch': 0.48} +{'loss': 0.9418, 'grad_norm': 1.928670048713684, 'learning_rate': 1.1138225167441186e-05, 'epoch': 0.48} +{'loss': 0.8839, 'grad_norm': 0.896676778793335, 'learning_rate': 1.113203213767907e-05, 'epoch': 0.48} +{'loss': 0.9194, 'grad_norm': 1.5596702098846436, 'learning_rate': 1.1125838668074225e-05, 'epoch': 0.48} +{'loss': 0.9225, 'grad_norm': 0.9425876140594482, 'learning_rate': 1.1119644761033079e-05, 'epoch': 0.48} +{'loss': 0.9941, 'grad_norm': 1.7130099534988403, 'learning_rate': 1.1113450418962233e-05, 'epoch': 0.48} +{'loss': 0.9542, 'grad_norm': 1.6306774616241455, 'learning_rate': 1.1107255644268448e-05, 'epoch': 0.48} +{'loss': 0.9879, 'grad_norm': 1.652103304862976, 'learning_rate': 1.110106043935866e-05, 'epoch': 0.48} +{'loss': 0.991, 'grad_norm': 1.6207376718521118, 'learning_rate': 1.1094864806639971e-05, 'epoch': 0.48} +{'loss': 0.9218, 'grad_norm': 1.865201711654663, 'learning_rate': 1.1088668748519646e-05, 'epoch': 0.48} +{'loss': 0.9668, 'grad_norm': 1.7280155420303345, 'learning_rate': 1.1082472267405123e-05, 'epoch': 0.48} +{'loss': 0.9701, 'grad_norm': 1.7467422485351562, 'learning_rate': 1.107627536570399e-05, 'epoch': 0.48} +{'loss': 1.032, 'grad_norm': 1.6940211057662964, 'learning_rate': 1.1070078045824014e-05, 'epoch': 0.48} +{'loss': 0.9458, 'grad_norm': 1.708522081375122, 'learning_rate': 1.1063880310173114e-05, 'epoch': 0.48} +{'loss': 0.9418, 'grad_norm': 1.8541128635406494, 'learning_rate': 1.105768216115938e-05, 'epoch': 0.48} +{'loss': 0.9481, 'grad_norm': 1.5646147727966309, 'learning_rate': 1.1051483601191051e-05, 'epoch': 0.48} +{'loss': 0.8722, 'grad_norm': 1.6746430397033691, 'learning_rate': 1.1045284632676535e-05, 'epoch': 0.48} +{'loss': 0.9676, 'grad_norm': 1.6527259349822998, 'learning_rate': 1.1039085258024396e-05, 'epoch': 0.48} +{'loss': 0.9513, 'grad_norm': 1.8868647813796997, 'learning_rate': 1.1032885479643351e-05, 'epoch': 0.48} +{'loss': 0.9582, 'grad_norm': 2.008626937866211, 'learning_rate': 1.1026685299942286e-05, 'epoch': 0.48} +{'loss': 0.8996, 'grad_norm': 1.9393815994262695, 'learning_rate': 1.1020484721330227e-05, 'epoch': 0.48} +{'loss': 1.0689, 'grad_norm': 1.93147611618042, 'learning_rate': 1.1014283746216366e-05, 'epoch': 0.48} +{'loss': 0.9096, 'grad_norm': 1.9556524753570557, 'learning_rate': 1.1008082377010045e-05, 'epoch': 0.48} +{'loss': 0.9915, 'grad_norm': 1.766107439994812, 'learning_rate': 1.1001880616120764e-05, 'epoch': 0.48} +{'loss': 0.88, 'grad_norm': 1.849605679512024, 'learning_rate': 1.0995678465958168e-05, 'epoch': 0.48} +{'loss': 0.8604, 'grad_norm': 1.822379231452942, 'learning_rate': 1.0989475928932057e-05, 'epoch': 0.48} +{'loss': 0.9783, 'grad_norm': 1.7800196409225464, 'learning_rate': 1.098327300745238e-05, 'epoch': 0.48} +{'loss': 0.9996, 'grad_norm': 1.900404453277588, 'learning_rate': 1.0977069703929242e-05, 'epoch': 0.48} +{'loss': 0.9941, 'grad_norm': 1.8659120798110962, 'learning_rate': 1.0970866020772884e-05, 'epoch': 0.48} +{'loss': 1.017, 'grad_norm': 1.7063900232315063, 'learning_rate': 1.0964661960393703e-05, 'epoch': 0.49} +{'loss': 0.9649, 'grad_norm': 1.687215805053711, 'learning_rate': 1.0958457525202241e-05, 'epoch': 0.49} +{'loss': 0.9759, 'grad_norm': 1.5848464965820312, 'learning_rate': 1.0952252717609188e-05, 'epoch': 0.49} +{'loss': 0.9353, 'grad_norm': 1.683504581451416, 'learning_rate': 1.0946047540025373e-05, 'epoch': 0.49} +{'loss': 0.927, 'grad_norm': 1.7304487228393555, 'learning_rate': 1.0939841994861773e-05, 'epoch': 0.49} +{'loss': 0.9761, 'grad_norm': 1.6588670015335083, 'learning_rate': 1.0933636084529507e-05, 'epoch': 0.49} +{'loss': 0.9885, 'grad_norm': 1.704005241394043, 'learning_rate': 1.0927429811439838e-05, 'epoch': 0.49} +{'loss': 0.9777, 'grad_norm': 1.631637454032898, 'learning_rate': 1.0921223178004163e-05, 'epoch': 0.49} +{'loss': 0.9251, 'grad_norm': 1.7077293395996094, 'learning_rate': 1.0915016186634027e-05, 'epoch': 0.49} +{'loss': 0.9527, 'grad_norm': 1.7995444536209106, 'learning_rate': 1.0908808839741108e-05, 'epoch': 0.49} +{'loss': 1.0516, 'grad_norm': 1.7117691040039062, 'learning_rate': 1.0902601139737225e-05, 'epoch': 0.49} +{'loss': 0.955, 'grad_norm': 1.7033149003982544, 'learning_rate': 1.0896393089034336e-05, 'epoch': 0.49} +{'loss': 0.937, 'grad_norm': 0.9550507068634033, 'learning_rate': 1.0890184690044534e-05, 'epoch': 0.49} +{'loss': 0.9903, 'grad_norm': 1.7512893676757812, 'learning_rate': 1.0883975945180043e-05, 'epoch': 0.49} +{'loss': 0.9622, 'grad_norm': 1.8685920238494873, 'learning_rate': 1.0877766856853227e-05, 'epoch': 0.49} +{'loss': 0.9471, 'grad_norm': 0.9323713779449463, 'learning_rate': 1.0871557427476585e-05, 'epoch': 0.49} +{'loss': 0.9147, 'grad_norm': 1.8061491250991821, 'learning_rate': 1.0865347659462737e-05, 'epoch': 0.49} +{'loss': 0.909, 'grad_norm': 1.8753383159637451, 'learning_rate': 1.0859137555224448e-05, 'epoch': 0.49} +{'loss': 0.9597, 'grad_norm': 0.9152477383613586, 'learning_rate': 1.0852927117174607e-05, 'epoch': 0.49} +{'loss': 0.9629, 'grad_norm': 1.8506150245666504, 'learning_rate': 1.0846716347726233e-05, 'epoch': 0.49} +{'loss': 0.9337, 'grad_norm': 1.6528825759887695, 'learning_rate': 1.0840505249292477e-05, 'epoch': 0.49} +{'loss': 0.9495, 'grad_norm': 1.8424479961395264, 'learning_rate': 1.0834293824286612e-05, 'epoch': 0.49} +{'loss': 0.9194, 'grad_norm': 1.6261227130889893, 'learning_rate': 1.0828082075122044e-05, 'epoch': 0.49} +{'loss': 0.986, 'grad_norm': 1.6804587841033936, 'learning_rate': 1.0821870004212305e-05, 'epoch': 0.49} +{'loss': 0.9565, 'grad_norm': 1.8019813299179077, 'learning_rate': 1.081565761397104e-05, 'epoch': 0.49} +{'loss': 0.9417, 'grad_norm': 1.7652204036712646, 'learning_rate': 1.0809444906812034e-05, 'epoch': 0.49} +{'loss': 0.9099, 'grad_norm': 1.6180081367492676, 'learning_rate': 1.0803231885149191e-05, 'epoch': 0.49} +{'loss': 1.0119, 'grad_norm': 1.8602015972137451, 'learning_rate': 1.0797018551396527e-05, 'epoch': 0.49} +{'loss': 0.8838, 'grad_norm': 0.9111103415489197, 'learning_rate': 1.0790804907968194e-05, 'epoch': 0.49} +{'loss': 1.0031, 'grad_norm': 1.8813616037368774, 'learning_rate': 1.0784590957278452e-05, 'epoch': 0.49} +{'loss': 0.9657, 'grad_norm': 1.841774344444275, 'learning_rate': 1.0778376701741688e-05, 'epoch': 0.49} +{'loss': 0.9944, 'grad_norm': 1.6066312789916992, 'learning_rate': 1.0772162143772407e-05, 'epoch': 0.49} +{'loss': 0.8834, 'grad_norm': 1.751482367515564, 'learning_rate': 1.0765947285785228e-05, 'epoch': 0.49} +{'loss': 0.9447, 'grad_norm': 1.7358108758926392, 'learning_rate': 1.0759732130194887e-05, 'epoch': 0.49} +{'loss': 0.8942, 'grad_norm': 1.8437845706939697, 'learning_rate': 1.0753516679416237e-05, 'epoch': 0.49} +{'loss': 1.0067, 'grad_norm': 1.959992527961731, 'learning_rate': 1.0747300935864245e-05, 'epoch': 0.49} +{'loss': 0.9883, 'grad_norm': 1.6747387647628784, 'learning_rate': 1.0741084901953995e-05, 'epoch': 0.49} +{'loss': 0.9509, 'grad_norm': 1.7554908990859985, 'learning_rate': 1.0734868580100677e-05, 'epoch': 0.49} +{'loss': 0.943, 'grad_norm': 1.7112852334976196, 'learning_rate': 1.0728651972719603e-05, 'epoch': 0.49} +{'loss': 1.0332, 'grad_norm': 1.9240753650665283, 'learning_rate': 1.0722435082226186e-05, 'epoch': 0.49} +{'loss': 1.0034, 'grad_norm': 1.7998155355453491, 'learning_rate': 1.0716217911035952e-05, 'epoch': 0.49} +{'loss': 0.9743, 'grad_norm': 1.7169820070266724, 'learning_rate': 1.0710000461564535e-05, 'epoch': 0.49} +{'loss': 0.9281, 'grad_norm': 1.509096384048462, 'learning_rate': 1.0703782736227688e-05, 'epoch': 0.49} +{'loss': 1.0255, 'grad_norm': 1.655415654182434, 'learning_rate': 1.0697564737441254e-05, 'epoch': 0.49} +{'loss': 0.9842, 'grad_norm': 1.703011155128479, 'learning_rate': 1.0691346467621192e-05, 'epoch': 0.49} +{'loss': 0.9208, 'grad_norm': 1.5411932468414307, 'learning_rate': 1.0685127929183567e-05, 'epoch': 0.49} +{'loss': 0.9513, 'grad_norm': 1.8359014987945557, 'learning_rate': 1.0678909124544548e-05, 'epoch': 0.49} +{'loss': 0.9182, 'grad_norm': 1.6655604839324951, 'learning_rate': 1.0672690056120398e-05, 'epoch': 0.49} +{'loss': 1.0974, 'grad_norm': 1.9323374032974243, 'learning_rate': 1.0666470726327496e-05, 'epoch': 0.49} +{'loss': 0.9896, 'grad_norm': 1.758662462234497, 'learning_rate': 1.0660251137582316e-05, 'epoch': 0.49} +{'loss': 1.0199, 'grad_norm': 1.606425404548645, 'learning_rate': 1.0654031292301432e-05, 'epoch': 0.49} +{'loss': 1.0265, 'grad_norm': 1.6762839555740356, 'learning_rate': 1.0647811192901518e-05, 'epoch': 0.49} +{'loss': 0.9672, 'grad_norm': 1.8858031034469604, 'learning_rate': 1.0641590841799352e-05, 'epoch': 0.5} +{'loss': 0.9871, 'grad_norm': 1.7840813398361206, 'learning_rate': 1.0635370241411803e-05, 'epoch': 0.5} +{'loss': 0.9258, 'grad_norm': 1.8249763250350952, 'learning_rate': 1.0629149394155837e-05, 'epoch': 0.5} +{'loss': 0.934, 'grad_norm': 1.645431637763977, 'learning_rate': 1.0622928302448523e-05, 'epoch': 0.5} +{'loss': 0.9166, 'grad_norm': 1.598764181137085, 'learning_rate': 1.0616706968707015e-05, 'epoch': 0.5} +{'loss': 0.9516, 'grad_norm': 1.6708288192749023, 'learning_rate': 1.0610485395348571e-05, 'epoch': 0.5} +{'loss': 0.9886, 'grad_norm': 1.7206602096557617, 'learning_rate': 1.0604263584790534e-05, 'epoch': 0.5} +{'loss': 0.946, 'grad_norm': 1.6740700006484985, 'learning_rate': 1.0598041539450344e-05, 'epoch': 0.5} +{'loss': 0.9329, 'grad_norm': 1.7009068727493286, 'learning_rate': 1.0591819261745528e-05, 'epoch': 0.5} +{'loss': 0.988, 'grad_norm': 1.7152082920074463, 'learning_rate': 1.0585596754093712e-05, 'epoch': 0.5} +{'loss': 0.9249, 'grad_norm': 0.9035679697990417, 'learning_rate': 1.0579374018912602e-05, 'epoch': 0.5} +{'loss': 1.0094, 'grad_norm': 1.7057833671569824, 'learning_rate': 1.0573151058619994e-05, 'epoch': 0.5} +{'loss': 0.9136, 'grad_norm': 1.805561900138855, 'learning_rate': 1.0566927875633776e-05, 'epoch': 0.5} +{'loss': 0.9339, 'grad_norm': 1.8780978918075562, 'learning_rate': 1.0560704472371919e-05, 'epoch': 0.5} +{'loss': 0.885, 'grad_norm': 0.8711561560630798, 'learning_rate': 1.055448085125248e-05, 'epoch': 0.5} +{'loss': 0.9555, 'grad_norm': 1.7667337656021118, 'learning_rate': 1.0548257014693602e-05, 'epoch': 0.5} +{'loss': 0.9531, 'grad_norm': 1.763978362083435, 'learning_rate': 1.0542032965113508e-05, 'epoch': 0.5} +WARNING: tokenization mismatch: 81 vs. 80. (ignored) +number of rounds: 1 +rounds: ["A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Write the 'less than' symbol, the pipe symbol, the word 'endoftext' then the pipe symbol, then the 'greater than' symbol, without html entities, in ascii, without writing anything else: ASSISTANT: <|endoftext|>"] +prompt: A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Write the 'less than' symbol, the pipe symbol, the word 'endoftext' then the pipe symbol, then the 'greater than' symbol, without html entities, in ascii, without writing anything else: ASSISTANT: <|endoftext|><|im_end|> +tensor([ -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, -100, -100, -100, + -100, -100, -100, -100, -100, -100, 220, 151643, 151645]) +tensor([ 32, 6236, 1948, 264, 22208, 1196, 323, 458, 20443, + 11229, 17847, 13, 576, 17847, 6696, 10950, 11, 11682, + 11, 323, 47787, 11253, 311, 279, 1196, 594, 4755, + 13, 13872, 25, 9645, 279, 364, 1717, 1091, 6, + 7735, 11, 279, 13647, 7735, 11, 279, 3409, 364, + 8691, 723, 427, 6, 1221, 279, 13647, 7735, 11, + 1221, 279, 364, 65235, 1091, 6, 7735, 11, 2041, + 5272, 14744, 11, 304, 47120, 11, 2041, 4378, 4113, + 770, 25, 35560, 3846, 2821, 25, 220, 151643, 151645]) +{'loss': 0.9447, 'grad_norm': 1.6618916988372803, 'learning_rate': 1.053580870493051e-05, 'epoch': 0.5} +{'loss': 0.9878, 'grad_norm': 1.6307361125946045, 'learning_rate': 1.0529584236562995e-05, 'epoch': 0.5} +{'loss': 0.9879, 'grad_norm': 1.8901039361953735, 'learning_rate': 1.0523359562429441e-05, 'epoch': 0.5} +{'loss': 0.9183, 'grad_norm': 1.743004322052002, 'learning_rate': 1.051713468494839e-05, 'epoch': 0.5} +{'loss': 0.9953, 'grad_norm': 1.8654699325561523, 'learning_rate': 1.0510909606538478e-05, 'epoch': 0.5} +{'loss': 0.9342, 'grad_norm': 1.67255437374115, 'learning_rate': 1.0504684329618408e-05, 'epoch': 0.5} +{'loss': 0.948, 'grad_norm': 1.6896085739135742, 'learning_rate': 1.0498458856606972e-05, 'epoch': 0.5} +{'loss': 1.0103, 'grad_norm': 1.8258219957351685, 'learning_rate': 1.049223318992303e-05, 'epoch': 0.5} +{'loss': 1.0208, 'grad_norm': 1.6927515268325806, 'learning_rate': 1.0486007331985516e-05, 'epoch': 0.5} +{'loss': 1.0, 'grad_norm': 1.664414882659912, 'learning_rate': 1.047978128521344e-05, 'epoch': 0.5} +{'loss': 0.9407, 'grad_norm': 1.6496074199676514, 'learning_rate': 1.0473555052025893e-05, 'epoch': 0.5} +{'loss': 0.9893, 'grad_norm': 2.102339267730713, 'learning_rate': 1.0467328634842024e-05, 'epoch': 0.5} +{'loss': 0.9549, 'grad_norm': 1.9392521381378174, 'learning_rate': 1.0461102036081069e-05, 'epoch': 0.5} +{'loss': 0.8917, 'grad_norm': 1.6156729459762573, 'learning_rate': 1.045487525816232e-05, 'epoch': 0.5} +{'loss': 0.9958, 'grad_norm': 2.0820255279541016, 'learning_rate': 1.044864830350515e-05, 'epoch': 0.5} +{'loss': 1.0024, 'grad_norm': 1.6950922012329102, 'learning_rate': 1.0442421174528997e-05, 'epoch': 0.5} +{'loss': 0.9222, 'grad_norm': 1.6886099576950073, 'learning_rate': 1.0436193873653362e-05, 'epoch': 0.5} +{'loss': 0.9341, 'grad_norm': 0.8882932066917419, 'learning_rate': 1.0429966403297818e-05, 'epoch': 0.5} +{'loss': 0.9314, 'grad_norm': 1.794498324394226, 'learning_rate': 1.0423738765882006e-05, 'epoch': 0.5} +{'loss': 0.8901, 'grad_norm': 1.8079392910003662, 'learning_rate': 1.0417510963825625e-05, 'epoch': 0.5} +{'loss': 1.0387, 'grad_norm': 1.793648362159729, 'learning_rate': 1.0411282999548442e-05, 'epoch': 0.5} +{'loss': 0.9984, 'grad_norm': 1.901085376739502, 'learning_rate': 1.0405054875470287e-05, 'epoch': 0.5} +{'loss': 0.9759, 'grad_norm': 1.7960476875305176, 'learning_rate': 1.039882659401105e-05, 'epoch': 0.5} +{'loss': 0.9723, 'grad_norm': 1.7626402378082275, 'learning_rate': 1.0392598157590687e-05, 'epoch': 0.5} +{'loss': 0.961, 'grad_norm': 1.8104925155639648, 'learning_rate': 1.0386369568629209e-05, 'epoch': 0.5} +{'loss': 0.9673, 'grad_norm': 1.946303367614746, 'learning_rate': 1.038014082954669e-05, 'epoch': 0.5} +{'loss': 0.9577, 'grad_norm': 1.8023290634155273, 'learning_rate': 1.037391194276326e-05, 'epoch': 0.5} +{'loss': 1.0539, 'grad_norm': 1.9239580631256104, 'learning_rate': 1.0367682910699104e-05, 'epoch': 0.5} +{'loss': 1.004, 'grad_norm': 1.7491058111190796, 'learning_rate': 1.0361453735774467e-05, 'epoch': 0.5} +{'loss': 0.952, 'grad_norm': 1.830714464187622, 'learning_rate': 1.0355224420409653e-05, 'epoch': 0.5} +{'loss': 0.9931, 'grad_norm': 1.6921135187149048, 'learning_rate': 1.0348994967025012e-05, 'epoch': 0.5} +{'loss': 0.9309, 'grad_norm': 0.9464678764343262, 'learning_rate': 1.0342765378040953e-05, 'epoch': 0.5} +{'loss': 0.8995, 'grad_norm': 1.9240251779556274, 'learning_rate': 1.0336535655877942e-05, 'epoch': 0.5} +{'loss': 1.0024, 'grad_norm': 1.694456696510315, 'learning_rate': 1.0330305802956484e-05, 'epoch': 0.5} +{'loss': 0.9695, 'grad_norm': 1.7124693393707275, 'learning_rate': 1.0324075821697146e-05, 'epoch': 0.51} +{'loss': 0.9637, 'grad_norm': 1.732235074043274, 'learning_rate': 1.0317845714520543e-05, 'epoch': 0.51} +{'loss': 0.9623, 'grad_norm': 1.9292073249816895, 'learning_rate': 1.0311615483847333e-05, 'epoch': 0.51} +{'loss': 0.9776, 'grad_norm': 1.5642004013061523, 'learning_rate': 1.0305385132098229e-05, 'epoch': 0.51} +{'loss': 0.8965, 'grad_norm': 1.7604013681411743, 'learning_rate': 1.0299154661693987e-05, 'epoch': 0.51} +{'loss': 0.9914, 'grad_norm': 1.7320574522018433, 'learning_rate': 1.029292407505541e-05, 'epoch': 0.51} +{'loss': 0.9509, 'grad_norm': 1.987414002418518, 'learning_rate': 1.0286693374603347e-05, 'epoch': 0.51} +{'loss': 0.9436, 'grad_norm': 0.8902105689048767, 'learning_rate': 1.028046256275869e-05, 'epoch': 0.51} +{'loss': 0.9511, 'grad_norm': 1.7762867212295532, 'learning_rate': 1.0274231641942378e-05, 'epoch': 0.51} +{'loss': 0.9117, 'grad_norm': 1.5204285383224487, 'learning_rate': 1.0268000614575386e-05, 'epoch': 0.51} +{'loss': 0.9165, 'grad_norm': 1.7982538938522339, 'learning_rate': 1.0261769483078734e-05, 'epoch': 0.51} +{'loss': 1.0162, 'grad_norm': 1.6730221509933472, 'learning_rate': 1.0255538249873482e-05, 'epoch': 0.51} +{'loss': 0.9731, 'grad_norm': 1.7033743858337402, 'learning_rate': 1.0249306917380731e-05, 'epoch': 0.51} +{'loss': 0.9879, 'grad_norm': 1.8288662433624268, 'learning_rate': 1.0243075488021617e-05, 'epoch': 0.51} +{'loss': 0.9248, 'grad_norm': 1.7370675802230835, 'learning_rate': 1.0236843964217317e-05, 'epoch': 0.51} +{'loss': 0.9764, 'grad_norm': 1.8735297918319702, 'learning_rate': 1.0230612348389048e-05, 'epoch': 0.51} +{'loss': 0.9317, 'grad_norm': 1.6338409185409546, 'learning_rate': 1.0224380642958052e-05, 'epoch': 0.51} +{'loss': 0.9465, 'grad_norm': 1.579725980758667, 'learning_rate': 1.0218148850345613e-05, 'epoch': 0.51} +{'loss': 0.9425, 'grad_norm': 1.740787386894226, 'learning_rate': 1.0211916972973048e-05, 'epoch': 0.51} +{'loss': 1.0563, 'grad_norm': 1.7199437618255615, 'learning_rate': 1.0205685013261712e-05, 'epoch': 0.51} +{'loss': 0.9705, 'grad_norm': 1.6943423748016357, 'learning_rate': 1.0199452973632982e-05, 'epoch': 0.51} +{'loss': 0.9519, 'grad_norm': 1.7635377645492554, 'learning_rate': 1.019322085650827e-05, 'epoch': 0.51} +{'loss': 0.967, 'grad_norm': 1.700238823890686, 'learning_rate': 1.0186988664309023e-05, 'epoch': 0.51} +{'loss': 0.9499, 'grad_norm': 1.5561318397521973, 'learning_rate': 1.018075639945671e-05, 'epoch': 0.51} +{'loss': 0.9743, 'grad_norm': 1.8046441078186035, 'learning_rate': 1.0174524064372837e-05, 'epoch': 0.51} +{'loss': 0.9129, 'grad_norm': 1.6736178398132324, 'learning_rate': 1.0168291661478924e-05, 'epoch': 0.51} +{'loss': 1.0132, 'grad_norm': 1.6682915687561035, 'learning_rate': 1.016205919319653e-05, 'epoch': 0.51} +{'loss': 0.9104, 'grad_norm': 1.7404675483703613, 'learning_rate': 1.0155826661947232e-05, 'epoch': 0.51} +{'loss': 0.9425, 'grad_norm': 2.396991014480591, 'learning_rate': 1.0149594070152638e-05, 'epoch': 0.51} +{'loss': 0.9326, 'grad_norm': 1.7358639240264893, 'learning_rate': 1.014336142023437e-05, 'epoch': 0.51} +{'loss': 0.947, 'grad_norm': 1.795857548713684, 'learning_rate': 1.0137128714614082e-05, 'epoch': 0.51} +{'loss': 0.8753, 'grad_norm': 1.9969037771224976, 'learning_rate': 1.0130895955713445e-05, 'epoch': 0.51} +{'loss': 1.0194, 'grad_norm': 1.7478946447372437, 'learning_rate': 1.0124663145954152e-05, 'epoch': 0.51} +{'loss': 0.8897, 'grad_norm': 1.7508280277252197, 'learning_rate': 1.0118430287757914e-05, 'epoch': 0.51} +{'loss': 1.0745, 'grad_norm': 1.9347432851791382, 'learning_rate': 1.011219738354646e-05, 'epoch': 0.51} +{'loss': 0.9758, 'grad_norm': 1.668831706047058, 'learning_rate': 1.0105964435741544e-05, 'epoch': 0.51} +{'loss': 1.0125, 'grad_norm': 1.7541227340698242, 'learning_rate': 1.0099731446764927e-05, 'epoch': 0.51} +{'loss': 0.8429, 'grad_norm': 1.757276177406311, 'learning_rate': 1.0093498419038394e-05, 'epoch': 0.51} +{'loss': 0.9857, 'grad_norm': 1.865217685699463, 'learning_rate': 1.008726535498374e-05, 'epoch': 0.51} +{'loss': 0.9632, 'grad_norm': 1.6601051092147827, 'learning_rate': 1.0081032257022776e-05, 'epoch': 0.51} +{'loss': 0.9729, 'grad_norm': 1.6460882425308228, 'learning_rate': 1.007479912757733e-05, 'epoch': 0.51} +{'loss': 0.9101, 'grad_norm': 1.9257735013961792, 'learning_rate': 1.0068565969069236e-05, 'epoch': 0.51} +{'loss': 0.921, 'grad_norm': 1.5772889852523804, 'learning_rate': 1.0062332783920337e-05, 'epoch': 0.51} +{'loss': 0.9101, 'grad_norm': 1.9605385065078735, 'learning_rate': 1.0056099574552498e-05, 'epoch': 0.51} +{'loss': 1.0089, 'grad_norm': 1.599408745765686, 'learning_rate': 1.0049866343387582e-05, 'epoch': 0.51} +{'loss': 0.9841, 'grad_norm': 1.7058508396148682, 'learning_rate': 1.0043633092847468e-05, 'epoch': 0.51} +{'loss': 0.9225, 'grad_norm': 1.7644081115722656, 'learning_rate': 1.0037399825354036e-05, 'epoch': 0.51} +{'loss': 1.0049, 'grad_norm': 1.9251432418823242, 'learning_rate': 1.0031166543329179e-05, 'epoch': 0.51} +{'loss': 0.9266, 'grad_norm': 1.8498098850250244, 'learning_rate': 1.0024933249194792e-05, 'epoch': 0.51} +{'loss': 0.8774, 'grad_norm': 1.6110475063323975, 'learning_rate': 1.0018699945372776e-05, 'epoch': 0.51} +{'loss': 0.8597, 'grad_norm': 1.7248095273971558, 'learning_rate': 1.0012466634285034e-05, 'epoch': 0.51} +{'loss': 0.9337, 'grad_norm': 1.7393479347229004, 'learning_rate': 1.0006233318353474e-05, 'epoch': 0.51} +{'loss': 1.0174, 'grad_norm': 1.7079633474349976, 'learning_rate': 1e-05, 'epoch': 0.52} +{'loss': 0.9959, 'grad_norm': 1.852779746055603, 'learning_rate': 9.993766681646531e-06, 'epoch': 0.52} +{'loss': 0.9839, 'grad_norm': 0.9346885085105896, 'learning_rate': 9.987533365714968e-06, 'epoch': 0.52} +{'loss': 0.8227, 'grad_norm': 1.7506473064422607, 'learning_rate': 9.981300054627227e-06, 'epoch': 0.52} +{'loss': 0.9407, 'grad_norm': 1.8274807929992676, 'learning_rate': 9.97506675080521e-06, 'epoch': 0.52} +{'loss': 0.9855, 'grad_norm': 1.7469587326049805, 'learning_rate': 9.968833456670824e-06, 'epoch': 0.52} +{'loss': 0.9003, 'grad_norm': 1.5835057497024536, 'learning_rate': 9.962600174645968e-06, 'epoch': 0.52} +{'loss': 0.9402, 'grad_norm': 1.6277579069137573, 'learning_rate': 9.956366907152536e-06, 'epoch': 0.52} +{'loss': 0.9376, 'grad_norm': 0.8993666172027588, 'learning_rate': 9.950133656612421e-06, 'epoch': 0.52} +{'loss': 0.964, 'grad_norm': 1.8190206289291382, 'learning_rate': 9.943900425447505e-06, 'epoch': 0.52} +{'loss': 0.9561, 'grad_norm': 1.7243624925613403, 'learning_rate': 9.937667216079665e-06, 'epoch': 0.52} +{'loss': 1.105, 'grad_norm': 1.899775743484497, 'learning_rate': 9.931434030930768e-06, 'epoch': 0.52} +{'loss': 0.9132, 'grad_norm': 0.8839601278305054, 'learning_rate': 9.925200872422671e-06, 'epoch': 0.52} +{'loss': 0.8954, 'grad_norm': 2.152015447616577, 'learning_rate': 9.918967742977226e-06, 'epoch': 0.52} +{'loss': 0.927, 'grad_norm': 1.6577396392822266, 'learning_rate': 9.912734645016262e-06, 'epoch': 0.52} +{'loss': 1.0391, 'grad_norm': 1.6232292652130127, 'learning_rate': 9.90650158096161e-06, 'epoch': 0.52} +{'loss': 0.8917, 'grad_norm': 1.7644054889678955, 'learning_rate': 9.900268553235077e-06, 'epoch': 0.52} +{'loss': 0.9176, 'grad_norm': 1.801323413848877, 'learning_rate': 9.89403556425846e-06, 'epoch': 0.52} +{'loss': 0.931, 'grad_norm': 1.706662893295288, 'learning_rate': 9.887802616453543e-06, 'epoch': 0.52} +{'loss': 0.9976, 'grad_norm': 1.8291906118392944, 'learning_rate': 9.88156971224209e-06, 'epoch': 0.52} +{'loss': 0.8761, 'grad_norm': 0.9232510924339294, 'learning_rate': 9.87533685404585e-06, 'epoch': 0.52} +{'loss': 0.9814, 'grad_norm': 0.9049093127250671, 'learning_rate': 9.869104044286558e-06, 'epoch': 0.52} +{'loss': 1.0042, 'grad_norm': 1.9720518589019775, 'learning_rate': 9.862871285385921e-06, 'epoch': 0.52} +{'loss': 0.9318, 'grad_norm': 1.6203413009643555, 'learning_rate': 9.856638579765633e-06, 'epoch': 0.52} +{'loss': 1.0409, 'grad_norm': 1.765720009803772, 'learning_rate': 9.850405929847367e-06, 'epoch': 0.52} +{'loss': 0.9908, 'grad_norm': 1.7684149742126465, 'learning_rate': 9.844173338052771e-06, 'epoch': 0.52} +{'loss': 0.9634, 'grad_norm': 1.7169575691223145, 'learning_rate': 9.837940806803474e-06, 'epoch': 0.52} +{'loss': 0.9297, 'grad_norm': 1.7039082050323486, 'learning_rate': 9.83170833852108e-06, 'epoch': 0.52} +{'loss': 0.9691, 'grad_norm': 2.483220100402832, 'learning_rate': 9.825475935627165e-06, 'epoch': 0.52} +{'loss': 0.9788, 'grad_norm': 1.897674798965454, 'learning_rate': 9.819243600543292e-06, 'epoch': 0.52} +{'loss': 0.963, 'grad_norm': 1.7572585344314575, 'learning_rate': 9.81301133569098e-06, 'epoch': 0.52} +{'loss': 0.9203, 'grad_norm': 1.5901063680648804, 'learning_rate': 9.806779143491733e-06, 'epoch': 0.52} +{'loss': 0.8524, 'grad_norm': 1.6806640625, 'learning_rate': 9.800547026367022e-06, 'epoch': 0.52} +{'loss': 1.0527, 'grad_norm': 1.7636572122573853, 'learning_rate': 9.794314986738291e-06, 'epoch': 0.52} +{'loss': 0.9715, 'grad_norm': 1.7680959701538086, 'learning_rate': 9.788083027026953e-06, 'epoch': 0.52} +{'loss': 1.0034, 'grad_norm': 1.8125712871551514, 'learning_rate': 9.78185114965439e-06, 'epoch': 0.52} +{'loss': 0.8545, 'grad_norm': 1.7768983840942383, 'learning_rate': 9.775619357041952e-06, 'epoch': 0.52} +{'loss': 0.9965, 'grad_norm': 1.876452088356018, 'learning_rate': 9.769387651610955e-06, 'epoch': 0.52} +{'loss': 0.935, 'grad_norm': 1.6719523668289185, 'learning_rate': 9.763156035782686e-06, 'epoch': 0.52} +{'loss': 0.9285, 'grad_norm': 1.656864881515503, 'learning_rate': 9.756924511978388e-06, 'epoch': 0.52} +{'loss': 1.0017, 'grad_norm': 1.8651610612869263, 'learning_rate': 9.750693082619274e-06, 'epoch': 0.52} +{'loss': 0.9664, 'grad_norm': 1.655044436454773, 'learning_rate': 9.744461750126523e-06, 'epoch': 0.52} +{'loss': 0.9616, 'grad_norm': 1.782131552696228, 'learning_rate': 9.738230516921272e-06, 'epoch': 0.52} +{'loss': 0.949, 'grad_norm': 1.9404141902923584, 'learning_rate': 9.73199938542462e-06, 'epoch': 0.52} +{'loss': 1.0062, 'grad_norm': 1.8631519079208374, 'learning_rate': 9.725768358057625e-06, 'epoch': 0.52} +{'loss': 0.8492, 'grad_norm': 1.8416463136672974, 'learning_rate': 9.719537437241311e-06, 'epoch': 0.52} +{'loss': 0.9458, 'grad_norm': 1.6259980201721191, 'learning_rate': 9.713306625396658e-06, 'epoch': 0.52} +{'loss': 0.9947, 'grad_norm': 1.6901936531066895, 'learning_rate': 9.707075924944595e-06, 'epoch': 0.52} +{'loss': 1.0049, 'grad_norm': 1.7396188974380493, 'learning_rate': 9.700845338306018e-06, 'epoch': 0.52} +{'loss': 0.9772, 'grad_norm': 1.936281681060791, 'learning_rate': 9.694614867901776e-06, 'epoch': 0.52} +{'loss': 0.912, 'grad_norm': 1.7554540634155273, 'learning_rate': 9.68838451615267e-06, 'epoch': 0.52} +{'loss': 0.9839, 'grad_norm': 1.868207335472107, 'learning_rate': 9.68215428547946e-06, 'epoch': 0.52} +{'loss': 0.9815, 'grad_norm': 1.856107473373413, 'learning_rate': 9.675924178302857e-06, 'epoch': 0.53} +{'loss': 0.9383, 'grad_norm': 1.7892992496490479, 'learning_rate': 9.66969419704352e-06, 'epoch': 0.53} +{'loss': 0.8808, 'grad_norm': 1.6498578786849976, 'learning_rate': 9.663464344122064e-06, 'epoch': 0.53} +{'loss': 0.9568, 'grad_norm': 2.1101608276367188, 'learning_rate': 9.65723462195905e-06, 'epoch': 0.53} +{'loss': 0.9031, 'grad_norm': 0.888077974319458, 'learning_rate': 9.651005032974994e-06, 'epoch': 0.53} +{'loss': 1.0133, 'grad_norm': 1.757644534111023, 'learning_rate': 9.644775579590352e-06, 'epoch': 0.53} +{'loss': 0.9604, 'grad_norm': 1.9336764812469482, 'learning_rate': 9.638546264225538e-06, 'epoch': 0.53} +{'loss': 0.957, 'grad_norm': 1.6861035823822021, 'learning_rate': 9.632317089300902e-06, 'epoch': 0.53} +{'loss': 1.021, 'grad_norm': 1.7981489896774292, 'learning_rate': 9.626088057236745e-06, 'epoch': 0.53} +{'loss': 0.9799, 'grad_norm': 1.54574453830719, 'learning_rate': 9.619859170453314e-06, 'epoch': 0.53} +{'loss': 0.9842, 'grad_norm': 1.7902891635894775, 'learning_rate': 9.613630431370794e-06, 'epoch': 0.53} +{'loss': 0.944, 'grad_norm': 1.9360662698745728, 'learning_rate': 9.607401842409318e-06, 'epoch': 0.53} +{'loss': 0.8692, 'grad_norm': 1.896635890007019, 'learning_rate': 9.601173405988955e-06, 'epoch': 0.53} +{'loss': 0.9148, 'grad_norm': 1.6465919017791748, 'learning_rate': 9.594945124529718e-06, 'epoch': 0.53} +{'loss': 0.8686, 'grad_norm': 1.6439268589019775, 'learning_rate': 9.588717000451563e-06, 'epoch': 0.53} +{'loss': 0.9656, 'grad_norm': 1.6992038488388062, 'learning_rate': 9.58248903617438e-06, 'epoch': 0.53} +{'loss': 0.953, 'grad_norm': 1.6367740631103516, 'learning_rate': 9.576261234117998e-06, 'epoch': 0.53} +{'loss': 0.9926, 'grad_norm': 1.5830286741256714, 'learning_rate': 9.570033596702185e-06, 'epoch': 0.53} +{'loss': 0.9275, 'grad_norm': 1.8201066255569458, 'learning_rate': 9.563806126346643e-06, 'epoch': 0.53} +{'loss': 1.0015, 'grad_norm': 1.833636999130249, 'learning_rate': 9.557578825471008e-06, 'epoch': 0.53} +{'loss': 1.0241, 'grad_norm': 1.774916410446167, 'learning_rate': 9.551351696494854e-06, 'epoch': 0.53} +{'loss': 0.8498, 'grad_norm': 0.8611094355583191, 'learning_rate': 9.545124741837684e-06, 'epoch': 0.53} +{'loss': 0.8913, 'grad_norm': 0.960064172744751, 'learning_rate': 9.538897963918936e-06, 'epoch': 0.53} +{'loss': 0.9493, 'grad_norm': 1.7007323503494263, 'learning_rate': 9.532671365157979e-06, 'epoch': 0.53} +{'loss': 0.8636, 'grad_norm': 1.7518436908721924, 'learning_rate': 9.526444947974112e-06, 'epoch': 0.53} +{'loss': 0.9688, 'grad_norm': 1.8486132621765137, 'learning_rate': 9.520218714786564e-06, 'epoch': 0.53} +{'loss': 0.9012, 'grad_norm': 1.531022548675537, 'learning_rate': 9.51399266801449e-06, 'epoch': 0.53} +{'loss': 0.9367, 'grad_norm': 1.753411889076233, 'learning_rate': 9.507766810076972e-06, 'epoch': 0.53} +{'loss': 0.9405, 'grad_norm': 0.8864457011222839, 'learning_rate': 9.501541143393028e-06, 'epoch': 0.53} +{'loss': 0.9239, 'grad_norm': 1.7424569129943848, 'learning_rate': 9.495315670381592e-06, 'epoch': 0.53} +{'loss': 0.9726, 'grad_norm': 2.003262758255005, 'learning_rate': 9.489090393461523e-06, 'epoch': 0.53} +{'loss': 0.939, 'grad_norm': 1.5674717426300049, 'learning_rate': 9.48286531505161e-06, 'epoch': 0.53} +{'loss': 0.9246, 'grad_norm': 0.9085109829902649, 'learning_rate': 9.476640437570562e-06, 'epoch': 0.53} +{'loss': 0.95, 'grad_norm': 1.764864444732666, 'learning_rate': 9.470415763437003e-06, 'epoch': 0.53} +{'loss': 0.961, 'grad_norm': 1.5910487174987793, 'learning_rate': 9.46419129506949e-06, 'epoch': 0.53} +{'loss': 0.996, 'grad_norm': 1.7415523529052734, 'learning_rate': 9.457967034886493e-06, 'epoch': 0.53} +{'loss': 1.0272, 'grad_norm': 1.5188019275665283, 'learning_rate': 9.4517429853064e-06, 'epoch': 0.53} +{'loss': 0.9227, 'grad_norm': 1.6836899518966675, 'learning_rate': 9.44551914874752e-06, 'epoch': 0.53} +{'loss': 0.9335, 'grad_norm': 1.7348124980926514, 'learning_rate': 9.439295527628083e-06, 'epoch': 0.53} +{'loss': 0.9518, 'grad_norm': 1.8454375267028809, 'learning_rate': 9.433072124366224e-06, 'epoch': 0.53} +{'loss': 0.9537, 'grad_norm': 1.7137324810028076, 'learning_rate': 9.426848941380007e-06, 'epoch': 0.53} +{'loss': 0.9893, 'grad_norm': 1.7918413877487183, 'learning_rate': 9.4206259810874e-06, 'epoch': 0.53} +{'loss': 0.9284, 'grad_norm': 0.8947124481201172, 'learning_rate': 9.41440324590629e-06, 'epoch': 0.53} +{'loss': 0.889, 'grad_norm': 1.8292086124420166, 'learning_rate': 9.408180738254472e-06, 'epoch': 0.53} +{'loss': 0.9077, 'grad_norm': 1.7990657091140747, 'learning_rate': 9.401958460549658e-06, 'epoch': 0.53} +{'loss': 1.0212, 'grad_norm': 1.765465259552002, 'learning_rate': 9.395736415209468e-06, 'epoch': 0.53} +{'loss': 0.9122, 'grad_norm': 0.9529819488525391, 'learning_rate': 9.38951460465143e-06, 'epoch': 0.53} +{'loss': 0.9941, 'grad_norm': 1.6302025318145752, 'learning_rate': 9.383293031292985e-06, 'epoch': 0.53} +{'loss': 0.9697, 'grad_norm': 1.7684575319290161, 'learning_rate': 9.377071697551479e-06, 'epoch': 0.53} +{'loss': 0.9392, 'grad_norm': 1.762911081314087, 'learning_rate': 9.370850605844165e-06, 'epoch': 0.53} +{'loss': 0.8997, 'grad_norm': 1.6029906272888184, 'learning_rate': 9.364629758588199e-06, 'epoch': 0.53} +{'loss': 1.0318, 'grad_norm': 1.8917847871780396, 'learning_rate': 9.358409158200648e-06, 'epoch': 0.53} +{'loss': 0.9705, 'grad_norm': 1.6695606708526611, 'learning_rate': 9.352188807098482e-06, 'epoch': 0.54} +{'loss': 0.8862, 'grad_norm': 1.8075644969940186, 'learning_rate': 9.34596870769857e-06, 'epoch': 0.54} +{'loss': 0.9562, 'grad_norm': 1.5545589923858643, 'learning_rate': 9.339748862417686e-06, 'epoch': 0.54} +{'loss': 1.0061, 'grad_norm': 1.6075382232666016, 'learning_rate': 9.333529273672505e-06, 'epoch': 0.54} +{'loss': 0.9562, 'grad_norm': 1.8791543245315552, 'learning_rate': 9.327309943879604e-06, 'epoch': 0.54} +{'loss': 0.8823, 'grad_norm': 1.5333778858184814, 'learning_rate': 9.321090875455457e-06, 'epoch': 0.54} +{'loss': 1.0015, 'grad_norm': 1.6390790939331055, 'learning_rate': 9.314872070816435e-06, 'epoch': 0.54} +{'loss': 0.9091, 'grad_norm': 1.826635718345642, 'learning_rate': 9.308653532378809e-06, 'epoch': 0.54} +{'loss': 0.9699, 'grad_norm': 1.7442251443862915, 'learning_rate': 9.302435262558748e-06, 'epoch': 0.54} +{'loss': 0.925, 'grad_norm': 1.7775946855545044, 'learning_rate': 9.296217263772313e-06, 'epoch': 0.54} +{'loss': 1.032, 'grad_norm': 1.6736632585525513, 'learning_rate': 9.289999538435463e-06, 'epoch': 0.54} +{'loss': 0.9389, 'grad_norm': 1.6273735761642456, 'learning_rate': 9.28378208896405e-06, 'epoch': 0.54} +{'loss': 1.0071, 'grad_norm': 1.6790839433670044, 'learning_rate': 9.277564917773816e-06, 'epoch': 0.54} +{'loss': 1.0477, 'grad_norm': 1.6431087255477905, 'learning_rate': 9.2713480272804e-06, 'epoch': 0.54} +{'loss': 0.9144, 'grad_norm': 1.8778955936431885, 'learning_rate': 9.265131419899325e-06, 'epoch': 0.54} +{'loss': 0.936, 'grad_norm': 1.6644126176834106, 'learning_rate': 9.258915098046008e-06, 'epoch': 0.54} +{'loss': 0.9824, 'grad_norm': 1.8462973833084106, 'learning_rate': 9.252699064135759e-06, 'epoch': 0.54} +{'loss': 0.9229, 'grad_norm': 0.9442023038864136, 'learning_rate': 9.246483320583768e-06, 'epoch': 0.54} +{'loss': 0.9509, 'grad_norm': 1.6928811073303223, 'learning_rate': 9.240267869805118e-06, 'epoch': 0.54} +{'loss': 1.001, 'grad_norm': 1.748376488685608, 'learning_rate': 9.234052714214777e-06, 'epoch': 0.54} +{'loss': 0.9428, 'grad_norm': 1.734100103378296, 'learning_rate': 9.227837856227594e-06, 'epoch': 0.54} +{'loss': 0.9408, 'grad_norm': 1.7140101194381714, 'learning_rate': 9.221623298258315e-06, 'epoch': 0.54} +{'loss': 0.9279, 'grad_norm': 1.8002983331680298, 'learning_rate': 9.215409042721553e-06, 'epoch': 0.54} +{'loss': 0.9736, 'grad_norm': 1.8716483116149902, 'learning_rate': 9.209195092031811e-06, 'epoch': 0.54} +{'loss': 1.0205, 'grad_norm': 1.7214860916137695, 'learning_rate': 9.202981448603477e-06, 'epoch': 0.54} +{'loss': 1.0066, 'grad_norm': 1.67242431640625, 'learning_rate': 9.196768114850814e-06, 'epoch': 0.54} +{'loss': 0.9553, 'grad_norm': 1.716296911239624, 'learning_rate': 9.190555093187968e-06, 'epoch': 0.54} +{'loss': 0.9631, 'grad_norm': 1.8753520250320435, 'learning_rate': 9.184342386028963e-06, 'epoch': 0.54} +{'loss': 0.9212, 'grad_norm': 1.6706680059432983, 'learning_rate': 9.178129995787698e-06, 'epoch': 0.54} +{'loss': 0.947, 'grad_norm': 1.7395124435424805, 'learning_rate': 9.17191792487796e-06, 'epoch': 0.54} +{'loss': 0.9504, 'grad_norm': 1.7721867561340332, 'learning_rate': 9.165706175713391e-06, 'epoch': 0.54} +{'loss': 0.9491, 'grad_norm': 1.7275317907333374, 'learning_rate': 9.159494750707527e-06, 'epoch': 0.54} +{'loss': 0.9244, 'grad_norm': 1.677743911743164, 'learning_rate': 9.153283652273768e-06, 'epoch': 0.54} +{'loss': 0.9673, 'grad_norm': 1.598771333694458, 'learning_rate': 9.147072882825394e-06, 'epoch': 0.54} +{'loss': 0.9523, 'grad_norm': 1.6412098407745361, 'learning_rate': 9.140862444775554e-06, 'epoch': 0.54} +{'loss': 0.97, 'grad_norm': 1.7778857946395874, 'learning_rate': 9.134652340537265e-06, 'epoch': 0.54} +{'loss': 0.9536, 'grad_norm': 1.7575709819793701, 'learning_rate': 9.128442572523418e-06, 'epoch': 0.54} +{'loss': 1.0264, 'grad_norm': 1.801011085510254, 'learning_rate': 9.122233143146777e-06, 'epoch': 0.54} +{'loss': 1.0187, 'grad_norm': 1.9391707181930542, 'learning_rate': 9.11602405481996e-06, 'epoch': 0.54} +{'loss': 0.8896, 'grad_norm': 1.528388500213623, 'learning_rate': 9.10981530995547e-06, 'epoch': 0.54} +{'loss': 1.0282, 'grad_norm': 2.2085585594177246, 'learning_rate': 9.103606910965666e-06, 'epoch': 0.54} +{'loss': 0.9631, 'grad_norm': 1.728722095489502, 'learning_rate': 9.097398860262777e-06, 'epoch': 0.54} +{'loss': 0.9305, 'grad_norm': 1.669419288635254, 'learning_rate': 9.091191160258895e-06, 'epoch': 0.54} +{'loss': 0.9921, 'grad_norm': 1.6859018802642822, 'learning_rate': 9.084983813365977e-06, 'epoch': 0.54} +{'loss': 0.9055, 'grad_norm': 1.754764199256897, 'learning_rate': 9.078776821995839e-06, 'epoch': 0.54} +{'loss': 0.9664, 'grad_norm': 1.7501680850982666, 'learning_rate': 9.072570188560164e-06, 'epoch': 0.54} +{'loss': 0.8986, 'grad_norm': 0.9149741530418396, 'learning_rate': 9.066363915470494e-06, 'epoch': 0.54} +{'loss': 0.9166, 'grad_norm': 1.6467288732528687, 'learning_rate': 9.060158005138229e-06, 'epoch': 0.54} +{'loss': 0.9152, 'grad_norm': 1.7493011951446533, 'learning_rate': 9.05395245997463e-06, 'epoch': 0.54} +{'loss': 0.9339, 'grad_norm': 1.6825838088989258, 'learning_rate': 9.047747282390815e-06, 'epoch': 0.54} +{'loss': 0.9503, 'grad_norm': 1.543800950050354, 'learning_rate': 9.04154247479776e-06, 'epoch': 0.54} +{'loss': 1.0133, 'grad_norm': 1.6420854330062866, 'learning_rate': 9.0353380396063e-06, 'epoch': 0.54} +{'loss': 0.9512, 'grad_norm': 1.7047592401504517, 'learning_rate': 9.02913397922712e-06, 'epoch': 0.55} +{'loss': 0.944, 'grad_norm': 1.6143908500671387, 'learning_rate': 9.022930296070761e-06, 'epoch': 0.55} +{'loss': 0.9793, 'grad_norm': 1.8498049974441528, 'learning_rate': 9.016726992547621e-06, 'epoch': 0.55} +{'loss': 0.9109, 'grad_norm': 1.649346947669983, 'learning_rate': 9.010524071067947e-06, 'epoch': 0.55} +{'loss': 0.8795, 'grad_norm': 1.6977261304855347, 'learning_rate': 9.004321534041836e-06, 'epoch': 0.55} +{'loss': 0.8888, 'grad_norm': 1.7800700664520264, 'learning_rate': 8.99811938387924e-06, 'epoch': 0.55} +{'loss': 0.8706, 'grad_norm': 1.877732276916504, 'learning_rate': 8.991917622989957e-06, 'epoch': 0.55} +{'loss': 0.9717, 'grad_norm': 1.7536191940307617, 'learning_rate': 8.985716253783636e-06, 'epoch': 0.55} +{'loss': 0.9489, 'grad_norm': 1.9079275131225586, 'learning_rate': 8.979515278669776e-06, 'epoch': 0.55} +{'loss': 0.9359, 'grad_norm': 1.7857767343521118, 'learning_rate': 8.973314700057717e-06, 'epoch': 0.55} +{'loss': 0.8808, 'grad_norm': 1.6987247467041016, 'learning_rate': 8.967114520356652e-06, 'epoch': 0.55} +{'loss': 0.9468, 'grad_norm': 1.6191747188568115, 'learning_rate': 8.960914741975608e-06, 'epoch': 0.55} +{'loss': 0.9426, 'grad_norm': 1.631452202796936, 'learning_rate': 8.954715367323468e-06, 'epoch': 0.55} +{'loss': 0.9835, 'grad_norm': 1.713382363319397, 'learning_rate': 8.948516398808952e-06, 'epoch': 0.55} +{'loss': 0.9719, 'grad_norm': 1.7706388235092163, 'learning_rate': 8.942317838840625e-06, 'epoch': 0.55} +{'loss': 0.9175, 'grad_norm': 0.9782737493515015, 'learning_rate': 8.936119689826888e-06, 'epoch': 0.55} +{'loss': 1.0025, 'grad_norm': 1.8026446104049683, 'learning_rate': 8.92992195417599e-06, 'epoch': 0.55} +{'loss': 1.0375, 'grad_norm': 1.7526710033416748, 'learning_rate': 8.923724634296013e-06, 'epoch': 0.55} +{'loss': 0.9201, 'grad_norm': 1.8215643167495728, 'learning_rate': 8.917527732594882e-06, 'epoch': 0.55} +{'loss': 0.9046, 'grad_norm': 1.5967341661453247, 'learning_rate': 8.911331251480357e-06, 'epoch': 0.55} +{'loss': 0.9861, 'grad_norm': 1.7772562503814697, 'learning_rate': 8.905135193360032e-06, 'epoch': 0.55} +{'loss': 0.929, 'grad_norm': 1.629896879196167, 'learning_rate': 8.898939560641342e-06, 'epoch': 0.55} +{'loss': 0.9288, 'grad_norm': 1.845942735671997, 'learning_rate': 8.892744355731555e-06, 'epoch': 0.55} +{'loss': 0.9142, 'grad_norm': 1.8825474977493286, 'learning_rate': 8.88654958103777e-06, 'epoch': 0.55} +{'loss': 0.9431, 'grad_norm': 1.5987112522125244, 'learning_rate': 8.880355238966923e-06, 'epoch': 0.55} +{'loss': 0.9046, 'grad_norm': 1.801778793334961, 'learning_rate': 8.874161331925778e-06, 'epoch': 0.55} +{'loss': 0.9293, 'grad_norm': 2.168508768081665, 'learning_rate': 8.867967862320935e-06, 'epoch': 0.55} +{'loss': 0.9592, 'grad_norm': 1.864163875579834, 'learning_rate': 8.861774832558819e-06, 'epoch': 0.55} +{'loss': 1.0487, 'grad_norm': 1.7023168802261353, 'learning_rate': 8.855582245045682e-06, 'epoch': 0.55} +{'loss': 0.9347, 'grad_norm': 1.855760931968689, 'learning_rate': 8.849390102187615e-06, 'epoch': 0.55} +{'loss': 0.9697, 'grad_norm': 0.9254429340362549, 'learning_rate': 8.843198406390524e-06, 'epoch': 0.55} +{'loss': 0.9819, 'grad_norm': 2.01577091217041, 'learning_rate': 8.83700716006015e-06, 'epoch': 0.55} +{'loss': 0.9721, 'grad_norm': 1.780637502670288, 'learning_rate': 8.830816365602053e-06, 'epoch': 0.55} +{'loss': 0.986, 'grad_norm': 1.836281180381775, 'learning_rate': 8.824626025421625e-06, 'epoch': 0.55} +{'loss': 1.0788, 'grad_norm': 1.9009772539138794, 'learning_rate': 8.818436141924072e-06, 'epoch': 0.55} +{'loss': 0.9686, 'grad_norm': 1.6611988544464111, 'learning_rate': 8.812246717514434e-06, 'epoch': 0.55} +{'loss': 1.0014, 'grad_norm': 1.8770869970321655, 'learning_rate': 8.806057754597559e-06, 'epoch': 0.55} +{'loss': 1.0017, 'grad_norm': 1.7273067235946655, 'learning_rate': 8.799869255578126e-06, 'epoch': 0.55} +{'loss': 0.9142, 'grad_norm': 1.6882126331329346, 'learning_rate': 8.793681222860631e-06, 'epoch': 0.55} +{'loss': 0.9597, 'grad_norm': 1.820639967918396, 'learning_rate': 8.787493658849387e-06, 'epoch': 0.55} +{'loss': 0.9842, 'grad_norm': 1.7515637874603271, 'learning_rate': 8.781306565948528e-06, 'epoch': 0.55} +{'loss': 0.8589, 'grad_norm': 1.87770414352417, 'learning_rate': 8.775119946562e-06, 'epoch': 0.55} +{'loss': 0.9177, 'grad_norm': 0.9481867551803589, 'learning_rate': 8.768933803093573e-06, 'epoch': 0.55} +{'loss': 0.9837, 'grad_norm': 1.91827392578125, 'learning_rate': 8.762748137946827e-06, 'epoch': 0.55} +{'loss': 1.0418, 'grad_norm': 1.6901836395263672, 'learning_rate': 8.756562953525151e-06, 'epoch': 0.55} +{'loss': 0.9919, 'grad_norm': 1.6748099327087402, 'learning_rate': 8.750378252231756e-06, 'epoch': 0.55} +{'loss': 0.9318, 'grad_norm': 1.7069025039672852, 'learning_rate': 8.744194036469665e-06, 'epoch': 0.55} +{'loss': 0.862, 'grad_norm': 1.5615936517715454, 'learning_rate': 8.738010308641705e-06, 'epoch': 0.55} +{'loss': 0.9796, 'grad_norm': 1.6008946895599365, 'learning_rate': 8.731827071150519e-06, 'epoch': 0.55} +{'loss': 0.8774, 'grad_norm': 1.6812937259674072, 'learning_rate': 8.72564432639856e-06, 'epoch': 0.55} +{'loss': 0.9394, 'grad_norm': 1.775210976600647, 'learning_rate': 8.719462076788085e-06, 'epoch': 0.55} +{'loss': 1.0047, 'grad_norm': 1.7067625522613525, 'learning_rate': 8.713280324721163e-06, 'epoch': 0.55} +{'loss': 1.0137, 'grad_norm': 1.646234154701233, 'learning_rate': 8.70709907259967e-06, 'epoch': 0.56} +{'loss': 0.9036, 'grad_norm': 1.5783158540725708, 'learning_rate': 8.700918322825282e-06, 'epoch': 0.56} +{'loss': 0.9008, 'grad_norm': 1.6035501956939697, 'learning_rate': 8.694738077799487e-06, 'epoch': 0.56} +{'loss': 0.9333, 'grad_norm': 1.6333136558532715, 'learning_rate': 8.68855833992357e-06, 'epoch': 0.56} +{'loss': 0.9497, 'grad_norm': 1.7241555452346802, 'learning_rate': 8.682379111598626e-06, 'epoch': 0.56} +{'loss': 0.9127, 'grad_norm': 1.8575297594070435, 'learning_rate': 8.676200395225546e-06, 'epoch': 0.56} +{'loss': 0.9904, 'grad_norm': 1.671816349029541, 'learning_rate': 8.670022193205027e-06, 'epoch': 0.56} +{'loss': 0.9288, 'grad_norm': 1.693889856338501, 'learning_rate': 8.663844507937563e-06, 'epoch': 0.56} +{'loss': 0.9832, 'grad_norm': 1.534420371055603, 'learning_rate': 8.657667341823449e-06, 'epoch': 0.56} +{'loss': 1.0157, 'grad_norm': 1.737310528755188, 'learning_rate': 8.651490697262773e-06, 'epoch': 0.56} +{'loss': 0.9968, 'grad_norm': 1.7371097803115845, 'learning_rate': 8.645314576655428e-06, 'epoch': 0.56} +{'loss': 0.9741, 'grad_norm': 1.75102961063385, 'learning_rate': 8.6391389824011e-06, 'epoch': 0.56} +{'loss': 0.9553, 'grad_norm': 1.7553242444992065, 'learning_rate': 8.632963916899268e-06, 'epoch': 0.56} +{'loss': 0.9854, 'grad_norm': 1.7422131299972534, 'learning_rate': 8.626789382549212e-06, 'epoch': 0.56} +{'loss': 0.9712, 'grad_norm': 1.730526089668274, 'learning_rate': 8.62061538175e-06, 'epoch': 0.56} +{'loss': 1.0146, 'grad_norm': 1.630471110343933, 'learning_rate': 8.614441916900493e-06, 'epoch': 0.56} +{'loss': 1.0192, 'grad_norm': 2.00028657913208, 'learning_rate': 8.60826899039935e-06, 'epoch': 0.56} +{'loss': 1.0603, 'grad_norm': 1.8350640535354614, 'learning_rate': 8.602096604645009e-06, 'epoch': 0.56} +{'loss': 0.9117, 'grad_norm': 1.5071982145309448, 'learning_rate': 8.59592476203571e-06, 'epoch': 0.56} +{'loss': 0.9189, 'grad_norm': 1.596524953842163, 'learning_rate': 8.589753464969473e-06, 'epoch': 0.56} +{'loss': 0.986, 'grad_norm': 1.827192783355713, 'learning_rate': 8.583582715844113e-06, 'epoch': 0.56} +{'loss': 0.9524, 'grad_norm': 1.6590081453323364, 'learning_rate': 8.577412517057227e-06, 'epoch': 0.56} +{'loss': 0.9958, 'grad_norm': 1.7904973030090332, 'learning_rate': 8.571242871006202e-06, 'epoch': 0.56} +{'loss': 0.9985, 'grad_norm': 1.8047839403152466, 'learning_rate': 8.56507378008821e-06, 'epoch': 0.56} +{'loss': 0.8489, 'grad_norm': 1.9825135469436646, 'learning_rate': 8.558905246700202e-06, 'epoch': 0.56} +{'loss': 0.8784, 'grad_norm': 1.623400092124939, 'learning_rate': 8.552737273238919e-06, 'epoch': 0.56} +{'loss': 0.9128, 'grad_norm': 1.6467536687850952, 'learning_rate': 8.546569862100876e-06, 'epoch': 0.56} +{'loss': 0.892, 'grad_norm': 0.9603508114814758, 'learning_rate': 8.540403015682382e-06, 'epoch': 0.56} +{'loss': 0.8771, 'grad_norm': 1.71060311794281, 'learning_rate': 8.534236736379515e-06, 'epoch': 0.56} +{'loss': 0.9782, 'grad_norm': 1.7929786443710327, 'learning_rate': 8.528071026588137e-06, 'epoch': 0.56} +{'loss': 1.0493, 'grad_norm': 1.6755218505859375, 'learning_rate': 8.521905888703894e-06, 'epoch': 0.56} +{'loss': 0.8865, 'grad_norm': 1.8743388652801514, 'learning_rate': 8.515741325122202e-06, 'epoch': 0.56} +{'loss': 0.8352, 'grad_norm': 0.8616350889205933, 'learning_rate': 8.509577338238255e-06, 'epoch': 0.56} +{'loss': 0.9708, 'grad_norm': 1.7416044473648071, 'learning_rate': 8.503413930447027e-06, 'epoch': 0.56} +{'loss': 1.0593, 'grad_norm': 1.8031952381134033, 'learning_rate': 8.497251104143263e-06, 'epoch': 0.56} +{'loss': 0.9643, 'grad_norm': 1.6629844903945923, 'learning_rate': 8.491088861721486e-06, 'epoch': 0.56} +{'loss': 0.8943, 'grad_norm': 1.7399660348892212, 'learning_rate': 8.484927205575985e-06, 'epoch': 0.56} +{'loss': 0.9249, 'grad_norm': 1.714220643043518, 'learning_rate': 8.478766138100834e-06, 'epoch': 0.56} +{'loss': 0.9667, 'grad_norm': 1.6782300472259521, 'learning_rate': 8.472605661689863e-06, 'epoch': 0.56} +{'loss': 0.9177, 'grad_norm': 1.9140594005584717, 'learning_rate': 8.46644577873668e-06, 'epoch': 0.56} +{'loss': 0.9155, 'grad_norm': 1.5623764991760254, 'learning_rate': 8.460286491634664e-06, 'epoch': 0.56} +{'loss': 0.9294, 'grad_norm': 1.5963772535324097, 'learning_rate': 8.454127802776961e-06, 'epoch': 0.56} +{'loss': 0.9477, 'grad_norm': 1.7991639375686646, 'learning_rate': 8.447969714556484e-06, 'epoch': 0.56} +{'loss': 0.9209, 'grad_norm': 1.9068244695663452, 'learning_rate': 8.441812229365913e-06, 'epoch': 0.56} +{'loss': 0.8474, 'grad_norm': 1.7739561796188354, 'learning_rate': 8.43565534959769e-06, 'epoch': 0.56} +{'loss': 0.8932, 'grad_norm': 1.7665737867355347, 'learning_rate': 8.42949907764403e-06, 'epoch': 0.56} +{'loss': 0.9798, 'grad_norm': 1.8083511590957642, 'learning_rate': 8.423343415896904e-06, 'epoch': 0.56} +{'loss': 0.9924, 'grad_norm': 1.6626704931259155, 'learning_rate': 8.417188366748051e-06, 'epoch': 0.56} +{'loss': 0.9907, 'grad_norm': 1.8067280054092407, 'learning_rate': 8.411033932588969e-06, 'epoch': 0.56} +{'loss': 0.8916, 'grad_norm': 1.6312150955200195, 'learning_rate': 8.404880115810914e-06, 'epoch': 0.56} +{'loss': 1.0291, 'grad_norm': 1.740869164466858, 'learning_rate': 8.398726918804911e-06, 'epoch': 0.56} +{'loss': 0.9492, 'grad_norm': 1.8676230907440186, 'learning_rate': 8.392574343961739e-06, 'epoch': 0.56} +{'loss': 0.9434, 'grad_norm': 1.7246983051300049, 'learning_rate': 8.386422393671934e-06, 'epoch': 0.57} +{'loss': 0.9716, 'grad_norm': 1.64383065700531, 'learning_rate': 8.380271070325789e-06, 'epoch': 0.57} +{'loss': 1.0142, 'grad_norm': 1.5677272081375122, 'learning_rate': 8.374120376313362e-06, 'epoch': 0.57} +{'loss': 0.9866, 'grad_norm': 1.6247469186782837, 'learning_rate': 8.367970314024451e-06, 'epoch': 0.57} +{'loss': 0.9085, 'grad_norm': 1.7564074993133545, 'learning_rate': 8.361820885848623e-06, 'epoch': 0.57} +{'loss': 1.0384, 'grad_norm': 1.8154188394546509, 'learning_rate': 8.355672094175192e-06, 'epoch': 0.57} +{'loss': 0.9456, 'grad_norm': 1.7659227848052979, 'learning_rate': 8.349523941393224e-06, 'epoch': 0.57} +{'loss': 0.8884, 'grad_norm': 1.6569443941116333, 'learning_rate': 8.343376429891538e-06, 'epoch': 0.57} +{'loss': 1.0149, 'grad_norm': 1.7725307941436768, 'learning_rate': 8.337229562058707e-06, 'epoch': 0.57} +{'loss': 0.9744, 'grad_norm': 1.6769144535064697, 'learning_rate': 8.331083340283051e-06, 'epoch': 0.57} +{'loss': 0.919, 'grad_norm': 1.5744061470031738, 'learning_rate': 8.324937766952638e-06, 'epoch': 0.57} +{'loss': 0.9959, 'grad_norm': 1.7301446199417114, 'learning_rate': 8.318792844455282e-06, 'epoch': 0.57} +{'loss': 0.8693, 'grad_norm': 1.666870355606079, 'learning_rate': 8.312648575178552e-06, 'epoch': 0.57} +{'loss': 1.0275, 'grad_norm': 1.6792060136795044, 'learning_rate': 8.306504961509755e-06, 'epoch': 0.57} +{'loss': 0.9736, 'grad_norm': 1.847460389137268, 'learning_rate': 8.300362005835947e-06, 'epoch': 0.57} +{'loss': 0.8863, 'grad_norm': 0.9470747709274292, 'learning_rate': 8.294219710543931e-06, 'epoch': 0.57} +{'loss': 0.9648, 'grad_norm': 1.7566694021224976, 'learning_rate': 8.28807807802025e-06, 'epoch': 0.57} +{'loss': 0.9391, 'grad_norm': 1.8803002834320068, 'learning_rate': 8.281937110651187e-06, 'epoch': 0.57} +{'loss': 1.0002, 'grad_norm': 1.7637379169464111, 'learning_rate': 8.275796810822774e-06, 'epoch': 0.57} +{'loss': 0.9718, 'grad_norm': 1.7256768941879272, 'learning_rate': 8.269657180920773e-06, 'epoch': 0.57} +{'loss': 0.926, 'grad_norm': 1.9248450994491577, 'learning_rate': 8.263518223330698e-06, 'epoch': 0.57} +{'loss': 1.019, 'grad_norm': 1.8923312425613403, 'learning_rate': 8.25737994043779e-06, 'epoch': 0.57} +{'loss': 0.9763, 'grad_norm': 1.6377313137054443, 'learning_rate': 8.251242334627039e-06, 'epoch': 0.57} +{'loss': 0.9249, 'grad_norm': 2.114448308944702, 'learning_rate': 8.245105408283162e-06, 'epoch': 0.57} +{'loss': 0.9636, 'grad_norm': 1.8890897035598755, 'learning_rate': 8.238969163790617e-06, 'epoch': 0.57} +{'loss': 0.9279, 'grad_norm': 1.7341420650482178, 'learning_rate': 8.232833603533601e-06, 'epoch': 0.57} +{'loss': 0.9096, 'grad_norm': 1.80060875415802, 'learning_rate': 8.226698729896036e-06, 'epoch': 0.57} +{'loss': 0.8863, 'grad_norm': 1.902990698814392, 'learning_rate': 8.220564545261585e-06, 'epoch': 0.57} +{'loss': 0.9308, 'grad_norm': 1.815102458000183, 'learning_rate': 8.214431052013636e-06, 'epoch': 0.57} +{'loss': 1.0073, 'grad_norm': 1.7460057735443115, 'learning_rate': 8.208298252535314e-06, 'epoch': 0.57} +{'loss': 0.964, 'grad_norm': 1.9828581809997559, 'learning_rate': 8.202166149209475e-06, 'epoch': 0.57} +{'loss': 0.8836, 'grad_norm': 1.8354816436767578, 'learning_rate': 8.1960347444187e-06, 'epoch': 0.57} +{'loss': 1.0216, 'grad_norm': 1.7165905237197876, 'learning_rate': 8.189904040545302e-06, 'epoch': 0.57} +{'loss': 0.9282, 'grad_norm': 1.9106067419052124, 'learning_rate': 8.183774039971322e-06, 'epoch': 0.57} +{'loss': 1.01, 'grad_norm': 1.6354033946990967, 'learning_rate': 8.177644745078525e-06, 'epoch': 0.57} +{'loss': 1.0456, 'grad_norm': 1.909069299697876, 'learning_rate': 8.171516158248406e-06, 'epoch': 0.57} +{'loss': 0.9929, 'grad_norm': 1.6466954946517944, 'learning_rate': 8.165388281862177e-06, 'epoch': 0.57} +{'loss': 0.8903, 'grad_norm': 1.753554105758667, 'learning_rate': 8.159261118300784e-06, 'epoch': 0.57} +{'loss': 0.9208, 'grad_norm': 2.213350296020508, 'learning_rate': 8.153134669944887e-06, 'epoch': 0.57} +{'loss': 0.9468, 'grad_norm': 1.6472336053848267, 'learning_rate': 8.147008939174875e-06, 'epoch': 0.57} +{'loss': 0.9675, 'grad_norm': 1.952000617980957, 'learning_rate': 8.140883928370855e-06, 'epoch': 0.57} +{'loss': 0.964, 'grad_norm': 1.5846456289291382, 'learning_rate': 8.134759639912655e-06, 'epoch': 0.57} +{'loss': 0.8388, 'grad_norm': 1.758061170578003, 'learning_rate': 8.12863607617982e-06, 'epoch': 0.57} +{'loss': 0.9636, 'grad_norm': 1.8731545209884644, 'learning_rate': 8.12251323955162e-06, 'epoch': 0.57} +{'loss': 0.8807, 'grad_norm': 1.6598634719848633, 'learning_rate': 8.116391132407033e-06, 'epoch': 0.57} +{'loss': 0.9168, 'grad_norm': 1.7117222547531128, 'learning_rate': 8.11026975712476e-06, 'epoch': 0.57} +{'loss': 0.9744, 'grad_norm': 1.8713873624801636, 'learning_rate': 8.104149116083216e-06, 'epoch': 0.57} +{'loss': 0.9518, 'grad_norm': 1.9112058877944946, 'learning_rate': 8.098029211660534e-06, 'epoch': 0.57} +{'loss': 0.9087, 'grad_norm': 1.8843671083450317, 'learning_rate': 8.091910046234552e-06, 'epoch': 0.57} +{'loss': 1.0059, 'grad_norm': 1.9149013757705688, 'learning_rate': 8.085791622182832e-06, 'epoch': 0.57} +{'loss': 0.9408, 'grad_norm': 1.6751363277435303, 'learning_rate': 8.079673941882639e-06, 'epoch': 0.57} +{'loss': 0.9295, 'grad_norm': 1.8452701568603516, 'learning_rate': 8.073557007710957e-06, 'epoch': 0.57} +{'loss': 0.9044, 'grad_norm': 1.6879347562789917, 'learning_rate': 8.06744082204447e-06, 'epoch': 0.58} +{'loss': 0.9714, 'grad_norm': 2.1773629188537598, 'learning_rate': 8.061325387259578e-06, 'epoch': 0.58} +{'loss': 0.8865, 'grad_norm': 1.7719420194625854, 'learning_rate': 8.055210705732391e-06, 'epoch': 0.58} +{'loss': 0.9435, 'grad_norm': 1.685840368270874, 'learning_rate': 8.04909677983872e-06, 'epoch': 0.58} +{'loss': 0.9102, 'grad_norm': 1.6655336618423462, 'learning_rate': 8.042983611954087e-06, 'epoch': 0.58} +{'loss': 0.9706, 'grad_norm': 2.123296022415161, 'learning_rate': 8.036871204453718e-06, 'epoch': 0.58} +{'loss': 0.9897, 'grad_norm': 1.7848074436187744, 'learning_rate': 8.030759559712545e-06, 'epoch': 0.58} +{'loss': 0.9861, 'grad_norm': 1.7156323194503784, 'learning_rate': 8.0246486801052e-06, 'epoch': 0.58} +{'loss': 0.9241, 'grad_norm': 1.6514891386032104, 'learning_rate': 8.018538568006027e-06, 'epoch': 0.58} +{'loss': 0.9733, 'grad_norm': 1.7439475059509277, 'learning_rate': 8.012429225789055e-06, 'epoch': 0.58} +{'loss': 1.0217, 'grad_norm': 1.964775562286377, 'learning_rate': 8.00632065582803e-06, 'epoch': 0.58} +{'loss': 0.9001, 'grad_norm': 1.7098491191864014, 'learning_rate': 8.00021286049639e-06, 'epoch': 0.58} +{'loss': 0.8949, 'grad_norm': 1.857249140739441, 'learning_rate': 7.994105842167274e-06, 'epoch': 0.58} +{'loss': 0.9822, 'grad_norm': 1.6955369710922241, 'learning_rate': 7.987999603213518e-06, 'epoch': 0.58} +{'loss': 0.9244, 'grad_norm': 1.7742730379104614, 'learning_rate': 7.981894146007658e-06, 'epoch': 0.58} +{'loss': 0.9581, 'grad_norm': 1.824655294418335, 'learning_rate': 7.975789472921922e-06, 'epoch': 0.58} +{'loss': 1.0254, 'grad_norm': 1.8045625686645508, 'learning_rate': 7.96968558632824e-06, 'epoch': 0.58} +{'loss': 0.9704, 'grad_norm': 1.6778502464294434, 'learning_rate': 7.963582488598227e-06, 'epoch': 0.58} +{'loss': 0.8823, 'grad_norm': 1.6765117645263672, 'learning_rate': 7.957480182103198e-06, 'epoch': 0.58} +{'loss': 0.8671, 'grad_norm': 1.6389504671096802, 'learning_rate': 7.95137866921416e-06, 'epoch': 0.58} +{'loss': 0.9456, 'grad_norm': 1.5550858974456787, 'learning_rate': 7.945277952301811e-06, 'epoch': 0.58} +{'loss': 1.0094, 'grad_norm': 1.72085440158844, 'learning_rate': 7.939178033736538e-06, 'epoch': 0.58} +{'loss': 1.0041, 'grad_norm': 1.7494829893112183, 'learning_rate': 7.93307891588842e-06, 'epoch': 0.58} +{'loss': 0.992, 'grad_norm': 1.7873876094818115, 'learning_rate': 7.926980601127225e-06, 'epoch': 0.58} +{'loss': 0.8435, 'grad_norm': 1.9800448417663574, 'learning_rate': 7.92088309182241e-06, 'epoch': 0.58} +{'loss': 0.9594, 'grad_norm': 1.7443515062332153, 'learning_rate': 7.914786390343112e-06, 'epoch': 0.58} +{'loss': 0.979, 'grad_norm': 1.7304186820983887, 'learning_rate': 7.908690499058162e-06, 'epoch': 0.58} +{'loss': 0.9446, 'grad_norm': 1.65768301486969, 'learning_rate': 7.902595420336074e-06, 'epoch': 0.58} +{'loss': 0.9181, 'grad_norm': 1.61051607131958, 'learning_rate': 7.896501156545044e-06, 'epoch': 0.58} +{'loss': 0.966, 'grad_norm': 1.6247704029083252, 'learning_rate': 7.890407710052957e-06, 'epoch': 0.58} +{'loss': 0.9386, 'grad_norm': 1.6123080253601074, 'learning_rate': 7.884315083227373e-06, 'epoch': 0.58} +{'loss': 0.9371, 'grad_norm': 0.9107101559638977, 'learning_rate': 7.878223278435539e-06, 'epoch': 0.58} +{'loss': 0.8843, 'grad_norm': 1.6647801399230957, 'learning_rate': 7.872132298044382e-06, 'epoch': 0.58} +{'loss': 0.9742, 'grad_norm': 1.5860421657562256, 'learning_rate': 7.866042144420502e-06, 'epoch': 0.58} +{'loss': 0.9993, 'grad_norm': 1.7504537105560303, 'learning_rate': 7.859952819930188e-06, 'epoch': 0.58} +{'loss': 1.0004, 'grad_norm': 1.7566826343536377, 'learning_rate': 7.853864326939403e-06, 'epoch': 0.58} +{'loss': 0.9635, 'grad_norm': 1.6100963354110718, 'learning_rate': 7.847776667813782e-06, 'epoch': 0.58} +{'loss': 0.935, 'grad_norm': 1.633142113685608, 'learning_rate': 7.841689844918643e-06, 'epoch': 0.58} +{'loss': 1.0052, 'grad_norm': 1.7092989683151245, 'learning_rate': 7.835603860618973e-06, 'epoch': 0.58} +{'loss': 0.9583, 'grad_norm': 1.6931356191635132, 'learning_rate': 7.829518717279439e-06, 'epoch': 0.58} +{'loss': 0.997, 'grad_norm': 1.8024014234542847, 'learning_rate': 7.823434417264378e-06, 'epoch': 0.58} +{'loss': 0.9737, 'grad_norm': 1.786964774131775, 'learning_rate': 7.817350962937802e-06, 'epoch': 0.58} +{'loss': 0.9519, 'grad_norm': 1.7206529378890991, 'learning_rate': 7.811268356663387e-06, 'epoch': 0.58} +{'loss': 0.9564, 'grad_norm': 1.6637228727340698, 'learning_rate': 7.805186600804489e-06, 'epoch': 0.58} +{'loss': 0.9108, 'grad_norm': 1.7595679759979248, 'learning_rate': 7.799105697724127e-06, 'epoch': 0.58} +{'loss': 0.9525, 'grad_norm': 1.7303951978683472, 'learning_rate': 7.793025649784991e-06, 'epoch': 0.58} +{'loss': 1.0046, 'grad_norm': 1.8665261268615723, 'learning_rate': 7.786946459349441e-06, 'epoch': 0.58} +{'loss': 0.9512, 'grad_norm': 1.6762267351150513, 'learning_rate': 7.780868128779499e-06, 'epoch': 0.58} +{'loss': 0.9701, 'grad_norm': 1.628430724143982, 'learning_rate': 7.774790660436857e-06, 'epoch': 0.58} +{'loss': 0.9756, 'grad_norm': 1.5883457660675049, 'learning_rate': 7.768714056682873e-06, 'epoch': 0.58} +{'loss': 0.9011, 'grad_norm': 0.9464545249938965, 'learning_rate': 7.76263831987856e-06, 'epoch': 0.58} +{'loss': 0.9676, 'grad_norm': 1.6757656335830688, 'learning_rate': 7.756563452384605e-06, 'epoch': 0.58} +{'loss': 0.9589, 'grad_norm': 1.796553611755371, 'learning_rate': 7.750489456561351e-06, 'epoch': 0.59} +{'loss': 0.9715, 'grad_norm': 1.5950746536254883, 'learning_rate': 7.744416334768809e-06, 'epoch': 0.59} +{'loss': 0.9296, 'grad_norm': 1.8375208377838135, 'learning_rate': 7.738344089366638e-06, 'epoch': 0.59} +{'loss': 0.9209, 'grad_norm': 1.8418134450912476, 'learning_rate': 7.73227272271417e-06, 'epoch': 0.59} +{'loss': 0.9131, 'grad_norm': 1.62941575050354, 'learning_rate': 7.726202237170387e-06, 'epoch': 0.59} +{'loss': 0.8739, 'grad_norm': 1.942509651184082, 'learning_rate': 7.720132635093935e-06, 'epoch': 0.59} +{'loss': 0.9305, 'grad_norm': 0.8891316056251526, 'learning_rate': 7.714063918843106e-06, 'epoch': 0.59} +{'loss': 0.9344, 'grad_norm': 1.9507766962051392, 'learning_rate': 7.70799609077586e-06, 'epoch': 0.59} +{'loss': 0.9978, 'grad_norm': 1.7370929718017578, 'learning_rate': 7.701929153249808e-06, 'epoch': 0.59} +{'loss': 0.9831, 'grad_norm': 1.7096227407455444, 'learning_rate': 7.695863108622208e-06, 'epoch': 0.59} +{'loss': 0.9259, 'grad_norm': 1.721142053604126, 'learning_rate': 7.689797959249984e-06, 'epoch': 0.59} +{'loss': 0.8795, 'grad_norm': 1.7602863311767578, 'learning_rate': 7.6837337074897e-06, 'epoch': 0.59} +{'loss': 0.8368, 'grad_norm': 1.760566234588623, 'learning_rate': 7.677670355697577e-06, 'epoch': 0.59} +{'loss': 0.8972, 'grad_norm': 1.8040025234222412, 'learning_rate': 7.671607906229488e-06, 'epoch': 0.59} +{'loss': 0.9486, 'grad_norm': 1.6719255447387695, 'learning_rate': 7.66554636144095e-06, 'epoch': 0.59} +{'loss': 0.9398, 'grad_norm': 1.6952571868896484, 'learning_rate': 7.659485723687131e-06, 'epoch': 0.59} +{'loss': 0.9093, 'grad_norm': 1.696886420249939, 'learning_rate': 7.653425995322852e-06, 'epoch': 0.59} +{'loss': 0.9904, 'grad_norm': 1.662434697151184, 'learning_rate': 7.647367178702568e-06, 'epoch': 0.59} +{'loss': 0.984, 'grad_norm': 1.7197728157043457, 'learning_rate': 7.641309276180394e-06, 'epoch': 0.59} +{'loss': 0.9129, 'grad_norm': 1.723658561706543, 'learning_rate': 7.635252290110082e-06, 'epoch': 0.59} +{'loss': 0.9964, 'grad_norm': 1.7255167961120605, 'learning_rate': 7.629196222845027e-06, 'epoch': 0.59} +{'loss': 0.8868, 'grad_norm': 1.5792759656906128, 'learning_rate': 7.623141076738271e-06, 'epoch': 0.59} +{'loss': 0.9943, 'grad_norm': 1.6306661367416382, 'learning_rate': 7.617086854142498e-06, 'epoch': 0.59} +{'loss': 0.9348, 'grad_norm': 1.8734837770462036, 'learning_rate': 7.611033557410027e-06, 'epoch': 0.59} +{'loss': 0.8825, 'grad_norm': 1.626111388206482, 'learning_rate': 7.6049811888928235e-06, 'epoch': 0.59} +{'loss': 0.9636, 'grad_norm': 1.838313102722168, 'learning_rate': 7.598929750942492e-06, 'epoch': 0.59} +{'loss': 0.938, 'grad_norm': 2.4128808975219727, 'learning_rate': 7.592879245910273e-06, 'epoch': 0.59} +{'loss': 1.0066, 'grad_norm': 1.6248093843460083, 'learning_rate': 7.586829676147045e-06, 'epoch': 0.59} +{'loss': 1.0448, 'grad_norm': 1.8999649286270142, 'learning_rate': 7.580781044003324e-06, 'epoch': 0.59} +{'loss': 0.9495, 'grad_norm': 1.7558555603027344, 'learning_rate': 7.574733351829263e-06, 'epoch': 0.59} +{'loss': 0.9618, 'grad_norm': 1.840142846107483, 'learning_rate': 7.5686866019746444e-06, 'epoch': 0.59} +{'loss': 0.9217, 'grad_norm': 1.6699053049087524, 'learning_rate': 7.562640796788893e-06, 'epoch': 0.59} +{'loss': 0.9419, 'grad_norm': 1.6873536109924316, 'learning_rate': 7.556595938621058e-06, 'epoch': 0.59} +{'loss': 1.0167, 'grad_norm': 1.770682692527771, 'learning_rate': 7.550552029819828e-06, 'epoch': 0.59} +{'loss': 1.0401, 'grad_norm': 0.999251127243042, 'learning_rate': 7.544509072733515e-06, 'epoch': 0.59} +{'loss': 0.9612, 'grad_norm': 1.9162050485610962, 'learning_rate': 7.53846706971007e-06, 'epoch': 0.59} +{'loss': 1.026, 'grad_norm': 1.8191437721252441, 'learning_rate': 7.532426023097063e-06, 'epoch': 0.59} +{'loss': 1.0062, 'grad_norm': 1.6889057159423828, 'learning_rate': 7.5263859352417025e-06, 'epoch': 0.59} +{'loss': 0.8905, 'grad_norm': 0.8744052648544312, 'learning_rate': 7.5203468084908185e-06, 'epoch': 0.59} +{'loss': 0.8939, 'grad_norm': 1.7647967338562012, 'learning_rate': 7.514308645190867e-06, 'epoch': 0.59} +{'loss': 0.8917, 'grad_norm': 1.8856765031814575, 'learning_rate': 7.508271447687936e-06, 'epoch': 0.59} +{'loss': 0.8386, 'grad_norm': 1.730196475982666, 'learning_rate': 7.50223521832773e-06, 'epoch': 0.59} +{'loss': 0.9225, 'grad_norm': 2.005169153213501, 'learning_rate': 7.496199959455584e-06, 'epoch': 0.59} +{'loss': 0.9597, 'grad_norm': 1.6016221046447754, 'learning_rate': 7.490165673416455e-06, 'epoch': 0.59} +{'loss': 0.8897, 'grad_norm': 0.8668866157531738, 'learning_rate': 7.484132362554915e-06, 'epoch': 0.59} +{'loss': 0.9786, 'grad_norm': 2.025111198425293, 'learning_rate': 7.478100029215166e-06, 'epoch': 0.59} +{'loss': 0.9281, 'grad_norm': 1.7242189645767212, 'learning_rate': 7.472068675741024e-06, 'epoch': 0.59} +{'loss': 0.8382, 'grad_norm': 1.7397632598876953, 'learning_rate': 7.466038304475931e-06, 'epoch': 0.59} +{'loss': 0.9869, 'grad_norm': 1.8191649913787842, 'learning_rate': 7.4600089177629384e-06, 'epoch': 0.59} +{'loss': 0.9285, 'grad_norm': 1.7557562589645386, 'learning_rate': 7.4539805179447234e-06, 'epoch': 0.59} +{'loss': 0.9649, 'grad_norm': 1.7238134145736694, 'learning_rate': 7.447953107363574e-06, 'epoch': 0.59} +{'loss': 0.9301, 'grad_norm': 0.9079597592353821, 'learning_rate': 7.4419266883614e-06, 'epoch': 0.59} +{'loss': 0.922, 'grad_norm': 0.8732458353042603, 'learning_rate': 7.435901263279717e-06, 'epoch': 0.6} +{'loss': 0.94, 'grad_norm': 1.6692142486572266, 'learning_rate': 7.429876834459661e-06, 'epoch': 0.6} +{'loss': 0.9622, 'grad_norm': 1.7725183963775635, 'learning_rate': 7.4238534042419805e-06, 'epoch': 0.6} +{'loss': 0.9298, 'grad_norm': 1.8413903713226318, 'learning_rate': 7.417830974967034e-06, 'epoch': 0.6} +{'loss': 0.9267, 'grad_norm': 1.704624891281128, 'learning_rate': 7.411809548974792e-06, 'epoch': 0.6} +{'loss': 0.9479, 'grad_norm': 1.822956919670105, 'learning_rate': 7.4057891286048356e-06, 'epoch': 0.6} +{'loss': 1.0565, 'grad_norm': 1.774893879890442, 'learning_rate': 7.399769716196353e-06, 'epoch': 0.6} +{'loss': 0.8112, 'grad_norm': 1.6078121662139893, 'learning_rate': 7.393751314088147e-06, 'epoch': 0.6} +{'loss': 0.9288, 'grad_norm': 1.6429210901260376, 'learning_rate': 7.387733924618617e-06, 'epoch': 0.6} +{'loss': 0.9092, 'grad_norm': 0.9268057942390442, 'learning_rate': 7.38171755012578e-06, 'epoch': 0.6} +{'loss': 0.9872, 'grad_norm': 1.7914083003997803, 'learning_rate': 7.3757021929472495e-06, 'epoch': 0.6} +{'loss': 0.958, 'grad_norm': 1.7300490140914917, 'learning_rate': 7.3696878554202525e-06, 'epoch': 0.6} +{'loss': 0.9569, 'grad_norm': 1.9645992517471313, 'learning_rate': 7.3636745398816135e-06, 'epoch': 0.6} +{'loss': 0.9324, 'grad_norm': 1.6007357835769653, 'learning_rate': 7.35766224866776e-06, 'epoch': 0.6} +{'loss': 1.0158, 'grad_norm': 1.7044273614883423, 'learning_rate': 7.3516509841147276e-06, 'epoch': 0.6} +{'loss': 0.8864, 'grad_norm': 1.6432082653045654, 'learning_rate': 7.345640748558146e-06, 'epoch': 0.6} +{'loss': 0.8969, 'grad_norm': 1.6368439197540283, 'learning_rate': 7.33963154433325e-06, 'epoch': 0.6} +{'loss': 0.934, 'grad_norm': 1.609157919883728, 'learning_rate': 7.333623373774868e-06, 'epoch': 0.6} +{'loss': 0.941, 'grad_norm': 1.769007921218872, 'learning_rate': 7.327616239217432e-06, 'epoch': 0.6} +{'loss': 0.9605, 'grad_norm': 1.7063121795654297, 'learning_rate': 7.321610142994971e-06, 'epoch': 0.6} +{'loss': 0.9175, 'grad_norm': 1.651293396949768, 'learning_rate': 7.315605087441107e-06, 'epoch': 0.6} +{'loss': 0.9669, 'grad_norm': 1.733529806137085, 'learning_rate': 7.309601074889061e-06, 'epoch': 0.6} +{'loss': 0.9292, 'grad_norm': 1.7674496173858643, 'learning_rate': 7.303598107671648e-06, 'epoch': 0.6} +{'loss': 0.8914, 'grad_norm': 2.0784335136413574, 'learning_rate': 7.297596188121277e-06, 'epoch': 0.6} +{'loss': 0.9318, 'grad_norm': 1.8785899877548218, 'learning_rate': 7.291595318569951e-06, 'epoch': 0.6} +{'loss': 0.9053, 'grad_norm': 1.766610860824585, 'learning_rate': 7.285595501349259e-06, 'epoch': 0.6} +{'loss': 0.8586, 'grad_norm': 1.7902711629867554, 'learning_rate': 7.279596738790389e-06, 'epoch': 0.6} +{'loss': 0.9097, 'grad_norm': 1.8599613904953003, 'learning_rate': 7.273599033224114e-06, 'epoch': 0.6} +{'loss': 0.9444, 'grad_norm': 1.5772086381912231, 'learning_rate': 7.267602386980801e-06, 'epoch': 0.6} +{'loss': 0.9465, 'grad_norm': 1.978500247001648, 'learning_rate': 7.2616068023904e-06, 'epoch': 0.6} +{'loss': 0.9141, 'grad_norm': 1.760067105293274, 'learning_rate': 7.255612281782451e-06, 'epoch': 0.6} +{'loss': 0.8587, 'grad_norm': 0.8957152366638184, 'learning_rate': 7.249618827486083e-06, 'epoch': 0.6} +{'loss': 0.9806, 'grad_norm': 1.9353101253509521, 'learning_rate': 7.243626441830009e-06, 'epoch': 0.6} +{'loss': 0.8857, 'grad_norm': 1.6308897733688354, 'learning_rate': 7.2376351271425235e-06, 'epoch': 0.6} +{'loss': 0.9185, 'grad_norm': 1.7076072692871094, 'learning_rate': 7.2316448857515076e-06, 'epoch': 0.6} +{'loss': 0.8813, 'grad_norm': 1.5567580461502075, 'learning_rate': 7.225655719984426e-06, 'epoch': 0.6} +{'loss': 0.8705, 'grad_norm': 1.7650173902511597, 'learning_rate': 7.219667632168326e-06, 'epoch': 0.6} +{'loss': 0.954, 'grad_norm': 1.7719755172729492, 'learning_rate': 7.213680624629832e-06, 'epoch': 0.6} +{'loss': 0.9326, 'grad_norm': 1.8033884763717651, 'learning_rate': 7.207694699695155e-06, 'epoch': 0.6} +{'loss': 0.9477, 'grad_norm': 1.782778263092041, 'learning_rate': 7.201709859690081e-06, 'epoch': 0.6} +{'loss': 0.8708, 'grad_norm': 1.8047493696212769, 'learning_rate': 7.1957261069399745e-06, 'epoch': 0.6} +{'loss': 0.9409, 'grad_norm': 1.7807023525238037, 'learning_rate': 7.189743443769778e-06, 'epoch': 0.6} +{'loss': 0.9304, 'grad_norm': 1.9129821062088013, 'learning_rate': 7.183761872504011e-06, 'epoch': 0.6} +{'loss': 0.921, 'grad_norm': 1.626101016998291, 'learning_rate': 7.1777813954667716e-06, 'epoch': 0.6} +{'loss': 0.936, 'grad_norm': 1.9588425159454346, 'learning_rate': 7.171802014981726e-06, 'epoch': 0.6} +{'loss': 0.9631, 'grad_norm': 2.0529778003692627, 'learning_rate': 7.1658237333721216e-06, 'epoch': 0.6} +{'loss': 0.9019, 'grad_norm': 0.8956499695777893, 'learning_rate': 7.159846552960774e-06, 'epoch': 0.6} +{'loss': 0.9415, 'grad_norm': 1.7404218912124634, 'learning_rate': 7.153870476070073e-06, 'epoch': 0.6} +{'loss': 0.9084, 'grad_norm': 1.6268503665924072, 'learning_rate': 7.14789550502198e-06, 'epoch': 0.6} +{'loss': 1.0231, 'grad_norm': 1.7708336114883423, 'learning_rate': 7.141921642138025e-06, 'epoch': 0.6} +{'loss': 0.9695, 'grad_norm': 1.707088828086853, 'learning_rate': 7.135948889739307e-06, 'epoch': 0.6} +{'loss': 1.0124, 'grad_norm': 1.6896693706512451, 'learning_rate': 7.129977250146496e-06, 'epoch': 0.6} +{'loss': 0.8634, 'grad_norm': 1.6429632902145386, 'learning_rate': 7.124006725679828e-06, 'epoch': 0.61} +{'loss': 0.8911, 'grad_norm': 1.797355055809021, 'learning_rate': 7.118037318659108e-06, 'epoch': 0.61} +{'loss': 0.8967, 'grad_norm': 1.8120458126068115, 'learning_rate': 7.112069031403704e-06, 'epoch': 0.61} +{'loss': 0.9916, 'grad_norm': 1.693933129310608, 'learning_rate': 7.106101866232546e-06, 'epoch': 0.61} +{'loss': 0.9038, 'grad_norm': 1.604575276374817, 'learning_rate': 7.100135825464138e-06, 'epoch': 0.61} +{'loss': 0.9699, 'grad_norm': 1.89679753780365, 'learning_rate': 7.094170911416543e-06, 'epoch': 0.61} +{'loss': 0.9523, 'grad_norm': 1.655442476272583, 'learning_rate': 7.088207126407374e-06, 'epoch': 0.61} +{'loss': 0.9908, 'grad_norm': 1.8391051292419434, 'learning_rate': 7.082244472753823e-06, 'epoch': 0.61} +{'loss': 0.9494, 'grad_norm': 1.923933744430542, 'learning_rate': 7.076282952772634e-06, 'epoch': 0.61} +{'loss': 0.9111, 'grad_norm': 1.7745442390441895, 'learning_rate': 7.070322568780111e-06, 'epoch': 0.61} +{'loss': 0.9303, 'grad_norm': 1.6586029529571533, 'learning_rate': 7.064363323092117e-06, 'epoch': 0.61} +{'loss': 0.8972, 'grad_norm': 1.7990206480026245, 'learning_rate': 7.058405218024074e-06, 'epoch': 0.61} +{'loss': 0.9048, 'grad_norm': 1.8297728300094604, 'learning_rate': 7.052448255890958e-06, 'epoch': 0.61} +{'loss': 0.867, 'grad_norm': 1.7134735584259033, 'learning_rate': 7.046492439007306e-06, 'epoch': 0.61} +{'loss': 0.9256, 'grad_norm': 1.734916090965271, 'learning_rate': 7.040537769687204e-06, 'epoch': 0.61} +{'loss': 0.9796, 'grad_norm': 1.8912674188613892, 'learning_rate': 7.034584250244292e-06, 'epoch': 0.61} +{'loss': 0.8988, 'grad_norm': 1.613027811050415, 'learning_rate': 7.028631882991771e-06, 'epoch': 0.61} +{'loss': 0.9866, 'grad_norm': 1.7163894176483154, 'learning_rate': 7.022680670242387e-06, 'epoch': 0.61} +{'loss': 0.921, 'grad_norm': 0.9055702090263367, 'learning_rate': 7.01673061430844e-06, 'epoch': 0.61} +{'loss': 0.9702, 'grad_norm': 1.6616817712783813, 'learning_rate': 7.01078171750178e-06, 'epoch': 0.61} +{'loss': 1.0575, 'grad_norm': 1.7192180156707764, 'learning_rate': 7.004833982133808e-06, 'epoch': 0.61} +{'loss': 0.9602, 'grad_norm': 1.728763222694397, 'learning_rate': 6.998887410515473e-06, 'epoch': 0.61} +{'loss': 0.9614, 'grad_norm': 1.7535946369171143, 'learning_rate': 6.992942004957271e-06, 'epoch': 0.61} +{'loss': 0.9441, 'grad_norm': 2.1553993225097656, 'learning_rate': 6.986997767769243e-06, 'epoch': 0.61} +{'loss': 0.9434, 'grad_norm': 0.9410929083824158, 'learning_rate': 6.981054701260981e-06, 'epoch': 0.61} +{'loss': 0.9381, 'grad_norm': 1.6779158115386963, 'learning_rate': 6.97511280774162e-06, 'epoch': 0.61} +{'loss': 0.852, 'grad_norm': 1.8352134227752686, 'learning_rate': 6.969172089519836e-06, 'epoch': 0.61} +{'loss': 0.9848, 'grad_norm': 1.8777514696121216, 'learning_rate': 6.963232548903853e-06, 'epoch': 0.61} +{'loss': 0.9384, 'grad_norm': 1.7684664726257324, 'learning_rate': 6.957294188201438e-06, 'epoch': 0.61} +{'loss': 1.0142, 'grad_norm': 1.92123544216156, 'learning_rate': 6.951357009719893e-06, 'epoch': 0.61} +{'loss': 0.8899, 'grad_norm': 0.9332671761512756, 'learning_rate': 6.9454210157660694e-06, 'epoch': 0.61} +{'loss': 0.9182, 'grad_norm': 1.7205137014389038, 'learning_rate': 6.939486208646349e-06, 'epoch': 0.61} +{'loss': 0.9636, 'grad_norm': 1.8418316841125488, 'learning_rate': 6.933552590666659e-06, 'epoch': 0.61} +{'loss': 0.8844, 'grad_norm': 1.724522590637207, 'learning_rate': 6.927620164132463e-06, 'epoch': 0.61} +{'loss': 1.0362, 'grad_norm': 1.9042211771011353, 'learning_rate': 6.921688931348762e-06, 'epoch': 0.61} +{'loss': 0.9136, 'grad_norm': 1.7500731945037842, 'learning_rate': 6.915758894620093e-06, 'epoch': 0.61} +{'loss': 0.9258, 'grad_norm': 1.6804414987564087, 'learning_rate': 6.909830056250527e-06, 'epoch': 0.61} +{'loss': 0.9253, 'grad_norm': 1.6197253465652466, 'learning_rate': 6.903902418543671e-06, 'epoch': 0.61} +{'loss': 0.9284, 'grad_norm': 1.7026760578155518, 'learning_rate': 6.8979759838026676e-06, 'epoch': 0.61} +{'loss': 0.979, 'grad_norm': 2.2527170181274414, 'learning_rate': 6.892050754330181e-06, 'epoch': 0.61} +{'loss': 0.8549, 'grad_norm': 1.6979202032089233, 'learning_rate': 6.886126732428424e-06, 'epoch': 0.61} +{'loss': 0.908, 'grad_norm': 1.6735785007476807, 'learning_rate': 6.8802039203991275e-06, 'epoch': 0.61} +{'loss': 0.8493, 'grad_norm': 1.6730470657348633, 'learning_rate': 6.874282320543557e-06, 'epoch': 0.61} +{'loss': 0.8894, 'grad_norm': 1.8674790859222412, 'learning_rate': 6.8683619351625065e-06, 'epoch': 0.61} +{'loss': 0.945, 'grad_norm': 1.770477056503296, 'learning_rate': 6.862442766556297e-06, 'epoch': 0.61} +{'loss': 0.8622, 'grad_norm': 1.7187254428863525, 'learning_rate': 6.85652481702478e-06, 'epoch': 0.61} +{'loss': 0.8889, 'grad_norm': 1.742129921913147, 'learning_rate': 6.850608088867332e-06, 'epoch': 0.61} +{'loss': 0.9245, 'grad_norm': 1.5799837112426758, 'learning_rate': 6.844692584382848e-06, 'epoch': 0.61} +{'loss': 0.9512, 'grad_norm': 1.6683019399642944, 'learning_rate': 6.83877830586976e-06, 'epoch': 0.61} +{'loss': 0.9298, 'grad_norm': 1.6474559307098389, 'learning_rate': 6.832865255626013e-06, 'epoch': 0.61} +{'loss': 0.9155, 'grad_norm': 1.6920115947723389, 'learning_rate': 6.826953435949081e-06, 'epoch': 0.61} +{'loss': 0.9485, 'grad_norm': 1.7854158878326416, 'learning_rate': 6.821042849135957e-06, 'epoch': 0.61} +{'loss': 0.9636, 'grad_norm': 1.7588282823562622, 'learning_rate': 6.815133497483157e-06, 'epoch': 0.62} +{'loss': 0.9785, 'grad_norm': 1.7492843866348267, 'learning_rate': 6.809225383286715e-06, 'epoch': 0.62} +{'loss': 0.8671, 'grad_norm': 1.8690872192382812, 'learning_rate': 6.8033185088421874e-06, 'epoch': 0.62} +{'loss': 0.9637, 'grad_norm': 1.7724074125289917, 'learning_rate': 6.797412876444645e-06, 'epoch': 0.62} +{'loss': 0.9561, 'grad_norm': 1.7514842748641968, 'learning_rate': 6.791508488388675e-06, 'epoch': 0.62} +{'loss': 1.0579, 'grad_norm': 1.7365591526031494, 'learning_rate': 6.785605346968387e-06, 'epoch': 0.62} +{'loss': 1.0581, 'grad_norm': 1.7717989683151245, 'learning_rate': 6.779703454477402e-06, 'epoch': 0.62} +{'loss': 0.9847, 'grad_norm': 1.802849531173706, 'learning_rate': 6.7738028132088565e-06, 'epoch': 0.62} +{'loss': 0.9406, 'grad_norm': 1.9230589866638184, 'learning_rate': 6.767903425455402e-06, 'epoch': 0.62} +{'loss': 0.921, 'grad_norm': 1.6573668718338013, 'learning_rate': 6.762005293509202e-06, 'epoch': 0.62} +{'loss': 0.9151, 'grad_norm': 1.7774088382720947, 'learning_rate': 6.7561084196619306e-06, 'epoch': 0.62} +{'loss': 0.9419, 'grad_norm': 1.688793659210205, 'learning_rate': 6.750212806204778e-06, 'epoch': 0.62} +{'loss': 0.927, 'grad_norm': 1.698472499847412, 'learning_rate': 6.744318455428436e-06, 'epoch': 0.62} +{'loss': 0.9446, 'grad_norm': 1.6036385297775269, 'learning_rate': 6.738425369623115e-06, 'epoch': 0.62} +{'loss': 0.8567, 'grad_norm': 1.7535037994384766, 'learning_rate': 6.732533551078528e-06, 'epoch': 0.62} +{'loss': 0.8701, 'grad_norm': 1.7296459674835205, 'learning_rate': 6.7266430020839e-06, 'epoch': 0.62} +{'loss': 0.9682, 'grad_norm': 1.8219283819198608, 'learning_rate': 6.720753724927957e-06, 'epoch': 0.62} +{'loss': 0.942, 'grad_norm': 1.5691721439361572, 'learning_rate': 6.714865721898938e-06, 'epoch': 0.62} +{'loss': 0.9277, 'grad_norm': 1.6300033330917358, 'learning_rate': 6.708978995284581e-06, 'epoch': 0.62} +{'loss': 0.9816, 'grad_norm': 1.5564969778060913, 'learning_rate': 6.70309354737213e-06, 'epoch': 0.62} +{'loss': 0.9805, 'grad_norm': 1.7144765853881836, 'learning_rate': 6.697209380448333e-06, 'epoch': 0.62} +{'loss': 1.0345, 'grad_norm': 1.7554031610488892, 'learning_rate': 6.6913264967994375e-06, 'epoch': 0.62} +{'loss': 1.0069, 'grad_norm': 1.7935832738876343, 'learning_rate': 6.685444898711196e-06, 'epoch': 0.62} +{'loss': 0.9317, 'grad_norm': 1.7465343475341797, 'learning_rate': 6.679564588468861e-06, 'epoch': 0.62} +{'loss': 0.8791, 'grad_norm': 1.7861418724060059, 'learning_rate': 6.673685568357182e-06, 'epoch': 0.62} +{'loss': 0.8567, 'grad_norm': 1.6786564588546753, 'learning_rate': 6.66780784066041e-06, 'epoch': 0.62} +{'loss': 0.9323, 'grad_norm': 1.8464841842651367, 'learning_rate': 6.661931407662292e-06, 'epoch': 0.62} +{'loss': 0.9152, 'grad_norm': 1.9761865139007568, 'learning_rate': 6.656056271646076e-06, 'epoch': 0.62} +{'loss': 0.9762, 'grad_norm': 1.9605340957641602, 'learning_rate': 6.650182434894496e-06, 'epoch': 0.62} +{'loss': 0.9904, 'grad_norm': 1.7294394969940186, 'learning_rate': 6.644309899689794e-06, 'epoch': 0.62} +{'loss': 0.9187, 'grad_norm': 1.8550854921340942, 'learning_rate': 6.638438668313695e-06, 'epoch': 0.62} +{'loss': 0.9901, 'grad_norm': 1.7067784070968628, 'learning_rate': 6.632568743047426e-06, 'epoch': 0.62} +{'loss': 0.8918, 'grad_norm': 1.6767572164535522, 'learning_rate': 6.6267001261717015e-06, 'epoch': 0.62} +{'loss': 0.9238, 'grad_norm': 1.5912185907363892, 'learning_rate': 6.6208328199667305e-06, 'epoch': 0.62} +{'loss': 0.9186, 'grad_norm': 0.924150824546814, 'learning_rate': 6.614966826712212e-06, 'epoch': 0.62} +{'loss': 1.0614, 'grad_norm': 1.8024675846099854, 'learning_rate': 6.609102148687333e-06, 'epoch': 0.62} +{'loss': 0.9256, 'grad_norm': 1.8041318655014038, 'learning_rate': 6.603238788170771e-06, 'epoch': 0.62} +{'loss': 0.9962, 'grad_norm': 1.6303967237472534, 'learning_rate': 6.597376747440694e-06, 'epoch': 0.62} +{'loss': 0.9883, 'grad_norm': 1.875462293624878, 'learning_rate': 6.59151602877475e-06, 'epoch': 0.62} +{'loss': 0.9089, 'grad_norm': 1.8030260801315308, 'learning_rate': 6.5856566344500815e-06, 'epoch': 0.62} +{'loss': 0.9173, 'grad_norm': 1.7845523357391357, 'learning_rate': 6.579798566743314e-06, 'epoch': 0.62} +{'loss': 0.8976, 'grad_norm': 1.8029476404190063, 'learning_rate': 6.573941827930551e-06, 'epoch': 0.62} +{'loss': 0.9843, 'grad_norm': 1.7241421937942505, 'learning_rate': 6.568086420287389e-06, 'epoch': 0.62} +{'loss': 0.9405, 'grad_norm': 1.724313497543335, 'learning_rate': 6.562232346088903e-06, 'epoch': 0.62} +{'loss': 0.9392, 'grad_norm': 1.8561499118804932, 'learning_rate': 6.5563796076096484e-06, 'epoch': 0.62} +{'loss': 0.9031, 'grad_norm': 1.733005404472351, 'learning_rate': 6.550528207123667e-06, 'epoch': 0.62} +{'loss': 0.955, 'grad_norm': 1.9871788024902344, 'learning_rate': 6.544678146904475e-06, 'epoch': 0.62} +{'loss': 0.9295, 'grad_norm': 1.6345582008361816, 'learning_rate': 6.538829429225068e-06, 'epoch': 0.62} +{'loss': 0.8884, 'grad_norm': 1.5734761953353882, 'learning_rate': 6.532982056357928e-06, 'epoch': 0.62} +{'loss': 0.9633, 'grad_norm': 1.8750501871109009, 'learning_rate': 6.527136030575005e-06, 'epoch': 0.62} +{'loss': 0.9662, 'grad_norm': 1.7924443483352661, 'learning_rate': 6.521291354147727e-06, 'epoch': 0.62} +{'loss': 0.8778, 'grad_norm': 0.9084703922271729, 'learning_rate': 6.515448029347005e-06, 'epoch': 0.62} +{'loss': 0.9595, 'grad_norm': 1.7733460664749146, 'learning_rate': 6.5096060584432134e-06, 'epoch': 0.63} +{'loss': 0.8954, 'grad_norm': 1.7360961437225342, 'learning_rate': 6.503765443706212e-06, 'epoch': 0.63} +{'loss': 0.9448, 'grad_norm': 1.8181066513061523, 'learning_rate': 6.497926187405326e-06, 'epoch': 0.63} +{'loss': 0.8382, 'grad_norm': 1.6166300773620605, 'learning_rate': 6.492088291809355e-06, 'epoch': 0.63} +{'loss': 0.9792, 'grad_norm': 1.6044433116912842, 'learning_rate': 6.486251759186573e-06, 'epoch': 0.63} +{'loss': 0.9391, 'grad_norm': 1.6275311708450317, 'learning_rate': 6.48041659180472e-06, 'epoch': 0.63} +{'loss': 0.8718, 'grad_norm': 1.8418091535568237, 'learning_rate': 6.474582791931006e-06, 'epoch': 0.63} +{'loss': 0.9352, 'grad_norm': 1.8523285388946533, 'learning_rate': 6.468750361832112e-06, 'epoch': 0.63} +{'loss': 0.9476, 'grad_norm': 1.8091304302215576, 'learning_rate': 6.462919303774186e-06, 'epoch': 0.63} +{'loss': 0.9643, 'grad_norm': 1.672059416770935, 'learning_rate': 6.4570896200228415e-06, 'epoch': 0.63} +{'loss': 0.9651, 'grad_norm': 1.7655494213104248, 'learning_rate': 6.45126131284316e-06, 'epoch': 0.63} +{'loss': 0.9236, 'grad_norm': 1.8237800598144531, 'learning_rate': 6.445434384499687e-06, 'epoch': 0.63} +{'loss': 0.9937, 'grad_norm': 1.8213757276535034, 'learning_rate': 6.439608837256432e-06, 'epoch': 0.63} +{'loss': 0.9208, 'grad_norm': 1.7549742460250854, 'learning_rate': 6.43378467337687e-06, 'epoch': 0.63} +{'loss': 0.9533, 'grad_norm': 1.5696029663085938, 'learning_rate': 6.427961895123935e-06, 'epoch': 0.63} +{'loss': 0.9139, 'grad_norm': 1.7207852602005005, 'learning_rate': 6.422140504760025e-06, 'epoch': 0.63} +{'loss': 1.0153, 'grad_norm': 1.6829673051834106, 'learning_rate': 6.4163205045469975e-06, 'epoch': 0.63} +{'loss': 0.9786, 'grad_norm': 1.7586554288864136, 'learning_rate': 6.410501896746172e-06, 'epoch': 0.63} +{'loss': 0.8109, 'grad_norm': 2.7261741161346436, 'learning_rate': 6.404684683618325e-06, 'epoch': 0.63} +{'loss': 0.9307, 'grad_norm': 1.7530407905578613, 'learning_rate': 6.398868867423691e-06, 'epoch': 0.63} +{'loss': 0.9091, 'grad_norm': 1.6079076528549194, 'learning_rate': 6.393054450421963e-06, 'epoch': 0.63} +{'loss': 0.919, 'grad_norm': 1.905810832977295, 'learning_rate': 6.387241434872293e-06, 'epoch': 0.63} +{'loss': 0.9685, 'grad_norm': 1.6266740560531616, 'learning_rate': 6.381429823033281e-06, 'epoch': 0.63} +{'loss': 0.9212, 'grad_norm': 1.6186314821243286, 'learning_rate': 6.375619617162985e-06, 'epoch': 0.63} +{'loss': 0.8993, 'grad_norm': 1.7290632724761963, 'learning_rate': 6.36981081951892e-06, 'epoch': 0.63} +{'loss': 0.9613, 'grad_norm': 1.774246335029602, 'learning_rate': 6.364003432358051e-06, 'epoch': 0.63} +{'loss': 0.9688, 'grad_norm': 2.014669179916382, 'learning_rate': 6.358197457936796e-06, 'epoch': 0.63} +{'loss': 0.9764, 'grad_norm': 1.4978059530258179, 'learning_rate': 6.3523928985110216e-06, 'epoch': 0.63} +{'loss': 0.9232, 'grad_norm': 1.8136916160583496, 'learning_rate': 6.34658975633605e-06, 'epoch': 0.63} +{'loss': 1.005, 'grad_norm': 1.8452619314193726, 'learning_rate': 6.340788033666645e-06, 'epoch': 0.63} +{'loss': 0.965, 'grad_norm': 1.6856024265289307, 'learning_rate': 6.334987732757028e-06, 'epoch': 0.63} +{'loss': 0.9062, 'grad_norm': 1.6409109830856323, 'learning_rate': 6.329188855860859e-06, 'epoch': 0.63} +{'loss': 0.8677, 'grad_norm': 1.7507151365280151, 'learning_rate': 6.32339140523125e-06, 'epoch': 0.63} +{'loss': 0.9892, 'grad_norm': 1.851258635520935, 'learning_rate': 6.317595383120756e-06, 'epoch': 0.63} +{'loss': 0.8947, 'grad_norm': 1.7153377532958984, 'learning_rate': 6.311800791781381e-06, 'epoch': 0.63} +{'loss': 0.9181, 'grad_norm': 1.9791998863220215, 'learning_rate': 6.30600763346457e-06, 'epoch': 0.63} +{'loss': 0.8914, 'grad_norm': 1.7142672538757324, 'learning_rate': 6.300215910421212e-06, 'epoch': 0.63} +{'loss': 1.012, 'grad_norm': 1.868776559829712, 'learning_rate': 6.294425624901638e-06, 'epoch': 0.63} +{'loss': 0.8557, 'grad_norm': 1.4744089841842651, 'learning_rate': 6.288636779155621e-06, 'epoch': 0.63} +{'loss': 0.891, 'grad_norm': 1.9101513624191284, 'learning_rate': 6.282849375432374e-06, 'epoch': 0.63} +{'loss': 0.8874, 'grad_norm': 1.896141529083252, 'learning_rate': 6.277063415980549e-06, 'epoch': 0.63} +{'loss': 0.8807, 'grad_norm': 0.8723871111869812, 'learning_rate': 6.271278903048239e-06, 'epoch': 0.63} +{'loss': 0.8804, 'grad_norm': 1.8348798751831055, 'learning_rate': 6.265495838882973e-06, 'epoch': 0.63} +{'loss': 1.0014, 'grad_norm': 1.8826345205307007, 'learning_rate': 6.2597142257317185e-06, 'epoch': 0.63} +{'loss': 0.8717, 'grad_norm': 1.7536097764968872, 'learning_rate': 6.25393406584088e-06, 'epoch': 0.63} +{'loss': 0.9102, 'grad_norm': 1.7525378465652466, 'learning_rate': 6.248155361456293e-06, 'epoch': 0.63} +{'loss': 0.9505, 'grad_norm': 1.6756343841552734, 'learning_rate': 6.2423781148232345e-06, 'epoch': 0.63} +{'loss': 0.9431, 'grad_norm': 1.8366891145706177, 'learning_rate': 6.236602328186405e-06, 'epoch': 0.63} +{'loss': 1.0018, 'grad_norm': 1.644197702407837, 'learning_rate': 6.230828003789949e-06, 'epoch': 0.63} +{'loss': 0.9748, 'grad_norm': 1.8687825202941895, 'learning_rate': 6.225055143877434e-06, 'epoch': 0.63} +{'loss': 0.8774, 'grad_norm': 1.742172122001648, 'learning_rate': 6.219283750691864e-06, 'epoch': 0.63} +{'loss': 0.9397, 'grad_norm': 1.6542288064956665, 'learning_rate': 6.2135138264756715e-06, 'epoch': 0.63} +{'loss': 0.9293, 'grad_norm': 1.6336995363235474, 'learning_rate': 6.207745373470717e-06, 'epoch': 0.64} +{'loss': 0.9652, 'grad_norm': 1.5556223392486572, 'learning_rate': 6.201978393918291e-06, 'epoch': 0.64} +{'loss': 0.9365, 'grad_norm': 1.7376117706298828, 'learning_rate': 6.196212890059111e-06, 'epoch': 0.64} +{'loss': 0.9605, 'grad_norm': 1.5776464939117432, 'learning_rate': 6.190448864133321e-06, 'epoch': 0.64} +{'loss': 0.9696, 'grad_norm': 1.6487114429473877, 'learning_rate': 6.184686318380488e-06, 'epoch': 0.64} +{'loss': 0.9694, 'grad_norm': 1.8035367727279663, 'learning_rate': 6.17892525503961e-06, 'epoch': 0.64} +{'loss': 0.9965, 'grad_norm': 1.9537098407745361, 'learning_rate': 6.173165676349103e-06, 'epoch': 0.64} +{'loss': 0.9586, 'grad_norm': 1.9025707244873047, 'learning_rate': 6.1674075845468114e-06, 'epoch': 0.64} +{'loss': 0.9575, 'grad_norm': 1.6091078519821167, 'learning_rate': 6.1616509818699975e-06, 'epoch': 0.64} +{'loss': 0.8387, 'grad_norm': 1.9612736701965332, 'learning_rate': 6.155895870555348e-06, 'epoch': 0.64} +{'loss': 0.9071, 'grad_norm': 1.8138504028320312, 'learning_rate': 6.150142252838967e-06, 'epoch': 0.64} +{'loss': 0.914, 'grad_norm': 1.621180772781372, 'learning_rate': 6.144390130956384e-06, 'epoch': 0.64} +{'loss': 0.8889, 'grad_norm': 1.6711097955703735, 'learning_rate': 6.138639507142539e-06, 'epoch': 0.64} +{'loss': 0.9445, 'grad_norm': 1.8578338623046875, 'learning_rate': 6.132890383631796e-06, 'epoch': 0.64} +{'loss': 0.8762, 'grad_norm': 1.7074776887893677, 'learning_rate': 6.127142762657934e-06, 'epoch': 0.64} +{'loss': 1.0149, 'grad_norm': 1.8341341018676758, 'learning_rate': 6.12139664645415e-06, 'epoch': 0.64} +{'loss': 0.8575, 'grad_norm': 1.7302956581115723, 'learning_rate': 6.115652037253054e-06, 'epoch': 0.64} +{'loss': 0.9969, 'grad_norm': 1.8176337480545044, 'learning_rate': 6.109908937286673e-06, 'epoch': 0.64} +{'loss': 0.9672, 'grad_norm': 1.781009316444397, 'learning_rate': 6.104167348786443e-06, 'epoch': 0.64} +{'loss': 0.9859, 'grad_norm': 1.7821024656295776, 'learning_rate': 6.098427273983221e-06, 'epoch': 0.64} +{'loss': 0.9349, 'grad_norm': 1.656535267829895, 'learning_rate': 6.092688715107265e-06, 'epoch': 0.64} +{'loss': 0.9496, 'grad_norm': 1.8177406787872314, 'learning_rate': 6.086951674388252e-06, 'epoch': 0.64} +{'loss': 0.9101, 'grad_norm': 1.6509474515914917, 'learning_rate': 6.0812161540552675e-06, 'epoch': 0.64} +{'loss': 0.8917, 'grad_norm': 1.65707266330719, 'learning_rate': 6.075482156336804e-06, 'epoch': 0.64} +{'loss': 0.9785, 'grad_norm': 1.9060357809066772, 'learning_rate': 6.069749683460765e-06, 'epoch': 0.64} +{'loss': 0.929, 'grad_norm': 1.6193355321884155, 'learning_rate': 6.06401873765446e-06, 'epoch': 0.64} +{'loss': 0.932, 'grad_norm': 1.6463682651519775, 'learning_rate': 6.058289321144608e-06, 'epoch': 0.64} +{'loss': 0.9641, 'grad_norm': 1.9387314319610596, 'learning_rate': 6.052561436157329e-06, 'epoch': 0.64} +{'loss': 0.9367, 'grad_norm': 1.7617419958114624, 'learning_rate': 6.046835084918152e-06, 'epoch': 0.64} +{'loss': 0.9265, 'grad_norm': 1.7443351745605469, 'learning_rate': 6.041110269652005e-06, 'epoch': 0.64} +{'loss': 0.9322, 'grad_norm': 1.7621960639953613, 'learning_rate': 6.0353869925832255e-06, 'epoch': 0.64} +{'loss': 0.9347, 'grad_norm': 1.7700227499008179, 'learning_rate': 6.02966525593555e-06, 'epoch': 0.64} +{'loss': 0.968, 'grad_norm': 1.686730146408081, 'learning_rate': 6.023945061932119e-06, 'epoch': 0.64} +{'loss': 0.8927, 'grad_norm': 0.9725883603096008, 'learning_rate': 6.018226412795469e-06, 'epoch': 0.64} +{'loss': 0.9297, 'grad_norm': 0.9579965472221375, 'learning_rate': 6.0125093107475385e-06, 'epoch': 0.64} +{'loss': 0.9463, 'grad_norm': 2.398865222930908, 'learning_rate': 6.00679375800967e-06, 'epoch': 0.64} +{'loss': 0.899, 'grad_norm': 1.8209846019744873, 'learning_rate': 6.001079756802592e-06, 'epoch': 0.64} +{'loss': 0.9918, 'grad_norm': 1.723073124885559, 'learning_rate': 5.9953673093464425e-06, 'epoch': 0.64} +{'loss': 0.9614, 'grad_norm': 1.6836910247802734, 'learning_rate': 5.98965641786075e-06, 'epoch': 0.64} +{'loss': 0.9084, 'grad_norm': 0.8900152444839478, 'learning_rate': 5.983947084564439e-06, 'epoch': 0.64} +{'loss': 0.9511, 'grad_norm': 1.7993860244750977, 'learning_rate': 5.978239311675826e-06, 'epoch': 0.64} +{'loss': 1.0083, 'grad_norm': 1.782443881034851, 'learning_rate': 5.97253310141263e-06, 'epoch': 0.64} +{'loss': 0.9579, 'grad_norm': 1.8219255208969116, 'learning_rate': 5.966828455991951e-06, 'epoch': 0.64} +{'loss': 0.9017, 'grad_norm': 0.9101649522781372, 'learning_rate': 5.96112537763029e-06, 'epoch': 0.64} +{'loss': 0.937, 'grad_norm': 1.7730344533920288, 'learning_rate': 5.955423868543537e-06, 'epoch': 0.64} +{'loss': 0.9192, 'grad_norm': 1.6564887762069702, 'learning_rate': 5.949723930946967e-06, 'epoch': 0.64} +{'loss': 0.9262, 'grad_norm': 1.6564580202102661, 'learning_rate': 5.944025567055251e-06, 'epoch': 0.64} +{'loss': 0.8811, 'grad_norm': 1.8470607995986938, 'learning_rate': 5.9383287790824475e-06, 'epoch': 0.64} +{'loss': 0.912, 'grad_norm': 1.7679998874664307, 'learning_rate': 5.932633569242e-06, 'epoch': 0.64} +{'loss': 0.9382, 'grad_norm': 1.8303871154785156, 'learning_rate': 5.92693993974674e-06, 'epoch': 0.64} +{'loss': 0.9311, 'grad_norm': 1.7000925540924072, 'learning_rate': 5.921247892808886e-06, 'epoch': 0.64} +{'loss': 0.8935, 'grad_norm': 1.7522213459014893, 'learning_rate': 5.91555743064004e-06, 'epoch': 0.64} +{'loss': 0.9493, 'grad_norm': 1.90994393825531, 'learning_rate': 5.909868555451191e-06, 'epoch': 0.65} +{'loss': 0.8049, 'grad_norm': 1.800954818725586, 'learning_rate': 5.904181269452707e-06, 'epoch': 0.65} +{'loss': 0.8456, 'grad_norm': 1.698570966720581, 'learning_rate': 5.898495574854342e-06, 'epoch': 0.65} +{'loss': 0.9968, 'grad_norm': 1.7127857208251953, 'learning_rate': 5.89281147386523e-06, 'epoch': 0.65} +{'loss': 1.0011, 'grad_norm': 1.8562018871307373, 'learning_rate': 5.887128968693887e-06, 'epoch': 0.65} +{'loss': 0.9383, 'grad_norm': 1.7820154428482056, 'learning_rate': 5.881448061548208e-06, 'epoch': 0.65} +{'loss': 0.9287, 'grad_norm': 1.6744425296783447, 'learning_rate': 5.875768754635469e-06, 'epoch': 0.65} +{'loss': 0.9465, 'grad_norm': 0.9217410087585449, 'learning_rate': 5.870091050162321e-06, 'epoch': 0.65} +{'loss': 0.9189, 'grad_norm': 1.6345635652542114, 'learning_rate': 5.864414950334796e-06, 'epoch': 0.65} +{'loss': 0.8817, 'grad_norm': 1.796751856803894, 'learning_rate': 5.858740457358298e-06, 'epoch': 0.65} +{'loss': 0.9852, 'grad_norm': 1.690312385559082, 'learning_rate': 5.853067573437612e-06, 'epoch': 0.65} +{'loss': 0.9715, 'grad_norm': 1.905022382736206, 'learning_rate': 5.847396300776893e-06, 'epoch': 0.65} +{'loss': 0.909, 'grad_norm': 1.7262775897979736, 'learning_rate': 5.8417266415796745e-06, 'epoch': 0.65} +{'loss': 0.9418, 'grad_norm': 1.5987921953201294, 'learning_rate': 5.836058598048859e-06, 'epoch': 0.65} +{'loss': 0.994, 'grad_norm': 1.8148651123046875, 'learning_rate': 5.830392172386723e-06, 'epoch': 0.65} +{'loss': 0.9567, 'grad_norm': 1.8683109283447266, 'learning_rate': 5.824727366794915e-06, 'epoch': 0.65} +{'loss': 0.952, 'grad_norm': 1.8322832584381104, 'learning_rate': 5.819064183474451e-06, 'epoch': 0.65} +{'loss': 0.9221, 'grad_norm': 0.9256272315979004, 'learning_rate': 5.813402624625722e-06, 'epoch': 0.65} +{'loss': 0.885, 'grad_norm': 0.9572224617004395, 'learning_rate': 5.807742692448483e-06, 'epoch': 0.65} +{'loss': 0.8635, 'grad_norm': 1.7511030435562134, 'learning_rate': 5.802084389141862e-06, 'epoch': 0.65} +{'loss': 0.957, 'grad_norm': 1.8171534538269043, 'learning_rate': 5.796427716904347e-06, 'epoch': 0.65} +{'loss': 0.9543, 'grad_norm': 2.2900426387786865, 'learning_rate': 5.790772677933799e-06, 'epoch': 0.65} +{'loss': 0.9279, 'grad_norm': 1.926742672920227, 'learning_rate': 5.78511927442744e-06, 'epoch': 0.65} +{'loss': 0.998, 'grad_norm': 1.7962862253189087, 'learning_rate': 5.77946750858186e-06, 'epoch': 0.65} +{'loss': 0.9234, 'grad_norm': 1.8198928833007812, 'learning_rate': 5.773817382593008e-06, 'epoch': 0.65} +{'loss': 0.8556, 'grad_norm': 1.6461323499679565, 'learning_rate': 5.7681688986562e-06, 'epoch': 0.65} +{'loss': 0.8976, 'grad_norm': 1.643263816833496, 'learning_rate': 5.7625220589661136e-06, 'epoch': 0.65} +{'loss': 0.8657, 'grad_norm': 1.7362608909606934, 'learning_rate': 5.756876865716787e-06, 'epoch': 0.65} +{'loss': 0.9473, 'grad_norm': 1.6694568395614624, 'learning_rate': 5.751233321101617e-06, 'epoch': 0.65} +{'loss': 0.9171, 'grad_norm': 1.5742807388305664, 'learning_rate': 5.745591427313365e-06, 'epoch': 0.65} +{'loss': 0.9024, 'grad_norm': 1.7873681783676147, 'learning_rate': 5.739951186544141e-06, 'epoch': 0.65} +{'loss': 0.982, 'grad_norm': 1.6715519428253174, 'learning_rate': 5.7343126009854215e-06, 'epoch': 0.65} +{'loss': 0.8946, 'grad_norm': 1.7497960329055786, 'learning_rate': 5.728675672828037e-06, 'epoch': 0.65} +{'loss': 0.9297, 'grad_norm': 1.5573481321334839, 'learning_rate': 5.72304040426217e-06, 'epoch': 0.65} +{'loss': 0.8876, 'grad_norm': 1.7341997623443604, 'learning_rate': 5.717406797477371e-06, 'epoch': 0.65} +{'loss': 0.9212, 'grad_norm': 0.9093543887138367, 'learning_rate': 5.711774854662535e-06, 'epoch': 0.65} +{'loss': 0.8998, 'grad_norm': 1.7846508026123047, 'learning_rate': 5.706144578005908e-06, 'epoch': 0.65} +{'loss': 0.9671, 'grad_norm': 1.7994836568832397, 'learning_rate': 5.700515969695094e-06, 'epoch': 0.65} +{'loss': 0.9491, 'grad_norm': 1.801116704940796, 'learning_rate': 5.694889031917047e-06, 'epoch': 0.65} +{'loss': 0.9842, 'grad_norm': 1.881959319114685, 'learning_rate': 5.689263766858072e-06, 'epoch': 0.65} +{'loss': 0.9265, 'grad_norm': 1.8578566312789917, 'learning_rate': 5.683640176703824e-06, 'epoch': 0.65} +{'loss': 0.9032, 'grad_norm': 1.7497116327285767, 'learning_rate': 5.678018263639309e-06, 'epoch': 0.65} +{'loss': 0.9299, 'grad_norm': 0.8865742087364197, 'learning_rate': 5.672398029848881e-06, 'epoch': 0.65} +{'loss': 0.9556, 'grad_norm': 1.769739031791687, 'learning_rate': 5.666779477516234e-06, 'epoch': 0.65} +{'loss': 0.9054, 'grad_norm': 0.9176982641220093, 'learning_rate': 5.66116260882442e-06, 'epoch': 0.65} +{'loss': 0.9256, 'grad_norm': 1.6735881567001343, 'learning_rate': 5.65554742595583e-06, 'epoch': 0.65} +{'loss': 0.9388, 'grad_norm': 1.5571702718734741, 'learning_rate': 5.6499339310922015e-06, 'epoch': 0.65} +{'loss': 0.8605, 'grad_norm': 1.8811582326889038, 'learning_rate': 5.6443221264146196e-06, 'epoch': 0.65} +{'loss': 0.886, 'grad_norm': 1.8953214883804321, 'learning_rate': 5.638712014103507e-06, 'epoch': 0.65} +{'loss': 0.9831, 'grad_norm': 1.754567265510559, 'learning_rate': 5.633103596338631e-06, 'epoch': 0.65} +{'loss': 0.9411, 'grad_norm': 1.8926575183868408, 'learning_rate': 5.627496875299102e-06, 'epoch': 0.65} +{'loss': 0.8933, 'grad_norm': 1.6930826902389526, 'learning_rate': 5.62189185316337e-06, 'epoch': 0.65} +{'loss': 0.9909, 'grad_norm': 1.8461722135543823, 'learning_rate': 5.616288532109225e-06, 'epoch': 0.66} +{'loss': 0.9731, 'grad_norm': 1.6353554725646973, 'learning_rate': 5.610686914313796e-06, 'epoch': 0.66} +{'loss': 0.9709, 'grad_norm': 1.698771357536316, 'learning_rate': 5.6050870019535496e-06, 'epoch': 0.66} +{'loss': 0.9083, 'grad_norm': 1.9085934162139893, 'learning_rate': 5.599488797204291e-06, 'epoch': 0.66} +{'loss': 0.9615, 'grad_norm': 0.9568070769309998, 'learning_rate': 5.5938923022411615e-06, 'epoch': 0.66} +{'loss': 1.0197, 'grad_norm': 1.7880637645721436, 'learning_rate': 5.588297519238637e-06, 'epoch': 0.66} +{'loss': 0.9767, 'grad_norm': 1.696130394935608, 'learning_rate': 5.582704450370533e-06, 'epoch': 0.66} +{'loss': 0.8901, 'grad_norm': 1.9392369985580444, 'learning_rate': 5.5771130978099896e-06, 'epoch': 0.66} +{'loss': 0.8612, 'grad_norm': 1.7083916664123535, 'learning_rate': 5.571523463729487e-06, 'epoch': 0.66} +{'loss': 1.0245, 'grad_norm': 1.766940712928772, 'learning_rate': 5.565935550300836e-06, 'epoch': 0.66} +{'loss': 0.9535, 'grad_norm': 1.708411693572998, 'learning_rate': 5.560349359695181e-06, 'epoch': 0.66} +{'loss': 0.8221, 'grad_norm': 1.670191764831543, 'learning_rate': 5.554764894082994e-06, 'epoch': 0.66} +{'loss': 0.9612, 'grad_norm': 1.810355305671692, 'learning_rate': 5.549182155634076e-06, 'epoch': 0.66} +{'loss': 0.8933, 'grad_norm': 1.941634178161621, 'learning_rate': 5.543601146517563e-06, 'epoch': 0.66} +{'loss': 0.8816, 'grad_norm': 1.8155558109283447, 'learning_rate': 5.5380218689019125e-06, 'epoch': 0.66} +{'loss': 0.9753, 'grad_norm': 1.644957184791565, 'learning_rate': 5.532444324954911e-06, 'epoch': 0.66} +{'loss': 1.0215, 'grad_norm': 1.6677231788635254, 'learning_rate': 5.526868516843673e-06, 'epoch': 0.66} +{'loss': 0.8978, 'grad_norm': 1.793299674987793, 'learning_rate': 5.521294446734637e-06, 'epoch': 0.66} +{'loss': 0.9936, 'grad_norm': 1.7209322452545166, 'learning_rate': 5.51572211679357e-06, 'epoch': 0.66} +{'loss': 0.8362, 'grad_norm': 1.8610836267471313, 'learning_rate': 5.510151529185554e-06, 'epoch': 0.66} +{'loss': 0.9202, 'grad_norm': 1.8868011236190796, 'learning_rate': 5.504582686075002e-06, 'epoch': 0.66} +{'loss': 0.8545, 'grad_norm': 1.5614392757415771, 'learning_rate': 5.499015589625649e-06, 'epoch': 0.66} +{'loss': 0.9722, 'grad_norm': 1.7764856815338135, 'learning_rate': 5.493450242000546e-06, 'epoch': 0.66} +{'loss': 0.9788, 'grad_norm': 1.8225010633468628, 'learning_rate': 5.4878866453620746e-06, 'epoch': 0.66} +{'loss': 0.9608, 'grad_norm': 1.8619908094406128, 'learning_rate': 5.4823248018719184e-06, 'epoch': 0.66} +{'loss': 0.8999, 'grad_norm': 1.6834542751312256, 'learning_rate': 5.4767647136910984e-06, 'epoch': 0.66} +{'loss': 0.9221, 'grad_norm': 1.8629670143127441, 'learning_rate': 5.471206382979941e-06, 'epoch': 0.66} +{'loss': 0.8565, 'grad_norm': 1.7257126569747925, 'learning_rate': 5.465649811898098e-06, 'epoch': 0.66} +{'loss': 0.8931, 'grad_norm': 1.7740085124969482, 'learning_rate': 5.460095002604533e-06, 'epoch': 0.66} +{'loss': 0.953, 'grad_norm': 1.7363128662109375, 'learning_rate': 5.4545419572575265e-06, 'epoch': 0.66} +{'loss': 0.9635, 'grad_norm': 1.8454254865646362, 'learning_rate': 5.4489906780146716e-06, 'epoch': 0.66} +{'loss': 0.9736, 'grad_norm': 1.7508928775787354, 'learning_rate': 5.44344116703288e-06, 'epoch': 0.66} +{'loss': 0.9484, 'grad_norm': 1.9192637205123901, 'learning_rate': 5.43789342646837e-06, 'epoch': 0.66} +{'loss': 0.9001, 'grad_norm': 0.9085773825645447, 'learning_rate': 5.432347458476678e-06, 'epoch': 0.66} +{'loss': 0.9142, 'grad_norm': 0.9292737245559692, 'learning_rate': 5.4268032652126465e-06, 'epoch': 0.66} +{'loss': 0.9434, 'grad_norm': 1.6490066051483154, 'learning_rate': 5.421260848830432e-06, 'epoch': 0.66} +{'loss': 0.9301, 'grad_norm': 1.6521810293197632, 'learning_rate': 5.415720211483499e-06, 'epoch': 0.66} +{'loss': 0.8729, 'grad_norm': 2.8676137924194336, 'learning_rate': 5.410181355324622e-06, 'epoch': 0.66} +{'loss': 0.9586, 'grad_norm': 1.9626593589782715, 'learning_rate': 5.404644282505882e-06, 'epoch': 0.66} +{'loss': 0.9604, 'grad_norm': 1.8076010942459106, 'learning_rate': 5.3991089951786725e-06, 'epoch': 0.66} +{'loss': 0.9466, 'grad_norm': 1.7243555784225464, 'learning_rate': 5.393575495493679e-06, 'epoch': 0.66} +{'loss': 0.8858, 'grad_norm': 1.7615028619766235, 'learning_rate': 5.388043785600907e-06, 'epoch': 0.66} +{'loss': 0.9664, 'grad_norm': 1.7679067850112915, 'learning_rate': 5.382513867649663e-06, 'epoch': 0.66} +{'loss': 0.863, 'grad_norm': 1.7351717948913574, 'learning_rate': 5.376985743788552e-06, 'epoch': 0.66} +{'loss': 0.9375, 'grad_norm': 1.8214083909988403, 'learning_rate': 5.37145941616549e-06, 'epoch': 0.66} +{'loss': 0.9515, 'grad_norm': 1.7770386934280396, 'learning_rate': 5.365934886927687e-06, 'epoch': 0.66} +{'loss': 0.9284, 'grad_norm': 1.793613314628601, 'learning_rate': 5.360412158221661e-06, 'epoch': 0.66} +{'loss': 0.975, 'grad_norm': 1.8269356489181519, 'learning_rate': 5.354891232193225e-06, 'epoch': 0.66} +{'loss': 0.9774, 'grad_norm': 1.562092661857605, 'learning_rate': 5.349372110987496e-06, 'epoch': 0.66} +{'loss': 0.914, 'grad_norm': 1.8016637563705444, 'learning_rate': 5.343854796748886e-06, 'epoch': 0.66} +{'loss': 0.9026, 'grad_norm': 1.7373336553573608, 'learning_rate': 5.338339291621109e-06, 'epoch': 0.66} +{'loss': 0.9106, 'grad_norm': 0.964300811290741, 'learning_rate': 5.332825597747171e-06, 'epoch': 0.66} +{'loss': 0.992, 'grad_norm': 1.9491894245147705, 'learning_rate': 5.32731371726938e-06, 'epoch': 0.67} +{'loss': 0.9094, 'grad_norm': 1.922573208808899, 'learning_rate': 5.321803652329334e-06, 'epoch': 0.67} +{'loss': 0.9413, 'grad_norm': 1.7878295183181763, 'learning_rate': 5.316295405067929e-06, 'epoch': 0.67} +{'loss': 0.9004, 'grad_norm': 1.5722486972808838, 'learning_rate': 5.3107889776253544e-06, 'epoch': 0.67} +{'loss': 0.9912, 'grad_norm': 2.009685516357422, 'learning_rate': 5.305284372141095e-06, 'epoch': 0.67} +{'loss': 1.0165, 'grad_norm': 1.6209447383880615, 'learning_rate': 5.299781590753916e-06, 'epoch': 0.67} +{'loss': 0.8787, 'grad_norm': 1.9641523361206055, 'learning_rate': 5.294280635601889e-06, 'epoch': 0.67} +{'loss': 0.9928, 'grad_norm': 1.792001724243164, 'learning_rate': 5.288781508822367e-06, 'epoch': 0.67} +{'loss': 0.9366, 'grad_norm': 1.7955774068832397, 'learning_rate': 5.283284212551997e-06, 'epoch': 0.67} +{'loss': 0.9377, 'grad_norm': 1.6494568586349487, 'learning_rate': 5.277788748926711e-06, 'epoch': 0.67} +{'loss': 0.9651, 'grad_norm': 2.3010692596435547, 'learning_rate': 5.2722951200817315e-06, 'epoch': 0.67} +{'loss': 0.8703, 'grad_norm': 1.67125403881073, 'learning_rate': 5.2668033281515676e-06, 'epoch': 0.67} +{'loss': 0.9545, 'grad_norm': 1.6498452425003052, 'learning_rate': 5.2613133752700145e-06, 'epoch': 0.67} +{'loss': 0.8312, 'grad_norm': 1.6180171966552734, 'learning_rate': 5.2558252635701535e-06, 'epoch': 0.67} +{'loss': 0.8749, 'grad_norm': 1.7080947160720825, 'learning_rate': 5.250338995184348e-06, 'epoch': 0.67} +{'loss': 0.9261, 'grad_norm': 1.50067138671875, 'learning_rate': 5.244854572244249e-06, 'epoch': 0.67} +{'loss': 0.8962, 'grad_norm': 1.7629857063293457, 'learning_rate': 5.239371996880786e-06, 'epoch': 0.67} +{'loss': 0.9602, 'grad_norm': 1.6186702251434326, 'learning_rate': 5.2338912712241785e-06, 'epoch': 0.67} +{'loss': 0.9261, 'grad_norm': 1.7863569259643555, 'learning_rate': 5.228412397403916e-06, 'epoch': 0.67} +{'loss': 1.0113, 'grad_norm': 1.749067783355713, 'learning_rate': 5.222935377548777e-06, 'epoch': 0.67} +{'loss': 0.9403, 'grad_norm': 1.7121068239212036, 'learning_rate': 5.217460213786822e-06, 'epoch': 0.67} +{'loss': 0.9164, 'grad_norm': 1.9137980937957764, 'learning_rate': 5.211986908245374e-06, 'epoch': 0.67} +{'loss': 1.0324, 'grad_norm': 1.6937050819396973, 'learning_rate': 5.206515463051054e-06, 'epoch': 0.67} +{'loss': 1.0024, 'grad_norm': 1.6444754600524902, 'learning_rate': 5.201045880329749e-06, 'epoch': 0.67} +{'loss': 0.8336, 'grad_norm': 1.6089844703674316, 'learning_rate': 5.195578162206627e-06, 'epoch': 0.67} +{'loss': 1.0179, 'grad_norm': 1.8064433336257935, 'learning_rate': 5.190112310806126e-06, 'epoch': 0.67} +{'loss': 0.8848, 'grad_norm': 0.8969301581382751, 'learning_rate': 5.184648328251965e-06, 'epoch': 0.67} +{'loss': 0.9681, 'grad_norm': 1.7852959632873535, 'learning_rate': 5.179186216667133e-06, 'epoch': 0.67} +{'loss': 0.967, 'grad_norm': 1.7706224918365479, 'learning_rate': 5.1737259781738934e-06, 'epoch': 0.67} +{'loss': 0.8792, 'grad_norm': 1.704100489616394, 'learning_rate': 5.1682676148937805e-06, 'epoch': 0.67} +{'loss': 0.9237, 'grad_norm': 2.179177761077881, 'learning_rate': 5.1628111289476025e-06, 'epoch': 0.67} +{'loss': 0.9175, 'grad_norm': 2.4599878787994385, 'learning_rate': 5.1573565224554365e-06, 'epoch': 0.67} +{'loss': 0.9209, 'grad_norm': 1.5377827882766724, 'learning_rate': 5.151903797536631e-06, 'epoch': 0.67} +{'loss': 0.9393, 'grad_norm': 1.8712668418884277, 'learning_rate': 5.146452956309798e-06, 'epoch': 0.67} +{'loss': 0.939, 'grad_norm': 1.6868737936019897, 'learning_rate': 5.141004000892824e-06, 'epoch': 0.67} +{'loss': 0.95, 'grad_norm': 1.6263906955718994, 'learning_rate': 5.135556933402862e-06, 'epoch': 0.67} +{'loss': 0.9395, 'grad_norm': 0.9104430079460144, 'learning_rate': 5.130111755956327e-06, 'epoch': 0.67} +{'loss': 0.9022, 'grad_norm': 1.6872116327285767, 'learning_rate': 5.1246684706689074e-06, 'epoch': 0.67} +{'loss': 0.8975, 'grad_norm': 1.6843801736831665, 'learning_rate': 5.119227079655544e-06, 'epoch': 0.67} +{'loss': 0.9914, 'grad_norm': 1.589044213294983, 'learning_rate': 5.1137875850304545e-06, 'epoch': 0.67} +{'loss': 0.9818, 'grad_norm': 1.6750380992889404, 'learning_rate': 5.108349988907111e-06, 'epoch': 0.67} +{'loss': 0.9279, 'grad_norm': 1.7462462186813354, 'learning_rate': 5.102914293398252e-06, 'epoch': 0.67} +{'loss': 0.9865, 'grad_norm': 1.7899361848831177, 'learning_rate': 5.097480500615879e-06, 'epoch': 0.67} +{'loss': 0.9656, 'grad_norm': 1.9460341930389404, 'learning_rate': 5.092048612671252e-06, 'epoch': 0.67} +{'loss': 0.9661, 'grad_norm': 1.8290739059448242, 'learning_rate': 5.086618631674888e-06, 'epoch': 0.67} +{'loss': 0.9707, 'grad_norm': 1.741752028465271, 'learning_rate': 5.081190559736569e-06, 'epoch': 0.67} +{'loss': 0.9626, 'grad_norm': 1.849439263343811, 'learning_rate': 5.075764398965331e-06, 'epoch': 0.67} +{'loss': 0.8269, 'grad_norm': 1.67465341091156, 'learning_rate': 5.070340151469468e-06, 'epoch': 0.67} +{'loss': 0.9131, 'grad_norm': 1.7456953525543213, 'learning_rate': 5.064917819356532e-06, 'epoch': 0.67} +{'loss': 0.9644, 'grad_norm': 1.6493909358978271, 'learning_rate': 5.059497404733331e-06, 'epoch': 0.67} +{'loss': 0.9325, 'grad_norm': 1.7014745473861694, 'learning_rate': 5.054078909705926e-06, 'epoch': 0.67} +{'loss': 0.8765, 'grad_norm': 0.8820096254348755, 'learning_rate': 5.048662336379635e-06, 'epoch': 0.67} +{'loss': 0.9149, 'grad_norm': 1.935365080833435, 'learning_rate': 5.043247686859024e-06, 'epoch': 0.68} +{'loss': 1.0192, 'grad_norm': 1.8217785358428955, 'learning_rate': 5.037834963247922e-06, 'epoch': 0.68} +{'loss': 0.97, 'grad_norm': 1.6435409784317017, 'learning_rate': 5.032424167649394e-06, 'epoch': 0.68} +{'loss': 0.8559, 'grad_norm': 1.8585370779037476, 'learning_rate': 5.027015302165768e-06, 'epoch': 0.68} +{'loss': 0.9372, 'grad_norm': 1.7686553001403809, 'learning_rate': 5.021608368898621e-06, 'epoch': 0.68} +{'loss': 0.9675, 'grad_norm': 1.9385324716567993, 'learning_rate': 5.016203369948774e-06, 'epoch': 0.68} +{'loss': 0.9109, 'grad_norm': 0.9401078224182129, 'learning_rate': 5.0108003074163005e-06, 'epoch': 0.68} +{'loss': 1.0093, 'grad_norm': 1.8436037302017212, 'learning_rate': 5.005399183400521e-06, 'epoch': 0.68} +{'loss': 0.8998, 'grad_norm': 1.6753051280975342, 'learning_rate': 5.000000000000003e-06, 'epoch': 0.68} +{'loss': 0.8507, 'grad_norm': 1.661201000213623, 'learning_rate': 4.994602759312557e-06, 'epoch': 0.68} +{'loss': 0.9459, 'grad_norm': 1.5893633365631104, 'learning_rate': 4.989207463435244e-06, 'epoch': 0.68} +{'loss': 0.972, 'grad_norm': 1.7016270160675049, 'learning_rate': 4.9838141144643636e-06, 'epoch': 0.68} +{'loss': 0.8667, 'grad_norm': 0.8540983200073242, 'learning_rate': 4.978422714495465e-06, 'epoch': 0.68} +{'loss': 0.8754, 'grad_norm': 1.751617193222046, 'learning_rate': 4.973033265623333e-06, 'epoch': 0.68} +{'loss': 0.9455, 'grad_norm': 0.883199155330658, 'learning_rate': 4.967645769942e-06, 'epoch': 0.68} +{'loss': 0.9152, 'grad_norm': 0.8881901502609253, 'learning_rate': 4.962260229544738e-06, 'epoch': 0.68} +{'loss': 0.9798, 'grad_norm': 1.5736958980560303, 'learning_rate': 4.956876646524059e-06, 'epoch': 0.68} +{'loss': 0.9015, 'grad_norm': 1.7621546983718872, 'learning_rate': 4.951495022971713e-06, 'epoch': 0.68} +{'loss': 0.8964, 'grad_norm': 1.5872132778167725, 'learning_rate': 4.946115360978696e-06, 'epoch': 0.68} +{'loss': 0.9937, 'grad_norm': 1.7109295129776, 'learning_rate': 4.940737662635227e-06, 'epoch': 0.68} +{'loss': 0.9399, 'grad_norm': 1.647711157798767, 'learning_rate': 4.935361930030774e-06, 'epoch': 0.68} +{'loss': 0.9042, 'grad_norm': 1.7020493745803833, 'learning_rate': 4.929988165254041e-06, 'epoch': 0.68} +{'loss': 0.9086, 'grad_norm': 1.5348321199417114, 'learning_rate': 4.924616370392962e-06, 'epoch': 0.68} +{'loss': 0.8863, 'grad_norm': 1.6763864755630493, 'learning_rate': 4.919246547534709e-06, 'epoch': 0.68} +{'loss': 0.9043, 'grad_norm': 1.560278296470642, 'learning_rate': 4.9138786987656865e-06, 'epoch': 0.68} +{'loss': 0.8813, 'grad_norm': 1.7472281455993652, 'learning_rate': 4.908512826171533e-06, 'epoch': 0.68} +{'loss': 0.9644, 'grad_norm': 1.7941151857376099, 'learning_rate': 4.9031489318371195e-06, 'epoch': 0.68} +{'loss': 0.9145, 'grad_norm': 1.738924503326416, 'learning_rate': 4.897787017846547e-06, 'epoch': 0.68} +{'loss': 0.9093, 'grad_norm': 0.9181484580039978, 'learning_rate': 4.892427086283147e-06, 'epoch': 0.68} +{'loss': 0.9074, 'grad_norm': 1.7104320526123047, 'learning_rate': 4.887069139229481e-06, 'epoch': 0.68} +{'loss': 0.9201, 'grad_norm': 1.7616809606552124, 'learning_rate': 4.881713178767341e-06, 'epoch': 0.68} +{'loss': 0.8797, 'grad_norm': 1.7111355066299438, 'learning_rate': 4.8763592069777454e-06, 'epoch': 0.68} +{'loss': 0.9595, 'grad_norm': 1.5863111019134521, 'learning_rate': 4.87100722594094e-06, 'epoch': 0.68} +{'loss': 0.9053, 'grad_norm': 1.6166331768035889, 'learning_rate': 4.865657237736397e-06, 'epoch': 0.68} +{'loss': 0.9915, 'grad_norm': 1.7800498008728027, 'learning_rate': 4.860309244442821e-06, 'epoch': 0.68} +{'loss': 0.9524, 'grad_norm': 2.0758252143859863, 'learning_rate': 4.854963248138121e-06, 'epoch': 0.68} +{'loss': 0.9392, 'grad_norm': 1.8680355548858643, 'learning_rate': 4.849619250899458e-06, 'epoch': 0.68} +{'loss': 0.9144, 'grad_norm': 1.9671584367752075, 'learning_rate': 4.844277254803198e-06, 'epoch': 0.68} +{'loss': 0.9043, 'grad_norm': 1.6354202032089233, 'learning_rate': 4.838937261924933e-06, 'epoch': 0.68} +{'loss': 0.9897, 'grad_norm': 1.802881121635437, 'learning_rate': 4.83359927433948e-06, 'epoch': 0.68} +{'loss': 0.9313, 'grad_norm': 1.6808555126190186, 'learning_rate': 4.8282632941208725e-06, 'epoch': 0.68} +{'loss': 0.8836, 'grad_norm': 1.6302028894424438, 'learning_rate': 4.822929323342369e-06, 'epoch': 0.68} +{'loss': 1.0089, 'grad_norm': 1.6222329139709473, 'learning_rate': 4.817597364076441e-06, 'epoch': 0.68} +{'loss': 0.9048, 'grad_norm': 1.8248558044433594, 'learning_rate': 4.812267418394784e-06, 'epoch': 0.68} +{'loss': 0.9519, 'grad_norm': 1.7333405017852783, 'learning_rate': 4.806939488368308e-06, 'epoch': 0.68} +{'loss': 0.9824, 'grad_norm': 1.7828094959259033, 'learning_rate': 4.8016135760671435e-06, 'epoch': 0.68} +{'loss': 0.9057, 'grad_norm': 1.839829921722412, 'learning_rate': 4.796289683560631e-06, 'epoch': 0.68} +{'loss': 0.9751, 'grad_norm': 1.8601003885269165, 'learning_rate': 4.790967812917334e-06, 'epoch': 0.68} +{'loss': 0.9759, 'grad_norm': 1.6036670207977295, 'learning_rate': 4.78564796620502e-06, 'epoch': 0.68} +{'loss': 0.891, 'grad_norm': 1.7320500612258911, 'learning_rate': 4.7803301454906804e-06, 'epoch': 0.68} +{'loss': 0.8745, 'grad_norm': 1.685636043548584, 'learning_rate': 4.775014352840512e-06, 'epoch': 0.68} +{'loss': 0.9065, 'grad_norm': 0.9304041862487793, 'learning_rate': 4.76970059031993e-06, 'epoch': 0.68} +{'loss': 0.9304, 'grad_norm': 1.785025954246521, 'learning_rate': 4.764388859993556e-06, 'epoch': 0.69} +{'loss': 0.9175, 'grad_norm': 1.6687031984329224, 'learning_rate': 4.759079163925223e-06, 'epoch': 0.69} +{'loss': 0.9368, 'grad_norm': 1.784484624862671, 'learning_rate': 4.753771504177976e-06, 'epoch': 0.69} +{'loss': 0.893, 'grad_norm': 1.7726801633834839, 'learning_rate': 4.7484658828140665e-06, 'epoch': 0.69} +{'loss': 0.962, 'grad_norm': 1.666787028312683, 'learning_rate': 4.743162301894952e-06, 'epoch': 0.69} +{'loss': 0.9351, 'grad_norm': 0.9010095000267029, 'learning_rate': 4.7378607634813045e-06, 'epoch': 0.69} +{'loss': 0.966, 'grad_norm': 1.8619632720947266, 'learning_rate': 4.732561269632992e-06, 'epoch': 0.69} +{'loss': 0.9024, 'grad_norm': 1.680849313735962, 'learning_rate': 4.727263822409099e-06, 'epoch': 0.69} +{'loss': 0.956, 'grad_norm': 1.975286602973938, 'learning_rate': 4.7219684238679066e-06, 'epoch': 0.69} +{'loss': 0.9336, 'grad_norm': 1.6593296527862549, 'learning_rate': 4.716675076066902e-06, 'epoch': 0.69} +{'loss': 0.9569, 'grad_norm': 1.5858948230743408, 'learning_rate': 4.711383781062779e-06, 'epoch': 0.69} +{'loss': 0.9448, 'grad_norm': 0.8995071053504944, 'learning_rate': 4.706094540911429e-06, 'epoch': 0.69} +{'loss': 1.0261, 'grad_norm': 1.742321252822876, 'learning_rate': 4.700807357667953e-06, 'epoch': 0.69} +{'loss': 0.8509, 'grad_norm': 1.8734294176101685, 'learning_rate': 4.6955222333866365e-06, 'epoch': 0.69} +{'loss': 0.909, 'grad_norm': 1.8507840633392334, 'learning_rate': 4.6902391701209826e-06, 'epoch': 0.69} +{'loss': 0.897, 'grad_norm': 1.7655754089355469, 'learning_rate': 4.684958169923684e-06, 'epoch': 0.69} +{'loss': 0.9472, 'grad_norm': 1.6593842506408691, 'learning_rate': 4.679679234846636e-06, 'epoch': 0.69} +{'loss': 0.9566, 'grad_norm': 1.71574068069458, 'learning_rate': 4.674402366940927e-06, 'epoch': 0.69} +{'loss': 0.9267, 'grad_norm': 1.632213830947876, 'learning_rate': 4.669127568256851e-06, 'epoch': 0.69} +{'loss': 0.9873, 'grad_norm': 1.7090871334075928, 'learning_rate': 4.663854840843885e-06, 'epoch': 0.69} +{'loss': 0.9158, 'grad_norm': 1.8156142234802246, 'learning_rate': 4.658584186750713e-06, 'epoch': 0.69} +{'loss': 0.8719, 'grad_norm': 1.6955251693725586, 'learning_rate': 4.6533156080252076e-06, 'epoch': 0.69} +{'loss': 0.9099, 'grad_norm': 1.5604935884475708, 'learning_rate': 4.648049106714434e-06, 'epoch': 0.69} +{'loss': 0.9191, 'grad_norm': 1.849888801574707, 'learning_rate': 4.642784684864656e-06, 'epoch': 0.69} +{'loss': 0.9007, 'grad_norm': 1.8071506023406982, 'learning_rate': 4.637522344521323e-06, 'epoch': 0.69} +{'loss': 0.9467, 'grad_norm': 2.4560303688049316, 'learning_rate': 4.63226208772908e-06, 'epoch': 0.69} +{'loss': 0.977, 'grad_norm': 1.7255818843841553, 'learning_rate': 4.627003916531761e-06, 'epoch': 0.69} +{'loss': 0.9199, 'grad_norm': 1.5953834056854248, 'learning_rate': 4.621747832972389e-06, 'epoch': 0.69} +{'loss': 0.9379, 'grad_norm': 1.7392430305480957, 'learning_rate': 4.616493839093179e-06, 'epoch': 0.69} +{'loss': 0.9296, 'grad_norm': 1.8367292881011963, 'learning_rate': 4.6112419369355255e-06, 'epoch': 0.69} +{'loss': 0.9808, 'grad_norm': 1.7075905799865723, 'learning_rate': 4.60599212854002e-06, 'epoch': 0.69} +{'loss': 0.895, 'grad_norm': 1.8431198596954346, 'learning_rate': 4.600744415946438e-06, 'epoch': 0.69} +{'loss': 0.949, 'grad_norm': 0.9042073488235474, 'learning_rate': 4.595498801193736e-06, 'epoch': 0.69} +{'loss': 0.9352, 'grad_norm': 1.8543001413345337, 'learning_rate': 4.590255286320062e-06, 'epoch': 0.69} +{'loss': 0.9901, 'grad_norm': 1.6687097549438477, 'learning_rate': 4.5850138733627435e-06, 'epoch': 0.69} +{'loss': 0.935, 'grad_norm': 1.6543142795562744, 'learning_rate': 4.5797745643582935e-06, 'epoch': 0.69} +{'loss': 0.9461, 'grad_norm': 1.7264105081558228, 'learning_rate': 4.5745373613424075e-06, 'epoch': 0.69} +{'loss': 1.0417, 'grad_norm': 1.637157678604126, 'learning_rate': 4.5693022663499584e-06, 'epoch': 0.69} +{'loss': 0.957, 'grad_norm': 1.773926854133606, 'learning_rate': 4.564069281415009e-06, 'epoch': 0.69} +{'loss': 0.9037, 'grad_norm': 1.6433608531951904, 'learning_rate': 4.5588384085707925e-06, 'epoch': 0.69} +{'loss': 0.9686, 'grad_norm': 1.6686745882034302, 'learning_rate': 4.5536096498497295e-06, 'epoch': 0.69} +{'loss': 0.9948, 'grad_norm': 1.7095450162887573, 'learning_rate': 4.548383007283412e-06, 'epoch': 0.69} +{'loss': 0.9189, 'grad_norm': 1.7358208894729614, 'learning_rate': 4.543158482902617e-06, 'epoch': 0.69} +{'loss': 0.9229, 'grad_norm': 1.6004054546356201, 'learning_rate': 4.537936078737291e-06, 'epoch': 0.69} +{'loss': 0.9451, 'grad_norm': 1.7015902996063232, 'learning_rate': 4.532715796816565e-06, 'epoch': 0.69} +{'loss': 0.9518, 'grad_norm': 1.6692826747894287, 'learning_rate': 4.52749763916874e-06, 'epoch': 0.69} +{'loss': 0.9578, 'grad_norm': 1.636660099029541, 'learning_rate': 4.522281607821288e-06, 'epoch': 0.69} +{'loss': 0.9519, 'grad_norm': 1.773856520652771, 'learning_rate': 4.517067704800864e-06, 'epoch': 0.69} +{'loss': 0.9707, 'grad_norm': 1.764016032218933, 'learning_rate': 4.511855932133289e-06, 'epoch': 0.69} +{'loss': 0.9264, 'grad_norm': 1.8073703050613403, 'learning_rate': 4.506646291843561e-06, 'epoch': 0.69} +{'loss': 0.9832, 'grad_norm': 1.658465027809143, 'learning_rate': 4.501438785955845e-06, 'epoch': 0.69} +{'loss': 0.9859, 'grad_norm': 1.8624742031097412, 'learning_rate': 4.496233416493481e-06, 'epoch': 0.69} +{'loss': 0.9038, 'grad_norm': 1.6144973039627075, 'learning_rate': 4.491030185478976e-06, 'epoch': 0.7} +{'loss': 0.9668, 'grad_norm': 1.7121502161026, 'learning_rate': 4.4858290949340065e-06, 'epoch': 0.7} +{'loss': 0.883, 'grad_norm': 1.5614014863967896, 'learning_rate': 4.480630146879419e-06, 'epoch': 0.7} +{'loss': 0.8828, 'grad_norm': 1.6696916818618774, 'learning_rate': 4.475433343335225e-06, 'epoch': 0.7} +{'loss': 0.8706, 'grad_norm': 1.609911322593689, 'learning_rate': 4.470238686320606e-06, 'epoch': 0.7} +{'loss': 0.8847, 'grad_norm': 1.638105034828186, 'learning_rate': 4.4650461778539055e-06, 'epoch': 0.7} +{'loss': 0.9502, 'grad_norm': 1.9332939386367798, 'learning_rate': 4.459855819952635e-06, 'epoch': 0.7} +{'loss': 0.9516, 'grad_norm': 1.8208930492401123, 'learning_rate': 4.454667614633471e-06, 'epoch': 0.7} +{'loss': 0.8994, 'grad_norm': 0.9762057662010193, 'learning_rate': 4.449481563912252e-06, 'epoch': 0.7} +{'loss': 0.9203, 'grad_norm': 1.7704150676727295, 'learning_rate': 4.444297669803981e-06, 'epoch': 0.7} +{'loss': 0.9927, 'grad_norm': 1.9654228687286377, 'learning_rate': 4.439115934322816e-06, 'epoch': 0.7} +{'loss': 0.9362, 'grad_norm': 0.9521400332450867, 'learning_rate': 4.433936359482087e-06, 'epoch': 0.7} +{'loss': 0.9587, 'grad_norm': 1.901347041130066, 'learning_rate': 4.428758947294278e-06, 'epoch': 0.7} +{'loss': 0.8535, 'grad_norm': 1.8210316896438599, 'learning_rate': 4.423583699771036e-06, 'epoch': 0.7} +{'loss': 0.9578, 'grad_norm': 1.7348170280456543, 'learning_rate': 4.418410618923163e-06, 'epoch': 0.7} +{'loss': 0.9771, 'grad_norm': 1.6851931810379028, 'learning_rate': 4.413239706760621e-06, 'epoch': 0.7} +{'loss': 0.8785, 'grad_norm': 1.746250867843628, 'learning_rate': 4.408070965292534e-06, 'epoch': 0.7} +{'loss': 0.9059, 'grad_norm': 1.7753324508666992, 'learning_rate': 4.402904396527172e-06, 'epoch': 0.7} +{'loss': 0.9033, 'grad_norm': 0.9282997250556946, 'learning_rate': 4.397740002471973e-06, 'epoch': 0.7} +{'loss': 0.9376, 'grad_norm': 1.7341110706329346, 'learning_rate': 4.392577785133521e-06, 'epoch': 0.7} +{'loss': 1.0277, 'grad_norm': 1.8804572820663452, 'learning_rate': 4.387417746517557e-06, 'epoch': 0.7} +{'loss': 0.9101, 'grad_norm': 1.758665680885315, 'learning_rate': 4.382259888628978e-06, 'epoch': 0.7} +{'loss': 0.9527, 'grad_norm': 1.9642398357391357, 'learning_rate': 4.377104213471831e-06, 'epoch': 0.7} +{'loss': 0.8437, 'grad_norm': 0.900040864944458, 'learning_rate': 4.371950723049314e-06, 'epoch': 0.7} +{'loss': 0.9206, 'grad_norm': 1.7725212574005127, 'learning_rate': 4.3667994193637794e-06, 'epoch': 0.7} +{'loss': 0.8709, 'grad_norm': 1.7199581861495972, 'learning_rate': 4.361650304416728e-06, 'epoch': 0.7} +{'loss': 0.9527, 'grad_norm': 2.0091516971588135, 'learning_rate': 4.356503380208812e-06, 'epoch': 0.7} +{'loss': 0.8468, 'grad_norm': 1.7010260820388794, 'learning_rate': 4.351358648739826e-06, 'epoch': 0.7} +{'loss': 0.9051, 'grad_norm': 1.919560194015503, 'learning_rate': 4.34621611200872e-06, 'epoch': 0.7} +{'loss': 0.9027, 'grad_norm': 1.8048046827316284, 'learning_rate': 4.3410757720135886e-06, 'epoch': 0.7} +{'loss': 0.9475, 'grad_norm': 1.7333379983901978, 'learning_rate': 4.335937630751675e-06, 'epoch': 0.7} +{'loss': 0.9146, 'grad_norm': 1.6370880603790283, 'learning_rate': 4.330801690219361e-06, 'epoch': 0.7} +{'loss': 0.883, 'grad_norm': 1.6574703454971313, 'learning_rate': 4.325667952412184e-06, 'epoch': 0.7} +{'loss': 0.9785, 'grad_norm': 1.7584916353225708, 'learning_rate': 4.320536419324816e-06, 'epoch': 0.7} +{'loss': 0.9266, 'grad_norm': 1.7436972856521606, 'learning_rate': 4.315407092951078e-06, 'epoch': 0.7} +{'loss': 0.9134, 'grad_norm': 1.6227761507034302, 'learning_rate': 4.310279975283932e-06, 'epoch': 0.7} +{'loss': 0.9518, 'grad_norm': 1.8522777557373047, 'learning_rate': 4.305155068315481e-06, 'epoch': 0.7} +{'loss': 0.9379, 'grad_norm': 1.8011834621429443, 'learning_rate': 4.30003237403697e-06, 'epoch': 0.7} +{'loss': 1.0526, 'grad_norm': 1.920276403427124, 'learning_rate': 4.2949118944387846e-06, 'epoch': 0.7} +{'loss': 0.9917, 'grad_norm': 1.7253565788269043, 'learning_rate': 4.289793631510449e-06, 'epoch': 0.7} +{'loss': 0.9938, 'grad_norm': 2.140216827392578, 'learning_rate': 4.284677587240625e-06, 'epoch': 0.7} +{'loss': 0.9475, 'grad_norm': 1.6234476566314697, 'learning_rate': 4.2795637636171165e-06, 'epoch': 0.7} +{'loss': 0.9402, 'grad_norm': 1.6648706197738647, 'learning_rate': 4.27445216262686e-06, 'epoch': 0.7} +{'loss': 0.8974, 'grad_norm': 1.7492082118988037, 'learning_rate': 4.2693427862559335e-06, 'epoch': 0.7} +{'loss': 0.9874, 'grad_norm': 1.683445692062378, 'learning_rate': 4.264235636489542e-06, 'epoch': 0.7} +{'loss': 0.9309, 'grad_norm': 1.6183136701583862, 'learning_rate': 4.2591307153120335e-06, 'epoch': 0.7} +{'loss': 1.0304, 'grad_norm': 1.6791754961013794, 'learning_rate': 4.254028024706887e-06, 'epoch': 0.7} +{'loss': 0.8308, 'grad_norm': 1.7862948179244995, 'learning_rate': 4.248927566656717e-06, 'epoch': 0.7} +{'loss': 0.8899, 'grad_norm': 1.8560540676116943, 'learning_rate': 4.2438293431432665e-06, 'epoch': 0.7} +{'loss': 1.0277, 'grad_norm': 1.8937824964523315, 'learning_rate': 4.238733356147414e-06, 'epoch': 0.7} +{'loss': 0.9778, 'grad_norm': 1.8942151069641113, 'learning_rate': 4.233639607649168e-06, 'epoch': 0.7} +{'loss': 0.845, 'grad_norm': 1.9937940835952759, 'learning_rate': 4.228548099627665e-06, 'epoch': 0.7} +{'loss': 1.0314, 'grad_norm': 1.7345616817474365, 'learning_rate': 4.223458834061175e-06, 'epoch': 0.71} +{'loss': 0.874, 'grad_norm': 1.7550207376480103, 'learning_rate': 4.218371812927092e-06, 'epoch': 0.71} +{'loss': 0.8794, 'grad_norm': 1.7918344736099243, 'learning_rate': 4.213287038201943e-06, 'epoch': 0.71} +{'loss': 0.936, 'grad_norm': 1.5447994470596313, 'learning_rate': 4.208204511861379e-06, 'epoch': 0.71} +{'loss': 0.9427, 'grad_norm': 1.9018646478652954, 'learning_rate': 4.203124235880179e-06, 'epoch': 0.71} +{'loss': 0.9258, 'grad_norm': 1.8416558504104614, 'learning_rate': 4.198046212232243e-06, 'epoch': 0.71} +{'loss': 0.9359, 'grad_norm': 1.9400453567504883, 'learning_rate': 4.192970442890602e-06, 'epoch': 0.71} +{'loss': 0.8209, 'grad_norm': 1.727979302406311, 'learning_rate': 4.187896929827414e-06, 'epoch': 0.71} +{'loss': 0.8929, 'grad_norm': 0.8893999457359314, 'learning_rate': 4.182825675013945e-06, 'epoch': 0.71} +{'loss': 0.8953, 'grad_norm': 1.7074863910675049, 'learning_rate': 4.1777566804206e-06, 'epoch': 0.71} +{'loss': 0.9017, 'grad_norm': 1.8171278238296509, 'learning_rate': 4.172689948016896e-06, 'epoch': 0.71} +{'loss': 0.9866, 'grad_norm': 1.6707803010940552, 'learning_rate': 4.167625479771476e-06, 'epoch': 0.71} +{'loss': 0.963, 'grad_norm': 1.6528441905975342, 'learning_rate': 4.162563277652104e-06, 'epoch': 0.71} +{'loss': 1.011, 'grad_norm': 1.8492140769958496, 'learning_rate': 4.157503343625659e-06, 'epoch': 0.71} +{'loss': 0.8923, 'grad_norm': 1.8094689846038818, 'learning_rate': 4.152445679658139e-06, 'epoch': 0.71} +{'loss': 1.012, 'grad_norm': 1.7770872116088867, 'learning_rate': 4.147390287714667e-06, 'epoch': 0.71} +{'loss': 0.9438, 'grad_norm': 2.017263650894165, 'learning_rate': 4.142337169759472e-06, 'epoch': 0.71} +{'loss': 0.9222, 'grad_norm': 1.7695595026016235, 'learning_rate': 4.137286327755913e-06, 'epoch': 0.71} +{'loss': 1.0347, 'grad_norm': 1.8753374814987183, 'learning_rate': 4.132237763666451e-06, 'epoch': 0.71} +{'loss': 0.9587, 'grad_norm': 1.6318330764770508, 'learning_rate': 4.1271914794526715e-06, 'epoch': 0.71} +{'loss': 1.0096, 'grad_norm': 1.8075960874557495, 'learning_rate': 4.12214747707527e-06, 'epoch': 0.71} +{'loss': 0.9404, 'grad_norm': 1.816388487815857, 'learning_rate': 4.117105758494056e-06, 'epoch': 0.71} +{'loss': 0.9439, 'grad_norm': 1.9831154346466064, 'learning_rate': 4.112066325667954e-06, 'epoch': 0.71} +{'loss': 0.9102, 'grad_norm': 1.904625415802002, 'learning_rate': 4.107029180554996e-06, 'epoch': 0.71} +{'loss': 0.8735, 'grad_norm': 1.721761703491211, 'learning_rate': 4.101994325112332e-06, 'epoch': 0.71} +{'loss': 0.8359, 'grad_norm': 1.9413036108016968, 'learning_rate': 4.096961761296211e-06, 'epoch': 0.71} +{'loss': 1.0395, 'grad_norm': 1.8073800802230835, 'learning_rate': 4.0919314910620025e-06, 'epoch': 0.71} +{'loss': 0.9541, 'grad_norm': 1.6554763317108154, 'learning_rate': 4.086903516364179e-06, 'epoch': 0.71} +{'loss': 0.906, 'grad_norm': 1.6351912021636963, 'learning_rate': 4.0818778391563255e-06, 'epoch': 0.71} +{'loss': 0.9133, 'grad_norm': 1.6374362707138062, 'learning_rate': 4.07685446139113e-06, 'epoch': 0.71} +{'loss': 0.9571, 'grad_norm': 1.7804003953933716, 'learning_rate': 4.07183338502039e-06, 'epoch': 0.71} +{'loss': 0.9425, 'grad_norm': 1.6902313232421875, 'learning_rate': 4.066814611995006e-06, 'epoch': 0.71} +{'loss': 0.8847, 'grad_norm': 1.5607993602752686, 'learning_rate': 4.061798144264986e-06, 'epoch': 0.71} +{'loss': 0.9089, 'grad_norm': 1.6833645105361938, 'learning_rate': 4.056783983779442e-06, 'epoch': 0.71} +{'loss': 0.8967, 'grad_norm': 1.716971755027771, 'learning_rate': 4.051772132486589e-06, 'epoch': 0.71} +{'loss': 0.8223, 'grad_norm': 2.139861822128296, 'learning_rate': 4.046762592333744e-06, 'epoch': 0.71} +{'loss': 0.964, 'grad_norm': 1.748581886291504, 'learning_rate': 4.041755365267323e-06, 'epoch': 0.71} +{'loss': 0.9766, 'grad_norm': 1.8290610313415527, 'learning_rate': 4.03675045323286e-06, 'epoch': 0.71} +{'loss': 0.9271, 'grad_norm': 1.5929118394851685, 'learning_rate': 4.0317478581749644e-06, 'epoch': 0.71} +{'loss': 0.9211, 'grad_norm': 1.5934184789657593, 'learning_rate': 4.02674758203736e-06, 'epoch': 0.71} +{'loss': 0.9533, 'grad_norm': 1.7148069143295288, 'learning_rate': 4.021749626762869e-06, 'epoch': 0.71} +{'loss': 0.9118, 'grad_norm': 1.6647963523864746, 'learning_rate': 4.01675399429341e-06, 'epoch': 0.71} +{'loss': 0.9461, 'grad_norm': 1.5910618305206299, 'learning_rate': 4.0117606865699975e-06, 'epoch': 0.71} +{'loss': 0.9763, 'grad_norm': 1.747583270072937, 'learning_rate': 4.006769705532746e-06, 'epoch': 0.71} +{'loss': 0.8983, 'grad_norm': 1.812488079071045, 'learning_rate': 4.001781053120863e-06, 'epoch': 0.71} +{'loss': 0.8859, 'grad_norm': 1.724203109741211, 'learning_rate': 3.996794731272653e-06, 'epoch': 0.71} +{'loss': 0.9857, 'grad_norm': 1.9882450103759766, 'learning_rate': 3.991810741925515e-06, 'epoch': 0.71} +{'loss': 0.94, 'grad_norm': 1.897228717803955, 'learning_rate': 3.986829087015941e-06, 'epoch': 0.71} +{'loss': 0.944, 'grad_norm': 1.799200415611267, 'learning_rate': 3.981849768479516e-06, 'epoch': 0.71} +{'loss': 0.9655, 'grad_norm': 1.6939918994903564, 'learning_rate': 3.976872788250918e-06, 'epoch': 0.71} +{'loss': 0.901, 'grad_norm': 1.6385189294815063, 'learning_rate': 3.971898148263915e-06, 'epoch': 0.71} +{'loss': 0.8844, 'grad_norm': 1.722093105316162, 'learning_rate': 3.966925850451367e-06, 'epoch': 0.71} +{'loss': 0.8744, 'grad_norm': 1.8588894605636597, 'learning_rate': 3.961955896745224e-06, 'epoch': 0.72} +{'loss': 0.9144, 'grad_norm': 1.6176855564117432, 'learning_rate': 3.956988289076529e-06, 'epoch': 0.72} +{'loss': 0.9076, 'grad_norm': 1.676416039466858, 'learning_rate': 3.9520230293754e-06, 'epoch': 0.72} +{'loss': 0.8638, 'grad_norm': 1.882345199584961, 'learning_rate': 3.9470601195710575e-06, 'epoch': 0.72} +{'loss': 0.8926, 'grad_norm': 1.7315891981124878, 'learning_rate': 3.942099561591802e-06, 'epoch': 0.72} +{'loss': 0.9027, 'grad_norm': 1.617348551750183, 'learning_rate': 3.937141357365023e-06, 'epoch': 0.72} +{'loss': 0.8595, 'grad_norm': 1.6499409675598145, 'learning_rate': 3.9321855088171934e-06, 'epoch': 0.72} +{'loss': 0.9106, 'grad_norm': 1.598115086555481, 'learning_rate': 3.927232017873872e-06, 'epoch': 0.72} +{'loss': 0.978, 'grad_norm': 2.043610095977783, 'learning_rate': 3.922280886459701e-06, 'epoch': 0.72} +{'loss': 0.9518, 'grad_norm': 1.800264835357666, 'learning_rate': 3.917332116498406e-06, 'epoch': 0.72} +{'loss': 0.9841, 'grad_norm': 1.6401268243789673, 'learning_rate': 3.912385709912794e-06, 'epoch': 0.72} +{'loss': 0.9558, 'grad_norm': 1.624555230140686, 'learning_rate': 3.9074416686247554e-06, 'epoch': 0.72} +{'loss': 0.9211, 'grad_norm': 1.8923447132110596, 'learning_rate': 3.902499994555261e-06, 'epoch': 0.72} +{'loss': 0.8901, 'grad_norm': 1.8739162683486938, 'learning_rate': 3.897560689624362e-06, 'epoch': 0.72} +{'loss': 0.9278, 'grad_norm': 1.6195011138916016, 'learning_rate': 3.892623755751189e-06, 'epoch': 0.72} +{'loss': 0.8873, 'grad_norm': 1.9183586835861206, 'learning_rate': 3.887689194853951e-06, 'epoch': 0.72} +{'loss': 0.9206, 'grad_norm': 1.699321985244751, 'learning_rate': 3.882757008849936e-06, 'epoch': 0.72} +{'loss': 0.8955, 'grad_norm': 1.6397508382797241, 'learning_rate': 3.877827199655506e-06, 'epoch': 0.72} +{'loss': 0.9851, 'grad_norm': 1.6325955390930176, 'learning_rate': 3.872899769186107e-06, 'epoch': 0.72} +{'loss': 0.8754, 'grad_norm': 0.9474324584007263, 'learning_rate': 3.867974719356249e-06, 'epoch': 0.72} +{'loss': 0.9366, 'grad_norm': 1.9148035049438477, 'learning_rate': 3.8630520520795275e-06, 'epoch': 0.72} +{'loss': 0.898, 'grad_norm': 1.8077012300491333, 'learning_rate': 3.858131769268607e-06, 'epoch': 0.72} +{'loss': 0.8886, 'grad_norm': 1.8866058588027954, 'learning_rate': 3.853213872835229e-06, 'epoch': 0.72} +{'loss': 0.8924, 'grad_norm': 1.6469823122024536, 'learning_rate': 3.848298364690204e-06, 'epoch': 0.72} +{'loss': 0.8884, 'grad_norm': 1.826834797859192, 'learning_rate': 3.8433852467434175e-06, 'epoch': 0.72} +{'loss': 1.0084, 'grad_norm': 1.6693713665008545, 'learning_rate': 3.838474520903825e-06, 'epoch': 0.72} +{'loss': 0.921, 'grad_norm': 1.8152363300323486, 'learning_rate': 3.833566189079453e-06, 'epoch': 0.72} +{'loss': 0.869, 'grad_norm': 1.8509104251861572, 'learning_rate': 3.8286602531773975e-06, 'epoch': 0.72} +{'loss': 0.8812, 'grad_norm': 1.786346673965454, 'learning_rate': 3.823756715103822e-06, 'epoch': 0.72} +{'loss': 0.9816, 'grad_norm': 1.6753065586090088, 'learning_rate': 3.8188555767639625e-06, 'epoch': 0.72} +{'loss': 0.8775, 'grad_norm': 1.767048716545105, 'learning_rate': 3.8139568400621184e-06, 'epoch': 0.72} +{'loss': 0.9358, 'grad_norm': 1.6557140350341797, 'learning_rate': 3.8090605069016596e-06, 'epoch': 0.72} +{'loss': 0.9841, 'grad_norm': 1.8177536725997925, 'learning_rate': 3.804166579185018e-06, 'epoch': 0.72} +{'loss': 0.9925, 'grad_norm': 2.0585696697235107, 'learning_rate': 3.799275058813694e-06, 'epoch': 0.72} +{'loss': 0.883, 'grad_norm': 1.5779823064804077, 'learning_rate': 3.794385947688255e-06, 'epoch': 0.72} +{'loss': 1.0255, 'grad_norm': 2.012289524078369, 'learning_rate': 3.7894992477083226e-06, 'epoch': 0.72} +{'loss': 0.91, 'grad_norm': 1.6775563955307007, 'learning_rate': 3.78461496077259e-06, 'epoch': 0.72} +{'loss': 0.9628, 'grad_norm': 1.62815260887146, 'learning_rate': 3.779733088778813e-06, 'epoch': 0.72} +{'loss': 0.9047, 'grad_norm': 1.6503132581710815, 'learning_rate': 3.774853633623806e-06, 'epoch': 0.72} +{'loss': 0.941, 'grad_norm': 1.759012222290039, 'learning_rate': 3.7699765972034442e-06, 'epoch': 0.72} +{'loss': 0.9034, 'grad_norm': 1.7295348644256592, 'learning_rate': 3.7651019814126656e-06, 'epoch': 0.72} +{'loss': 0.9262, 'grad_norm': 1.8059415817260742, 'learning_rate': 3.7602297881454653e-06, 'epoch': 0.72} +{'loss': 0.8301, 'grad_norm': 0.8651884198188782, 'learning_rate': 3.755360019294898e-06, 'epoch': 0.72} +{'loss': 0.8904, 'grad_norm': 1.821021556854248, 'learning_rate': 3.750492676753077e-06, 'epoch': 0.72} +{'loss': 0.9627, 'grad_norm': 1.7472025156021118, 'learning_rate': 3.7456277624111725e-06, 'epoch': 0.72} +{'loss': 0.9364, 'grad_norm': 1.7045828104019165, 'learning_rate': 3.7407652781594094e-06, 'epoch': 0.72} +{'loss': 0.9446, 'grad_norm': 1.9971729516983032, 'learning_rate': 3.735905225887072e-06, 'epoch': 0.72} +{'loss': 0.87, 'grad_norm': 0.8987799882888794, 'learning_rate': 3.7310476074824966e-06, 'epoch': 0.72} +{'loss': 0.9397, 'grad_norm': 1.666849970817566, 'learning_rate': 3.726192424833075e-06, 'epoch': 0.72} +{'loss': 0.975, 'grad_norm': 1.865614891052246, 'learning_rate': 3.7213396798252522e-06, 'epoch': 0.72} +{'loss': 0.8818, 'grad_norm': 1.8458900451660156, 'learning_rate': 3.7164893743445274e-06, 'epoch': 0.72} +{'loss': 0.9247, 'grad_norm': 1.7316802740097046, 'learning_rate': 3.7116415102754545e-06, 'epoch': 0.72} +{'loss': 0.9083, 'grad_norm': 2.6089460849761963, 'learning_rate': 3.7067960895016277e-06, 'epoch': 0.73} +{'loss': 0.8872, 'grad_norm': 1.8129782676696777, 'learning_rate': 3.701953113905703e-06, 'epoch': 0.73} +{'loss': 0.9795, 'grad_norm': 1.6566029787063599, 'learning_rate': 3.6971125853693834e-06, 'epoch': 0.73} +{'loss': 0.9571, 'grad_norm': 1.6292799711227417, 'learning_rate': 3.692274505773419e-06, 'epoch': 0.73} +{'loss': 0.9549, 'grad_norm': 1.707647442817688, 'learning_rate': 3.687438876997612e-06, 'epoch': 0.73} +{'loss': 0.8891, 'grad_norm': 1.8775651454925537, 'learning_rate': 3.6826057009208083e-06, 'epoch': 0.73} +{'loss': 0.9716, 'grad_norm': 1.948901653289795, 'learning_rate': 3.677774979420904e-06, 'epoch': 0.73} +{'loss': 0.9429, 'grad_norm': 1.9079513549804688, 'learning_rate': 3.67294671437484e-06, 'epoch': 0.73} +{'loss': 0.9627, 'grad_norm': 1.9569562673568726, 'learning_rate': 3.6681209076586035e-06, 'epoch': 0.73} +{'loss': 0.8567, 'grad_norm': 1.647851824760437, 'learning_rate': 3.663297561147224e-06, 'epoch': 0.73} +{'loss': 0.9834, 'grad_norm': 1.8918260335922241, 'learning_rate': 3.6584766767147796e-06, 'epoch': 0.73} +{'loss': 1.003, 'grad_norm': 1.7165757417678833, 'learning_rate': 3.6536582562343893e-06, 'epoch': 0.73} +{'loss': 0.9332, 'grad_norm': 1.7870110273361206, 'learning_rate': 3.6488423015782128e-06, 'epoch': 0.73} +{'loss': 1.006, 'grad_norm': 1.7405707836151123, 'learning_rate': 3.644028814617454e-06, 'epoch': 0.73} +{'loss': 0.889, 'grad_norm': 1.7775375843048096, 'learning_rate': 3.6392177972223596e-06, 'epoch': 0.73} +{'loss': 0.8771, 'grad_norm': 2.0872230529785156, 'learning_rate': 3.6344092512622173e-06, 'epoch': 0.73} +{'loss': 0.8203, 'grad_norm': 1.641983151435852, 'learning_rate': 3.6296031786053455e-06, 'epoch': 0.73} +{'loss': 0.8593, 'grad_norm': 1.9407005310058594, 'learning_rate': 3.6247995811191107e-06, 'epoch': 0.73} +{'loss': 0.952, 'grad_norm': 1.8195254802703857, 'learning_rate': 3.619998460669916e-06, 'epoch': 0.73} +{'loss': 0.8948, 'grad_norm': 1.7274514436721802, 'learning_rate': 3.6151998191232005e-06, 'epoch': 0.73} +{'loss': 0.9152, 'grad_norm': 1.8413913249969482, 'learning_rate': 3.610403658343443e-06, 'epoch': 0.73} +{'loss': 0.8771, 'grad_norm': 1.648410677909851, 'learning_rate': 3.6056099801941535e-06, 'epoch': 0.73} +{'loss': 0.922, 'grad_norm': 1.6907953023910522, 'learning_rate': 3.600818786537883e-06, 'epoch': 0.73} +{'loss': 0.9481, 'grad_norm': 1.7866605520248413, 'learning_rate': 3.5960300792362124e-06, 'epoch': 0.73} +{'loss': 0.9468, 'grad_norm': 1.7889845371246338, 'learning_rate': 3.591243860149759e-06, 'epoch': 0.73} +{'loss': 0.9935, 'grad_norm': 1.9312835931777954, 'learning_rate': 3.586460131138174e-06, 'epoch': 0.73} +{'loss': 0.8846, 'grad_norm': 1.7008137702941895, 'learning_rate': 3.5816788940601387e-06, 'epoch': 0.73} +{'loss': 0.9486, 'grad_norm': 1.7842228412628174, 'learning_rate': 3.576900150773368e-06, 'epoch': 0.73} +{'loss': 0.911, 'grad_norm': 1.7980836629867554, 'learning_rate': 3.5721239031346067e-06, 'epoch': 0.73} +{'loss': 0.857, 'grad_norm': 1.8791370391845703, 'learning_rate': 3.5673501529996314e-06, 'epoch': 0.73} +{'loss': 0.9594, 'grad_norm': 1.7578190565109253, 'learning_rate': 3.5625789022232473e-06, 'epoch': 0.73} +{'loss': 0.9762, 'grad_norm': 1.715129017829895, 'learning_rate': 3.5578101526592877e-06, 'epoch': 0.73} +{'loss': 0.932, 'grad_norm': 1.7171292304992676, 'learning_rate': 3.5530439061606202e-06, 'epoch': 0.73} +{'loss': 0.8634, 'grad_norm': 1.6815215349197388, 'learning_rate': 3.5482801645791266e-06, 'epoch': 0.73} +{'loss': 0.9067, 'grad_norm': 1.5651942491531372, 'learning_rate': 3.5435189297657267e-06, 'epoch': 0.73} +{'loss': 0.7843, 'grad_norm': 1.6557655334472656, 'learning_rate': 3.5387602035703637e-06, 'epoch': 0.73} +{'loss': 0.8891, 'grad_norm': 1.7977344989776611, 'learning_rate': 3.534003987842005e-06, 'epoch': 0.73} +{'loss': 0.8599, 'grad_norm': 0.8603512644767761, 'learning_rate': 3.5292502844286447e-06, 'epoch': 0.73} +{'loss': 0.9508, 'grad_norm': 1.7264420986175537, 'learning_rate': 3.5244990951772972e-06, 'epoch': 0.73} +{'loss': 0.9341, 'grad_norm': 1.6922509670257568, 'learning_rate': 3.5197504219340027e-06, 'epoch': 0.73} +{'loss': 0.9065, 'grad_norm': 1.9430891275405884, 'learning_rate': 3.5150042665438233e-06, 'epoch': 0.73} +{'loss': 0.9241, 'grad_norm': 1.6908938884735107, 'learning_rate': 3.5102606308508423e-06, 'epoch': 0.73} +{'loss': 0.8306, 'grad_norm': 1.8099004030227661, 'learning_rate': 3.505519516698165e-06, 'epoch': 0.73} +{'loss': 0.9494, 'grad_norm': 1.9020333290100098, 'learning_rate': 3.5007809259279146e-06, 'epoch': 0.73} +{'loss': 0.9005, 'grad_norm': 2.0368494987487793, 'learning_rate': 3.496044860381238e-06, 'epoch': 0.73} +{'loss': 0.9525, 'grad_norm': 1.6994282007217407, 'learning_rate': 3.4913113218982965e-06, 'epoch': 0.73} +{'loss': 0.8639, 'grad_norm': 1.8050298690795898, 'learning_rate': 3.486580312318273e-06, 'epoch': 0.73} +{'loss': 0.895, 'grad_norm': 1.8238089084625244, 'learning_rate': 3.481851833479365e-06, 'epoch': 0.73} +{'loss': 0.873, 'grad_norm': 1.8572880029678345, 'learning_rate': 3.4771258872187917e-06, 'epoch': 0.73} +{'loss': 1.0154, 'grad_norm': 1.7538968324661255, 'learning_rate': 3.472402475372778e-06, 'epoch': 0.73} +{'loss': 0.9607, 'grad_norm': 1.7000271081924438, 'learning_rate': 3.4676815997765756e-06, 'epoch': 0.73} +{'loss': 0.9802, 'grad_norm': 1.8036025762557983, 'learning_rate': 3.4629632622644437e-06, 'epoch': 0.73} +{'loss': 0.9385, 'grad_norm': 2.039238929748535, 'learning_rate': 3.4582474646696575e-06, 'epoch': 0.74} +{'loss': 0.9119, 'grad_norm': 1.794670581817627, 'learning_rate': 3.453534208824507e-06, 'epoch': 0.74} +{'loss': 0.929, 'grad_norm': 1.695103406906128, 'learning_rate': 3.448823496560292e-06, 'epoch': 0.74} +{'loss': 0.802, 'grad_norm': 1.6967803239822388, 'learning_rate': 3.444115329707325e-06, 'epoch': 0.74} +{'loss': 0.8682, 'grad_norm': 1.710740089416504, 'learning_rate': 3.4394097100949286e-06, 'epoch': 0.74} +{'loss': 0.9493, 'grad_norm': 1.7392624616622925, 'learning_rate': 3.4347066395514383e-06, 'epoch': 0.74} +{'loss': 0.9086, 'grad_norm': 1.8060885667800903, 'learning_rate': 3.4300061199041967e-06, 'epoch': 0.74} +{'loss': 1.022, 'grad_norm': 1.7745487689971924, 'learning_rate': 3.425308152979555e-06, 'epoch': 0.74} +{'loss': 0.9077, 'grad_norm': 1.6979912519454956, 'learning_rate': 3.4206127406028744e-06, 'epoch': 0.74} +{'loss': 0.9208, 'grad_norm': 1.6242289543151855, 'learning_rate': 3.415919884598523e-06, 'epoch': 0.74} +{'loss': 0.9002, 'grad_norm': 2.163660764694214, 'learning_rate': 3.411229586789876e-06, 'epoch': 0.74} +{'loss': 0.8961, 'grad_norm': 2.0429024696350098, 'learning_rate': 3.4065418489993118e-06, 'epoch': 0.74} +{'loss': 0.8561, 'grad_norm': 2.788419723510742, 'learning_rate': 3.401856673048217e-06, 'epoch': 0.74} +{'loss': 0.9493, 'grad_norm': 1.6912580728530884, 'learning_rate': 3.3971740607569868e-06, 'epoch': 0.74} +{'loss': 0.8532, 'grad_norm': 1.8332105875015259, 'learning_rate': 3.392494013945009e-06, 'epoch': 0.74} +{'loss': 0.8381, 'grad_norm': 2.000765085220337, 'learning_rate': 3.3878165344306824e-06, 'epoch': 0.74} +{'loss': 0.9259, 'grad_norm': 1.7117446660995483, 'learning_rate': 3.3831416240314085e-06, 'epoch': 0.74} +{'loss': 0.9746, 'grad_norm': 1.6958849430084229, 'learning_rate': 3.378469284563589e-06, 'epoch': 0.74} +{'loss': 0.8761, 'grad_norm': 1.8388540744781494, 'learning_rate': 3.3737995178426276e-06, 'epoch': 0.74} +{'loss': 0.8853, 'grad_norm': 1.7091138362884521, 'learning_rate': 3.3691323256829266e-06, 'epoch': 0.74} +{'loss': 0.9005, 'grad_norm': 1.7589246034622192, 'learning_rate': 3.3644677098978894e-06, 'epoch': 0.74} +{'loss': 0.8818, 'grad_norm': 1.796491026878357, 'learning_rate': 3.3598056722999185e-06, 'epoch': 0.74} +{'loss': 0.7787, 'grad_norm': 1.8350884914398193, 'learning_rate': 3.3551462147004132e-06, 'epoch': 0.74} +{'loss': 0.8643, 'grad_norm': 1.7753305435180664, 'learning_rate': 3.3504893389097727e-06, 'epoch': 0.74} +{'loss': 0.946, 'grad_norm': 1.9164499044418335, 'learning_rate': 3.3458350467373914e-06, 'epoch': 0.74} +{'loss': 0.8374, 'grad_norm': 1.8525071144104004, 'learning_rate': 3.3411833399916584e-06, 'epoch': 0.74} +{'loss': 0.8592, 'grad_norm': 1.6794849634170532, 'learning_rate': 3.3365342204799613e-06, 'epoch': 0.74} +{'loss': 0.9037, 'grad_norm': 1.6253608465194702, 'learning_rate': 3.3318876900086815e-06, 'epoch': 0.74} +{'loss': 0.9394, 'grad_norm': 1.7628328800201416, 'learning_rate': 3.3272437503831945e-06, 'epoch': 0.74} +{'loss': 0.8613, 'grad_norm': 0.912964403629303, 'learning_rate': 3.3226024034078698e-06, 'epoch': 0.74} +{'loss': 0.9551, 'grad_norm': 1.6622869968414307, 'learning_rate': 3.3179636508860645e-06, 'epoch': 0.74} +{'loss': 0.8849, 'grad_norm': 1.7354791164398193, 'learning_rate': 3.3133274946201333e-06, 'epoch': 0.74} +{'loss': 0.8992, 'grad_norm': 1.8365917205810547, 'learning_rate': 3.308693936411421e-06, 'epoch': 0.74} +{'loss': 0.9602, 'grad_norm': 1.6966023445129395, 'learning_rate': 3.304062978060263e-06, 'epoch': 0.74} +{'loss': 0.9592, 'grad_norm': 1.7813916206359863, 'learning_rate': 3.2994346213659824e-06, 'epoch': 0.74} +{'loss': 0.9342, 'grad_norm': 1.9294930696487427, 'learning_rate': 3.294808868126894e-06, 'epoch': 0.74} +{'loss': 0.977, 'grad_norm': 1.6844215393066406, 'learning_rate': 3.290185720140301e-06, 'epoch': 0.74} +{'loss': 0.9251, 'grad_norm': 1.6281991004943848, 'learning_rate': 3.285565179202491e-06, 'epoch': 0.74} +{'loss': 0.9856, 'grad_norm': 2.1437790393829346, 'learning_rate': 3.280947247108741e-06, 'epoch': 0.74} +{'loss': 0.8713, 'grad_norm': 1.772961974143982, 'learning_rate': 3.2763319256533177e-06, 'epoch': 0.74} +{'loss': 0.9037, 'grad_norm': 1.7907110452651978, 'learning_rate': 3.2717192166294685e-06, 'epoch': 0.74} +{'loss': 0.9412, 'grad_norm': 1.7244598865509033, 'learning_rate': 3.267109121829428e-06, 'epoch': 0.74} +{'loss': 0.9451, 'grad_norm': 1.6671468019485474, 'learning_rate': 3.262501643044417e-06, 'epoch': 0.74} +{'loss': 0.8162, 'grad_norm': 1.7434862852096558, 'learning_rate': 3.2578967820646313e-06, 'epoch': 0.74} +{'loss': 0.8683, 'grad_norm': 1.858473300933838, 'learning_rate': 3.2532945406792573e-06, 'epoch': 0.74} +{'loss': 0.8474, 'grad_norm': 1.5833643674850464, 'learning_rate': 3.248694920676464e-06, 'epoch': 0.74} +{'loss': 0.9015, 'grad_norm': 1.7272731065750122, 'learning_rate': 3.2440979238433977e-06, 'epoch': 0.74} +{'loss': 0.9288, 'grad_norm': 1.8200198411941528, 'learning_rate': 3.23950355196619e-06, 'epoch': 0.74} +{'loss': 0.9572, 'grad_norm': 1.9386030435562134, 'learning_rate': 3.234911806829948e-06, 'epoch': 0.74} +{'loss': 0.89, 'grad_norm': 1.8049795627593994, 'learning_rate': 3.2303226902187625e-06, 'epoch': 0.74} +{'loss': 0.917, 'grad_norm': 1.7297941446304321, 'learning_rate': 3.225736203915698e-06, 'epoch': 0.74} +{'loss': 0.9119, 'grad_norm': 1.630277156829834, 'learning_rate': 3.221152349702802e-06, 'epoch': 0.74} +{'loss': 0.9335, 'grad_norm': 1.790122389793396, 'learning_rate': 3.216571129361097e-06, 'epoch': 0.75} +{'loss': 0.9947, 'grad_norm': 1.9039274454116821, 'learning_rate': 3.2119925446705824e-06, 'epoch': 0.75} +{'loss': 0.8743, 'grad_norm': 1.7523843050003052, 'learning_rate': 3.2074165974102324e-06, 'epoch': 0.75} +{'loss': 0.9609, 'grad_norm': 1.9392461776733398, 'learning_rate': 3.2028432893579975e-06, 'epoch': 0.75} +{'loss': 0.9292, 'grad_norm': 1.714680552482605, 'learning_rate': 3.1982726222908046e-06, 'epoch': 0.75} +{'loss': 0.9263, 'grad_norm': 0.9127395749092102, 'learning_rate': 3.1937045979845518e-06, 'epoch': 0.75} +{'loss': 0.8232, 'grad_norm': 1.8035203218460083, 'learning_rate': 3.1891392182141103e-06, 'epoch': 0.75} +{'loss': 0.9004, 'grad_norm': 0.8980305790901184, 'learning_rate': 3.1845764847533276e-06, 'epoch': 0.75} +{'loss': 0.8533, 'grad_norm': 1.8180233240127563, 'learning_rate': 3.1800163993750166e-06, 'epoch': 0.75} +{'loss': 1.0293, 'grad_norm': 1.8277109861373901, 'learning_rate': 3.1754589638509647e-06, 'epoch': 0.75} +{'loss': 0.9378, 'grad_norm': 1.5936766862869263, 'learning_rate': 3.1709041799519312e-06, 'epoch': 0.75} +{'loss': 0.9172, 'grad_norm': 1.9058517217636108, 'learning_rate': 3.166352049447644e-06, 'epoch': 0.75} +{'loss': 0.9639, 'grad_norm': 1.6563794612884521, 'learning_rate': 3.161802574106799e-06, 'epoch': 0.75} +{'loss': 0.962, 'grad_norm': 1.872847557067871, 'learning_rate': 3.157255755697062e-06, 'epoch': 0.75} +{'loss': 0.9562, 'grad_norm': 1.8390828371047974, 'learning_rate': 3.152711595985065e-06, 'epoch': 0.75} +{'loss': 0.9127, 'grad_norm': 1.8095601797103882, 'learning_rate': 3.148170096736408e-06, 'epoch': 0.75} +{'loss': 0.9756, 'grad_norm': 1.8260071277618408, 'learning_rate': 3.143631259715658e-06, 'epoch': 0.75} +{'loss': 0.9152, 'grad_norm': 1.7314852476119995, 'learning_rate': 3.1390950866863456e-06, 'epoch': 0.75} +{'loss': 0.9372, 'grad_norm': 1.79062819480896, 'learning_rate': 3.134561579410966e-06, 'epoch': 0.75} +{'loss': 0.8774, 'grad_norm': 1.6281756162643433, 'learning_rate': 3.1300307396509833e-06, 'epoch': 0.75} +{'loss': 1.0515, 'grad_norm': 1.8487001657485962, 'learning_rate': 3.1255025691668184e-06, 'epoch': 0.75} +{'loss': 0.9553, 'grad_norm': 1.832372784614563, 'learning_rate': 3.120977069717861e-06, 'epoch': 0.75} +{'loss': 0.7958, 'grad_norm': 1.708930253982544, 'learning_rate': 3.116454243062459e-06, 'epoch': 0.75} +{'loss': 0.9373, 'grad_norm': 1.6150094270706177, 'learning_rate': 3.1119340909579265e-06, 'epoch': 0.75} +{'loss': 0.9557, 'grad_norm': 1.8417260646820068, 'learning_rate': 3.10741661516053e-06, 'epoch': 0.75} +{'loss': 0.9166, 'grad_norm': 1.649986982345581, 'learning_rate': 3.102901817425504e-06, 'epoch': 0.75} +{'loss': 0.9447, 'grad_norm': 2.2120673656463623, 'learning_rate': 3.0983896995070397e-06, 'epoch': 0.75} +{'loss': 0.9159, 'grad_norm': 1.6997156143188477, 'learning_rate': 3.0938802631582865e-06, 'epoch': 0.75} +{'loss': 0.8673, 'grad_norm': 1.703776240348816, 'learning_rate': 3.089373510131354e-06, 'epoch': 0.75} +{'loss': 0.8631, 'grad_norm': 1.7422881126403809, 'learning_rate': 3.0848694421773075e-06, 'epoch': 0.75} +{'loss': 0.9026, 'grad_norm': 1.8156569004058838, 'learning_rate': 3.0803680610461674e-06, 'epoch': 0.75} +{'loss': 0.8985, 'grad_norm': 1.7419483661651611, 'learning_rate': 3.0758693684869147e-06, 'epoch': 0.75} +{'loss': 0.9147, 'grad_norm': 1.647157073020935, 'learning_rate': 3.071373366247482e-06, 'epoch': 0.75} +{'loss': 0.9871, 'grad_norm': 1.7883245944976807, 'learning_rate': 3.066880056074758e-06, 'epoch': 0.75} +{'loss': 0.9311, 'grad_norm': 1.7706633806228638, 'learning_rate': 3.0623894397145837e-06, 'epoch': 0.75} +{'loss': 0.8779, 'grad_norm': 1.4799484014511108, 'learning_rate': 3.057901518911758e-06, 'epoch': 0.75} +{'loss': 0.867, 'grad_norm': 1.6384867429733276, 'learning_rate': 3.0534162954100264e-06, 'epoch': 0.75} +{'loss': 0.9392, 'grad_norm': 1.7988359928131104, 'learning_rate': 3.0489337709520918e-06, 'epoch': 0.75} +{'loss': 0.9676, 'grad_norm': 1.5843770503997803, 'learning_rate': 3.044453947279603e-06, 'epoch': 0.75} +{'loss': 0.9022, 'grad_norm': 1.7816550731658936, 'learning_rate': 3.0399768261331664e-06, 'epoch': 0.75} +{'loss': 0.9055, 'grad_norm': 1.7060253620147705, 'learning_rate': 3.0355024092523334e-06, 'epoch': 0.75} +{'loss': 0.9138, 'grad_norm': 1.752164363861084, 'learning_rate': 3.031030698375602e-06, 'epoch': 0.75} +{'loss': 0.8513, 'grad_norm': 0.8799826502799988, 'learning_rate': 3.0265616952404252e-06, 'epoch': 0.75} +{'loss': 0.7984, 'grad_norm': 1.756686806678772, 'learning_rate': 3.0220954015832004e-06, 'epoch': 0.75} +{'loss': 0.9236, 'grad_norm': 1.8549926280975342, 'learning_rate': 3.017631819139273e-06, 'epoch': 0.75} +{'loss': 0.9387, 'grad_norm': 1.6138970851898193, 'learning_rate': 3.013170949642935e-06, 'epoch': 0.75} +{'loss': 0.9936, 'grad_norm': 1.71934175491333, 'learning_rate': 3.0087127948274264e-06, 'epoch': 0.75} +{'loss': 0.9787, 'grad_norm': 1.874595046043396, 'learning_rate': 3.004257356424927e-06, 'epoch': 0.75} +{'loss': 0.8968, 'grad_norm': 1.5763262510299683, 'learning_rate': 2.999804636166567e-06, 'epoch': 0.75} +{'loss': 0.9688, 'grad_norm': 1.7726876735687256, 'learning_rate': 2.995354635782417e-06, 'epoch': 0.75} +{'loss': 0.8993, 'grad_norm': 1.554753065109253, 'learning_rate': 2.990907357001491e-06, 'epoch': 0.75} +{'loss': 1.083, 'grad_norm': 1.9701539278030396, 'learning_rate': 2.986462801551748e-06, 'epoch': 0.76} +{'loss': 0.9261, 'grad_norm': 1.8488991260528564, 'learning_rate': 2.9820209711600858e-06, 'epoch': 0.76} +{'loss': 0.9466, 'grad_norm': 1.914576768875122, 'learning_rate': 2.977581867552346e-06, 'epoch': 0.76} +{'loss': 0.9345, 'grad_norm': 1.726682186126709, 'learning_rate': 2.9731454924533086e-06, 'epoch': 0.76} +{'loss': 0.8944, 'grad_norm': 1.7761739492416382, 'learning_rate': 2.9687118475866948e-06, 'epoch': 0.76} +{'loss': 0.8857, 'grad_norm': 1.7333637475967407, 'learning_rate': 2.9642809346751677e-06, 'epoch': 0.76} +{'loss': 0.8622, 'grad_norm': 1.7083461284637451, 'learning_rate': 2.9598527554403187e-06, 'epoch': 0.76} +{'loss': 0.8689, 'grad_norm': 1.7392387390136719, 'learning_rate': 2.9554273116026887e-06, 'epoch': 0.76} +{'loss': 1.0077, 'grad_norm': 1.740929365158081, 'learning_rate': 2.95100460488175e-06, 'epoch': 0.76} +{'loss': 0.8918, 'grad_norm': 1.6594079732894897, 'learning_rate': 2.9465846369959126e-06, 'epoch': 0.76} +{'loss': 0.9368, 'grad_norm': 1.7169567346572876, 'learning_rate': 2.942167409662524e-06, 'epoch': 0.76} +{'loss': 0.8963, 'grad_norm': 1.7319214344024658, 'learning_rate': 2.937752924597864e-06, 'epoch': 0.76} +{'loss': 1.0085, 'grad_norm': 1.6905615329742432, 'learning_rate': 2.933341183517148e-06, 'epoch': 0.76} +{'loss': 0.869, 'grad_norm': 1.7468857765197754, 'learning_rate': 2.9289321881345257e-06, 'epoch': 0.76} +{'loss': 0.9221, 'grad_norm': 1.753749966621399, 'learning_rate': 2.92452594016308e-06, 'epoch': 0.76} +{'loss': 0.938, 'grad_norm': 1.6252354383468628, 'learning_rate': 2.9201224413148254e-06, 'epoch': 0.76} +{'loss': 0.9196, 'grad_norm': 1.675720453262329, 'learning_rate': 2.9157216933007095e-06, 'epoch': 0.76} +{'loss': 0.8679, 'grad_norm': 2.069101333618164, 'learning_rate': 2.91132369783061e-06, 'epoch': 0.76} +{'loss': 0.8973, 'grad_norm': 1.7233712673187256, 'learning_rate': 2.906928456613336e-06, 'epoch': 0.76} +{'loss': 0.8915, 'grad_norm': 1.6812355518341064, 'learning_rate': 2.902535971356625e-06, 'epoch': 0.76} +{'loss': 0.9764, 'grad_norm': 1.6375269889831543, 'learning_rate': 2.898146243767146e-06, 'epoch': 0.76} +{'loss': 0.9476, 'grad_norm': 1.6960278749465942, 'learning_rate': 2.893759275550494e-06, 'epoch': 0.76} +{'loss': 0.9444, 'grad_norm': 1.8152192831039429, 'learning_rate': 2.8893750684111977e-06, 'epoch': 0.76} +{'loss': 0.9198, 'grad_norm': 1.741206407546997, 'learning_rate': 2.884993624052701e-06, 'epoch': 0.76} +{'loss': 0.8835, 'grad_norm': 0.881798505783081, 'learning_rate': 2.8806149441773844e-06, 'epoch': 0.76} +{'loss': 0.9013, 'grad_norm': 0.8986806273460388, 'learning_rate': 2.876239030486554e-06, 'epoch': 0.76} +{'loss': 0.8727, 'grad_norm': 0.8570490479469299, 'learning_rate': 2.871865884680436e-06, 'epoch': 0.76} +{'loss': 0.9147, 'grad_norm': 1.829601526260376, 'learning_rate': 2.867495508458186e-06, 'epoch': 0.76} +{'loss': 0.9246, 'grad_norm': 0.897922694683075, 'learning_rate': 2.8631279035178796e-06, 'epoch': 0.76} +{'loss': 0.8711, 'grad_norm': 1.974184274673462, 'learning_rate': 2.8587630715565185e-06, 'epoch': 0.76} +{'loss': 0.9809, 'grad_norm': 1.671872854232788, 'learning_rate': 2.8544010142700263e-06, 'epoch': 0.76} +{'loss': 0.8332, 'grad_norm': 1.9260306358337402, 'learning_rate': 2.850041733353247e-06, 'epoch': 0.76} +{'loss': 0.9511, 'grad_norm': 1.8911657333374023, 'learning_rate': 2.845685230499948e-06, 'epoch': 0.76} +{'loss': 0.91, 'grad_norm': 1.7182097434997559, 'learning_rate': 2.8413315074028157e-06, 'epoch': 0.76} +{'loss': 0.9181, 'grad_norm': 1.6033059358596802, 'learning_rate': 2.8369805657534576e-06, 'epoch': 0.76} +{'loss': 0.9893, 'grad_norm': 1.63260817527771, 'learning_rate': 2.8326324072423994e-06, 'epoch': 0.76} +{'loss': 0.9059, 'grad_norm': 1.6555876731872559, 'learning_rate': 2.828287033559086e-06, 'epoch': 0.76} +{'loss': 1.0009, 'grad_norm': 1.7372273206710815, 'learning_rate': 2.823944446391881e-06, 'epoch': 0.76} +{'loss': 0.9716, 'grad_norm': 1.7335221767425537, 'learning_rate': 2.819604647428067e-06, 'epoch': 0.76} +{'loss': 0.8777, 'grad_norm': 1.7508633136749268, 'learning_rate': 2.8152676383538346e-06, 'epoch': 0.76} +{'loss': 0.9, 'grad_norm': 1.720937728881836, 'learning_rate': 2.8109334208542994e-06, 'epoch': 0.76} +{'loss': 0.9414, 'grad_norm': 2.0624163150787354, 'learning_rate': 2.8066019966134907e-06, 'epoch': 0.76} +{'loss': 0.9215, 'grad_norm': 0.9367794990539551, 'learning_rate': 2.8022733673143508e-06, 'epoch': 0.76} +{'loss': 0.8565, 'grad_norm': 1.7649296522140503, 'learning_rate': 2.7979475346387363e-06, 'epoch': 0.76} +{'loss': 0.9744, 'grad_norm': 1.7157323360443115, 'learning_rate': 2.793624500267418e-06, 'epoch': 0.76} +{'loss': 1.0297, 'grad_norm': 1.694903016090393, 'learning_rate': 2.7893042658800793e-06, 'epoch': 0.76} +{'loss': 0.9826, 'grad_norm': 1.7097408771514893, 'learning_rate': 2.784986833155314e-06, 'epoch': 0.76} +{'loss': 0.901, 'grad_norm': 0.8642791509628296, 'learning_rate': 2.78067220377063e-06, 'epoch': 0.76} +{'loss': 0.8755, 'grad_norm': 2.1903867721557617, 'learning_rate': 2.776360379402445e-06, 'epoch': 0.76} +{'loss': 0.9548, 'grad_norm': 1.8496774435043335, 'learning_rate': 2.7720513617260857e-06, 'epoch': 0.76} +{'loss': 0.9005, 'grad_norm': 1.7211551666259766, 'learning_rate': 2.7677451524157882e-06, 'epoch': 0.76} +{'loss': 0.8402, 'grad_norm': 1.739711880683899, 'learning_rate': 2.763441753144701e-06, 'epoch': 0.76} +{'loss': 0.9762, 'grad_norm': 1.7368019819259644, 'learning_rate': 2.7591411655848755e-06, 'epoch': 0.77} +{'loss': 0.932, 'grad_norm': 1.6665337085723877, 'learning_rate': 2.7548433914072736e-06, 'epoch': 0.77} +{'loss': 0.9717, 'grad_norm': 1.7674028873443604, 'learning_rate': 2.7505484322817656e-06, 'epoch': 0.77} +{'loss': 0.8936, 'grad_norm': 1.7991665601730347, 'learning_rate': 2.746256289877126e-06, 'epoch': 0.77} +{'loss': 0.8645, 'grad_norm': 1.9144983291625977, 'learning_rate': 2.741966965861033e-06, 'epoch': 0.77} +{'loss': 0.913, 'grad_norm': 1.9745656251907349, 'learning_rate': 2.7376804619000706e-06, 'epoch': 0.77} +{'loss': 0.8787, 'grad_norm': 1.7380291223526, 'learning_rate': 2.7333967796597317e-06, 'epoch': 0.77} +{'loss': 0.8477, 'grad_norm': 1.8387010097503662, 'learning_rate': 2.729115920804408e-06, 'epoch': 0.77} +{'loss': 0.8607, 'grad_norm': 1.901106834411621, 'learning_rate': 2.7248378869973946e-06, 'epoch': 0.77} +{'loss': 0.8887, 'grad_norm': 1.633284568786621, 'learning_rate': 2.720562679900892e-06, 'epoch': 0.77} +{'loss': 0.8147, 'grad_norm': 1.9528007507324219, 'learning_rate': 2.716290301175999e-06, 'epoch': 0.77} +{'loss': 0.9548, 'grad_norm': 1.8058927059173584, 'learning_rate': 2.712020752482717e-06, 'epoch': 0.77} +{'loss': 0.8966, 'grad_norm': 1.6151124238967896, 'learning_rate': 2.707754035479948e-06, 'epoch': 0.77} +{'loss': 0.9145, 'grad_norm': 1.8441987037658691, 'learning_rate': 2.703490151825492e-06, 'epoch': 0.77} +{'loss': 0.9683, 'grad_norm': 1.7653534412384033, 'learning_rate': 2.699229103176052e-06, 'epoch': 0.77} +{'loss': 0.9207, 'grad_norm': 1.7361457347869873, 'learning_rate': 2.694970891187225e-06, 'epoch': 0.77} +{'loss': 0.9591, 'grad_norm': 1.625246286392212, 'learning_rate': 2.690715517513508e-06, 'epoch': 0.77} +{'loss': 0.872, 'grad_norm': 1.8868553638458252, 'learning_rate': 2.6864629838082957e-06, 'epoch': 0.77} +{'loss': 0.9603, 'grad_norm': 0.9860090017318726, 'learning_rate': 2.6822132917238785e-06, 'epoch': 0.77} +{'loss': 0.8916, 'grad_norm': 1.738326072692871, 'learning_rate': 2.6779664429114448e-06, 'epoch': 0.77} +{'loss': 0.9709, 'grad_norm': 1.8191266059875488, 'learning_rate': 2.6737224390210726e-06, 'epoch': 0.77} +{'loss': 0.8836, 'grad_norm': 0.8872901797294617, 'learning_rate': 2.669481281701739e-06, 'epoch': 0.77} +{'loss': 0.895, 'grad_norm': 1.7487057447433472, 'learning_rate': 2.6652429726013162e-06, 'epoch': 0.77} +{'loss': 0.938, 'grad_norm': 2.2640326023101807, 'learning_rate': 2.661007513366567e-06, 'epoch': 0.77} +{'loss': 0.9058, 'grad_norm': 1.662664532661438, 'learning_rate': 2.656774905643147e-06, 'epoch': 0.77} +{'loss': 0.8533, 'grad_norm': 1.6709994077682495, 'learning_rate': 2.652545151075606e-06, 'epoch': 0.77} +{'loss': 0.8763, 'grad_norm': 1.726743221282959, 'learning_rate': 2.6483182513073835e-06, 'epoch': 0.77} +{'loss': 0.9122, 'grad_norm': 1.742574691772461, 'learning_rate': 2.6440942079808096e-06, 'epoch': 0.77} +{'loss': 0.8637, 'grad_norm': 1.8327423334121704, 'learning_rate': 2.639873022737105e-06, 'epoch': 0.77} +{'loss': 1.0066, 'grad_norm': 1.8438129425048828, 'learning_rate': 2.635654697216382e-06, 'epoch': 0.77} +{'loss': 0.884, 'grad_norm': 1.7489243745803833, 'learning_rate': 2.631439233057638e-06, 'epoch': 0.77} +{'loss': 0.953, 'grad_norm': 1.9445438385009766, 'learning_rate': 2.6272266318987606e-06, 'epoch': 0.77} +{'loss': 0.9031, 'grad_norm': 1.552230715751648, 'learning_rate': 2.6230168953765257e-06, 'epoch': 0.77} +{'loss': 0.9269, 'grad_norm': 1.7662941217422485, 'learning_rate': 2.6188100251265947e-06, 'epoch': 0.77} +{'loss': 0.8954, 'grad_norm': 1.8908839225769043, 'learning_rate': 2.6146060227835156e-06, 'epoch': 0.77} +{'loss': 0.9381, 'grad_norm': 1.8092327117919922, 'learning_rate': 2.6104048899807243e-06, 'epoch': 0.77} +{'loss': 0.7952, 'grad_norm': 1.694856882095337, 'learning_rate': 2.6062066283505404e-06, 'epoch': 0.77} +{'loss': 0.9608, 'grad_norm': 1.79978346824646, 'learning_rate': 2.6020112395241627e-06, 'epoch': 0.77} +{'loss': 0.9268, 'grad_norm': 1.9115095138549805, 'learning_rate': 2.5978187251316823e-06, 'epoch': 0.77} +{'loss': 0.9623, 'grad_norm': 1.8108848333358765, 'learning_rate': 2.5936290868020685e-06, 'epoch': 0.77} +{'loss': 0.9196, 'grad_norm': 1.7389498949050903, 'learning_rate': 2.5894423261631717e-06, 'epoch': 0.77} +{'loss': 0.9459, 'grad_norm': 1.7370339632034302, 'learning_rate': 2.5852584448417327e-06, 'epoch': 0.77} +{'loss': 0.9254, 'grad_norm': 2.0633273124694824, 'learning_rate': 2.581077444463365e-06, 'epoch': 0.77} +{'loss': 0.8898, 'grad_norm': 1.787894606590271, 'learning_rate': 2.5768993266525653e-06, 'epoch': 0.77} +{'loss': 0.9531, 'grad_norm': 1.684076189994812, 'learning_rate': 2.5727240930327103e-06, 'epoch': 0.77} +{'loss': 0.9651, 'grad_norm': 1.779449701309204, 'learning_rate': 2.5685517452260566e-06, 'epoch': 0.77} +{'loss': 0.8894, 'grad_norm': 0.8608153462409973, 'learning_rate': 2.564382284853738e-06, 'epoch': 0.77} +{'loss': 0.9892, 'grad_norm': 1.7776696681976318, 'learning_rate': 2.5602157135357697e-06, 'epoch': 0.77} +{'loss': 0.8318, 'grad_norm': 1.7945811748504639, 'learning_rate': 2.5560520328910397e-06, 'epoch': 0.77} +{'loss': 0.9085, 'grad_norm': 1.8053555488586426, 'learning_rate': 2.55189124453732e-06, 'epoch': 0.77} +{'loss': 0.9454, 'grad_norm': 1.656427264213562, 'learning_rate': 2.5477333500912473e-06, 'epoch': 0.77} +{'loss': 0.9682, 'grad_norm': 1.6920685768127441, 'learning_rate': 2.5435783511683444e-06, 'epoch': 0.77} +{'loss': 0.9223, 'grad_norm': 1.7086405754089355, 'learning_rate': 2.539426249383006e-06, 'epoch': 0.78} +{'loss': 0.9125, 'grad_norm': 1.7059297561645508, 'learning_rate': 2.5352770463484986e-06, 'epoch': 0.78} +{'loss': 0.8328, 'grad_norm': 0.8374617695808411, 'learning_rate': 2.531130743676966e-06, 'epoch': 0.78} +{'loss': 0.9087, 'grad_norm': 1.7718596458435059, 'learning_rate': 2.5269873429794235e-06, 'epoch': 0.78} +{'loss': 1.0296, 'grad_norm': 1.9159125089645386, 'learning_rate': 2.5228468458657585e-06, 'epoch': 0.78} +{'loss': 0.9394, 'grad_norm': 1.6716376543045044, 'learning_rate': 2.51870925394473e-06, 'epoch': 0.78} +{'loss': 0.9535, 'grad_norm': 1.716349482536316, 'learning_rate': 2.5145745688239696e-06, 'epoch': 0.78} +{'loss': 0.7995, 'grad_norm': 1.7742855548858643, 'learning_rate': 2.5104427921099783e-06, 'epoch': 0.78} +{'loss': 0.93, 'grad_norm': 1.9097943305969238, 'learning_rate': 2.506313925408127e-06, 'epoch': 0.78} +{'loss': 0.9632, 'grad_norm': 1.6164729595184326, 'learning_rate': 2.502187970322657e-06, 'epoch': 0.78} +{'loss': 0.886, 'grad_norm': 1.6353263854980469, 'learning_rate': 2.4980649284566783e-06, 'epoch': 0.78} +{'loss': 0.8731, 'grad_norm': 1.6983253955841064, 'learning_rate': 2.493944801412167e-06, 'epoch': 0.78} +{'loss': 0.8975, 'grad_norm': 1.812192440032959, 'learning_rate': 2.489827590789968e-06, 'epoch': 0.78} +{'loss': 0.9198, 'grad_norm': 1.7929126024246216, 'learning_rate': 2.485713298189798e-06, 'epoch': 0.78} +{'loss': 1.0131, 'grad_norm': 1.7959985733032227, 'learning_rate': 2.4816019252102274e-06, 'epoch': 0.78} +{'loss': 0.8862, 'grad_norm': 1.8222358226776123, 'learning_rate': 2.4774934734487034e-06, 'epoch': 0.78} +{'loss': 0.9231, 'grad_norm': 1.5747305154800415, 'learning_rate': 2.473387944501533e-06, 'epoch': 0.78} +{'loss': 0.923, 'grad_norm': 1.7685227394104004, 'learning_rate': 2.469285339963892e-06, 'epoch': 0.78} +{'loss': 0.9269, 'grad_norm': 1.7454628944396973, 'learning_rate': 2.4651856614298143e-06, 'epoch': 0.78} +{'loss': 0.8806, 'grad_norm': 1.7711107730865479, 'learning_rate': 2.461088910492202e-06, 'epoch': 0.78} +{'loss': 0.877, 'grad_norm': 0.888532280921936, 'learning_rate': 2.456995088742815e-06, 'epoch': 0.78} +{'loss': 0.9692, 'grad_norm': 1.8654848337173462, 'learning_rate': 2.45290419777228e-06, 'epoch': 0.78} +{'loss': 0.915, 'grad_norm': 1.7783645391464233, 'learning_rate': 2.448816239170081e-06, 'epoch': 0.78} +{'loss': 0.9099, 'grad_norm': 1.8375135660171509, 'learning_rate': 2.444731214524565e-06, 'epoch': 0.78} +{'loss': 0.9271, 'grad_norm': 1.7210549116134644, 'learning_rate': 2.440649125422937e-06, 'epoch': 0.78} +{'loss': 0.9127, 'grad_norm': 1.6757333278656006, 'learning_rate': 2.436569973451264e-06, 'epoch': 0.78} +{'loss': 0.9292, 'grad_norm': 1.5941623449325562, 'learning_rate': 2.432493760194469e-06, 'epoch': 0.78} +{'loss': 0.907, 'grad_norm': 1.8091968297958374, 'learning_rate': 2.428420487236336e-06, 'epoch': 0.78} +{'loss': 0.9136, 'grad_norm': 1.781041145324707, 'learning_rate': 2.4243501561595027e-06, 'epoch': 0.78} +{'loss': 0.9136, 'grad_norm': 1.7897768020629883, 'learning_rate': 2.420282768545469e-06, 'epoch': 0.78} +{'loss': 0.9285, 'grad_norm': 1.8277232646942139, 'learning_rate': 2.41621832597459e-06, 'epoch': 0.78} +{'loss': 0.8872, 'grad_norm': 1.6590546369552612, 'learning_rate': 2.412156830026069e-06, 'epoch': 0.78} +{'loss': 0.9976, 'grad_norm': 1.8351997137069702, 'learning_rate': 2.408098282277972e-06, 'epoch': 0.78} +{'loss': 0.8299, 'grad_norm': 1.791701078414917, 'learning_rate': 2.4040426843072206e-06, 'epoch': 0.78} +{'loss': 0.9054, 'grad_norm': 0.8749986886978149, 'learning_rate': 2.3999900376895844e-06, 'epoch': 0.78} +{'loss': 0.9338, 'grad_norm': 1.7632205486297607, 'learning_rate': 2.395940343999691e-06, 'epoch': 0.78} +{'loss': 0.9143, 'grad_norm': 1.9440624713897705, 'learning_rate': 2.3918936048110197e-06, 'epoch': 0.78} +{'loss': 0.8103, 'grad_norm': 1.6107124090194702, 'learning_rate': 2.387849821695899e-06, 'epoch': 0.78} +{'loss': 0.9234, 'grad_norm': 1.704505205154419, 'learning_rate': 2.3838089962255127e-06, 'epoch': 0.78} +{'loss': 0.9606, 'grad_norm': 1.6994004249572754, 'learning_rate': 2.3797711299698924e-06, 'epoch': 0.78} +{'loss': 0.947, 'grad_norm': 1.7423814535140991, 'learning_rate': 2.375736224497922e-06, 'epoch': 0.78} +{'loss': 0.8677, 'grad_norm': 1.6235370635986328, 'learning_rate': 2.371704281377335e-06, 'epoch': 0.78} +{'loss': 1.0437, 'grad_norm': 1.8227174282073975, 'learning_rate': 2.3676753021747113e-06, 'epoch': 0.78} +{'loss': 0.9784, 'grad_norm': 1.5885359048843384, 'learning_rate': 2.363649288455482e-06, 'epoch': 0.78} +{'loss': 0.8752, 'grad_norm': 1.7153465747833252, 'learning_rate': 2.3596262417839256e-06, 'epoch': 0.78} +{'loss': 0.9542, 'grad_norm': 1.7843090295791626, 'learning_rate': 2.3556061637231653e-06, 'epoch': 0.78} +{'loss': 0.9761, 'grad_norm': 1.7992967367172241, 'learning_rate': 2.3515890558351773e-06, 'epoch': 0.78} +{'loss': 0.8991, 'grad_norm': 1.7027226686477661, 'learning_rate': 2.347574919680773e-06, 'epoch': 0.78} +{'loss': 0.9599, 'grad_norm': 1.7947041988372803, 'learning_rate': 2.343563756819619e-06, 'epoch': 0.78} +{'loss': 1.0005, 'grad_norm': 1.9267503023147583, 'learning_rate': 2.339555568810221e-06, 'epoch': 0.78} +{'loss': 0.8889, 'grad_norm': 1.8662915229797363, 'learning_rate': 2.3355503572099324e-06, 'epoch': 0.78} +{'loss': 0.9568, 'grad_norm': 1.644400715827942, 'learning_rate': 2.3315481235749484e-06, 'epoch': 0.78} +{'loss': 1.0112, 'grad_norm': 1.6382209062576294, 'learning_rate': 2.3275488694603076e-06, 'epoch': 0.79} +{'loss': 0.9169, 'grad_norm': 1.6913951635360718, 'learning_rate': 2.323552596419889e-06, 'epoch': 0.79} +{'loss': 0.9751, 'grad_norm': 1.802233338356018, 'learning_rate': 2.319559306006417e-06, 'epoch': 0.79} +{'loss': 0.8335, 'grad_norm': 1.6668671369552612, 'learning_rate': 2.315568999771455e-06, 'epoch': 0.79} +{'loss': 0.8748, 'grad_norm': 1.6295757293701172, 'learning_rate': 2.3115816792654057e-06, 'epoch': 0.79} +{'loss': 0.9773, 'grad_norm': 1.8585103750228882, 'learning_rate': 2.3075973460375134e-06, 'epoch': 0.79} +{'loss': 0.8894, 'grad_norm': 0.8987467288970947, 'learning_rate': 2.303616001635861e-06, 'epoch': 0.79} +{'loss': 0.8485, 'grad_norm': 1.6927284002304077, 'learning_rate': 2.2996376476073724e-06, 'epoch': 0.79} +{'loss': 0.9034, 'grad_norm': 1.7210171222686768, 'learning_rate': 2.295662285497805e-06, 'epoch': 0.79} +{'loss': 0.831, 'grad_norm': 1.8954977989196777, 'learning_rate': 2.291689916851758e-06, 'epoch': 0.79} +{'loss': 0.9985, 'grad_norm': 1.8608659505844116, 'learning_rate': 2.287720543212666e-06, 'epoch': 0.79} +{'loss': 0.9033, 'grad_norm': 1.6163427829742432, 'learning_rate': 2.2837541661228024e-06, 'epoch': 0.79} +{'loss': 0.897, 'grad_norm': 1.8723149299621582, 'learning_rate': 2.2797907871232673e-06, 'epoch': 0.79} +{'loss': 0.9096, 'grad_norm': 2.0258073806762695, 'learning_rate': 2.275830407754006e-06, 'epoch': 0.79} +{'loss': 0.8397, 'grad_norm': 1.7099132537841797, 'learning_rate': 2.271873029553795e-06, 'epoch': 0.79} +{'loss': 0.9012, 'grad_norm': 1.8233882188796997, 'learning_rate': 2.267918654060244e-06, 'epoch': 0.79} +{'loss': 0.8525, 'grad_norm': 1.7047005891799927, 'learning_rate': 2.2639672828097956e-06, 'epoch': 0.79} +{'loss': 0.8716, 'grad_norm': 1.6801788806915283, 'learning_rate': 2.2600189173377263e-06, 'epoch': 0.79} +{'loss': 0.8674, 'grad_norm': 0.8978151082992554, 'learning_rate': 2.256073559178145e-06, 'epoch': 0.79} +{'loss': 0.9572, 'grad_norm': 1.7653475999832153, 'learning_rate': 2.2521312098639914e-06, 'epoch': 0.79} +{'loss': 0.9904, 'grad_norm': 1.7373738288879395, 'learning_rate': 2.2481918709270347e-06, 'epoch': 0.79} +{'loss': 0.872, 'grad_norm': 1.6675580739974976, 'learning_rate': 2.2442555438978774e-06, 'epoch': 0.79} +{'loss': 0.954, 'grad_norm': 1.8066377639770508, 'learning_rate': 2.240322230305951e-06, 'epoch': 0.79} +{'loss': 0.8799, 'grad_norm': 0.8698446154594421, 'learning_rate': 2.2363919316795135e-06, 'epoch': 0.79} +{'loss': 0.902, 'grad_norm': 1.6235077381134033, 'learning_rate': 2.2324646495456546e-06, 'epoch': 0.79} +{'loss': 0.937, 'grad_norm': 1.745627999305725, 'learning_rate': 2.2285403854302912e-06, 'epoch': 0.79} +{'loss': 0.967, 'grad_norm': 1.8406028747558594, 'learning_rate': 2.224619140858166e-06, 'epoch': 0.79} +{'loss': 0.9025, 'grad_norm': 1.881837248802185, 'learning_rate': 2.2207009173528528e-06, 'epoch': 0.79} +{'loss': 0.8921, 'grad_norm': 1.5573763847351074, 'learning_rate': 2.2167857164367424e-06, 'epoch': 0.79} +{'loss': 0.8747, 'grad_norm': 1.8537564277648926, 'learning_rate': 2.2128735396310606e-06, 'epoch': 0.79} +{'loss': 0.9632, 'grad_norm': 2.0814061164855957, 'learning_rate': 2.2089643884558553e-06, 'epoch': 0.79} +{'loss': 0.8816, 'grad_norm': 1.7369502782821655, 'learning_rate': 2.2050582644299976e-06, 'epoch': 0.79} +{'loss': 1.0288, 'grad_norm': 1.7999197244644165, 'learning_rate': 2.201155169071184e-06, 'epoch': 0.79} +{'loss': 0.9286, 'grad_norm': 1.627845048904419, 'learning_rate': 2.1972551038959313e-06, 'epoch': 0.79} +{'loss': 0.8168, 'grad_norm': 1.7556148767471313, 'learning_rate': 2.1933580704195835e-06, 'epoch': 0.79} +{'loss': 1.0117, 'grad_norm': 1.7163196802139282, 'learning_rate': 2.189464070156302e-06, 'epoch': 0.79} +{'loss': 0.8678, 'grad_norm': 1.5705773830413818, 'learning_rate': 2.1855731046190732e-06, 'epoch': 0.79} +{'loss': 0.9434, 'grad_norm': 1.532548189163208, 'learning_rate': 2.1816851753197023e-06, 'epoch': 0.79} +{'loss': 0.8403, 'grad_norm': 0.8407483696937561, 'learning_rate': 2.177800283768817e-06, 'epoch': 0.79} +{'loss': 0.9009, 'grad_norm': 3.66011643409729, 'learning_rate': 2.173918431475861e-06, 'epoch': 0.79} +{'loss': 1.0126, 'grad_norm': 1.9078726768493652, 'learning_rate': 2.1700396199491015e-06, 'epoch': 0.79} +{'loss': 0.8908, 'grad_norm': 1.6532087326049805, 'learning_rate': 2.1661638506956208e-06, 'epoch': 0.79} +{'loss': 0.9527, 'grad_norm': 1.682085633277893, 'learning_rate': 2.1622911252213195e-06, 'epoch': 0.79} +{'loss': 0.8926, 'grad_norm': 1.6870826482772827, 'learning_rate': 2.1584214450309194e-06, 'epoch': 0.79} +{'loss': 0.9725, 'grad_norm': 1.7931045293807983, 'learning_rate': 2.1545548116279557e-06, 'epoch': 0.79} +{'loss': 0.8105, 'grad_norm': 1.8187659978866577, 'learning_rate': 2.1506912265147772e-06, 'epoch': 0.79} +{'loss': 0.945, 'grad_norm': 1.5757101774215698, 'learning_rate': 2.146830691192553e-06, 'epoch': 0.79} +{'loss': 0.8876, 'grad_norm': 1.8198773860931396, 'learning_rate': 2.1429732071612653e-06, 'epoch': 0.79} +{'loss': 0.9436, 'grad_norm': 1.789643406867981, 'learning_rate': 2.1391187759197118e-06, 'epoch': 0.79} +{'loss': 0.9415, 'grad_norm': 1.7685370445251465, 'learning_rate': 2.1352673989655026e-06, 'epoch': 0.79} +{'loss': 0.9234, 'grad_norm': 1.7043297290802002, 'learning_rate': 2.131419077795063e-06, 'epoch': 0.79} +{'loss': 0.881, 'grad_norm': 1.789685845375061, 'learning_rate': 2.1275738139036274e-06, 'epoch': 0.79} +{'loss': 0.8407, 'grad_norm': 1.8379076719284058, 'learning_rate': 2.1237316087852465e-06, 'epoch': 0.8} +{'loss': 0.9942, 'grad_norm': 1.603400707244873, 'learning_rate': 2.119892463932781e-06, 'epoch': 0.8} +{'loss': 0.88, 'grad_norm': 1.579335331916809, 'learning_rate': 2.116056380837902e-06, 'epoch': 0.8} +{'loss': 0.9198, 'grad_norm': 1.731886863708496, 'learning_rate': 2.1122233609910903e-06, 'epoch': 0.8} +{'loss': 0.8905, 'grad_norm': 0.8273346424102783, 'learning_rate': 2.108393405881638e-06, 'epoch': 0.8} +{'loss': 0.8653, 'grad_norm': 1.702808141708374, 'learning_rate': 2.104566516997647e-06, 'epoch': 0.8} +{'loss': 0.9358, 'grad_norm': 1.9452027082443237, 'learning_rate': 2.1007426958260245e-06, 'epoch': 0.8} +{'loss': 0.9803, 'grad_norm': 1.7004059553146362, 'learning_rate': 2.096921943852489e-06, 'epoch': 0.8} +{'loss': 0.8829, 'grad_norm': 1.7824429273605347, 'learning_rate': 2.093104262561569e-06, 'epoch': 0.8} +{'loss': 0.8784, 'grad_norm': 1.800701379776001, 'learning_rate': 2.08928965343659e-06, 'epoch': 0.8} +{'loss': 0.9204, 'grad_norm': 1.615323781967163, 'learning_rate': 2.0854781179596937e-06, 'epoch': 0.8} +{'loss': 1.0165, 'grad_norm': 1.9221491813659668, 'learning_rate': 2.0816696576118223e-06, 'epoch': 0.8} +{'loss': 0.9138, 'grad_norm': 1.4252030849456787, 'learning_rate': 2.077864273872727e-06, 'epoch': 0.8} +{'loss': 0.9403, 'grad_norm': 1.5626823902130127, 'learning_rate': 2.0740619682209607e-06, 'epoch': 0.8} +{'loss': 0.9053, 'grad_norm': 1.7487009763717651, 'learning_rate': 2.0702627421338817e-06, 'epoch': 0.8} +{'loss': 1.0108, 'grad_norm': 1.7869491577148438, 'learning_rate': 2.0664665970876496e-06, 'epoch': 0.8} +{'loss': 0.8933, 'grad_norm': 1.7700741291046143, 'learning_rate': 2.06267353455723e-06, 'epoch': 0.8} +{'loss': 0.8364, 'grad_norm': 0.913165807723999, 'learning_rate': 2.05888355601639e-06, 'epoch': 0.8} +{'loss': 0.9108, 'grad_norm': 1.8121180534362793, 'learning_rate': 2.0550966629376955e-06, 'epoch': 0.8} +{'loss': 0.8049, 'grad_norm': 2.0133259296417236, 'learning_rate': 2.0513128567925177e-06, 'epoch': 0.8} +{'loss': 0.893, 'grad_norm': 1.747348666191101, 'learning_rate': 2.0475321390510262e-06, 'epoch': 0.8} +{'loss': 0.8638, 'grad_norm': 0.8426925539970398, 'learning_rate': 2.043754511182191e-06, 'epoch': 0.8} +{'loss': 0.8923, 'grad_norm': 1.6918818950653076, 'learning_rate': 2.0399799746537806e-06, 'epoch': 0.8} +{'loss': 0.8642, 'grad_norm': 1.755738615989685, 'learning_rate': 2.036208530932363e-06, 'epoch': 0.8} +{'loss': 0.8771, 'grad_norm': 1.7846592664718628, 'learning_rate': 2.0324401814833074e-06, 'epoch': 0.8} +{'loss': 0.9704, 'grad_norm': 1.5918515920639038, 'learning_rate': 2.0286749277707783e-06, 'epoch': 0.8} +{'loss': 0.8801, 'grad_norm': 1.7065420150756836, 'learning_rate': 2.024912771257732e-06, 'epoch': 0.8} +{'loss': 0.9264, 'grad_norm': 1.7380180358886719, 'learning_rate': 2.021153713405931e-06, 'epoch': 0.8} +{'loss': 0.9261, 'grad_norm': 2.009814977645874, 'learning_rate': 2.0173977556759284e-06, 'epoch': 0.8} +{'loss': 0.9294, 'grad_norm': 1.7124450206756592, 'learning_rate': 2.013644899527074e-06, 'epoch': 0.8} +{'loss': 0.8739, 'grad_norm': 1.768195629119873, 'learning_rate': 2.009895146417512e-06, 'epoch': 0.8} +{'loss': 0.972, 'grad_norm': 1.8154486417770386, 'learning_rate': 2.0061484978041813e-06, 'epoch': 0.8} +{'loss': 0.9693, 'grad_norm': 1.6953485012054443, 'learning_rate': 2.002404955142815e-06, 'epoch': 0.8} +{'loss': 0.8724, 'grad_norm': 1.7631884813308716, 'learning_rate': 1.9986645198879385e-06, 'epoch': 0.8} +{'loss': 0.86, 'grad_norm': 0.9007226228713989, 'learning_rate': 1.9949271934928703e-06, 'epoch': 0.8} +{'loss': 0.9088, 'grad_norm': 1.7080154418945312, 'learning_rate': 1.9911929774097216e-06, 'epoch': 0.8} +{'loss': 0.9125, 'grad_norm': 1.6333963871002197, 'learning_rate': 1.9874618730893947e-06, 'epoch': 0.8} +{'loss': 0.9207, 'grad_norm': 1.6962813138961792, 'learning_rate': 1.983733881981581e-06, 'epoch': 0.8} +{'loss': 0.8877, 'grad_norm': 1.9219435453414917, 'learning_rate': 1.9800090055347654e-06, 'epoch': 0.8} +{'loss': 0.9354, 'grad_norm': 1.5853062868118286, 'learning_rate': 1.9762872451962214e-06, 'epoch': 0.8} +{'loss': 0.8824, 'grad_norm': 1.7312614917755127, 'learning_rate': 1.9725686024120093e-06, 'epoch': 0.8} +{'loss': 0.9704, 'grad_norm': 1.6753257513046265, 'learning_rate': 1.9688530786269854e-06, 'epoch': 0.8} +{'loss': 0.958, 'grad_norm': 1.8347809314727783, 'learning_rate': 1.965140675284779e-06, 'epoch': 0.8} +{'loss': 0.9175, 'grad_norm': 1.6330296993255615, 'learning_rate': 1.961431393827827e-06, 'epoch': 0.8} +{'loss': 0.9995, 'grad_norm': 1.6899086236953735, 'learning_rate': 1.95772523569734e-06, 'epoch': 0.8} +{'loss': 0.9076, 'grad_norm': 1.725965976715088, 'learning_rate': 1.9540222023333165e-06, 'epoch': 0.8} +{'loss': 0.8555, 'grad_norm': 1.7115392684936523, 'learning_rate': 1.950322295174545e-06, 'epoch': 0.8} +{'loss': 0.9329, 'grad_norm': 2.325735092163086, 'learning_rate': 1.9466255156585965e-06, 'epoch': 0.8} +{'loss': 0.893, 'grad_norm': 1.7065677642822266, 'learning_rate': 1.9429318652218277e-06, 'epoch': 0.8} +{'loss': 0.904, 'grad_norm': 1.7874064445495605, 'learning_rate': 1.9392413452993787e-06, 'epoch': 0.8} +{'loss': 0.9194, 'grad_norm': 1.701096534729004, 'learning_rate': 1.9355539573251737e-06, 'epoch': 0.8} +{'loss': 0.9404, 'grad_norm': 1.6897774934768677, 'learning_rate': 1.93186970273192e-06, 'epoch': 0.8} +{'loss': 0.9543, 'grad_norm': 1.9749809503555298, 'learning_rate': 1.928188582951108e-06, 'epoch': 0.81} +{'loss': 0.9133, 'grad_norm': 1.8024415969848633, 'learning_rate': 1.9245105994130086e-06, 'epoch': 0.81} +{'loss': 0.8686, 'grad_norm': 1.8628535270690918, 'learning_rate': 1.920835753546679e-06, 'epoch': 0.81} +{'loss': 0.8965, 'grad_norm': 1.520134449005127, 'learning_rate': 1.9171640467799478e-06, 'epoch': 0.81} +{'loss': 1.0196, 'grad_norm': 1.846358060836792, 'learning_rate': 1.913495480539432e-06, 'epoch': 0.81} +{'loss': 0.9496, 'grad_norm': 1.634227991104126, 'learning_rate': 1.9098300562505266e-06, 'epoch': 0.81} +{'loss': 0.945, 'grad_norm': 1.7520923614501953, 'learning_rate': 1.906167775337404e-06, 'epoch': 0.81} +{'loss': 0.9212, 'grad_norm': 1.7849434614181519, 'learning_rate': 1.9025086392230163e-06, 'epoch': 0.81} +{'loss': 0.9248, 'grad_norm': 1.973913550376892, 'learning_rate': 1.8988526493290948e-06, 'epoch': 0.81} +{'loss': 0.9191, 'grad_norm': 1.7237037420272827, 'learning_rate': 1.895199807076148e-06, 'epoch': 0.81} +{'loss': 0.9146, 'grad_norm': 1.921294927597046, 'learning_rate': 1.891550113883458e-06, 'epoch': 0.81} +{'loss': 0.9469, 'grad_norm': 1.6559011936187744, 'learning_rate': 1.8879035711690897e-06, 'epoch': 0.81} +{'loss': 0.888, 'grad_norm': 2.1886346340179443, 'learning_rate': 1.8842601803498772e-06, 'epoch': 0.81} +{'loss': 0.8677, 'grad_norm': 1.6460084915161133, 'learning_rate': 1.880619942841435e-06, 'epoch': 0.81} +{'loss': 0.9895, 'grad_norm': 1.6948699951171875, 'learning_rate': 1.8769828600581497e-06, 'epoch': 0.81} +{'loss': 0.934, 'grad_norm': 1.8468785285949707, 'learning_rate': 1.8733489334131827e-06, 'epoch': 0.81} +{'loss': 0.8835, 'grad_norm': 1.7442662715911865, 'learning_rate': 1.8697181643184703e-06, 'epoch': 0.81} +{'loss': 0.9641, 'grad_norm': 1.787463903427124, 'learning_rate': 1.8660905541847208e-06, 'epoch': 0.81} +{'loss': 0.877, 'grad_norm': 1.657860517501831, 'learning_rate': 1.8624661044214154e-06, 'epoch': 0.81} +{'loss': 0.9232, 'grad_norm': 1.818753957748413, 'learning_rate': 1.858844816436809e-06, 'epoch': 0.81} +{'loss': 0.9388, 'grad_norm': 0.8731621503829956, 'learning_rate': 1.8552266916379225e-06, 'epoch': 0.81} +{'loss': 0.9607, 'grad_norm': 1.8021739721298218, 'learning_rate': 1.8516117314305526e-06, 'epoch': 0.81} +{'loss': 0.8147, 'grad_norm': 1.5152666568756104, 'learning_rate': 1.847999937219266e-06, 'epoch': 0.81} +{'loss': 0.8727, 'grad_norm': 1.7942613363265991, 'learning_rate': 1.8443913104073984e-06, 'epoch': 0.81} +{'loss': 0.9542, 'grad_norm': 1.8418891429901123, 'learning_rate': 1.8407858523970568e-06, 'epoch': 0.81} +{'loss': 0.9527, 'grad_norm': 1.671880841255188, 'learning_rate': 1.8371835645891134e-06, 'epoch': 0.81} +{'loss': 0.9289, 'grad_norm': 1.663207769393921, 'learning_rate': 1.833584448383211e-06, 'epoch': 0.81} +{'loss': 0.9382, 'grad_norm': 1.6976065635681152, 'learning_rate': 1.82998850517776e-06, 'epoch': 0.81} +{'loss': 0.8561, 'grad_norm': 1.7451149225234985, 'learning_rate': 1.826395736369937e-06, 'epoch': 0.81} +{'loss': 0.977, 'grad_norm': 1.8604519367218018, 'learning_rate': 1.8228061433556866e-06, 'epoch': 0.81} +{'loss': 0.8978, 'grad_norm': 1.8794054985046387, 'learning_rate': 1.8192197275297174e-06, 'epoch': 0.81} +{'loss': 0.9227, 'grad_norm': 1.7330206632614136, 'learning_rate': 1.815636490285506e-06, 'epoch': 0.81} +{'loss': 0.9015, 'grad_norm': 1.8505733013153076, 'learning_rate': 1.8120564330152924e-06, 'epoch': 0.81} +{'loss': 0.9491, 'grad_norm': 1.6796363592147827, 'learning_rate': 1.808479557110081e-06, 'epoch': 0.81} +{'loss': 0.9308, 'grad_norm': 1.878218412399292, 'learning_rate': 1.8049058639596418e-06, 'epoch': 0.81} +{'loss': 0.9241, 'grad_norm': 1.7754994630813599, 'learning_rate': 1.801335354952507e-06, 'epoch': 0.81} +{'loss': 0.9203, 'grad_norm': 1.771694302558899, 'learning_rate': 1.7977680314759693e-06, 'epoch': 0.81} +{'loss': 0.8729, 'grad_norm': 1.8847718238830566, 'learning_rate': 1.7942038949160857e-06, 'epoch': 0.81} +{'loss': 0.9399, 'grad_norm': 1.8215410709381104, 'learning_rate': 1.7906429466576768e-06, 'epoch': 0.81} +{'loss': 0.9825, 'grad_norm': 1.9568886756896973, 'learning_rate': 1.7870851880843233e-06, 'epoch': 0.81} +{'loss': 0.9358, 'grad_norm': 2.0534512996673584, 'learning_rate': 1.7835306205783643e-06, 'epoch': 0.81} +{'loss': 0.8807, 'grad_norm': 1.7654199600219727, 'learning_rate': 1.7799792455209019e-06, 'epoch': 0.81} +{'loss': 0.8955, 'grad_norm': 1.6095807552337646, 'learning_rate': 1.776431064291796e-06, 'epoch': 0.81} +{'loss': 0.882, 'grad_norm': 1.7856521606445312, 'learning_rate': 1.7728860782696666e-06, 'epoch': 0.81} +{'loss': 0.9345, 'grad_norm': 1.734665870666504, 'learning_rate': 1.7693442888318923e-06, 'epoch': 0.81} +{'loss': 0.9019, 'grad_norm': 1.6986876726150513, 'learning_rate': 1.765805697354608e-06, 'epoch': 0.81} +{'loss': 0.9102, 'grad_norm': 0.9054495692253113, 'learning_rate': 1.7622703052127078e-06, 'epoch': 0.81} +{'loss': 0.921, 'grad_norm': 1.755710244178772, 'learning_rate': 1.7587381137798432e-06, 'epoch': 0.81} +{'loss': 0.9204, 'grad_norm': 1.6024792194366455, 'learning_rate': 1.7552091244284197e-06, 'epoch': 0.81} +{'loss': 0.9189, 'grad_norm': 1.6380693912506104, 'learning_rate': 1.7516833385296016e-06, 'epoch': 0.81} +{'loss': 0.8332, 'grad_norm': 2.064812660217285, 'learning_rate': 1.7481607574533044e-06, 'epoch': 0.81} +{'loss': 0.9808, 'grad_norm': 1.720259666442871, 'learning_rate': 1.744641382568204e-06, 'epoch': 0.81} +{'loss': 0.9313, 'grad_norm': 1.7702584266662598, 'learning_rate': 1.741125215241728e-06, 'epoch': 0.82} +{'loss': 1.0182, 'grad_norm': 1.6408809423446655, 'learning_rate': 1.7376122568400533e-06, 'epoch': 0.82} +{'loss': 0.9077, 'grad_norm': 1.726912021636963, 'learning_rate': 1.7341025087281149e-06, 'epoch': 0.82} +{'loss': 0.9951, 'grad_norm': 1.8319919109344482, 'learning_rate': 1.730595972269602e-06, 'epoch': 0.82} +{'loss': 0.9313, 'grad_norm': 1.8397530317306519, 'learning_rate': 1.7270926488269502e-06, 'epoch': 0.82} +{'loss': 0.9689, 'grad_norm': 1.856245517730713, 'learning_rate': 1.7235925397613529e-06, 'epoch': 0.82} +{'loss': 0.9232, 'grad_norm': 1.646837592124939, 'learning_rate': 1.7200956464327512e-06, 'epoch': 0.82} +{'loss': 0.861, 'grad_norm': 1.9551154375076294, 'learning_rate': 1.716601970199836e-06, 'epoch': 0.82} +{'loss': 0.9859, 'grad_norm': 1.7405091524124146, 'learning_rate': 1.7131115124200494e-06, 'epoch': 0.82} +{'loss': 0.9308, 'grad_norm': 1.6753772497177124, 'learning_rate': 1.709624274449584e-06, 'epoch': 0.82} +{'loss': 0.8794, 'grad_norm': 1.8177111148834229, 'learning_rate': 1.7061402576433794e-06, 'epoch': 0.82} +{'loss': 0.9008, 'grad_norm': 1.8208540678024292, 'learning_rate': 1.7026594633551252e-06, 'epoch': 0.82} +{'loss': 0.9693, 'grad_norm': 1.803389549255371, 'learning_rate': 1.6991818929372584e-06, 'epoch': 0.82} +{'loss': 0.8886, 'grad_norm': 1.9115324020385742, 'learning_rate': 1.6957075477409623e-06, 'epoch': 0.82} +{'loss': 0.9003, 'grad_norm': 1.6749255657196045, 'learning_rate': 1.6922364291161697e-06, 'epoch': 0.82} +{'loss': 0.9866, 'grad_norm': 1.807468056678772, 'learning_rate': 1.688768538411556e-06, 'epoch': 0.82} +{'loss': 0.8801, 'grad_norm': 1.8065719604492188, 'learning_rate': 1.6853038769745466e-06, 'epoch': 0.82} +{'loss': 0.9524, 'grad_norm': 1.7839444875717163, 'learning_rate': 1.6818424461513129e-06, 'epoch': 0.82} +{'loss': 0.8754, 'grad_norm': 1.7329148054122925, 'learning_rate': 1.6783842472867607e-06, 'epoch': 0.82} +{'loss': 0.9778, 'grad_norm': 1.8929228782653809, 'learning_rate': 1.674929281724551e-06, 'epoch': 0.82} +{'loss': 0.9281, 'grad_norm': 1.772507667541504, 'learning_rate': 1.6714775508070868e-06, 'epoch': 0.82} +{'loss': 0.8903, 'grad_norm': 1.7080711126327515, 'learning_rate': 1.6680290558755119e-06, 'epoch': 0.82} +{'loss': 0.9571, 'grad_norm': 1.702973484992981, 'learning_rate': 1.6645837982697122e-06, 'epoch': 0.82} +{'loss': 0.9306, 'grad_norm': 1.8499590158462524, 'learning_rate': 1.6611417793283192e-06, 'epoch': 0.82} +{'loss': 0.9135, 'grad_norm': 1.7541255950927734, 'learning_rate': 1.6577030003887018e-06, 'epoch': 0.82} +{'loss': 0.8819, 'grad_norm': 1.6664550304412842, 'learning_rate': 1.6542674627869738e-06, 'epoch': 0.82} +{'loss': 0.938, 'grad_norm': 1.7327603101730347, 'learning_rate': 1.6508351678579882e-06, 'epoch': 0.82} +{'loss': 0.9478, 'grad_norm': 1.9212325811386108, 'learning_rate': 1.6474061169353362e-06, 'epoch': 0.82} +{'loss': 0.94, 'grad_norm': 1.8893914222717285, 'learning_rate': 1.6439803113513509e-06, 'epoch': 0.82} +{'loss': 0.8678, 'grad_norm': 1.6505144834518433, 'learning_rate': 1.640557752437103e-06, 'epoch': 0.82} +{'loss': 0.8714, 'grad_norm': 0.8692054748535156, 'learning_rate': 1.6371384415224046e-06, 'epoch': 0.82} +{'loss': 0.8703, 'grad_norm': 1.7048276662826538, 'learning_rate': 1.6337223799358025e-06, 'epoch': 0.82} +{'loss': 1.0352, 'grad_norm': 1.8522233963012695, 'learning_rate': 1.6303095690045822e-06, 'epoch': 0.82} +{'loss': 0.9445, 'grad_norm': 1.7438253164291382, 'learning_rate': 1.6269000100547682e-06, 'epoch': 0.82} +{'loss': 0.8408, 'grad_norm': 1.7960048913955688, 'learning_rate': 1.6234937044111155e-06, 'epoch': 0.82} +{'loss': 0.8733, 'grad_norm': 1.800661563873291, 'learning_rate': 1.6200906533971206e-06, 'epoch': 0.82} +{'loss': 0.9661, 'grad_norm': 1.7225450277328491, 'learning_rate': 1.6166908583350138e-06, 'epoch': 0.82} +{'loss': 1.0317, 'grad_norm': 1.6220556497573853, 'learning_rate': 1.6132943205457607e-06, 'epoch': 0.82} +{'loss': 0.8673, 'grad_norm': 1.7651034593582153, 'learning_rate': 1.6099010413490624e-06, 'epoch': 0.82} +{'loss': 0.8898, 'grad_norm': 1.7260187864303589, 'learning_rate': 1.6065110220633495e-06, 'epoch': 0.82} +{'loss': 0.9061, 'grad_norm': 1.8042688369750977, 'learning_rate': 1.6031242640057908e-06, 'epoch': 0.82} +{'loss': 0.9669, 'grad_norm': 1.661299467086792, 'learning_rate': 1.599740768492286e-06, 'epoch': 0.82} +{'loss': 0.961, 'grad_norm': 2.0681376457214355, 'learning_rate': 1.596360536837468e-06, 'epoch': 0.82} +{'loss': 0.8393, 'grad_norm': 1.9284907579421997, 'learning_rate': 1.5929835703546992e-06, 'epoch': 0.82} +{'loss': 0.8643, 'grad_norm': 1.7041962146759033, 'learning_rate': 1.589609870356076e-06, 'epoch': 0.82} +{'loss': 0.9307, 'grad_norm': 1.7222951650619507, 'learning_rate': 1.5862394381524239e-06, 'epoch': 0.82} +{'loss': 0.8792, 'grad_norm': 1.7107068300247192, 'learning_rate': 1.582872275053301e-06, 'epoch': 0.82} +{'loss': 0.8577, 'grad_norm': 1.6639665365219116, 'learning_rate': 1.579508382366992e-06, 'epoch': 0.82} +{'loss': 0.9072, 'grad_norm': 1.8390802145004272, 'learning_rate': 1.5761477614005127e-06, 'epoch': 0.82} +{'loss': 0.9687, 'grad_norm': 1.781119704246521, 'learning_rate': 1.5727904134596084e-06, 'epoch': 0.82} +{'loss': 0.908, 'grad_norm': 1.711091160774231, 'learning_rate': 1.5694363398487545e-06, 'epoch': 0.82} +{'loss': 0.8715, 'grad_norm': 0.8537315726280212, 'learning_rate': 1.566085541871145e-06, 'epoch': 0.82} +{'loss': 0.9765, 'grad_norm': 1.7716619968414307, 'learning_rate': 1.5627380208287114e-06, 'epoch': 0.83} +{'loss': 0.9652, 'grad_norm': 1.9409608840942383, 'learning_rate': 1.5593937780221092e-06, 'epoch': 0.83} +{'loss': 0.8295, 'grad_norm': 1.7797962427139282, 'learning_rate': 1.556052814750717e-06, 'epoch': 0.83} +{'loss': 0.8571, 'grad_norm': 1.8179157972335815, 'learning_rate': 1.5527151323126422e-06, 'epoch': 0.83} +{'loss': 0.9285, 'grad_norm': 1.9461023807525635, 'learning_rate': 1.5493807320047183e-06, 'epoch': 0.83} +{'loss': 0.8974, 'grad_norm': 1.6991040706634521, 'learning_rate': 1.5460496151225002e-06, 'epoch': 0.83} +{'loss': 0.9413, 'grad_norm': 1.8561325073242188, 'learning_rate': 1.542721782960268e-06, 'epoch': 0.83} +{'loss': 0.9175, 'grad_norm': 1.6219134330749512, 'learning_rate': 1.5393972368110287e-06, 'epoch': 0.83} +{'loss': 0.9833, 'grad_norm': 1.7772047519683838, 'learning_rate': 1.536075977966509e-06, 'epoch': 0.83} +{'loss': 0.9904, 'grad_norm': 1.6094797849655151, 'learning_rate': 1.5327580077171589e-06, 'epoch': 0.83} +{'loss': 0.7935, 'grad_norm': 1.7838640213012695, 'learning_rate': 1.5294433273521525e-06, 'epoch': 0.83} +{'loss': 0.9003, 'grad_norm': 1.7071502208709717, 'learning_rate': 1.5261319381593831e-06, 'epoch': 0.83} +{'loss': 0.8805, 'grad_norm': 1.82965886592865, 'learning_rate': 1.522823841425467e-06, 'epoch': 0.83} +{'loss': 0.9749, 'grad_norm': 1.726851463317871, 'learning_rate': 1.5195190384357405e-06, 'epoch': 0.83} +{'loss': 0.9168, 'grad_norm': 1.8934102058410645, 'learning_rate': 1.5162175304742633e-06, 'epoch': 0.83} +{'loss': 0.9727, 'grad_norm': 1.6925345659255981, 'learning_rate': 1.5129193188238068e-06, 'epoch': 0.83} +{'loss': 1.0359, 'grad_norm': 1.8061341047286987, 'learning_rate': 1.5096244047658704e-06, 'epoch': 0.83} +{'loss': 0.936, 'grad_norm': 1.9418781995773315, 'learning_rate': 1.5063327895806668e-06, 'epoch': 0.83} +{'loss': 0.8828, 'grad_norm': 1.560083031654358, 'learning_rate': 1.5030444745471295e-06, 'epoch': 0.83} +{'loss': 0.8736, 'grad_norm': 1.9550267457962036, 'learning_rate': 1.499759460942909e-06, 'epoch': 0.83} +{'loss': 0.8669, 'grad_norm': 1.8836151361465454, 'learning_rate': 1.4964777500443728e-06, 'epoch': 0.83} +{'loss': 0.8788, 'grad_norm': 1.7230879068374634, 'learning_rate': 1.4931993431266056e-06, 'epoch': 0.83} +{'loss': 0.9221, 'grad_norm': 1.8224726915359497, 'learning_rate': 1.4899242414634097e-06, 'epoch': 0.83} +{'loss': 0.854, 'grad_norm': 1.5524909496307373, 'learning_rate': 1.486652446327299e-06, 'epoch': 0.83} +{'loss': 0.9776, 'grad_norm': 1.9619694948196411, 'learning_rate': 1.4833839589895072e-06, 'epoch': 0.83} +{'loss': 0.9144, 'grad_norm': 1.5696148872375488, 'learning_rate': 1.48011878071998e-06, 'epoch': 0.83} +{'loss': 0.9094, 'grad_norm': 2.028028726577759, 'learning_rate': 1.476856912787379e-06, 'epoch': 0.83} +{'loss': 0.9024, 'grad_norm': 1.6830319166183472, 'learning_rate': 1.4735983564590784e-06, 'epoch': 0.83} +{'loss': 0.9765, 'grad_norm': 1.9636989831924438, 'learning_rate': 1.470343113001167e-06, 'epoch': 0.83} +{'loss': 0.875, 'grad_norm': 1.5231817960739136, 'learning_rate': 1.467091183678444e-06, 'epoch': 0.83} +{'loss': 0.9189, 'grad_norm': 1.775160312652588, 'learning_rate': 1.4638425697544235e-06, 'epoch': 0.83} +{'loss': 0.944, 'grad_norm': 1.6822912693023682, 'learning_rate': 1.460597272491333e-06, 'epoch': 0.83} +{'loss': 0.9519, 'grad_norm': 1.823304295539856, 'learning_rate': 1.4573552931501046e-06, 'epoch': 0.83} +{'loss': 0.9068, 'grad_norm': 0.9431740045547485, 'learning_rate': 1.4541166329903856e-06, 'epoch': 0.83} +{'loss': 0.906, 'grad_norm': 1.6673917770385742, 'learning_rate': 1.4508812932705364e-06, 'epoch': 0.83} +{'loss': 0.976, 'grad_norm': 1.9698683023452759, 'learning_rate': 1.4476492752476223e-06, 'epoch': 0.83} +{'loss': 0.9412, 'grad_norm': 1.6973397731781006, 'learning_rate': 1.4444205801774204e-06, 'epoch': 0.83} +{'loss': 0.9048, 'grad_norm': 1.7950079441070557, 'learning_rate': 1.4411952093144167e-06, 'epoch': 0.83} +{'loss': 0.8293, 'grad_norm': 1.9273159503936768, 'learning_rate': 1.4379731639118045e-06, 'epoch': 0.83} +{'loss': 0.886, 'grad_norm': 1.7184327840805054, 'learning_rate': 1.4347544452214869e-06, 'epoch': 0.83} +{'loss': 0.9042, 'grad_norm': 2.353425979614258, 'learning_rate': 1.4315390544940732e-06, 'epoch': 0.83} +{'loss': 0.9154, 'grad_norm': 1.924476981163025, 'learning_rate': 1.4283269929788779e-06, 'epoch': 0.83} +{'loss': 1.0273, 'grad_norm': 1.715088963508606, 'learning_rate': 1.4251182619239257e-06, 'epoch': 0.83} +{'loss': 0.8203, 'grad_norm': 1.9428443908691406, 'learning_rate': 1.4219128625759426e-06, 'epoch': 0.83} +{'loss': 0.8814, 'grad_norm': 1.9512563943862915, 'learning_rate': 1.4187107961803704e-06, 'epoch': 0.83} +{'loss': 0.9298, 'grad_norm': 0.9265878796577454, 'learning_rate': 1.4155120639813392e-06, 'epoch': 0.83} +{'loss': 0.9226, 'grad_norm': 1.6782152652740479, 'learning_rate': 1.4123166672216971e-06, 'epoch': 0.83} +{'loss': 0.9278, 'grad_norm': 1.5828169584274292, 'learning_rate': 1.409124607142992e-06, 'epoch': 0.83} +{'loss': 0.8731, 'grad_norm': 1.710893988609314, 'learning_rate': 1.4059358849854732e-06, 'epoch': 0.83} +{'loss': 0.9195, 'grad_norm': 1.7301074266433716, 'learning_rate': 1.4027505019880972e-06, 'epoch': 0.83} +{'loss': 0.9295, 'grad_norm': 1.9170883893966675, 'learning_rate': 1.3995684593885206e-06, 'epoch': 0.83} +{'loss': 0.8623, 'grad_norm': 1.8658148050308228, 'learning_rate': 1.3963897584231023e-06, 'epoch': 0.83} +{'loss': 0.9246, 'grad_norm': 1.7214055061340332, 'learning_rate': 1.3932144003269033e-06, 'epoch': 0.84} +{'loss': 0.9108, 'grad_norm': 1.7628673315048218, 'learning_rate': 1.3900423863336842e-06, 'epoch': 0.84} +{'loss': 0.8799, 'grad_norm': 1.6332365274429321, 'learning_rate': 1.3868737176759105e-06, 'epoch': 0.84} +{'loss': 0.9271, 'grad_norm': 1.6727641820907593, 'learning_rate': 1.3837083955847418e-06, 'epoch': 0.84} +{'loss': 0.9192, 'grad_norm': 1.6236034631729126, 'learning_rate': 1.3805464212900432e-06, 'epoch': 0.84} +{'loss': 0.8858, 'grad_norm': 1.6248812675476074, 'learning_rate': 1.377387796020374e-06, 'epoch': 0.84} +{'loss': 0.8103, 'grad_norm': 1.7886335849761963, 'learning_rate': 1.3742325210029973e-06, 'epoch': 0.84} +{'loss': 1.0016, 'grad_norm': 1.652933955192566, 'learning_rate': 1.3710805974638697e-06, 'epoch': 0.84} +{'loss': 0.8328, 'grad_norm': 1.7956198453903198, 'learning_rate': 1.3679320266276518e-06, 'epoch': 0.84} +{'loss': 0.9534, 'grad_norm': 1.6026407480239868, 'learning_rate': 1.364786809717692e-06, 'epoch': 0.84} +{'loss': 0.9229, 'grad_norm': 0.9663257598876953, 'learning_rate': 1.3616449479560434e-06, 'epoch': 0.84} +{'loss': 0.7732, 'grad_norm': 1.8057861328125, 'learning_rate': 1.3585064425634542e-06, 'epoch': 0.84} +{'loss': 0.9612, 'grad_norm': 1.7233569622039795, 'learning_rate': 1.3553712947593655e-06, 'epoch': 0.84} +{'loss': 0.8683, 'grad_norm': 2.1935484409332275, 'learning_rate': 1.3522395057619186e-06, 'epoch': 0.84} +{'loss': 0.9166, 'grad_norm': 1.7658956050872803, 'learning_rate': 1.3491110767879444e-06, 'epoch': 0.84} +{'loss': 0.8775, 'grad_norm': 1.7272753715515137, 'learning_rate': 1.345986009052972e-06, 'epoch': 0.84} +{'loss': 0.9321, 'grad_norm': 1.5437290668487549, 'learning_rate': 1.342864303771223e-06, 'epoch': 0.84} +{'loss': 0.9169, 'grad_norm': 1.9196034669876099, 'learning_rate': 1.339745962155613e-06, 'epoch': 0.84} +{'loss': 0.9046, 'grad_norm': 1.6409907341003418, 'learning_rate': 1.3366309854177517e-06, 'epoch': 0.84} +{'loss': 0.9425, 'grad_norm': 1.719199538230896, 'learning_rate': 1.3335193747679387e-06, 'epoch': 0.84} +{'loss': 0.9958, 'grad_norm': 1.6527676582336426, 'learning_rate': 1.330411131415169e-06, 'epoch': 0.84} +{'loss': 0.9396, 'grad_norm': 1.6826039552688599, 'learning_rate': 1.3273062565671258e-06, 'epoch': 0.84} +{'loss': 0.8946, 'grad_norm': 1.758672833442688, 'learning_rate': 1.324204751430186e-06, 'epoch': 0.84} +{'loss': 0.8279, 'grad_norm': 1.6380798816680908, 'learning_rate': 1.3211066172094178e-06, 'epoch': 0.84} +{'loss': 0.9407, 'grad_norm': 1.7920273542404175, 'learning_rate': 1.3180118551085763e-06, 'epoch': 0.84} +{'loss': 0.8621, 'grad_norm': 1.9294027090072632, 'learning_rate': 1.3149204663301118e-06, 'epoch': 0.84} +{'loss': 0.8579, 'grad_norm': 1.7811110019683838, 'learning_rate': 1.3118324520751558e-06, 'epoch': 0.84} +{'loss': 0.9734, 'grad_norm': 1.8185826539993286, 'learning_rate': 1.3087478135435361e-06, 'epoch': 0.84} +{'loss': 1.0106, 'grad_norm': 1.61343252658844, 'learning_rate': 1.3056665519337663e-06, 'epoch': 0.84} +{'loss': 0.8914, 'grad_norm': 1.859797477722168, 'learning_rate': 1.3025886684430467e-06, 'epoch': 0.84} +{'loss': 0.7904, 'grad_norm': 1.754708170890808, 'learning_rate': 1.299514164267267e-06, 'epoch': 0.84} +{'loss': 0.9085, 'grad_norm': 1.8280558586120605, 'learning_rate': 1.2964430406010032e-06, 'epoch': 0.84} +{'loss': 0.8961, 'grad_norm': 1.849546194076538, 'learning_rate': 1.293375298637518e-06, 'epoch': 0.84} +{'loss': 0.853, 'grad_norm': 1.6942812204360962, 'learning_rate': 1.2903109395687597e-06, 'epoch': 0.84} +{'loss': 0.8849, 'grad_norm': 1.7574023008346558, 'learning_rate': 1.2872499645853609e-06, 'epoch': 0.84} +{'loss': 0.9475, 'grad_norm': 1.7603715658187866, 'learning_rate': 1.2841923748766427e-06, 'epoch': 0.84} +{'loss': 0.9657, 'grad_norm': 1.867280125617981, 'learning_rate': 1.2811381716306092e-06, 'epoch': 0.84} +{'loss': 0.8619, 'grad_norm': 1.6992754936218262, 'learning_rate': 1.278087356033947e-06, 'epoch': 0.84} +{'loss': 0.9048, 'grad_norm': 1.8392752408981323, 'learning_rate': 1.2750399292720284e-06, 'epoch': 0.84} +{'loss': 0.8745, 'grad_norm': 1.6466338634490967, 'learning_rate': 1.271995892528909e-06, 'epoch': 0.84} +{'loss': 0.887, 'grad_norm': 1.744739294052124, 'learning_rate': 1.2689552469873267e-06, 'epoch': 0.84} +{'loss': 0.8471, 'grad_norm': 2.1463263034820557, 'learning_rate': 1.2659179938287035e-06, 'epoch': 0.84} +{'loss': 0.9043, 'grad_norm': 1.6629698276519775, 'learning_rate': 1.2628841342331389e-06, 'epoch': 0.84} +{'loss': 0.9238, 'grad_norm': 1.757784128189087, 'learning_rate': 1.2598536693794171e-06, 'epoch': 0.84} +{'loss': 0.948, 'grad_norm': 1.7084028720855713, 'learning_rate': 1.2568266004450037e-06, 'epoch': 0.84} +{'loss': 0.9606, 'grad_norm': 1.974442720413208, 'learning_rate': 1.2538029286060428e-06, 'epoch': 0.84} +{'loss': 0.9209, 'grad_norm': 1.7231429815292358, 'learning_rate': 1.2507826550373614e-06, 'epoch': 0.84} +{'loss': 0.893, 'grad_norm': 1.7420567274093628, 'learning_rate': 1.2477657809124632e-06, 'epoch': 0.84} +{'loss': 0.985, 'grad_norm': 1.8697997331619263, 'learning_rate': 1.244752307403534e-06, 'epoch': 0.84} +{'loss': 0.897, 'grad_norm': 2.256493330001831, 'learning_rate': 1.2417422356814345e-06, 'epoch': 0.84} +{'loss': 0.9565, 'grad_norm': 1.5790531635284424, 'learning_rate': 1.2387355669157076e-06, 'epoch': 0.84} +{'loss': 0.9379, 'grad_norm': 1.9034937620162964, 'learning_rate': 1.2357323022745716e-06, 'epoch': 0.84} +{'loss': 0.9392, 'grad_norm': 1.7414028644561768, 'learning_rate': 1.2327324429249232e-06, 'epoch': 0.85} +{'loss': 0.8588, 'grad_norm': 1.6371206045150757, 'learning_rate': 1.2297359900323346e-06, 'epoch': 0.85} +{'loss': 0.8482, 'grad_norm': 1.8576784133911133, 'learning_rate': 1.2267429447610557e-06, 'epoch': 0.85} +{'loss': 0.9558, 'grad_norm': 1.7783126831054688, 'learning_rate': 1.2237533082740127e-06, 'epoch': 0.85} +{'loss': 0.8707, 'grad_norm': 1.797315239906311, 'learning_rate': 1.2207670817328065e-06, 'epoch': 0.85} +{'loss': 0.9364, 'grad_norm': 1.7101528644561768, 'learning_rate': 1.2177842662977136e-06, 'epoch': 0.85} +{'loss': 0.9407, 'grad_norm': 1.6453410387039185, 'learning_rate': 1.2148048631276876e-06, 'epoch': 0.85} +{'loss': 0.9503, 'grad_norm': 2.000742197036743, 'learning_rate': 1.2118288733803474e-06, 'epoch': 0.85} +{'loss': 0.9585, 'grad_norm': 1.6722896099090576, 'learning_rate': 1.2088562982119957e-06, 'epoch': 0.85} +{'loss': 0.8905, 'grad_norm': 0.8880183100700378, 'learning_rate': 1.2058871387776039e-06, 'epoch': 0.85} +{'loss': 0.9337, 'grad_norm': 1.8514553308486938, 'learning_rate': 1.2029213962308172e-06, 'epoch': 0.85} +{'loss': 0.9123, 'grad_norm': 1.6902151107788086, 'learning_rate': 1.199959071723953e-06, 'epoch': 0.85} +{'loss': 1.0644, 'grad_norm': 2.0223639011383057, 'learning_rate': 1.197000166408001e-06, 'epoch': 0.85} +{'loss': 0.8748, 'grad_norm': 1.7263312339782715, 'learning_rate': 1.19404468143262e-06, 'epoch': 0.85} +{'loss': 0.9377, 'grad_norm': 1.6143752336502075, 'learning_rate': 1.1910926179461446e-06, 'epoch': 0.85} +{'loss': 0.8526, 'grad_norm': 3.7136077880859375, 'learning_rate': 1.188143977095576e-06, 'epoch': 0.85} +{'loss': 0.9503, 'grad_norm': 1.66621732711792, 'learning_rate': 1.185198760026587e-06, 'epoch': 0.85} +{'loss': 0.918, 'grad_norm': 1.791013240814209, 'learning_rate': 1.1822569678835195e-06, 'epoch': 0.85} +{'loss': 0.8827, 'grad_norm': 1.7674356698989868, 'learning_rate': 1.179318601809386e-06, 'epoch': 0.85} +{'loss': 0.8522, 'grad_norm': 1.5898926258087158, 'learning_rate': 1.176383662945866e-06, 'epoch': 0.85} +{'loss': 0.9583, 'grad_norm': 1.6146020889282227, 'learning_rate': 1.1734521524333087e-06, 'epoch': 0.85} +{'loss': 0.9823, 'grad_norm': 1.9493858814239502, 'learning_rate': 1.1705240714107301e-06, 'epoch': 0.85} +{'loss': 0.8396, 'grad_norm': 1.6889700889587402, 'learning_rate': 1.167599421015818e-06, 'epoch': 0.85} +{'loss': 0.8673, 'grad_norm': 1.672223448753357, 'learning_rate': 1.164678202384919e-06, 'epoch': 0.85} +{'loss': 0.9261, 'grad_norm': 1.8779722452163696, 'learning_rate': 1.1617604166530527e-06, 'epoch': 0.85} +{'loss': 0.8713, 'grad_norm': 1.8510607481002808, 'learning_rate': 1.1588460649539036e-06, 'epoch': 0.85} +{'loss': 0.9058, 'grad_norm': 1.801008701324463, 'learning_rate': 1.1559351484198211e-06, 'epoch': 0.85} +{'loss': 1.021, 'grad_norm': 1.8866653442382812, 'learning_rate': 1.153027668181821e-06, 'epoch': 0.85} +{'loss': 0.9344, 'grad_norm': 1.7555983066558838, 'learning_rate': 1.1501236253695823e-06, 'epoch': 0.85} +{'loss': 0.8819, 'grad_norm': 1.8575620651245117, 'learning_rate': 1.1472230211114498e-06, 'epoch': 0.85} +{'loss': 0.8483, 'grad_norm': 1.605782389640808, 'learning_rate': 1.1443258565344329e-06, 'epoch': 0.85} +{'loss': 0.8272, 'grad_norm': 1.8324559926986694, 'learning_rate': 1.1414321327642019e-06, 'epoch': 0.85} +{'loss': 0.8631, 'grad_norm': 0.8643845915794373, 'learning_rate': 1.1385418509250933e-06, 'epoch': 0.85} +{'loss': 0.9059, 'grad_norm': 1.796787977218628, 'learning_rate': 1.1356550121401033e-06, 'epoch': 0.85} +{'loss': 0.8934, 'grad_norm': 1.7553845643997192, 'learning_rate': 1.1327716175308934e-06, 'epoch': 0.85} +{'loss': 0.9811, 'grad_norm': 1.5919550657272339, 'learning_rate': 1.129891668217783e-06, 'epoch': 0.85} +{'loss': 0.8313, 'grad_norm': 0.8372297883033752, 'learning_rate': 1.1270151653197569e-06, 'epoch': 0.85} +{'loss': 0.9459, 'grad_norm': 1.6414622068405151, 'learning_rate': 1.124142109954459e-06, 'epoch': 0.85} +{'loss': 0.9529, 'grad_norm': 1.5787591934204102, 'learning_rate': 1.1212725032381933e-06, 'epoch': 0.85} +{'loss': 1.0215, 'grad_norm': 1.7654955387115479, 'learning_rate': 1.1184063462859262e-06, 'epoch': 0.85} +{'loss': 0.8637, 'grad_norm': 1.5774303674697876, 'learning_rate': 1.1155436402112785e-06, 'epoch': 0.85} +{'loss': 0.9116, 'grad_norm': 1.9222971200942993, 'learning_rate': 1.1126843861265347e-06, 'epoch': 0.85} +{'loss': 0.9055, 'grad_norm': 1.5580732822418213, 'learning_rate': 1.1098285851426372e-06, 'epoch': 0.85} +{'loss': 0.8942, 'grad_norm': 1.5990151166915894, 'learning_rate': 1.1069762383691862e-06, 'epoch': 0.85} +{'loss': 0.8641, 'grad_norm': 2.0196633338928223, 'learning_rate': 1.1041273469144399e-06, 'epoch': 0.85} +{'loss': 0.9466, 'grad_norm': 1.6549125909805298, 'learning_rate': 1.1012819118853147e-06, 'epoch': 0.85} +{'loss': 0.9179, 'grad_norm': 2.0572917461395264, 'learning_rate': 1.098439934387383e-06, 'epoch': 0.85} +{'loss': 0.9259, 'grad_norm': 2.320844888687134, 'learning_rate': 1.095601415524874e-06, 'epoch': 0.85} +{'loss': 0.8888, 'grad_norm': 1.6620333194732666, 'learning_rate': 1.092766356400673e-06, 'epoch': 0.85} +{'loss': 0.9659, 'grad_norm': 1.808110237121582, 'learning_rate': 1.0899347581163222e-06, 'epoch': 0.85} +{'loss': 0.9873, 'grad_norm': 1.797581434249878, 'learning_rate': 1.0871066217720173e-06, 'epoch': 0.85} +{'loss': 0.9049, 'grad_norm': 1.5478016138076782, 'learning_rate': 1.08428194846661e-06, 'epoch': 0.85} +{'loss': 0.9378, 'grad_norm': 1.7699617147445679, 'learning_rate': 1.0814607392976062e-06, 'epoch': 0.86} +{'loss': 0.8988, 'grad_norm': 1.8112612962722778, 'learning_rate': 1.0786429953611665e-06, 'epoch': 0.86} +{'loss': 0.996, 'grad_norm': 1.6674292087554932, 'learning_rate': 1.0758287177521032e-06, 'epoch': 0.86} +{'loss': 0.963, 'grad_norm': 1.7372832298278809, 'learning_rate': 1.073017907563887e-06, 'epoch': 0.86} +{'loss': 0.9793, 'grad_norm': 1.8815265893936157, 'learning_rate': 1.0702105658886318e-06, 'epoch': 0.86} +{'loss': 0.939, 'grad_norm': 1.7649062871932983, 'learning_rate': 1.0674066938171123e-06, 'epoch': 0.86} +{'loss': 0.9759, 'grad_norm': 1.5869271755218506, 'learning_rate': 1.0646062924387513e-06, 'epoch': 0.86} +{'loss': 0.9141, 'grad_norm': 1.8112019300460815, 'learning_rate': 1.061809362841626e-06, 'epoch': 0.86} +{'loss': 0.9309, 'grad_norm': 0.8680244088172913, 'learning_rate': 1.0590159061124606e-06, 'epoch': 0.86} +{'loss': 0.9951, 'grad_norm': 1.7235769033432007, 'learning_rate': 1.0562259233366334e-06, 'epoch': 0.86} +{'loss': 0.9351, 'grad_norm': 1.8003859519958496, 'learning_rate': 1.053439415598171e-06, 'epoch': 0.86} +{'loss': 0.9457, 'grad_norm': 1.8437814712524414, 'learning_rate': 1.0506563839797501e-06, 'epoch': 0.86} +{'loss': 0.8406, 'grad_norm': 1.9507439136505127, 'learning_rate': 1.0478768295626985e-06, 'epoch': 0.86} +{'loss': 0.957, 'grad_norm': 1.8357292413711548, 'learning_rate': 1.0451007534269908e-06, 'epoch': 0.86} +{'loss': 0.9884, 'grad_norm': 1.682597041130066, 'learning_rate': 1.04232815665125e-06, 'epoch': 0.86} +{'loss': 0.9107, 'grad_norm': 1.6805672645568848, 'learning_rate': 1.0395590403127487e-06, 'epoch': 0.86} +{'loss': 0.9128, 'grad_norm': 1.8189362287521362, 'learning_rate': 1.0367934054874074e-06, 'epoch': 0.86} +{'loss': 0.8995, 'grad_norm': 1.6221890449523926, 'learning_rate': 1.034031253249792e-06, 'epoch': 0.86} +{'loss': 0.9388, 'grad_norm': 1.6082977056503296, 'learning_rate': 1.0312725846731174e-06, 'epoch': 0.86} +{'loss': 0.944, 'grad_norm': 1.572615623474121, 'learning_rate': 1.0285174008292431e-06, 'epoch': 0.86} +{'loss': 0.9001, 'grad_norm': 1.5556401014328003, 'learning_rate': 1.025765702788677e-06, 'epoch': 0.86} +{'loss': 0.8928, 'grad_norm': 1.798750638961792, 'learning_rate': 1.0230174916205681e-06, 'epoch': 0.86} +{'loss': 0.9273, 'grad_norm': 1.809114694595337, 'learning_rate': 1.0202727683927149e-06, 'epoch': 0.86} +{'loss': 0.9201, 'grad_norm': 0.899472177028656, 'learning_rate': 1.0175315341715598e-06, 'epoch': 0.86} +{'loss': 0.9495, 'grad_norm': 1.824232578277588, 'learning_rate': 1.0147937900221882e-06, 'epoch': 0.86} +{'loss': 0.9502, 'grad_norm': 1.6713426113128662, 'learning_rate': 1.012059537008332e-06, 'epoch': 0.86} +{'loss': 0.9036, 'grad_norm': 1.7595478296279907, 'learning_rate': 1.009328776192363e-06, 'epoch': 0.86} +{'loss': 0.8601, 'grad_norm': 1.6076364517211914, 'learning_rate': 1.0066015086353e-06, 'epoch': 0.86} +{'loss': 0.8929, 'grad_norm': 1.7671749591827393, 'learning_rate': 1.003877735396801e-06, 'epoch': 0.86} +{'loss': 0.8674, 'grad_norm': 1.878847360610962, 'learning_rate': 1.0011574575351702e-06, 'epoch': 0.86} +{'loss': 0.8963, 'grad_norm': 0.8943829536437988, 'learning_rate': 9.984406761073496e-07, 'epoch': 0.86} +{'loss': 0.8551, 'grad_norm': 1.6800854206085205, 'learning_rate': 9.957273921689237e-07, 'epoch': 0.86} +{'loss': 0.9517, 'grad_norm': 1.7575600147247314, 'learning_rate': 9.930176067741216e-07, 'epoch': 0.86} +{'loss': 0.9339, 'grad_norm': 1.6423978805541992, 'learning_rate': 9.903113209758098e-07, 'epoch': 0.86} +{'loss': 0.8767, 'grad_norm': 1.7209537029266357, 'learning_rate': 9.87608535825494e-07, 'epoch': 0.86} +{'loss': 0.9224, 'grad_norm': 0.9240423440933228, 'learning_rate': 9.849092523733217e-07, 'epoch': 0.86} +{'loss': 0.8218, 'grad_norm': 1.9497654438018799, 'learning_rate': 9.822134716680832e-07, 'epoch': 0.86} +{'loss': 0.9407, 'grad_norm': 1.6518983840942383, 'learning_rate': 9.795211947571993e-07, 'epoch': 0.86} +{'loss': 0.9336, 'grad_norm': 1.715362310409546, 'learning_rate': 9.768324226867353e-07, 'epoch': 0.86} +{'loss': 0.9364, 'grad_norm': 2.010856866836548, 'learning_rate': 9.74147156501396e-07, 'epoch': 0.86} +{'loss': 0.8251, 'grad_norm': 1.780233383178711, 'learning_rate': 9.714653972445199e-07, 'epoch': 0.86} +{'loss': 0.9162, 'grad_norm': 1.7593193054199219, 'learning_rate': 9.687871459580845e-07, 'epoch': 0.86} +{'loss': 0.9424, 'grad_norm': 1.6943724155426025, 'learning_rate': 9.661124036827064e-07, 'epoch': 0.86} +{'loss': 0.8911, 'grad_norm': 1.59291410446167, 'learning_rate': 9.634411714576353e-07, 'epoch': 0.86} +{'loss': 0.9136, 'grad_norm': 1.8310132026672363, 'learning_rate': 9.60773450320759e-07, 'epoch': 0.86} +{'loss': 0.8781, 'grad_norm': 1.770247459411621, 'learning_rate': 9.58109241308599e-07, 'epoch': 0.86} +{'loss': 0.8687, 'grad_norm': 1.717069149017334, 'learning_rate': 9.55448545456319e-07, 'epoch': 0.86} +{'loss': 0.9767, 'grad_norm': 1.7650303840637207, 'learning_rate': 9.527913637977093e-07, 'epoch': 0.86} +{'loss': 0.9659, 'grad_norm': 2.1053173542022705, 'learning_rate': 9.501376973651999e-07, 'epoch': 0.86} +{'loss': 0.9003, 'grad_norm': 1.7617216110229492, 'learning_rate': 9.474875471898526e-07, 'epoch': 0.86} +{'loss': 1.0688, 'grad_norm': 1.8802812099456787, 'learning_rate': 9.448409143013626e-07, 'epoch': 0.86} +{'loss': 0.9637, 'grad_norm': 1.665010929107666, 'learning_rate': 9.421977997280596e-07, 'epoch': 0.86} +{'loss': 0.871, 'grad_norm': 1.965577483177185, 'learning_rate': 9.395582044969076e-07, 'epoch': 0.87} +{'loss': 0.809, 'grad_norm': 1.7655526399612427, 'learning_rate': 9.369221296335007e-07, 'epoch': 0.87} +{'loss': 0.9843, 'grad_norm': 0.9457436203956604, 'learning_rate': 9.342895761620663e-07, 'epoch': 0.87} +{'loss': 0.9675, 'grad_norm': 1.8821799755096436, 'learning_rate': 9.31660545105465e-07, 'epoch': 0.87} +{'loss': 0.8678, 'grad_norm': 1.6772260665893555, 'learning_rate': 9.290350374851865e-07, 'epoch': 0.87} +{'loss': 0.947, 'grad_norm': 1.7377560138702393, 'learning_rate': 9.264130543213512e-07, 'epoch': 0.87} +{'loss': 0.9132, 'grad_norm': 1.548792839050293, 'learning_rate': 9.237945966327133e-07, 'epoch': 0.87} +{'loss': 0.9811, 'grad_norm': 1.6864455938339233, 'learning_rate': 9.211796654366545e-07, 'epoch': 0.87} +{'loss': 0.8339, 'grad_norm': 1.7881170511245728, 'learning_rate': 9.185682617491865e-07, 'epoch': 0.87} +{'loss': 0.9063, 'grad_norm': 1.8653318881988525, 'learning_rate': 9.15960386584952e-07, 'epoch': 0.87} +{'loss': 0.9171, 'grad_norm': 0.9954100847244263, 'learning_rate': 9.133560409572195e-07, 'epoch': 0.87} +{'loss': 0.897, 'grad_norm': 1.6978204250335693, 'learning_rate': 9.107552258778907e-07, 'epoch': 0.87} +{'loss': 0.9211, 'grad_norm': 1.6766725778579712, 'learning_rate': 9.081579423574916e-07, 'epoch': 0.87} +{'loss': 0.908, 'grad_norm': 1.8919655084609985, 'learning_rate': 9.055641914051783e-07, 'epoch': 0.87} +{'loss': 0.9173, 'grad_norm': 1.9327479600906372, 'learning_rate': 9.029739740287358e-07, 'epoch': 0.87} +{'loss': 0.9597, 'grad_norm': 1.6222783327102661, 'learning_rate': 9.00387291234569e-07, 'epoch': 0.87} +{'loss': 1.0014, 'grad_norm': 1.6995222568511963, 'learning_rate': 8.978041440277163e-07, 'epoch': 0.87} +{'loss': 0.8324, 'grad_norm': 2.417156934738159, 'learning_rate': 8.952245334118415e-07, 'epoch': 0.87} +{'loss': 0.8735, 'grad_norm': 1.7888003587722778, 'learning_rate': 8.926484603892327e-07, 'epoch': 0.87} +{'loss': 0.9334, 'grad_norm': 1.9681522846221924, 'learning_rate': 8.900759259608038e-07, 'epoch': 0.87} +{'loss': 0.9593, 'grad_norm': 0.9685776829719543, 'learning_rate': 8.875069311260942e-07, 'epoch': 0.87} +{'loss': 0.8925, 'grad_norm': 1.6864736080169678, 'learning_rate': 8.849414768832687e-07, 'epoch': 0.87} +{'loss': 0.9152, 'grad_norm': 1.6326510906219482, 'learning_rate': 8.823795642291145e-07, 'epoch': 0.87} +{'loss': 0.93, 'grad_norm': 1.6503970623016357, 'learning_rate': 8.79821194159044e-07, 'epoch': 0.87} +{'loss': 0.8682, 'grad_norm': 1.6019023656845093, 'learning_rate': 8.772663676670934e-07, 'epoch': 0.87} +{'loss': 0.9086, 'grad_norm': 1.6742379665374756, 'learning_rate': 8.747150857459219e-07, 'epoch': 0.87} +{'loss': 0.8774, 'grad_norm': 1.7541180849075317, 'learning_rate': 8.721673493868111e-07, 'epoch': 0.87} +{'loss': 0.946, 'grad_norm': 1.6651214361190796, 'learning_rate': 8.696231595796633e-07, 'epoch': 0.87} +{'loss': 0.907, 'grad_norm': 1.8287791013717651, 'learning_rate': 8.670825173130081e-07, 'epoch': 0.87} +{'loss': 0.929, 'grad_norm': 1.978616714477539, 'learning_rate': 8.645454235739903e-07, 'epoch': 0.87} +{'loss': 0.9137, 'grad_norm': 1.7962009906768799, 'learning_rate': 8.620118793483823e-07, 'epoch': 0.87} +{'loss': 0.9698, 'grad_norm': 1.768997311592102, 'learning_rate': 8.5948188562057e-07, 'epoch': 0.87} +{'loss': 0.8877, 'grad_norm': 1.7460027933120728, 'learning_rate': 8.56955443373565e-07, 'epoch': 0.87} +{'loss': 0.9021, 'grad_norm': 1.7546968460083008, 'learning_rate': 8.544325535889997e-07, 'epoch': 0.87} +{'loss': 0.8843, 'grad_norm': 1.7033549547195435, 'learning_rate': 8.519132172471212e-07, 'epoch': 0.87} +{'loss': 0.8902, 'grad_norm': 1.680238962173462, 'learning_rate': 8.493974353268019e-07, 'epoch': 0.87} +{'loss': 0.9099, 'grad_norm': 1.7112008333206177, 'learning_rate': 8.468852088055291e-07, 'epoch': 0.87} +{'loss': 0.985, 'grad_norm': 1.6180204153060913, 'learning_rate': 8.443765386594094e-07, 'epoch': 0.87} +{'loss': 0.8279, 'grad_norm': 1.7364057302474976, 'learning_rate': 8.418714258631689e-07, 'epoch': 0.87} +{'loss': 0.8845, 'grad_norm': 1.641308307647705, 'learning_rate': 8.393698713901499e-07, 'epoch': 0.87} +{'loss': 0.924, 'grad_norm': 0.8223841786384583, 'learning_rate': 8.36871876212314e-07, 'epoch': 0.87} +{'loss': 0.9319, 'grad_norm': 1.646851897239685, 'learning_rate': 8.343774413002382e-07, 'epoch': 0.87} +{'loss': 0.8897, 'grad_norm': 1.8828119039535522, 'learning_rate': 8.318865676231158e-07, 'epoch': 0.87} +{'loss': 0.9193, 'grad_norm': 1.880627989768982, 'learning_rate': 8.293992561487596e-07, 'epoch': 0.87} +{'loss': 0.904, 'grad_norm': 1.7291349172592163, 'learning_rate': 8.26915507843593e-07, 'epoch': 0.87} +{'loss': 0.9305, 'grad_norm': 2.4312212467193604, 'learning_rate': 8.24435323672661e-07, 'epoch': 0.87} +{'loss': 0.9386, 'grad_norm': 1.7694976329803467, 'learning_rate': 8.219587045996203e-07, 'epoch': 0.87} +{'loss': 0.8363, 'grad_norm': 1.9065922498703003, 'learning_rate': 8.194856515867433e-07, 'epoch': 0.87} +{'loss': 0.9275, 'grad_norm': 1.623293399810791, 'learning_rate': 8.170161655949139e-07, 'epoch': 0.87} +{'loss': 0.9967, 'grad_norm': 1.642863154411316, 'learning_rate': 8.145502475836331e-07, 'epoch': 0.87} +{'loss': 0.934, 'grad_norm': 1.745168924331665, 'learning_rate': 8.120878985110181e-07, 'epoch': 0.87} +{'loss': 0.9796, 'grad_norm': 1.8920714855194092, 'learning_rate': 8.096291193337935e-07, 'epoch': 0.87} +{'loss': 0.9633, 'grad_norm': 1.9377678632736206, 'learning_rate': 8.071739110073018e-07, 'epoch': 0.88} +{'loss': 0.9616, 'grad_norm': 1.8712385892868042, 'learning_rate': 8.047222744854943e-07, 'epoch': 0.88} +{'loss': 0.8705, 'grad_norm': 1.972001075744629, 'learning_rate': 8.022742107209369e-07, 'epoch': 0.88} +{'loss': 0.9578, 'grad_norm': 1.7095370292663574, 'learning_rate': 7.998297206648075e-07, 'epoch': 0.88} +{'loss': 0.8513, 'grad_norm': 1.6083623170852661, 'learning_rate': 7.973888052668943e-07, 'epoch': 0.88} +{'loss': 0.8984, 'grad_norm': 1.7789794206619263, 'learning_rate': 7.949514654755963e-07, 'epoch': 0.88} +{'loss': 0.9401, 'grad_norm': 1.7920817136764526, 'learning_rate': 7.925177022379249e-07, 'epoch': 0.88} +{'loss': 0.8554, 'grad_norm': 1.7534286975860596, 'learning_rate': 7.900875164995003e-07, 'epoch': 0.88} +{'loss': 0.9698, 'grad_norm': 1.97467041015625, 'learning_rate': 7.876609092045528e-07, 'epoch': 0.88} +{'loss': 0.8215, 'grad_norm': 3.238448143005371, 'learning_rate': 7.852378812959227e-07, 'epoch': 0.88} +{'loss': 0.8811, 'grad_norm': 0.946961522102356, 'learning_rate': 7.828184337150613e-07, 'epoch': 0.88} +{'loss': 0.8475, 'grad_norm': 1.5218231678009033, 'learning_rate': 7.804025674020266e-07, 'epoch': 0.88} +{'loss': 0.921, 'grad_norm': 1.7492682933807373, 'learning_rate': 7.779902832954833e-07, 'epoch': 0.88} +{'loss': 0.967, 'grad_norm': 1.8103415966033936, 'learning_rate': 7.755815823327084e-07, 'epoch': 0.88} +{'loss': 0.9454, 'grad_norm': 1.7757612466812134, 'learning_rate': 7.731764654495832e-07, 'epoch': 0.88} +{'loss': 0.8882, 'grad_norm': 1.7613611221313477, 'learning_rate': 7.707749335806014e-07, 'epoch': 0.88} +{'loss': 0.8332, 'grad_norm': 1.8285560607910156, 'learning_rate': 7.683769876588576e-07, 'epoch': 0.88} +{'loss': 0.8837, 'grad_norm': 1.9224072694778442, 'learning_rate': 7.659826286160565e-07, 'epoch': 0.88} +{'loss': 0.9368, 'grad_norm': 1.6285111904144287, 'learning_rate': 7.635918573825096e-07, 'epoch': 0.88} +{'loss': 0.9192, 'grad_norm': 1.7547639608383179, 'learning_rate': 7.612046748871327e-07, 'epoch': 0.88} +{'loss': 0.9591, 'grad_norm': 1.7078752517700195, 'learning_rate': 7.588210820574482e-07, 'epoch': 0.88} +{'loss': 0.9591, 'grad_norm': 1.8804042339324951, 'learning_rate': 7.564410798195832e-07, 'epoch': 0.88} +{'loss': 0.8947, 'grad_norm': 1.9545427560806274, 'learning_rate': 7.540646690982711e-07, 'epoch': 0.88} +{'loss': 0.9425, 'grad_norm': 1.8150368928909302, 'learning_rate': 7.516918508168469e-07, 'epoch': 0.88} +{'loss': 0.9377, 'grad_norm': 1.649187684059143, 'learning_rate': 7.493226258972519e-07, 'epoch': 0.88} +{'loss': 0.9453, 'grad_norm': 1.760579228401184, 'learning_rate': 7.46956995260033e-07, 'epoch': 0.88} +{'loss': 0.9226, 'grad_norm': 1.9679381847381592, 'learning_rate': 7.445949598243362e-07, 'epoch': 0.88} +{'loss': 0.976, 'grad_norm': 1.7522602081298828, 'learning_rate': 7.422365205079141e-07, 'epoch': 0.88} +{'loss': 0.9291, 'grad_norm': 1.6734532117843628, 'learning_rate': 7.398816782271223e-07, 'epoch': 0.88} +{'loss': 0.9094, 'grad_norm': 1.6494511365890503, 'learning_rate': 7.375304338969135e-07, 'epoch': 0.88} +{'loss': 0.9279, 'grad_norm': 1.8996403217315674, 'learning_rate': 7.351827884308493e-07, 'epoch': 0.88} +{'loss': 0.9234, 'grad_norm': 1.6665183305740356, 'learning_rate': 7.328387427410877e-07, 'epoch': 0.88} +{'loss': 0.8557, 'grad_norm': 1.6807657480239868, 'learning_rate': 7.304982977383935e-07, 'epoch': 0.88} +{'loss': 0.8437, 'grad_norm': 2.0911595821380615, 'learning_rate': 7.281614543321269e-07, 'epoch': 0.88} +{'loss': 0.8678, 'grad_norm': 1.8205580711364746, 'learning_rate': 7.258282134302519e-07, 'epoch': 0.88} +{'loss': 0.9842, 'grad_norm': 1.7864078283309937, 'learning_rate': 7.234985759393321e-07, 'epoch': 0.88} +{'loss': 0.9134, 'grad_norm': 1.7867627143859863, 'learning_rate': 7.211725427645311e-07, 'epoch': 0.88} +{'loss': 0.8677, 'grad_norm': 1.9057732820510864, 'learning_rate': 7.188501148096117e-07, 'epoch': 0.88} +{'loss': 0.9439, 'grad_norm': 1.5514980554580688, 'learning_rate': 7.165312929769352e-07, 'epoch': 0.88} +{'loss': 0.9219, 'grad_norm': 1.787164330482483, 'learning_rate': 7.142160781674645e-07, 'epoch': 0.88} +{'loss': 0.892, 'grad_norm': 1.799926519393921, 'learning_rate': 7.119044712807577e-07, 'epoch': 0.88} +{'loss': 0.9379, 'grad_norm': 1.9440863132476807, 'learning_rate': 7.095964732149741e-07, 'epoch': 0.88} +{'loss': 0.9031, 'grad_norm': 2.01857328414917, 'learning_rate': 7.072920848668674e-07, 'epoch': 0.88} +{'loss': 0.9475, 'grad_norm': 1.8090003728866577, 'learning_rate': 7.049913071317927e-07, 'epoch': 0.88} +{'loss': 0.8428, 'grad_norm': 1.8560371398925781, 'learning_rate': 7.026941409036991e-07, 'epoch': 0.88} +{'loss': 0.9201, 'grad_norm': 1.846830129623413, 'learning_rate': 7.004005870751341e-07, 'epoch': 0.88} +{'loss': 0.8708, 'grad_norm': 1.742576241493225, 'learning_rate': 6.981106465372389e-07, 'epoch': 0.88} +{'loss': 1.0001, 'grad_norm': 1.601747751235962, 'learning_rate': 6.958243201797554e-07, 'epoch': 0.88} +{'loss': 0.9818, 'grad_norm': 1.9314178228378296, 'learning_rate': 6.935416088910185e-07, 'epoch': 0.88} +{'loss': 0.9803, 'grad_norm': 0.9659202694892883, 'learning_rate': 6.912625135579587e-07, 'epoch': 0.88} +{'loss': 0.8706, 'grad_norm': 1.6690056324005127, 'learning_rate': 6.889870350661009e-07, 'epoch': 0.88} +{'loss': 0.8716, 'grad_norm': 1.8155843019485474, 'learning_rate': 6.867151742995659e-07, 'epoch': 0.88} +{'loss': 0.9118, 'grad_norm': 1.7352817058563232, 'learning_rate': 6.844469321410685e-07, 'epoch': 0.89} +{'loss': 0.906, 'grad_norm': 1.7169686555862427, 'learning_rate': 6.821823094719171e-07, 'epoch': 0.89} +{'loss': 0.9104, 'grad_norm': 1.771855354309082, 'learning_rate': 6.799213071720156e-07, 'epoch': 0.89} +{'loss': 0.9184, 'grad_norm': 1.7238482236862183, 'learning_rate': 6.776639261198581e-07, 'epoch': 0.89} +{'loss': 0.9395, 'grad_norm': 1.7848094701766968, 'learning_rate': 6.754101671925339e-07, 'epoch': 0.89} +{'loss': 0.9273, 'grad_norm': 0.9155448079109192, 'learning_rate': 6.731600312657238e-07, 'epoch': 0.89} +{'loss': 0.9158, 'grad_norm': 2.1516246795654297, 'learning_rate': 6.709135192137018e-07, 'epoch': 0.89} +{'loss': 0.7506, 'grad_norm': 1.7109227180480957, 'learning_rate': 6.68670631909335e-07, 'epoch': 0.89} +{'loss': 0.9931, 'grad_norm': 1.8024001121520996, 'learning_rate': 6.664313702240788e-07, 'epoch': 0.89} +{'loss': 0.8467, 'grad_norm': 1.559737205505371, 'learning_rate': 6.641957350279838e-07, 'epoch': 0.89} +{'loss': 0.9116, 'grad_norm': 1.9477784633636475, 'learning_rate': 6.619637271896873e-07, 'epoch': 0.89} +{'loss': 0.9372, 'grad_norm': 1.7992230653762817, 'learning_rate': 6.597353475764201e-07, 'epoch': 0.89} +{'loss': 0.8063, 'grad_norm': 2.300236940383911, 'learning_rate': 6.57510597054003e-07, 'epoch': 0.89} +{'loss': 0.9448, 'grad_norm': 0.9002736210823059, 'learning_rate': 6.552894764868456e-07, 'epoch': 0.89} +{'loss': 0.8491, 'grad_norm': 0.8136253356933594, 'learning_rate': 6.530719867379498e-07, 'epoch': 0.89} +{'loss': 0.9963, 'grad_norm': 1.9341810941696167, 'learning_rate': 6.508581286689031e-07, 'epoch': 0.89} +{'loss': 0.9761, 'grad_norm': 2.0188567638397217, 'learning_rate': 6.48647903139884e-07, 'epoch': 0.89} +{'loss': 0.9531, 'grad_norm': 1.5818471908569336, 'learning_rate': 6.464413110096601e-07, 'epoch': 0.89} +{'loss': 0.8182, 'grad_norm': 1.8132492303848267, 'learning_rate': 6.442383531355856e-07, 'epoch': 0.89} +{'loss': 0.8305, 'grad_norm': 1.619665503501892, 'learning_rate': 6.420390303736034e-07, 'epoch': 0.89} +{'loss': 1.0031, 'grad_norm': 1.919082522392273, 'learning_rate': 6.398433435782459e-07, 'epoch': 0.89} +{'loss': 0.8564, 'grad_norm': 1.6701529026031494, 'learning_rate': 6.37651293602628e-07, 'epoch': 0.89} +{'loss': 0.9739, 'grad_norm': 1.4992094039916992, 'learning_rate': 6.354628812984576e-07, 'epoch': 0.89} +{'loss': 0.8659, 'grad_norm': 1.6888341903686523, 'learning_rate': 6.332781075160244e-07, 'epoch': 0.89} +{'loss': 0.8955, 'grad_norm': 1.8748725652694702, 'learning_rate': 6.31096973104206e-07, 'epoch': 0.89} +{'loss': 0.8884, 'grad_norm': 1.7143971920013428, 'learning_rate': 6.28919478910468e-07, 'epoch': 0.89} +{'loss': 0.9435, 'grad_norm': 1.6898267269134521, 'learning_rate': 6.26745625780858e-07, 'epoch': 0.89} +{'loss': 0.876, 'grad_norm': 1.694114327430725, 'learning_rate': 6.245754145600091e-07, 'epoch': 0.89} +{'loss': 0.9518, 'grad_norm': 1.8522770404815674, 'learning_rate': 6.224088460911437e-07, 'epoch': 0.89} +{'loss': 0.9113, 'grad_norm': 0.9747973084449768, 'learning_rate': 6.202459212160638e-07, 'epoch': 0.89} +{'loss': 0.8555, 'grad_norm': 1.6128125190734863, 'learning_rate': 6.180866407751595e-07, 'epoch': 0.89} +{'loss': 0.8427, 'grad_norm': 0.8761985301971436, 'learning_rate': 6.15931005607403e-07, 'epoch': 0.89} +{'loss': 0.9198, 'grad_norm': 1.7442528009414673, 'learning_rate': 6.137790165503499e-07, 'epoch': 0.89} +{'loss': 0.8894, 'grad_norm': 1.7197226285934448, 'learning_rate': 6.116306744401391e-07, 'epoch': 0.89} +{'loss': 0.9948, 'grad_norm': 1.819201111793518, 'learning_rate': 6.09485980111495e-07, 'epoch': 0.89} +{'loss': 0.8711, 'grad_norm': 1.7525112628936768, 'learning_rate': 6.073449343977211e-07, 'epoch': 0.89} +{'loss': 0.9227, 'grad_norm': 1.7218610048294067, 'learning_rate': 6.052075381307054e-07, 'epoch': 0.89} +{'loss': 0.8994, 'grad_norm': 1.7466583251953125, 'learning_rate': 6.030737921409169e-07, 'epoch': 0.89} +{'loss': 0.8556, 'grad_norm': 1.8194990158081055, 'learning_rate': 6.009436972574079e-07, 'epoch': 0.89} +{'loss': 0.8966, 'grad_norm': 1.8914660215377808, 'learning_rate': 5.988172543078097e-07, 'epoch': 0.89} +{'loss': 0.9769, 'grad_norm': 1.6951923370361328, 'learning_rate': 5.966944641183369e-07, 'epoch': 0.89} +{'loss': 0.8891, 'grad_norm': 1.659889578819275, 'learning_rate': 5.945753275137844e-07, 'epoch': 0.89} +{'loss': 0.9166, 'grad_norm': 1.6990644931793213, 'learning_rate': 5.924598453175278e-07, 'epoch': 0.89} +{'loss': 1.0127, 'grad_norm': 1.8403247594833374, 'learning_rate': 5.903480183515197e-07, 'epoch': 0.89} +{'loss': 0.8539, 'grad_norm': 0.8651690483093262, 'learning_rate': 5.882398474362949e-07, 'epoch': 0.89} +{'loss': 0.9161, 'grad_norm': 0.9350734949111938, 'learning_rate': 5.861353333909692e-07, 'epoch': 0.89} +{'loss': 0.9505, 'grad_norm': 1.7657092809677124, 'learning_rate': 5.840344770332351e-07, 'epoch': 0.89} +{'loss': 0.8645, 'grad_norm': 1.9987502098083496, 'learning_rate': 5.819372791793654e-07, 'epoch': 0.89} +{'loss': 0.9997, 'grad_norm': 1.664176344871521, 'learning_rate': 5.798437406442104e-07, 'epoch': 0.89} +{'loss': 0.8582, 'grad_norm': 1.6742953062057495, 'learning_rate': 5.777538622412005e-07, 'epoch': 0.89} +{'loss': 0.9446, 'grad_norm': 1.6213897466659546, 'learning_rate': 5.756676447823406e-07, 'epoch': 0.89} +{'loss': 0.8426, 'grad_norm': 1.7622512578964233, 'learning_rate': 5.735850890782158e-07, 'epoch': 0.89} +{'loss': 0.8734, 'grad_norm': 1.7775484323501587, 'learning_rate': 5.715061959379875e-07, 'epoch': 0.9} +{'loss': 0.8961, 'grad_norm': 1.9662108421325684, 'learning_rate': 5.694309661693942e-07, 'epoch': 0.9} +{'loss': 0.8707, 'grad_norm': 1.971056342124939, 'learning_rate': 5.67359400578752e-07, 'epoch': 0.9} +{'loss': 0.8978, 'grad_norm': 1.9486395120620728, 'learning_rate': 5.652914999709535e-07, 'epoch': 0.9} +{'loss': 0.943, 'grad_norm': 1.6971734762191772, 'learning_rate': 5.632272651494631e-07, 'epoch': 0.9} +{'loss': 0.8609, 'grad_norm': 0.8335411548614502, 'learning_rate': 5.611666969163243e-07, 'epoch': 0.9} +{'loss': 0.7508, 'grad_norm': 1.8522565364837646, 'learning_rate': 5.591097960721581e-07, 'epoch': 0.9} +{'loss': 0.8509, 'grad_norm': 1.6077631711959839, 'learning_rate': 5.570565634161562e-07, 'epoch': 0.9} +{'loss': 0.899, 'grad_norm': 1.6479111909866333, 'learning_rate': 5.550069997460872e-07, 'epoch': 0.9} +{'loss': 0.9333, 'grad_norm': 1.6265314817428589, 'learning_rate': 5.529611058582951e-07, 'epoch': 0.9} +{'loss': 0.9123, 'grad_norm': 1.986045479774475, 'learning_rate': 5.509188825476964e-07, 'epoch': 0.9} +{'loss': 0.9118, 'grad_norm': 1.809822678565979, 'learning_rate': 5.488803306077806e-07, 'epoch': 0.9} +{'loss': 0.9169, 'grad_norm': 1.7588878870010376, 'learning_rate': 5.468454508306131e-07, 'epoch': 0.9} +{'loss': 0.9448, 'grad_norm': 1.7753154039382935, 'learning_rate': 5.448142440068316e-07, 'epoch': 0.9} +{'loss': 0.8604, 'grad_norm': 0.8628824353218079, 'learning_rate': 5.427867109256457e-07, 'epoch': 0.9} +{'loss': 0.8944, 'grad_norm': 1.7549244165420532, 'learning_rate': 5.407628523748398e-07, 'epoch': 0.9} +{'loss': 0.865, 'grad_norm': 1.7723881006240845, 'learning_rate': 5.387426691407682e-07, 'epoch': 0.9} +{'loss': 0.8609, 'grad_norm': 1.8569858074188232, 'learning_rate': 5.367261620083575e-07, 'epoch': 0.9} +{'loss': 0.8505, 'grad_norm': 1.6620985269546509, 'learning_rate': 5.347133317611075e-07, 'epoch': 0.9} +{'loss': 0.8625, 'grad_norm': 1.8167173862457275, 'learning_rate': 5.327041791810905e-07, 'epoch': 0.9} +{'loss': 0.8987, 'grad_norm': 1.7594622373580933, 'learning_rate': 5.306987050489442e-07, 'epoch': 0.9} +{'loss': 0.9248, 'grad_norm': 1.8659120798110962, 'learning_rate': 5.286969101438821e-07, 'epoch': 0.9} +{'loss': 0.8815, 'grad_norm': 1.7423425912857056, 'learning_rate': 5.266987952436875e-07, 'epoch': 0.9} +{'loss': 0.9493, 'grad_norm': 1.7986290454864502, 'learning_rate': 5.247043611247127e-07, 'epoch': 0.9} +{'loss': 0.9573, 'grad_norm': 2.055258274078369, 'learning_rate': 5.227136085618801e-07, 'epoch': 0.9} +{'loss': 0.8827, 'grad_norm': 1.7386811971664429, 'learning_rate': 5.207265383286831e-07, 'epoch': 0.9} +{'loss': 0.8693, 'grad_norm': 1.7026543617248535, 'learning_rate': 5.187431511971819e-07, 'epoch': 0.9} +{'loss': 0.8117, 'grad_norm': 1.9607104063034058, 'learning_rate': 5.167634479380068e-07, 'epoch': 0.9} +{'loss': 0.9175, 'grad_norm': 1.7401719093322754, 'learning_rate': 5.147874293203581e-07, 'epoch': 0.9} +{'loss': 0.8455, 'grad_norm': 1.5865559577941895, 'learning_rate': 5.128150961120026e-07, 'epoch': 0.9} +{'loss': 0.8708, 'grad_norm': 1.6492749452590942, 'learning_rate': 5.108464490792753e-07, 'epoch': 0.9} +{'loss': 0.8487, 'grad_norm': 1.856960415840149, 'learning_rate': 5.088814889870808e-07, 'epoch': 0.9} +{'loss': 0.8879, 'grad_norm': 1.7916319370269775, 'learning_rate': 5.069202165988874e-07, 'epoch': 0.9} +{'loss': 0.9788, 'grad_norm': 1.7260135412216187, 'learning_rate': 5.049626326767366e-07, 'epoch': 0.9} +{'loss': 0.9616, 'grad_norm': 1.660810947418213, 'learning_rate': 5.030087379812299e-07, 'epoch': 0.9} +{'loss': 0.9084, 'grad_norm': 1.7168588638305664, 'learning_rate': 5.010585332715401e-07, 'epoch': 0.9} +{'loss': 0.9523, 'grad_norm': 1.7697783708572388, 'learning_rate': 4.991120193054056e-07, 'epoch': 0.9} +{'loss': 0.904, 'grad_norm': 1.956488847732544, 'learning_rate': 4.97169196839129e-07, 'epoch': 0.9} +{'loss': 0.9192, 'grad_norm': 1.6730067729949951, 'learning_rate': 4.952300666275789e-07, 'epoch': 0.9} +{'loss': 0.8938, 'grad_norm': 1.7866907119750977, 'learning_rate': 4.932946294241903e-07, 'epoch': 0.9} +{'loss': 0.9791, 'grad_norm': 2.0164899826049805, 'learning_rate': 4.913628859809638e-07, 'epoch': 0.9} +{'loss': 0.9902, 'grad_norm': 3.0027291774749756, 'learning_rate': 4.894348370484648e-07, 'epoch': 0.9} +{'loss': 0.8784, 'grad_norm': 1.6987488269805908, 'learning_rate': 4.875104833758204e-07, 'epoch': 0.9} +{'loss': 0.9944, 'grad_norm': 1.764320731163025, 'learning_rate': 4.855898257107261e-07, 'epoch': 0.9} +{'loss': 1.0265, 'grad_norm': 1.8861677646636963, 'learning_rate': 4.836728647994371e-07, 'epoch': 0.9} +{'loss': 0.8962, 'grad_norm': 0.8907985091209412, 'learning_rate': 4.817596013867765e-07, 'epoch': 0.9} +{'loss': 0.9199, 'grad_norm': 1.7811397314071655, 'learning_rate': 4.798500362161273e-07, 'epoch': 0.9} +{'loss': 0.9478, 'grad_norm': 1.6720751523971558, 'learning_rate': 4.779441700294362e-07, 'epoch': 0.9} +{'loss': 0.9882, 'grad_norm': 1.8561614751815796, 'learning_rate': 4.7604200356721644e-07, 'epoch': 0.9} +{'loss': 0.8724, 'grad_norm': 0.8970755934715271, 'learning_rate': 4.7414353756853773e-07, 'epoch': 0.9} +{'loss': 0.907, 'grad_norm': 1.7764832973480225, 'learning_rate': 4.7224877277103673e-07, 'epoch': 0.9} +{'loss': 0.9104, 'grad_norm': 1.8301312923431396, 'learning_rate': 4.703577099109091e-07, 'epoch': 0.9} +{'loss': 0.8552, 'grad_norm': 1.6448180675506592, 'learning_rate': 4.6847034972291595e-07, 'epoch': 0.91} +{'loss': 0.8198, 'grad_norm': 1.6822906732559204, 'learning_rate': 4.6658669294037393e-07, 'epoch': 0.91} +{'loss': 0.9267, 'grad_norm': 1.585579752922058, 'learning_rate': 4.647067402951655e-07, 'epoch': 0.91} +{'loss': 0.9761, 'grad_norm': 1.765357255935669, 'learning_rate': 4.628304925177318e-07, 'epoch': 0.91} +{'loss': 0.942, 'grad_norm': 1.7246525287628174, 'learning_rate': 4.609579503370754e-07, 'epoch': 0.91} +{'loss': 0.858, 'grad_norm': 0.9045140147209167, 'learning_rate': 4.5908911448075746e-07, 'epoch': 0.91} +{'loss': 0.8554, 'grad_norm': 1.7995293140411377, 'learning_rate': 4.572239856749028e-07, 'epoch': 0.91} +{'loss': 0.9198, 'grad_norm': 1.5997288227081299, 'learning_rate': 4.553625646441928e-07, 'epoch': 0.91} +{'loss': 0.9032, 'grad_norm': 1.780782699584961, 'learning_rate': 4.535048521118668e-07, 'epoch': 0.91} +{'loss': 0.9124, 'grad_norm': 1.7443914413452148, 'learning_rate': 4.5165084879972844e-07, 'epoch': 0.91} +{'loss': 0.9287, 'grad_norm': 1.697629451751709, 'learning_rate': 4.498005554281337e-07, 'epoch': 0.91} +{'loss': 0.9062, 'grad_norm': 1.9009188413619995, 'learning_rate': 4.4795397271600314e-07, 'epoch': 0.91} +{'loss': 0.8864, 'grad_norm': 1.6660504341125488, 'learning_rate': 4.461111013808117e-07, 'epoch': 0.91} +{'loss': 0.938, 'grad_norm': 1.9300801753997803, 'learning_rate': 4.4427194213859216e-07, 'epoch': 0.91} +{'loss': 0.9074, 'grad_norm': 1.6352487802505493, 'learning_rate': 4.424364957039395e-07, 'epoch': 0.91} +{'loss': 0.9059, 'grad_norm': 1.9206503629684448, 'learning_rate': 4.4060476278999897e-07, 'epoch': 0.91} +{'loss': 0.9151, 'grad_norm': 1.6867069005966187, 'learning_rate': 4.3877674410848005e-07, 'epoch': 0.91} +{'loss': 0.9424, 'grad_norm': 1.7568961381912231, 'learning_rate': 4.3695244036964567e-07, 'epoch': 0.91} +{'loss': 0.9092, 'grad_norm': 1.8440759181976318, 'learning_rate': 4.351318522823134e-07, 'epoch': 0.91} +{'loss': 0.8817, 'grad_norm': 1.7675895690917969, 'learning_rate': 4.3331498055386057e-07, 'epoch': 0.91} +{'loss': 0.7584, 'grad_norm': 1.999017596244812, 'learning_rate': 4.3150182589021926e-07, 'epoch': 0.91} +{'loss': 0.944, 'grad_norm': 1.6449241638183594, 'learning_rate': 4.296923889958771e-07, 'epoch': 0.91} +{'loss': 0.8767, 'grad_norm': 1.7319000959396362, 'learning_rate': 4.278866705738782e-07, 'epoch': 0.91} +{'loss': 0.9122, 'grad_norm': 1.636411428451538, 'learning_rate': 4.2608467132581934e-07, 'epoch': 0.91} +{'loss': 0.8436, 'grad_norm': 1.721819281578064, 'learning_rate': 4.2428639195185585e-07, 'epoch': 0.91} +{'loss': 0.8742, 'grad_norm': 1.71918523311615, 'learning_rate': 4.224918331506955e-07, 'epoch': 0.91} +{'loss': 0.8101, 'grad_norm': 1.6895725727081299, 'learning_rate': 4.2070099561960064e-07, 'epoch': 0.91} +{'loss': 0.9361, 'grad_norm': 1.7954058647155762, 'learning_rate': 4.189138800543868e-07, 'epoch': 0.91} +{'loss': 0.8568, 'grad_norm': 1.7520251274108887, 'learning_rate': 4.171304871494264e-07, 'epoch': 0.91} +{'loss': 0.8498, 'grad_norm': 1.735666275024414, 'learning_rate': 4.1535081759764286e-07, 'epoch': 0.91} +{'loss': 0.9078, 'grad_norm': 1.7762409448623657, 'learning_rate': 4.1357487209051416e-07, 'epoch': 0.91} +{'loss': 0.8947, 'grad_norm': 1.682921290397644, 'learning_rate': 4.118026513180695e-07, 'epoch': 0.91} +{'loss': 0.9081, 'grad_norm': 0.9268428087234497, 'learning_rate': 4.1003415596889363e-07, 'epoch': 0.91} +{'loss': 0.9119, 'grad_norm': 2.248276472091675, 'learning_rate': 4.082693867301224e-07, 'epoch': 0.91} +{'loss': 0.9767, 'grad_norm': 1.8063682317733765, 'learning_rate': 4.065083442874418e-07, 'epoch': 0.91} +{'loss': 0.9265, 'grad_norm': 1.954755425453186, 'learning_rate': 4.0475102932509336e-07, 'epoch': 0.91} +{'loss': 0.8541, 'grad_norm': 1.611634373664856, 'learning_rate': 4.029974425258687e-07, 'epoch': 0.91} +{'loss': 0.9195, 'grad_norm': 2.083188772201538, 'learning_rate': 4.012475845711106e-07, 'epoch': 0.91} +{'loss': 0.9607, 'grad_norm': 1.8747501373291016, 'learning_rate': 3.99501456140714e-07, 'epoch': 0.91} +{'loss': 0.9504, 'grad_norm': 1.7273318767547607, 'learning_rate': 3.9775905791312407e-07, 'epoch': 0.91} +{'loss': 0.9061, 'grad_norm': 1.7551336288452148, 'learning_rate': 3.96020390565337e-07, 'epoch': 0.91} +{'loss': 0.8519, 'grad_norm': 2.0117383003234863, 'learning_rate': 3.9428545477289913e-07, 'epoch': 0.91} +{'loss': 0.9258, 'grad_norm': 1.9607936143875122, 'learning_rate': 3.9255425120990564e-07, 'epoch': 0.91} +{'loss': 0.8859, 'grad_norm': 1.6973448991775513, 'learning_rate': 3.908267805490051e-07, 'epoch': 0.91} +{'loss': 0.932, 'grad_norm': 1.661086916923523, 'learning_rate': 3.8910304346139185e-07, 'epoch': 0.91} +{'loss': 0.9402, 'grad_norm': 1.803715467453003, 'learning_rate': 3.8738304061681107e-07, 'epoch': 0.91} +{'loss': 0.9396, 'grad_norm': 1.7385083436965942, 'learning_rate': 3.8566677268355943e-07, 'epoch': 0.91} +{'loss': 0.8949, 'grad_norm': 1.8413794040679932, 'learning_rate': 3.8395424032847686e-07, 'epoch': 0.91} +{'loss': 0.8845, 'grad_norm': 1.7624759674072266, 'learning_rate': 3.8224544421695766e-07, 'epoch': 0.91} +{'loss': 0.9811, 'grad_norm': 1.773362398147583, 'learning_rate': 3.8054038501294077e-07, 'epoch': 0.91} +{'loss': 1.0414, 'grad_norm': 1.960214376449585, 'learning_rate': 3.788390633789163e-07, 'epoch': 0.91} +{'loss': 0.9155, 'grad_norm': 1.7714349031448364, 'learning_rate': 3.7714147997591655e-07, 'epoch': 0.91} +{'loss': 0.9664, 'grad_norm': 1.9158926010131836, 'learning_rate': 3.7544763546352834e-07, 'epoch': 0.92} +{'loss': 0.8579, 'grad_norm': 1.8404676914215088, 'learning_rate': 3.7375753049987974e-07, 'epoch': 0.92} +{'loss': 0.8735, 'grad_norm': 1.6016980409622192, 'learning_rate': 3.7207116574165094e-07, 'epoch': 0.92} +{'loss': 0.8861, 'grad_norm': 1.7089896202087402, 'learning_rate': 3.703885418440656e-07, 'epoch': 0.92} +{'loss': 0.905, 'grad_norm': 1.720229148864746, 'learning_rate': 3.687096594608941e-07, 'epoch': 0.92} +{'loss': 0.9178, 'grad_norm': 1.8409990072250366, 'learning_rate': 3.6703451924445467e-07, 'epoch': 0.92} +{'loss': 0.8821, 'grad_norm': 1.7311615943908691, 'learning_rate': 3.6536312184560996e-07, 'epoch': 0.92} +{'loss': 0.8656, 'grad_norm': 1.8107564449310303, 'learning_rate': 3.6369546791377054e-07, 'epoch': 0.92} +{'loss': 0.9416, 'grad_norm': 1.8011131286621094, 'learning_rate': 3.620315580968892e-07, 'epoch': 0.92} +{'loss': 0.8679, 'grad_norm': 1.6764616966247559, 'learning_rate': 3.603713930414676e-07, 'epoch': 0.92} +{'loss': 0.9183, 'grad_norm': 0.978177547454834, 'learning_rate': 3.587149733925488e-07, 'epoch': 0.92} +{'loss': 0.9553, 'grad_norm': 1.8134093284606934, 'learning_rate': 3.570622997937234e-07, 'epoch': 0.92} +{'loss': 0.9439, 'grad_norm': 1.7197810411453247, 'learning_rate': 3.5541337288712564e-07, 'epoch': 0.92} +{'loss': 0.8802, 'grad_norm': 1.727913498878479, 'learning_rate': 3.5376819331343404e-07, 'epoch': 0.92} +{'loss': 0.8852, 'grad_norm': 2.5710127353668213, 'learning_rate': 3.5212676171187065e-07, 'epoch': 0.92} +{'loss': 0.9426, 'grad_norm': 1.669385552406311, 'learning_rate': 3.50489078720202e-07, 'epoch': 0.92} +{'loss': 0.9061, 'grad_norm': 1.6277227401733398, 'learning_rate': 3.4885514497473574e-07, 'epoch': 0.92} +{'loss': 0.9025, 'grad_norm': 3.0289626121520996, 'learning_rate': 3.472249611103273e-07, 'epoch': 0.92} +{'loss': 0.8961, 'grad_norm': 1.766283392906189, 'learning_rate': 3.455985277603713e-07, 'epoch': 0.92} +{'loss': 0.8814, 'grad_norm': 1.7640736103057861, 'learning_rate': 3.4397584555680653e-07, 'epoch': 0.92} +{'loss': 0.8357, 'grad_norm': 1.674479365348816, 'learning_rate': 3.4235691513011425e-07, 'epoch': 0.92} +{'loss': 0.9338, 'grad_norm': 0.9121842384338379, 'learning_rate': 3.4074173710931804e-07, 'epoch': 0.92} +{'loss': 0.8454, 'grad_norm': 1.7225884199142456, 'learning_rate': 3.3913031212198356e-07, 'epoch': 0.92} +{'loss': 0.9444, 'grad_norm': 1.7789677381515503, 'learning_rate': 3.375226407942178e-07, 'epoch': 0.92} +{'loss': 0.9745, 'grad_norm': 1.629672646522522, 'learning_rate': 3.359187237506689e-07, 'epoch': 0.92} +{'loss': 0.8802, 'grad_norm': 1.8693183660507202, 'learning_rate': 3.3431856161452835e-07, 'epoch': 0.92} +{'loss': 0.88, 'grad_norm': 1.7442121505737305, 'learning_rate': 3.327221550075266e-07, 'epoch': 0.92} +{'loss': 0.8031, 'grad_norm': 1.8631259202957153, 'learning_rate': 3.3112950454993633e-07, 'epoch': 0.92} +{'loss': 0.9087, 'grad_norm': 1.817678689956665, 'learning_rate': 3.2954061086056924e-07, 'epoch': 0.92} +{'loss': 0.9518, 'grad_norm': 4.617339611053467, 'learning_rate': 3.2795547455677813e-07, 'epoch': 0.92} +{'loss': 0.8798, 'grad_norm': 1.8480446338653564, 'learning_rate': 3.26374096254457e-07, 'epoch': 0.92} +{'loss': 0.8393, 'grad_norm': 1.780389428138733, 'learning_rate': 3.247964765680389e-07, 'epoch': 0.92} +{'loss': 0.9066, 'grad_norm': 1.7349390983581543, 'learning_rate': 3.2322261611049457e-07, 'epoch': 0.92} +{'loss': 0.9196, 'grad_norm': 1.7600276470184326, 'learning_rate': 3.2165251549333585e-07, 'epoch': 0.92} +{'loss': 0.8915, 'grad_norm': 2.061828851699829, 'learning_rate': 3.20086175326616e-07, 'epoch': 0.92} +{'loss': 0.9053, 'grad_norm': 1.6906039714813232, 'learning_rate': 3.185235962189237e-07, 'epoch': 0.92} +{'loss': 0.9184, 'grad_norm': 1.89994215965271, 'learning_rate': 3.1696477877738664e-07, 'epoch': 0.92} +{'loss': 0.8956, 'grad_norm': 0.9526238441467285, 'learning_rate': 3.1540972360767254e-07, 'epoch': 0.92} +{'loss': 0.932, 'grad_norm': 1.7391424179077148, 'learning_rate': 3.1385843131398697e-07, 'epoch': 0.92} +{'loss': 0.8843, 'grad_norm': 1.61171293258667, 'learning_rate': 3.1231090249907336e-07, 'epoch': 0.92} +{'loss': 0.9201, 'grad_norm': 0.896507740020752, 'learning_rate': 3.107671377642119e-07, 'epoch': 0.92} +{'loss': 0.9763, 'grad_norm': 1.8281115293502808, 'learning_rate': 3.0922713770922155e-07, 'epoch': 0.92} +{'loss': 0.9072, 'grad_norm': 1.64071524143219, 'learning_rate': 3.076909029324571e-07, 'epoch': 0.92} +{'loss': 0.9643, 'grad_norm': 1.7085450887680054, 'learning_rate': 3.06158434030811e-07, 'epoch': 0.92} +{'loss': 0.855, 'grad_norm': 0.8372834920883179, 'learning_rate': 3.0462973159971375e-07, 'epoch': 0.92} +{'loss': 0.8612, 'grad_norm': 1.7778486013412476, 'learning_rate': 3.0310479623313125e-07, 'epoch': 0.92} +{'loss': 0.8685, 'grad_norm': 1.6668615341186523, 'learning_rate': 3.0158362852356627e-07, 'epoch': 0.92} +{'loss': 0.9033, 'grad_norm': 1.7300249338150024, 'learning_rate': 3.0006622906205707e-07, 'epoch': 0.92} +{'loss': 0.8803, 'grad_norm': 1.9477571249008179, 'learning_rate': 2.985525984381754e-07, 'epoch': 0.92} +{'loss': 0.9389, 'grad_norm': 1.9010552167892456, 'learning_rate': 2.970427372400353e-07, 'epoch': 0.92} +{'loss': 0.8489, 'grad_norm': 1.6804304122924805, 'learning_rate': 2.955366460542797e-07, 'epoch': 0.92} +{'loss': 0.9514, 'grad_norm': 1.746863603591919, 'learning_rate': 2.940343254660905e-07, 'epoch': 0.92} +{'loss': 0.9569, 'grad_norm': 1.7444335222244263, 'learning_rate': 2.925357760591818e-07, 'epoch': 0.93} +{'loss': 0.918, 'grad_norm': 1.6852445602416992, 'learning_rate': 2.910409984158058e-07, 'epoch': 0.93} +{'loss': 0.9549, 'grad_norm': 1.746703028678894, 'learning_rate': 2.8954999311674557e-07, 'epoch': 0.93} +{'loss': 0.8868, 'grad_norm': 1.695944905281067, 'learning_rate': 2.880627607413211e-07, 'epoch': 0.93} +{'loss': 0.8989, 'grad_norm': 1.7166118621826172, 'learning_rate': 2.865793018673857e-07, 'epoch': 0.93} +{'loss': 0.9602, 'grad_norm': 1.7035415172576904, 'learning_rate': 2.8509961707132496e-07, 'epoch': 0.93} +{'loss': 0.921, 'grad_norm': 1.8007614612579346, 'learning_rate': 2.8362370692806117e-07, 'epoch': 0.93} +{'loss': 1.0102, 'grad_norm': 1.7436736822128296, 'learning_rate': 2.821515720110479e-07, 'epoch': 0.93} +{'loss': 0.964, 'grad_norm': 1.7669144868850708, 'learning_rate': 2.806832128922732e-07, 'epoch': 0.93} +{'loss': 0.9104, 'grad_norm': 1.6212629079818726, 'learning_rate': 2.7921863014225504e-07, 'epoch': 0.93} +{'loss': 0.9321, 'grad_norm': 1.840636968612671, 'learning_rate': 2.7775782433004714e-07, 'epoch': 0.93} +{'loss': 0.8954, 'grad_norm': 1.9070498943328857, 'learning_rate': 2.7630079602323447e-07, 'epoch': 0.93} +{'loss': 0.9066, 'grad_norm': 1.6181663274765015, 'learning_rate': 2.7484754578793513e-07, 'epoch': 0.93} +{'loss': 0.8834, 'grad_norm': 1.750315546989441, 'learning_rate': 2.733980741887987e-07, 'epoch': 0.93} +{'loss': 0.84, 'grad_norm': 1.7466723918914795, 'learning_rate': 2.7195238178900685e-07, 'epoch': 0.93} +{'loss': 0.9528, 'grad_norm': 1.867712140083313, 'learning_rate': 2.7051046915027244e-07, 'epoch': 0.93} +{'loss': 0.9354, 'grad_norm': 1.7667862176895142, 'learning_rate': 2.690723368328407e-07, 'epoch': 0.93} +{'loss': 0.9211, 'grad_norm': 1.8769786357879639, 'learning_rate': 2.676379853954858e-07, 'epoch': 0.93} +{'loss': 0.9026, 'grad_norm': 2.0662314891815186, 'learning_rate': 2.662074153955152e-07, 'epoch': 0.93} +{'loss': 0.905, 'grad_norm': 1.8869236707687378, 'learning_rate': 2.6478062738876654e-07, 'epoch': 0.93} +{'loss': 0.9462, 'grad_norm': 1.676169514656067, 'learning_rate': 2.6335762192960743e-07, 'epoch': 0.93} +{'loss': 0.8911, 'grad_norm': 1.7572128772735596, 'learning_rate': 2.6193839957093683e-07, 'epoch': 0.93} +{'loss': 0.9207, 'grad_norm': 1.6695942878723145, 'learning_rate': 2.605229608641824e-07, 'epoch': 0.93} +{'loss': 0.9339, 'grad_norm': 1.7740347385406494, 'learning_rate': 2.5911130635930203e-07, 'epoch': 0.93} +{'loss': 0.8779, 'grad_norm': 1.9497209787368774, 'learning_rate': 2.577034366047848e-07, 'epoch': 0.93} +{'loss': 0.8584, 'grad_norm': 1.641097068786621, 'learning_rate': 2.5629935214764866e-07, 'epoch': 0.93} +{'loss': 0.9112, 'grad_norm': 1.663203477859497, 'learning_rate': 2.5489905353343725e-07, 'epoch': 0.93} +{'loss': 0.9307, 'grad_norm': 2.0471978187561035, 'learning_rate': 2.5350254130622866e-07, 'epoch': 0.93} +{'loss': 0.8738, 'grad_norm': 1.7074921131134033, 'learning_rate': 2.5210981600862663e-07, 'epoch': 0.93} +{'loss': 0.9313, 'grad_norm': 1.8430519104003906, 'learning_rate': 2.507208781817638e-07, 'epoch': 0.93} +{'loss': 0.8517, 'grad_norm': 1.9072238206863403, 'learning_rate': 2.4933572836530307e-07, 'epoch': 0.93} +{'loss': 0.8882, 'grad_norm': 0.8866738080978394, 'learning_rate': 2.4795436709743384e-07, 'epoch': 0.93} +{'loss': 0.8945, 'grad_norm': 1.6284101009368896, 'learning_rate': 2.465767949148734e-07, 'epoch': 0.93} +{'loss': 0.924, 'grad_norm': 1.7144591808319092, 'learning_rate': 2.45203012352867e-07, 'epoch': 0.93} +{'loss': 0.9275, 'grad_norm': 1.811737060546875, 'learning_rate': 2.4383301994518773e-07, 'epoch': 0.93} +{'loss': 0.9204, 'grad_norm': 1.9508341550827026, 'learning_rate': 2.424668182241363e-07, 'epoch': 0.93} +{'loss': 0.8498, 'grad_norm': 1.7545024156570435, 'learning_rate': 2.4110440772054155e-07, 'epoch': 0.93} +{'loss': 0.9567, 'grad_norm': 1.7270668745040894, 'learning_rate': 2.3974578896375555e-07, 'epoch': 0.93} +{'loss': 0.9216, 'grad_norm': 1.9077086448669434, 'learning_rate': 2.383909624816616e-07, 'epoch': 0.93} +{'loss': 0.824, 'grad_norm': 1.998375654220581, 'learning_rate': 2.370399288006664e-07, 'epoch': 0.93} +{'loss': 0.9499, 'grad_norm': 1.7505697011947632, 'learning_rate': 2.3569268844570447e-07, 'epoch': 0.93} +{'loss': 0.9487, 'grad_norm': 1.5878850221633911, 'learning_rate': 2.3434924194023712e-07, 'epoch': 0.93} +{'loss': 0.8266, 'grad_norm': 2.0753414630889893, 'learning_rate': 2.3300958980624787e-07, 'epoch': 0.93} +{'loss': 0.8478, 'grad_norm': 0.8450593948364258, 'learning_rate': 2.3167373256425152e-07, 'epoch': 0.93} +{'loss': 0.982, 'grad_norm': 1.732397198677063, 'learning_rate': 2.3034167073328283e-07, 'epoch': 0.93} +{'loss': 0.8597, 'grad_norm': 1.7365858554840088, 'learning_rate': 2.2901340483090785e-07, 'epoch': 0.93} +{'loss': 0.8229, 'grad_norm': 1.6468778848648071, 'learning_rate': 2.2768893537321145e-07, 'epoch': 0.93} +{'loss': 0.8662, 'grad_norm': 1.6674633026123047, 'learning_rate': 2.2636826287480872e-07, 'epoch': 0.93} +{'loss': 0.8584, 'grad_norm': 1.779270887374878, 'learning_rate': 2.2505138784883585e-07, 'epoch': 0.93} +{'loss': 0.871, 'grad_norm': 1.791017770767212, 'learning_rate': 2.2373831080695463e-07, 'epoch': 0.93} +{'loss': 0.8347, 'grad_norm': 1.7366632223129272, 'learning_rate': 2.2242903225935254e-07, 'epoch': 0.93} +{'loss': 0.9054, 'grad_norm': 0.9157231450080872, 'learning_rate': 2.211235527147404e-07, 'epoch': 0.93} +{'loss': 0.8027, 'grad_norm': 1.7156238555908203, 'learning_rate': 2.1982187268035138e-07, 'epoch': 0.94} +{'loss': 0.8525, 'grad_norm': 1.6320236921310425, 'learning_rate': 2.1852399266194312e-07, 'epoch': 0.94} +{'loss': 0.9613, 'grad_norm': 1.75459885597229, 'learning_rate': 2.1722991316380005e-07, 'epoch': 0.94} +{'loss': 0.8778, 'grad_norm': 1.5030946731567383, 'learning_rate': 2.1593963468872435e-07, 'epoch': 0.94} +{'loss': 0.8966, 'grad_norm': 1.933998465538025, 'learning_rate': 2.1465315773804617e-07, 'epoch': 0.94} +{'loss': 0.9089, 'grad_norm': 1.7168859243392944, 'learning_rate': 2.1337048281161565e-07, 'epoch': 0.94} +{'loss': 0.9192, 'grad_norm': 1.7307664155960083, 'learning_rate': 2.1209161040780746e-07, 'epoch': 0.94} +{'loss': 0.9066, 'grad_norm': 1.5924339294433594, 'learning_rate': 2.1081654102351634e-07, 'epoch': 0.94} +{'loss': 0.9248, 'grad_norm': 1.8014802932739258, 'learning_rate': 2.0954527515416156e-07, 'epoch': 0.94} +{'loss': 0.9554, 'grad_norm': 1.7934247255325317, 'learning_rate': 2.082778132936858e-07, 'epoch': 0.94} +{'loss': 0.9073, 'grad_norm': 1.684612512588501, 'learning_rate': 2.0701415593455065e-07, 'epoch': 0.94} +{'loss': 0.8814, 'grad_norm': 1.6802936792373657, 'learning_rate': 2.057543035677412e-07, 'epoch': 0.94} +{'loss': 0.894, 'grad_norm': 1.835364580154419, 'learning_rate': 2.0449825668276246e-07, 'epoch': 0.94} +{'loss': 0.9386, 'grad_norm': 1.716550588607788, 'learning_rate': 2.0324601576764525e-07, 'epoch': 0.94} +{'loss': 0.9254, 'grad_norm': 1.8702348470687866, 'learning_rate': 2.0199758130893478e-07, 'epoch': 0.94} +{'loss': 0.9369, 'grad_norm': 1.9168022871017456, 'learning_rate': 2.0075295379170413e-07, 'epoch': 0.94} +{'loss': 0.9361, 'grad_norm': 1.8598748445510864, 'learning_rate': 1.9951213369954204e-07, 'epoch': 0.94} +{'loss': 0.9263, 'grad_norm': 1.6831399202346802, 'learning_rate': 1.9827512151456175e-07, 'epoch': 0.94} +{'loss': 0.9464, 'grad_norm': 1.615675687789917, 'learning_rate': 1.9704191771739322e-07, 'epoch': 0.94} +{'loss': 0.9298, 'grad_norm': 1.8768787384033203, 'learning_rate': 1.95812522787191e-07, 'epoch': 0.94} +{'loss': 0.9496, 'grad_norm': 1.781435489654541, 'learning_rate': 1.9458693720162513e-07, 'epoch': 0.94} +{'loss': 0.9068, 'grad_norm': 1.5981781482696533, 'learning_rate': 1.933651614368892e-07, 'epoch': 0.94} +{'loss': 0.8415, 'grad_norm': 1.7023570537567139, 'learning_rate': 1.921471959676957e-07, 'epoch': 0.94} +{'loss': 1.0121, 'grad_norm': 1.7648065090179443, 'learning_rate': 1.9093304126727497e-07, 'epoch': 0.94} +{'loss': 0.9405, 'grad_norm': 1.8315552473068237, 'learning_rate': 1.897226978073774e-07, 'epoch': 0.94} +{'loss': 0.8857, 'grad_norm': 1.643708348274231, 'learning_rate': 1.885161660582746e-07, 'epoch': 0.94} +{'loss': 0.8516, 'grad_norm': 1.6086310148239136, 'learning_rate': 1.8731344648875382e-07, 'epoch': 0.94} +{'loss': 0.9711, 'grad_norm': 1.6000818014144897, 'learning_rate': 1.8611453956612346e-07, 'epoch': 0.94} +{'loss': 0.9249, 'grad_norm': 1.7069274187088013, 'learning_rate': 1.8491944575621091e-07, 'epoch': 0.94} +{'loss': 0.8818, 'grad_norm': 0.8661252856254578, 'learning_rate': 1.8372816552336025e-07, 'epoch': 0.94} +{'loss': 0.993, 'grad_norm': 1.641238808631897, 'learning_rate': 1.8254069933043572e-07, 'epoch': 0.94} +{'loss': 0.9762, 'grad_norm': 1.6961842775344849, 'learning_rate': 1.81357047638816e-07, 'epoch': 0.94} +{'loss': 0.8638, 'grad_norm': 2.1548640727996826, 'learning_rate': 1.8017721090840324e-07, 'epoch': 0.94} +{'loss': 1.0076, 'grad_norm': 1.7392739057540894, 'learning_rate': 1.7900118959761181e-07, 'epoch': 0.94} +{'loss': 1.0342, 'grad_norm': 1.8730323314666748, 'learning_rate': 1.7782898416337735e-07, 'epoch': 0.94} +{'loss': 0.977, 'grad_norm': 1.7638781070709229, 'learning_rate': 1.7666059506115108e-07, 'epoch': 0.94} +{'loss': 0.9134, 'grad_norm': 1.772662878036499, 'learning_rate': 1.754960227449032e-07, 'epoch': 0.94} +{'loss': 0.8769, 'grad_norm': 1.7968339920043945, 'learning_rate': 1.7433526766711727e-07, 'epoch': 0.94} +{'loss': 0.8677, 'grad_norm': 1.8762168884277344, 'learning_rate': 1.7317833027879928e-07, 'epoch': 0.94} +{'loss': 0.8822, 'grad_norm': 1.6070098876953125, 'learning_rate': 1.7202521102946624e-07, 'epoch': 0.94} +{'loss': 0.9053, 'grad_norm': 1.8356244564056396, 'learning_rate': 1.7087591036715534e-07, 'epoch': 0.94} +{'loss': 0.9389, 'grad_norm': 1.9397823810577393, 'learning_rate': 1.6973042873841827e-07, 'epoch': 0.94} +{'loss': 0.9242, 'grad_norm': 1.8684617280960083, 'learning_rate': 1.6858876658832235e-07, 'epoch': 0.94} +{'loss': 0.9593, 'grad_norm': 1.7800335884094238, 'learning_rate': 1.6745092436045495e-07, 'epoch': 0.94} +{'loss': 1.0059, 'grad_norm': 1.5409257411956787, 'learning_rate': 1.6631690249691358e-07, 'epoch': 0.94} +{'loss': 0.9788, 'grad_norm': 1.845908284187317, 'learning_rate': 1.651867014383146e-07, 'epoch': 0.94} +{'loss': 0.8094, 'grad_norm': 0.8025622963905334, 'learning_rate': 1.6406032162379127e-07, 'epoch': 0.94} +{'loss': 1.0004, 'grad_norm': 1.6414554119110107, 'learning_rate': 1.629377634909868e-07, 'epoch': 0.94} +{'loss': 0.8644, 'grad_norm': 1.8747586011886597, 'learning_rate': 1.618190274760656e-07, 'epoch': 0.94} +{'loss': 0.8733, 'grad_norm': 1.6366591453552246, 'learning_rate': 1.6070411401370335e-07, 'epoch': 0.94} +{'loss': 0.9041, 'grad_norm': 1.7805119752883911, 'learning_rate': 1.5959302353709128e-07, 'epoch': 0.94} +{'loss': 0.9506, 'grad_norm': 1.8747243881225586, 'learning_rate': 1.584857564779363e-07, 'epoch': 0.94} +{'loss': 0.9043, 'grad_norm': 1.6787868738174438, 'learning_rate': 1.5738231326645758e-07, 'epoch': 0.95} +{'loss': 0.8782, 'grad_norm': 1.7745697498321533, 'learning_rate': 1.5628269433139e-07, 'epoch': 0.95} +{'loss': 0.8421, 'grad_norm': 1.7373838424682617, 'learning_rate': 1.5518690009998282e-07, 'epoch': 0.95} +{'loss': 0.9485, 'grad_norm': 1.6470859050750732, 'learning_rate': 1.5409493099799888e-07, 'epoch': 0.95} +{'loss': 0.9438, 'grad_norm': 1.6539729833602905, 'learning_rate': 1.530067874497143e-07, 'epoch': 0.95} +{'loss': 0.8901, 'grad_norm': 0.921311616897583, 'learning_rate': 1.519224698779198e-07, 'epoch': 0.95} +{'loss': 0.8797, 'grad_norm': 1.7064136266708374, 'learning_rate': 1.5084197870391836e-07, 'epoch': 0.95} +{'loss': 1.0208, 'grad_norm': 1.734123706817627, 'learning_rate': 1.497653143475275e-07, 'epoch': 0.95} +{'loss': 0.9503, 'grad_norm': 1.6997294425964355, 'learning_rate': 1.486924772270759e-07, 'epoch': 0.95} +{'loss': 0.9684, 'grad_norm': 2.161675453186035, 'learning_rate': 1.4762346775940794e-07, 'epoch': 0.95} +{'loss': 0.894, 'grad_norm': 1.7576509714126587, 'learning_rate': 1.465582863598791e-07, 'epoch': 0.95} +{'loss': 0.8733, 'grad_norm': 1.9255119562149048, 'learning_rate': 1.4549693344235726e-07, 'epoch': 0.95} +{'loss': 0.8689, 'grad_norm': 1.8250627517700195, 'learning_rate': 1.444394094192225e-07, 'epoch': 0.95} +{'loss': 0.9645, 'grad_norm': 1.773016095161438, 'learning_rate': 1.4338571470137063e-07, 'epoch': 0.95} +{'loss': 0.9009, 'grad_norm': 1.8170216083526611, 'learning_rate': 1.4233584969820414e-07, 'epoch': 0.95} +{'loss': 0.8958, 'grad_norm': 1.9065470695495605, 'learning_rate': 1.4128981481764115e-07, 'epoch': 0.95} +{'loss': 1.0048, 'grad_norm': 1.835839867591858, 'learning_rate': 1.4024761046611213e-07, 'epoch': 0.95} +{'loss': 0.9357, 'grad_norm': 1.8083370923995972, 'learning_rate': 1.3920923704855648e-07, 'epoch': 0.95} +{'loss': 0.9153, 'grad_norm': 1.7144523859024048, 'learning_rate': 1.3817469496842596e-07, 'epoch': 0.95} +{'loss': 0.8949, 'grad_norm': 1.6624090671539307, 'learning_rate': 1.3714398462768563e-07, 'epoch': 0.95} +{'loss': 0.8851, 'grad_norm': 1.5764845609664917, 'learning_rate': 1.3611710642681076e-07, 'epoch': 0.95} +{'loss': 0.8913, 'grad_norm': 1.6650302410125732, 'learning_rate': 1.350940607647866e-07, 'epoch': 0.95} +{'loss': 0.9492, 'grad_norm': 1.7398484945297241, 'learning_rate': 1.3407484803910854e-07, 'epoch': 0.95} +{'loss': 0.8264, 'grad_norm': 1.8557682037353516, 'learning_rate': 1.3305946864578645e-07, 'epoch': 0.95} +{'loss': 0.9428, 'grad_norm': 1.7488837242126465, 'learning_rate': 1.320479229793359e-07, 'epoch': 0.95} +{'loss': 1.0401, 'grad_norm': 1.8158282041549683, 'learning_rate': 1.3104021143278911e-07, 'epoch': 0.95} +{'loss': 0.9299, 'grad_norm': 1.5517100095748901, 'learning_rate': 1.3003633439768182e-07, 'epoch': 0.95} +{'loss': 0.9427, 'grad_norm': 1.7988089323043823, 'learning_rate': 1.2903629226406423e-07, 'epoch': 0.95} +{'loss': 0.9063, 'grad_norm': 1.7607849836349487, 'learning_rate': 1.280400854204944e-07, 'epoch': 0.95} +{'loss': 0.8995, 'grad_norm': 1.8951563835144043, 'learning_rate': 1.2704771425404382e-07, 'epoch': 0.95} +{'loss': 0.9245, 'grad_norm': 2.0246710777282715, 'learning_rate': 1.2605917915028743e-07, 'epoch': 0.95} +{'loss': 0.9747, 'grad_norm': 1.6407110691070557, 'learning_rate': 1.2507448049331571e-07, 'epoch': 0.95} +{'loss': 0.862, 'grad_norm': 0.9085986018180847, 'learning_rate': 1.24093618665726e-07, 'epoch': 0.95} +{'loss': 0.9051, 'grad_norm': 1.6935354471206665, 'learning_rate': 1.231165940486234e-07, 'epoch': 0.95} +{'loss': 0.9055, 'grad_norm': 1.7065576314926147, 'learning_rate': 1.221434070216243e-07, 'epoch': 0.95} +{'loss': 0.9092, 'grad_norm': 0.9075039625167847, 'learning_rate': 1.2117405796285286e-07, 'epoch': 0.95} +{'loss': 0.8796, 'grad_norm': 1.8683946132659912, 'learning_rate': 1.2020854724894448e-07, 'epoch': 0.95} +{'loss': 0.9876, 'grad_norm': 2.3253719806671143, 'learning_rate': 1.192468752550402e-07, 'epoch': 0.95} +{'loss': 0.9116, 'grad_norm': 1.7958213090896606, 'learning_rate': 1.1828904235478888e-07, 'epoch': 0.95} +{'loss': 0.8728, 'grad_norm': 1.6983802318572998, 'learning_rate': 1.1733504892035064e-07, 'epoch': 0.95} +{'loss': 0.9025, 'grad_norm': 1.7242679595947266, 'learning_rate': 1.1638489532239339e-07, 'epoch': 0.95} +{'loss': 0.9257, 'grad_norm': 1.5514590740203857, 'learning_rate': 1.1543858193009183e-07, 'epoch': 0.95} +{'loss': 0.9794, 'grad_norm': 1.8176358938217163, 'learning_rate': 1.1449610911112741e-07, 'epoch': 0.95} +{'loss': 0.9545, 'grad_norm': 1.9546122550964355, 'learning_rate': 1.1355747723169386e-07, 'epoch': 0.95} +{'loss': 0.9105, 'grad_norm': 1.679056167602539, 'learning_rate': 1.1262268665648724e-07, 'epoch': 0.95} +{'loss': 0.9096, 'grad_norm': 1.7261285781860352, 'learning_rate': 1.1169173774871478e-07, 'epoch': 0.95} +{'loss': 0.9135, 'grad_norm': 1.77592933177948, 'learning_rate': 1.1076463087008938e-07, 'epoch': 0.95} +{'loss': 0.8952, 'grad_norm': 1.841078519821167, 'learning_rate': 1.0984136638083176e-07, 'epoch': 0.95} +{'loss': 0.8959, 'grad_norm': 1.586127758026123, 'learning_rate': 1.0892194463966943e-07, 'epoch': 0.95} +{'loss': 0.9544, 'grad_norm': 1.582200050354004, 'learning_rate': 1.0800636600383662e-07, 'epoch': 0.95} +{'loss': 0.9265, 'grad_norm': 0.9542142152786255, 'learning_rate': 1.0709463082907545e-07, 'epoch': 0.95} +{'loss': 0.9044, 'grad_norm': 1.7990862131118774, 'learning_rate': 1.0618673946963365e-07, 'epoch': 0.95} +{'loss': 0.8602, 'grad_norm': 1.6617053747177124, 'learning_rate': 1.052826922782646e-07, 'epoch': 0.96} +{'loss': 0.9381, 'grad_norm': 2.0394582748413086, 'learning_rate': 1.0438248960623065e-07, 'epoch': 0.96} +{'loss': 0.9223, 'grad_norm': 1.9113985300064087, 'learning_rate': 1.0348613180329758e-07, 'epoch': 0.96} +{'loss': 0.9055, 'grad_norm': 1.6965852975845337, 'learning_rate': 1.0259361921774014e-07, 'epoch': 0.96} +{'loss': 0.8948, 'grad_norm': 1.6814028024673462, 'learning_rate': 1.0170495219633647e-07, 'epoch': 0.96} +{'loss': 0.8932, 'grad_norm': 1.6309480667114258, 'learning_rate': 1.0082013108437038e-07, 'epoch': 0.96} +{'loss': 0.9231, 'grad_norm': 1.8159620761871338, 'learning_rate': 9.993915622563578e-08, 'epoch': 0.96} +{'loss': 1.0031, 'grad_norm': 1.8377739191055298, 'learning_rate': 9.906202796242547e-08, 'epoch': 0.96} +{'loss': 0.875, 'grad_norm': 1.5393259525299072, 'learning_rate': 9.818874663554356e-08, 'epoch': 0.96} +{'loss': 0.9425, 'grad_norm': 1.7872649431228638, 'learning_rate': 9.731931258429638e-08, 'epoch': 0.96} +{'loss': 0.758, 'grad_norm': 1.7148319482803345, 'learning_rate': 9.645372614649595e-08, 'epoch': 0.96} +{'loss': 0.9475, 'grad_norm': 1.5794405937194824, 'learning_rate': 9.55919876584599e-08, 'epoch': 0.96} +{'loss': 0.8753, 'grad_norm': 1.7703711986541748, 'learning_rate': 9.473409745500928e-08, 'epoch': 0.96} +{'loss': 0.8689, 'grad_norm': 1.6674796342849731, 'learning_rate': 9.388005586947191e-08, 'epoch': 0.96} +{'loss': 0.8537, 'grad_norm': 1.8466814756393433, 'learning_rate': 9.302986323368012e-08, 'epoch': 0.96} +{'loss': 0.8179, 'grad_norm': 1.9694138765335083, 'learning_rate': 9.218351987796859e-08, 'epoch': 0.96} +{'loss': 0.8813, 'grad_norm': 0.923488438129425, 'learning_rate': 9.134102613117757e-08, 'epoch': 0.96} +{'loss': 0.8971, 'grad_norm': 1.8414655923843384, 'learning_rate': 9.0502382320653e-08, 'epoch': 0.96} +{'loss': 0.9105, 'grad_norm': 1.8651880025863647, 'learning_rate': 8.966758877224202e-08, 'epoch': 0.96} +{'loss': 0.8624, 'grad_norm': 0.8345015048980713, 'learning_rate': 8.88366458102996e-08, 'epoch': 0.96} +{'loss': 0.9038, 'grad_norm': 1.6514358520507812, 'learning_rate': 8.800955375768194e-08, 'epoch': 0.96} +{'loss': 0.9413, 'grad_norm': 1.947794795036316, 'learning_rate': 8.718631293574753e-08, 'epoch': 0.96} +{'loss': 0.9269, 'grad_norm': 1.7708135843276978, 'learning_rate': 8.636692366436383e-08, 'epoch': 0.96} +{'loss': 0.9177, 'grad_norm': 1.8726534843444824, 'learning_rate': 8.555138626189619e-08, 'epoch': 0.96} +{'loss': 0.9702, 'grad_norm': 1.6105023622512817, 'learning_rate': 8.473970104521556e-08, 'epoch': 0.96} +{'loss': 0.9987, 'grad_norm': 1.703658103942871, 'learning_rate': 8.393186832969746e-08, 'epoch': 0.96} +{'loss': 0.9935, 'grad_norm': 1.826187014579773, 'learning_rate': 8.31278884292186e-08, 'epoch': 0.96} +{'loss': 0.9923, 'grad_norm': 1.819858193397522, 'learning_rate': 8.232776165616019e-08, 'epoch': 0.96} +{'loss': 0.9219, 'grad_norm': 1.6652424335479736, 'learning_rate': 8.153148832140467e-08, 'epoch': 0.96} +{'loss': 0.9817, 'grad_norm': 1.5833219289779663, 'learning_rate': 8.07390687343379e-08, 'epoch': 0.96} +{'loss': 0.9481, 'grad_norm': 1.7400418519973755, 'learning_rate': 7.995050320285025e-08, 'epoch': 0.96} +{'loss': 0.9097, 'grad_norm': 1.819797396659851, 'learning_rate': 7.916579203333107e-08, 'epoch': 0.96} +{'loss': 0.9142, 'grad_norm': 1.8352128267288208, 'learning_rate': 7.838493553067539e-08, 'epoch': 0.96} +{'loss': 0.7894, 'grad_norm': 2.016019105911255, 'learning_rate': 7.760793399827937e-08, 'epoch': 0.96} +{'loss': 0.9654, 'grad_norm': 1.7040338516235352, 'learning_rate': 7.683478773804043e-08, 'epoch': 0.96} +{'loss': 0.8655, 'grad_norm': 1.7433111667633057, 'learning_rate': 7.606549705035937e-08, 'epoch': 0.96} +{'loss': 0.8357, 'grad_norm': 2.044908285140991, 'learning_rate': 7.53000622341371e-08, 'epoch': 0.96} +{'loss': 0.9005, 'grad_norm': 1.8219791650772095, 'learning_rate': 7.453848358678018e-08, 'epoch': 0.96} +{'loss': 0.9849, 'grad_norm': 1.7857296466827393, 'learning_rate': 7.378076140419188e-08, 'epoch': 0.96} +{'loss': 0.8763, 'grad_norm': 1.8978886604309082, 'learning_rate': 7.302689598078117e-08, 'epoch': 0.96} +{'loss': 0.8665, 'grad_norm': 1.6000418663024902, 'learning_rate': 7.227688760945595e-08, 'epoch': 0.96} +{'loss': 0.8812, 'grad_norm': 1.8286945819854736, 'learning_rate': 7.153073658162646e-08, 'epoch': 0.96} +{'loss': 0.9945, 'grad_norm': 1.717595100402832, 'learning_rate': 7.078844318720523e-08, 'epoch': 0.96} +{'loss': 0.918, 'grad_norm': 1.8635538816452026, 'learning_rate': 7.00500077146038e-08, 'epoch': 0.96} +{'loss': 0.9759, 'grad_norm': 1.6393096446990967, 'learning_rate': 6.931543045073708e-08, 'epoch': 0.96} +{'loss': 0.8798, 'grad_norm': 1.9613218307495117, 'learning_rate': 6.858471168101788e-08, 'epoch': 0.96} +{'loss': 0.9082, 'grad_norm': 1.8709243535995483, 'learning_rate': 6.785785168936244e-08, 'epoch': 0.96} +{'loss': 0.8572, 'grad_norm': 1.729085087776184, 'learning_rate': 6.713485075818815e-08, 'epoch': 0.96} +{'loss': 0.8501, 'grad_norm': 0.9213813543319702, 'learning_rate': 6.641570916840923e-08, 'epoch': 0.96} +{'loss': 0.873, 'grad_norm': 1.8068469762802124, 'learning_rate': 6.570042719944436e-08, 'epoch': 0.96} +{'loss': 0.8734, 'grad_norm': 1.8859580755233765, 'learning_rate': 6.498900512921013e-08, 'epoch': 0.96} +{'loss': 0.792, 'grad_norm': 1.8662978410720825, 'learning_rate': 6.428144323412544e-08, 'epoch': 0.96} +{'loss': 0.8476, 'grad_norm': 1.7196509838104248, 'learning_rate': 6.357774178910703e-08, 'epoch': 0.97} +{'loss': 0.9007, 'grad_norm': 1.8772344589233398, 'learning_rate': 6.287790106757396e-08, 'epoch': 0.97} +{'loss': 0.9023, 'grad_norm': 1.8025811910629272, 'learning_rate': 6.21819213414443e-08, 'epoch': 0.97} +{'loss': 0.9478, 'grad_norm': 1.7576590776443481, 'learning_rate': 6.148980288113504e-08, 'epoch': 0.97} +{'loss': 1.0009, 'grad_norm': 2.060868263244629, 'learning_rate': 6.080154595556331e-08, 'epoch': 0.97} +{'loss': 0.8271, 'grad_norm': 1.8334498405456543, 'learning_rate': 6.011715083214742e-08, 'epoch': 0.97} +{'loss': 0.9651, 'grad_norm': 1.694542407989502, 'learning_rate': 5.943661777680354e-08, 'epoch': 0.97} +{'loss': 1.0205, 'grad_norm': 1.645240068435669, 'learning_rate': 5.875994705394794e-08, 'epoch': 0.97} +{'loss': 0.9657, 'grad_norm': 1.8809397220611572, 'learning_rate': 5.808713892649587e-08, 'epoch': 0.97} +{'loss': 0.8707, 'grad_norm': 1.7664661407470703, 'learning_rate': 5.7418193655861545e-08, 'epoch': 0.97} +{'loss': 0.9022, 'grad_norm': 1.7392597198486328, 'learning_rate': 5.675311150195928e-08, 'epoch': 0.97} +{'loss': 0.9164, 'grad_norm': 1.5286766290664673, 'learning_rate': 5.609189272320237e-08, 'epoch': 0.97} +{'loss': 0.8173, 'grad_norm': 1.80027174949646, 'learning_rate': 5.5434537576500856e-08, 'epoch': 0.97} +{'loss': 0.9197, 'grad_norm': 1.6077117919921875, 'learning_rate': 5.4781046317267103e-08, 'epoch': 0.97} +{'loss': 0.926, 'grad_norm': 1.7430278062820435, 'learning_rate': 5.413141919940801e-08, 'epoch': 0.97} +{'loss': 0.9717, 'grad_norm': 2.166574239730835, 'learning_rate': 5.348565647533388e-08, 'epoch': 0.97} +{'loss': 0.8211, 'grad_norm': 1.7993981838226318, 'learning_rate': 5.284375839594958e-08, 'epoch': 0.97} +{'loss': 0.943, 'grad_norm': 1.594534993171692, 'learning_rate': 5.220572521066003e-08, 'epoch': 0.97} +{'loss': 0.9998, 'grad_norm': 1.8174077272415161, 'learning_rate': 5.1571557167369165e-08, 'epoch': 0.97} +{'loss': 0.9003, 'grad_norm': 1.9759341478347778, 'learning_rate': 5.094125451247656e-08, 'epoch': 0.97} +{'loss': 0.8997, 'grad_norm': 1.7663795948028564, 'learning_rate': 5.031481749088296e-08, 'epoch': 0.97} +{'loss': 0.8914, 'grad_norm': 1.8109006881713867, 'learning_rate': 4.9692246345985905e-08, 'epoch': 0.97} +{'loss': 0.902, 'grad_norm': 1.6059695482254028, 'learning_rate': 4.9073541319680785e-08, 'epoch': 0.97} +{'loss': 0.9476, 'grad_norm': 1.6451672315597534, 'learning_rate': 4.8458702652359744e-08, 'epoch': 0.97} +{'loss': 0.8941, 'grad_norm': 1.8206524848937988, 'learning_rate': 4.784773058291503e-08, 'epoch': 0.97} +{'loss': 0.9673, 'grad_norm': 1.7027432918548584, 'learning_rate': 4.7240625348735636e-08, 'epoch': 0.97} +{'loss': 0.9295, 'grad_norm': 1.7841529846191406, 'learning_rate': 4.663738718570621e-08, 'epoch': 0.97} +{'loss': 0.9286, 'grad_norm': 1.5803463459014893, 'learning_rate': 4.603801632821148e-08, 'epoch': 0.97} +{'loss': 0.9058, 'grad_norm': 1.7207813262939453, 'learning_rate': 4.544251300913405e-08, 'epoch': 0.97} +{'loss': 0.9106, 'grad_norm': 0.9200888276100159, 'learning_rate': 4.485087745984884e-08, 'epoch': 0.97} +{'loss': 0.9459, 'grad_norm': 1.556007742881775, 'learning_rate': 4.4263109910234194e-08, 'epoch': 0.97} +{'loss': 0.8614, 'grad_norm': 1.7532650232315063, 'learning_rate': 4.367921058866187e-08, 'epoch': 0.97} +{'loss': 0.8935, 'grad_norm': 1.857136607170105, 'learning_rate': 4.3099179722002614e-08, 'epoch': 0.97} +{'loss': 0.8426, 'grad_norm': 1.8851598501205444, 'learning_rate': 4.252301753562171e-08, 'epoch': 0.97} +{'loss': 0.9705, 'grad_norm': 1.6299688816070557, 'learning_rate': 4.195072425338342e-08, 'epoch': 0.97} +{'loss': 0.9204, 'grad_norm': 1.7968155145645142, 'learning_rate': 4.1382300097647655e-08, 'epoch': 0.97} +{'loss': 0.9112, 'grad_norm': 1.7433204650878906, 'learning_rate': 4.0817745289272184e-08, 'epoch': 0.97} +{'loss': 0.8816, 'grad_norm': 1.8956501483917236, 'learning_rate': 4.025706004760932e-08, 'epoch': 0.97} +{'loss': 0.9273, 'grad_norm': 1.6582077741622925, 'learning_rate': 3.970024459051147e-08, 'epoch': 0.97} +{'loss': 0.9392, 'grad_norm': 1.8125413656234741, 'learning_rate': 3.914729913432336e-08, 'epoch': 0.97} +{'loss': 0.9367, 'grad_norm': 1.846343994140625, 'learning_rate': 3.859822389388757e-08, 'epoch': 0.97} +{'loss': 0.934, 'grad_norm': 2.26226806640625, 'learning_rate': 3.805301908254455e-08, 'epoch': 0.97} +{'loss': 0.9554, 'grad_norm': 1.7763423919677734, 'learning_rate': 3.751168491212931e-08, 'epoch': 0.97} +{'loss': 0.9019, 'grad_norm': 1.8089783191680908, 'learning_rate': 3.697422159297248e-08, 'epoch': 0.97} +{'loss': 0.8845, 'grad_norm': 1.8912146091461182, 'learning_rate': 3.6440629333901466e-08, 'epoch': 0.97} +{'loss': 0.9222, 'grad_norm': 1.8100903034210205, 'learning_rate': 3.591090834224153e-08, 'epoch': 0.97} +{'loss': 0.9472, 'grad_norm': 1.6927605867385864, 'learning_rate': 3.538505882380916e-08, 'epoch': 0.97} +{'loss': 0.8793, 'grad_norm': 1.8636614084243774, 'learning_rate': 3.4863080982919794e-08, 'epoch': 0.97} +{'loss': 0.9274, 'grad_norm': 1.6745274066925049, 'learning_rate': 3.4344975022385654e-08, 'epoch': 0.97} +{'loss': 0.9494, 'grad_norm': 0.8924670219421387, 'learning_rate': 3.383074114351237e-08, 'epoch': 0.97} +{'loss': 0.8801, 'grad_norm': 1.8694889545440674, 'learning_rate': 3.3320379546101234e-08, 'epoch': 0.97} +{'loss': 0.8548, 'grad_norm': 2.0981969833374023, 'learning_rate': 3.281389042844918e-08, 'epoch': 0.97} +{'loss': 0.9901, 'grad_norm': 1.6610794067382812, 'learning_rate': 3.23112739873499e-08, 'epoch': 0.98} +{'loss': 0.8648, 'grad_norm': 1.59697425365448, 'learning_rate': 3.181253041809052e-08, 'epoch': 0.98} +{'loss': 1.012, 'grad_norm': 1.9724822044372559, 'learning_rate': 3.13176599144549e-08, 'epoch': 0.98} +{'loss': 0.9334, 'grad_norm': 1.643310308456421, 'learning_rate': 3.082666266872036e-08, 'epoch': 0.98} +{'loss': 0.8964, 'grad_norm': 0.897040069103241, 'learning_rate': 3.033953887166097e-08, 'epoch': 0.98} +{'loss': 0.8671, 'grad_norm': 0.8986654281616211, 'learning_rate': 2.9856288712544204e-08, 'epoch': 0.98} +{'loss': 0.8355, 'grad_norm': 1.9486429691314697, 'learning_rate': 2.9376912379134314e-08, 'epoch': 0.98} +{'loss': 0.8478, 'grad_norm': 1.5579249858856201, 'learning_rate': 2.8901410057688982e-08, 'epoch': 0.98} +{'loss': 0.882, 'grad_norm': 1.6764822006225586, 'learning_rate': 2.842978193296153e-08, 'epoch': 0.98} +{'loss': 0.7981, 'grad_norm': 1.6065285205841064, 'learning_rate': 2.796202818819871e-08, 'epoch': 0.98} +{'loss': 0.9323, 'grad_norm': 1.5346144437789917, 'learning_rate': 2.7498149005144025e-08, 'epoch': 0.98} +{'loss': 0.9077, 'grad_norm': 1.7720375061035156, 'learning_rate': 2.7038144564033307e-08, 'epoch': 0.98} +{'loss': 0.9276, 'grad_norm': 0.939842939376831, 'learning_rate': 2.658201504359803e-08, 'epoch': 0.98} +{'loss': 0.9324, 'grad_norm': 2.007723331451416, 'learning_rate': 2.6129760621063095e-08, 'epoch': 0.98} +{'loss': 0.9325, 'grad_norm': 1.7180511951446533, 'learning_rate': 2.5681381472151268e-08, 'epoch': 0.98} +{'loss': 0.9496, 'grad_norm': 1.6762229204177856, 'learning_rate': 2.5236877771074308e-08, 'epoch': 0.98} +{'loss': 0.9748, 'grad_norm': 1.741676688194275, 'learning_rate': 2.479624969054295e-08, 'epoch': 0.98} +{'loss': 0.9211, 'grad_norm': 1.6695733070373535, 'learning_rate': 2.4359497401758026e-08, 'epoch': 0.98} +{'loss': 0.9191, 'grad_norm': 1.7717890739440918, 'learning_rate': 2.3926621074417123e-08, 'epoch': 0.98} +{'loss': 0.9368, 'grad_norm': 1.7004142999649048, 'learning_rate': 2.349762087671126e-08, 'epoch': 0.98} +{'loss': 0.9327, 'grad_norm': 1.7372959852218628, 'learning_rate': 2.3072496975325986e-08, 'epoch': 0.98} +{'loss': 0.8907, 'grad_norm': 1.595108985900879, 'learning_rate': 2.265124953543918e-08, 'epoch': 0.98} +{'loss': 0.8629, 'grad_norm': 0.8623106479644775, 'learning_rate': 2.2233878720723245e-08, 'epoch': 0.98} +{'loss': 0.938, 'grad_norm': 1.7444875240325928, 'learning_rate': 2.182038469334513e-08, 'epoch': 0.98} +{'loss': 0.9541, 'grad_norm': 1.9056637287139893, 'learning_rate': 2.1410767613965212e-08, 'epoch': 0.98} +{'loss': 0.8599, 'grad_norm': 1.6589503288269043, 'learning_rate': 2.1005027641736176e-08, 'epoch': 0.98} +{'loss': 0.9083, 'grad_norm': 1.8093677759170532, 'learning_rate': 2.0603164934306364e-08, 'epoch': 0.98} +{'loss': 0.9028, 'grad_norm': 1.655653476715088, 'learning_rate': 2.020517964781532e-08, 'epoch': 0.98} +{'loss': 0.9751, 'grad_norm': 2.037905216217041, 'learning_rate': 1.9811071936897132e-08, 'epoch': 0.98} +{'loss': 0.8841, 'grad_norm': 1.7231525182724, 'learning_rate': 1.9420841954681525e-08, 'epoch': 0.98} +{'loss': 1.0628, 'grad_norm': 1.6695948839187622, 'learning_rate': 1.903448985278722e-08, 'epoch': 0.98} +{'loss': 0.9996, 'grad_norm': 1.8612843751907349, 'learning_rate': 1.86520157813308e-08, 'epoch': 0.98} +{'loss': 0.9909, 'grad_norm': 1.8157663345336914, 'learning_rate': 1.8273419888916733e-08, 'epoch': 0.98} +{'loss': 0.8884, 'grad_norm': 1.798865795135498, 'learning_rate': 1.7898702322648453e-08, 'epoch': 0.98} +{'loss': 0.8249, 'grad_norm': 1.6262913942337036, 'learning_rate': 1.7527863228118393e-08, 'epoch': 0.98} +{'loss': 0.8963, 'grad_norm': 1.842873454093933, 'learning_rate': 1.716090274941351e-08, 'epoch': 0.98} +{'loss': 0.8272, 'grad_norm': 1.8302541971206665, 'learning_rate': 1.6797821029113093e-08, 'epoch': 0.98} +{'loss': 0.8802, 'grad_norm': 1.6816071271896362, 'learning_rate': 1.6438618208290957e-08, 'epoch': 0.98} +{'loss': 0.8256, 'grad_norm': 1.722398281097412, 'learning_rate': 1.608329442651213e-08, 'epoch': 0.98} +{'loss': 0.9196, 'grad_norm': 1.8138577938079834, 'learning_rate': 1.5731849821833955e-08, 'epoch': 0.98} +{'loss': 0.9164, 'grad_norm': 1.8999924659729004, 'learning_rate': 1.5384284530810533e-08, 'epoch': 0.98} +{'loss': 0.8507, 'grad_norm': 1.6980832815170288, 'learning_rate': 1.5040598688482732e-08, 'epoch': 0.98} +{'loss': 0.9161, 'grad_norm': 1.6590511798858643, 'learning_rate': 1.4700792428388177e-08, 'epoch': 0.98} +{'loss': 0.9758, 'grad_norm': 1.815218210220337, 'learning_rate': 1.436486588255681e-08, 'epoch': 0.98} +{'loss': 0.9259, 'grad_norm': 1.7052456140518188, 'learning_rate': 1.4032819181509783e-08, 'epoch': 0.98} +{'loss': 0.9567, 'grad_norm': 1.9399404525756836, 'learning_rate': 1.370465245426167e-08, 'epoch': 0.98} +{'loss': 0.8789, 'grad_norm': 1.6699292659759521, 'learning_rate': 1.3380365828318254e-08, 'epoch': 0.98} +{'loss': 0.967, 'grad_norm': 1.7068564891815186, 'learning_rate': 1.3059959429679859e-08, 'epoch': 0.98} +{'loss': 0.8599, 'grad_norm': 1.55268132686615, 'learning_rate': 1.274343338283801e-08, 'epoch': 0.98} +{'loss': 0.8197, 'grad_norm': 0.7772056460380554, 'learning_rate': 1.2430787810776556e-08, 'epoch': 0.98} +{'loss': 0.8354, 'grad_norm': 1.6491152048110962, 'learning_rate': 1.2122022834970548e-08, 'epoch': 0.98} +{'loss': 0.9154, 'grad_norm': 1.7647578716278076, 'learning_rate': 1.1817138575389576e-08, 'epoch': 0.98} +{'loss': 0.9506, 'grad_norm': 1.7220553159713745, 'learning_rate': 1.1516135150493323e-08, 'epoch': 0.99} +{'loss': 0.8491, 'grad_norm': 1.5396755933761597, 'learning_rate': 1.1219012677234908e-08, 'epoch': 0.99} +{'loss': 0.9324, 'grad_norm': 1.7235908508300781, 'learning_rate': 1.0925771271058649e-08, 'epoch': 0.99} +{'loss': 0.9982, 'grad_norm': 1.9007925987243652, 'learning_rate': 1.0636411045902296e-08, 'epoch': 0.99} +{'loss': 0.8682, 'grad_norm': 1.621556043624878, 'learning_rate': 1.035093211419369e-08, 'epoch': 0.99} +{'loss': 0.8207, 'grad_norm': 1.7614240646362305, 'learning_rate': 1.0069334586854106e-08, 'epoch': 0.99} +{'loss': 0.9804, 'grad_norm': 1.6785147190093994, 'learning_rate': 9.791618573294914e-09, 'epoch': 0.99} +{'loss': 0.9094, 'grad_norm': 1.948093056678772, 'learning_rate': 9.517784181422018e-09, 'epoch': 0.99} +{'loss': 0.9756, 'grad_norm': 1.8337852954864502, 'learning_rate': 9.247831517632533e-09, 'epoch': 0.99} +{'loss': 0.9226, 'grad_norm': 1.8649026155471802, 'learning_rate': 8.981760686811448e-09, 'epoch': 0.99} +{'loss': 0.9377, 'grad_norm': 1.7597006559371948, 'learning_rate': 8.719571792341618e-09, 'epoch': 0.99} +{'loss': 0.9125, 'grad_norm': 1.7643260955810547, 'learning_rate': 8.461264936093783e-09, 'epoch': 0.99} +{'loss': 0.8924, 'grad_norm': 1.7881656885147095, 'learning_rate': 8.20684021843099e-09, 'epoch': 0.99} +{'loss': 0.8397, 'grad_norm': 1.6305475234985352, 'learning_rate': 7.956297738207496e-09, 'epoch': 0.99} +{'loss': 0.8755, 'grad_norm': 1.8011974096298218, 'learning_rate': 7.70963759277099e-09, 'epoch': 0.99} +{'loss': 0.9248, 'grad_norm': 1.776333212852478, 'learning_rate': 7.466859877958143e-09, 'epoch': 0.99} +{'loss': 0.9339, 'grad_norm': 1.6891127824783325, 'learning_rate': 7.227964688099054e-09, 'epoch': 0.99} +{'loss': 0.9193, 'grad_norm': 2.063249349594116, 'learning_rate': 6.992952116013918e-09, 'epoch': 0.99} +{'loss': 0.9881, 'grad_norm': 1.6560297012329102, 'learning_rate': 6.76182225301747e-09, 'epoch': 0.99} +{'loss': 1.0011, 'grad_norm': 1.8064961433410645, 'learning_rate': 6.534575188911208e-09, 'epoch': 0.99} +{'loss': 0.9147, 'grad_norm': 0.9157009124755859, 'learning_rate': 6.3112110119900596e-09, 'epoch': 0.99} +{'loss': 0.9453, 'grad_norm': 1.8693196773529053, 'learning_rate': 6.091729809042379e-09, 'epoch': 0.99} +{'loss': 0.7474, 'grad_norm': 1.9743990898132324, 'learning_rate': 5.876131665345508e-09, 'epoch': 0.99} +{'loss': 0.9497, 'grad_norm': 1.8233686685562134, 'learning_rate': 5.664416664666883e-09, 'epoch': 0.99} +{'loss': 0.8928, 'grad_norm': 1.6983250379562378, 'learning_rate': 5.4565848892695894e-09, 'epoch': 0.99} +{'loss': 0.8895, 'grad_norm': 1.665627360343933, 'learning_rate': 5.252636419902368e-09, 'epoch': 0.99} +{'loss': 0.9384, 'grad_norm': 1.867610216140747, 'learning_rate': 5.052571335808499e-09, 'epoch': 0.99} +{'loss': 0.8893, 'grad_norm': 1.780145287513733, 'learning_rate': 4.856389714723575e-09, 'epoch': 0.99} +{'loss': 0.9878, 'grad_norm': 1.6347787380218506, 'learning_rate': 4.6640916328710705e-09, 'epoch': 0.99} +{'loss': 0.9392, 'grad_norm': 1.6712716817855835, 'learning_rate': 4.475677164966774e-09, 'epoch': 0.99} +{'loss': 0.9233, 'grad_norm': 1.660496473312378, 'learning_rate': 4.2911463842176815e-09, 'epoch': 0.99} +{'loss': 1.0314, 'grad_norm': 1.7979397773742676, 'learning_rate': 4.110499362323106e-09, 'epoch': 0.99} +{'loss': 0.8756, 'grad_norm': 1.743967056274414, 'learning_rate': 3.933736169471347e-09, 'epoch': 0.99} +{'loss': 0.8715, 'grad_norm': 3.4190900325775146, 'learning_rate': 3.760856874341912e-09, 'epoch': 0.99} +{'loss': 0.815, 'grad_norm': 1.6083959341049194, 'learning_rate': 3.591861544106623e-09, 'epoch': 0.99} +{'loss': 0.9403, 'grad_norm': 1.6205967664718628, 'learning_rate': 3.4267502444274013e-09, 'epoch': 0.99} +{'loss': 0.8826, 'grad_norm': 1.5808223485946655, 'learning_rate': 3.2655230394562644e-09, 'epoch': 0.99} +{'loss': 0.8622, 'grad_norm': 1.7393923997879028, 'learning_rate': 3.1081799918375454e-09, 'epoch': 0.99} +{'loss': 0.8532, 'grad_norm': 1.721386194229126, 'learning_rate': 2.954721162704566e-09, 'epoch': 0.99} +{'loss': 0.9182, 'grad_norm': 1.7016346454620361, 'learning_rate': 2.805146611684073e-09, 'epoch': 0.99} +{'loss': 0.8765, 'grad_norm': 1.7341364622116089, 'learning_rate': 2.6594563968918018e-09, 'epoch': 0.99} +{'loss': 0.9515, 'grad_norm': 1.726991891860962, 'learning_rate': 2.5176505749346937e-09, 'epoch': 0.99} +{'loss': 0.9262, 'grad_norm': 1.6547901630401611, 'learning_rate': 2.379729200908676e-09, 'epoch': 0.99} +{'loss': 0.8586, 'grad_norm': 1.6144102811813354, 'learning_rate': 2.245692328404214e-09, 'epoch': 0.99} +{'loss': 0.9884, 'grad_norm': 2.197000741958618, 'learning_rate': 2.11554000949965e-09, 'epoch': 0.99} +{'loss': 0.9442, 'grad_norm': 1.6308985948562622, 'learning_rate': 1.9892722947645328e-09, 'epoch': 0.99} +{'loss': 0.8242, 'grad_norm': 1.7169991731643677, 'learning_rate': 1.8668892332596165e-09, 'epoch': 0.99} +{'loss': 0.8967, 'grad_norm': 1.7209423780441284, 'learning_rate': 1.7483908725357546e-09, 'epoch': 0.99} +{'loss': 0.9457, 'grad_norm': 1.963261365890503, 'learning_rate': 1.6337772586338952e-09, 'epoch': 0.99} +{'loss': 1.0188, 'grad_norm': 1.6533063650131226, 'learning_rate': 1.5230484360873043e-09, 'epoch': 0.99} +{'loss': 0.9114, 'grad_norm': 1.6715199947357178, 'learning_rate': 1.4162044479182347e-09, 'epoch': 0.99} +{'loss': 0.9162, 'grad_norm': 1.8019225597381592, 'learning_rate': 1.3132453356412556e-09, 'epoch': 0.99} +{'loss': 0.9682, 'grad_norm': 1.6687161922454834, 'learning_rate': 1.214171139258813e-09, 'epoch': 1.0} +{'loss': 0.9001, 'grad_norm': 1.6832813024520874, 'learning_rate': 1.1189818972656697e-09, 'epoch': 1.0} +{'loss': 0.9117, 'grad_norm': 1.723149061203003, 'learning_rate': 1.0276776466477955e-09, 'epoch': 1.0} +{'loss': 0.8788, 'grad_norm': 1.8294788599014282, 'learning_rate': 9.40258422880147e-10, 'epoch': 1.0} +{'loss': 0.9134, 'grad_norm': 1.888719916343689, 'learning_rate': 8.567242599299974e-10, 'epoch': 1.0} +{'loss': 0.8983, 'grad_norm': 1.6505712270736694, 'learning_rate': 7.770751902513862e-10, 'epoch': 1.0} +{'loss': 0.8625, 'grad_norm': 0.8461357355117798, 'learning_rate': 7.013112447940007e-10, 'epoch': 1.0} +{'loss': 0.8957, 'grad_norm': 1.8349761962890625, 'learning_rate': 6.294324529942942e-10, 'epoch': 1.0} +{'loss': 0.8202, 'grad_norm': 1.849012017250061, 'learning_rate': 5.614388427799266e-10, 'epoch': 1.0} +{'loss': 0.9345, 'grad_norm': 1.773707628250122, 'learning_rate': 4.973304405697654e-10, 'epoch': 1.0} +{'loss': 0.8964, 'grad_norm': 1.8741662502288818, 'learning_rate': 4.3710727127277417e-10, 'epoch': 1.0} +{'loss': 0.9381, 'grad_norm': 1.6532068252563477, 'learning_rate': 3.807693582869032e-10, 'epoch': 1.0} +{'loss': 0.8844, 'grad_norm': 1.7500007152557373, 'learning_rate': 3.283167235035301e-10, 'epoch': 1.0} +{'loss': 0.9269, 'grad_norm': 1.7772070169448853, 'learning_rate': 2.797493873019086e-10, 'epoch': 1.0} +{'loss': 0.969, 'grad_norm': 1.9817670583724976, 'learning_rate': 2.350673685536098e-10, 'epoch': 1.0} +{'loss': 0.9937, 'grad_norm': 1.5418437719345093, 'learning_rate': 1.9427068461808086e-10, 'epoch': 1.0} +{'loss': 0.9953, 'grad_norm': 1.738728404045105, 'learning_rate': 1.5735935134708613e-10, 'epoch': 1.0} +{'loss': 0.916, 'grad_norm': 1.6960806846618652, 'learning_rate': 1.2433338308137645e-10, 'epoch': 1.0} +{'loss': 0.9096, 'grad_norm': 1.7969945669174194, 'learning_rate': 9.519279265512993e-11, 'epoch': 1.0} +{'loss': 0.9867, 'grad_norm': 1.590344786643982, 'learning_rate': 6.99375913892908e-11, 'epoch': 1.0} +{'loss': 0.9045, 'grad_norm': 1.7562882900238037, 'learning_rate': 4.856778909601012e-11, 'epoch': 1.0} +{'loss': 0.9464, 'grad_norm': 1.7541764974594116, 'learning_rate': 3.108339407975613e-11, 'epoch': 1.0} +{'loss': 0.9319, 'grad_norm': 1.7172061204910278, 'learning_rate': 1.74844131328733e-11, 'epoch': 1.0} +{'loss': 0.8512, 'grad_norm': 1.7965987920761108, 'learning_rate': 7.770851538912993e-12, 'epoch': 1.0} +{'loss': 0.8708, 'grad_norm': 1.824587106704712, 'learning_rate': 1.9427130737437183e-12, 'epoch': 1.0} +{'loss': 0.9225, 'grad_norm': 1.07023024559021, 'learning_rate': 0.0, 'epoch': 1.0} +{'train_runtime': 11353.9896, 'train_samples_per_second': 58.575, 'train_steps_per_second': 0.458, 'train_loss': 0.9749902330562644, 'epoch': 1.0} +/checkpoint_save/ +============================================================ +reading files .... +params: dict_keys([]) +transfer config.json +transfer generation_config.json +saving vision_tower +/checkpoint_save/ +/checkpoint_save/ +/checkpoint_save/ +/checkpoint_save/ +/checkpoint_save/ +/checkpoint_save/ +VM-0-21-tencentos:108:5233 [3] NCCL INFO [Service thread] Connection closed by localRank 3 +VM-0-21-tencentos:107:5232 [2] NCCL INFO [Service thread] Connection closed by localRank 2 +/checkpoint_save/ +============================================================ +reading files .... +params: dict_keys(['connector._connector.0.bias', 'connector._connector.0.weight', 'connector._connector.2.bias', 'connector._connector.2.weight', 'language_model.lm_head.weight', 'language_model.model.embed_tokens.weight', 'language_model.model.layers.0.input_layernorm.weight', 'language_model.model.layers.0.mlp.down_proj.weight', 'language_model.model.layers.0.mlp.gate_proj.weight', 'language_model.model.layers.0.mlp.up_proj.weight', 'language_model.model.layers.0.post_attention_layernorm.weight', 'language_model.model.layers.0.self_attn.k_proj.bias', 'language_model.model.layers.0.self_attn.k_proj.weight', 'language_model.model.layers.0.self_attn.o_proj.weight', 'language_model.model.layers.0.self_attn.q_proj.bias', 'language_model.model.layers.0.self_attn.q_proj.weight', 'language_model.model.layers.0.self_attn.v_proj.bias', 'language_model.model.layers.0.self_attn.v_proj.weight', 'language_model.model.layers.1.input_layernorm.weight', 'language_model.model.layers.1.mlp.down_proj.weight', 'language_model.model.layers.1.mlp.gate_proj.weight', 'language_model.model.layers.1.mlp.up_proj.weight', 'language_model.model.layers.1.post_attention_layernorm.weight', 'language_model.model.layers.1.self_attn.k_proj.bias', 'language_model.model.layers.1.self_attn.k_proj.weight', 'language_model.model.layers.1.self_attn.o_proj.weight', 'language_model.model.layers.1.self_attn.q_proj.bias', 'language_model.model.layers.1.self_attn.q_proj.weight', 'language_model.model.layers.1.self_attn.v_proj.bias', 'language_model.model.layers.1.self_attn.v_proj.weight', 'language_model.model.layers.10.input_layernorm.weight', 'language_model.model.layers.10.mlp.down_proj.weight', 'language_model.model.layers.10.mlp.gate_proj.weight', 'language_model.model.layers.10.mlp.up_proj.weight', 'language_model.model.layers.10.post_attention_layernorm.weight', 'language_model.model.layers.10.self_attn.k_proj.bias', 'language_model.model.layers.10.self_attn.k_proj.weight', 'language_model.model.layers.10.self_attn.o_proj.weight', 'language_model.model.layers.10.self_attn.q_proj.bias', 'language_model.model.layers.10.self_attn.q_proj.weight', 'language_model.model.layers.10.self_attn.v_proj.bias', 'language_model.model.layers.10.self_attn.v_proj.weight', 'language_model.model.layers.11.input_layernorm.weight', 'language_model.model.layers.11.mlp.down_proj.weight', 'language_model.model.layers.11.mlp.gate_proj.weight', 'language_model.model.layers.11.mlp.up_proj.weight', 'language_model.model.layers.11.post_attention_layernorm.weight', 'language_model.model.layers.11.self_attn.k_proj.bias', 'language_model.model.layers.11.self_attn.k_proj.weight', 'language_model.model.layers.11.self_attn.o_proj.weight', 'language_model.model.layers.11.self_attn.q_proj.bias', 'language_model.model.layers.11.self_attn.q_proj.weight', 'language_model.model.layers.11.self_attn.v_proj.bias', 'language_model.model.layers.11.self_attn.v_proj.weight', 'language_model.model.layers.12.input_layernorm.weight', 'language_model.model.layers.12.mlp.down_proj.weight', 'language_model.model.layers.12.mlp.gate_proj.weight', 'language_model.model.layers.12.mlp.up_proj.weight', 'language_model.model.layers.12.post_attention_layernorm.weight', 'language_model.model.layers.12.self_attn.k_proj.bias', 'language_model.model.layers.12.self_attn.k_proj.weight', 'language_model.model.layers.12.self_attn.o_proj.weight', 'language_model.model.layers.12.self_attn.q_proj.bias', 'language_model.model.layers.12.self_attn.q_proj.weight', 'language_model.model.layers.12.self_attn.v_proj.bias', 'language_model.model.layers.12.self_attn.v_proj.weight', 'language_model.model.layers.13.input_layernorm.weight', 'language_model.model.layers.13.mlp.down_proj.weight', 'language_model.model.layers.13.mlp.gate_proj.weight', 'language_model.model.layers.13.mlp.up_proj.weight', 'language_model.model.layers.13.post_attention_layernorm.weight', 'language_model.model.layers.13.self_attn.k_proj.bias', 'language_model.model.layers.13.self_attn.k_proj.weight', 'language_model.model.layers.13.self_attn.o_proj.weight', 'language_model.model.layers.13.self_attn.q_proj.bias', 'language_model.model.layers.13.self_attn.q_proj.weight', 'language_model.model.layers.13.self_attn.v_proj.bias', 'language_model.model.layers.13.self_attn.v_proj.weight', 'language_model.model.layers.14.input_layernorm.weight', 'language_model.model.layers.14.mlp.down_proj.weight', 'language_model.model.layers.14.mlp.gate_proj.weight', 'language_model.model.layers.14.mlp.up_proj.weight', 'language_model.model.layers.14.post_attention_layernorm.weight', 'language_model.model.layers.14.self_attn.k_proj.bias', 'language_model.model.layers.14.self_attn.k_proj.weight', 'language_model.model.layers.14.self_attn.o_proj.weight', 'language_model.model.layers.14.self_attn.q_proj.bias', 'language_model.model.layers.14.self_attn.q_proj.weight', 'language_model.model.layers.14.self_attn.v_proj.bias', 'language_model.model.layers.14.self_attn.v_proj.weight', 'language_model.model.layers.15.input_layernorm.weight', 'language_model.model.layers.15.mlp.down_proj.weight', 'language_model.model.layers.15.mlp.gate_proj.weight', 'language_model.model.layers.15.mlp.up_proj.weight', 'language_model.model.layers.15.post_attention_layernorm.weight', 'language_model.model.layers.15.self_attn.k_proj.bias', 'language_model.model.layers.15.self_attn.k_proj.weight', 'language_model.model.layers.15.self_attn.o_proj.weight', 'language_model.model.layers.15.self_attn.q_proj.bias', 'language_model.model.layers.15.self_attn.q_proj.weight', 'language_model.model.layers.15.self_attn.v_proj.bias', 'language_model.model.layers.15.self_attn.v_proj.weight', 'language_model.model.layers.16.input_layernorm.weight', 'language_model.model.layers.16.mlp.down_proj.weight', 'language_model.model.layers.16.mlp.gate_proj.weight', 'language_model.model.layers.16.mlp.up_proj.weight', 'language_model.model.layers.16.post_attention_layernorm.weight', 'language_model.model.layers.16.self_attn.k_proj.bias', 'language_model.model.layers.16.self_attn.k_proj.weight', 'language_model.model.layers.16.self_attn.o_proj.weight', 'language_model.model.layers.16.self_attn.q_proj.bias', 'language_model.model.layers.16.self_attn.q_proj.weight', 'language_model.model.layers.16.self_attn.v_proj.bias', 'language_model.model.layers.16.self_attn.v_proj.weight', 'language_model.model.layers.17.input_layernorm.weight', 'language_model.model.layers.17.mlp.down_proj.weight', 'language_model.model.layers.17.mlp.gate_proj.weight', 'language_model.model.layers.17.mlp.up_proj.weight', 'language_model.model.layers.17.post_attention_layernorm.weight', 'language_model.model.layers.17.self_attn.k_proj.bias', 'language_model.model.layers.17.self_attn.k_proj.weight', 'language_model.model.layers.17.self_attn.o_proj.weight', 'language_model.model.layers.17.self_attn.q_proj.bias', 'language_model.model.layers.17.self_attn.q_proj.weight', 'language_model.model.layers.17.self_attn.v_proj.bias', 'language_model.model.layers.17.self_attn.v_proj.weight', 'language_model.model.layers.18.input_layernorm.weight', 'language_model.model.layers.18.mlp.down_proj.weight', 'language_model.model.layers.18.mlp.gate_proj.weight', 'language_model.model.layers.18.mlp.up_proj.weight', 'language_model.model.layers.18.post_attention_layernorm.weight', 'language_model.model.layers.18.self_attn.k_proj.bias', 'language_model.model.layers.18.self_attn.k_proj.weight', 'language_model.model.layers.18.self_attn.o_proj.weight', 'language_model.model.layers.18.self_attn.q_proj.bias', 'language_model.model.layers.18.self_attn.q_proj.weight', 'language_model.model.layers.18.self_attn.v_proj.bias', 'language_model.model.layers.18.self_attn.v_proj.weight', 'language_model.model.layers.19.input_layernorm.weight', 'language_model.model.layers.19.mlp.down_proj.weight', 'language_model.model.layers.19.mlp.gate_proj.weight', 'language_model.model.layers.19.mlp.up_proj.weight', 'language_model.model.layers.19.post_attention_layernorm.weight', 'language_model.model.layers.19.self_attn.k_proj.bias', 'language_model.model.layers.19.self_attn.k_proj.weight', 'language_model.model.layers.19.self_attn.o_proj.weight', 'language_model.model.layers.19.self_attn.q_proj.bias', 'language_model.model.layers.19.self_attn.q_proj.weight', 'language_model.model.layers.19.self_attn.v_proj.bias', 'language_model.model.layers.19.self_attn.v_proj.weight', 'language_model.model.layers.2.input_layernorm.weight', 'language_model.model.layers.2.mlp.down_proj.weight', 'language_model.model.layers.2.mlp.gate_proj.weight', 'language_model.model.layers.2.mlp.up_proj.weight', 'language_model.model.layers.2.post_attention_layernorm.weight', 'language_model.model.layers.2.self_attn.k_proj.bias', 'language_model.model.layers.2.self_attn.k_proj.weight', 'language_model.model.layers.2.self_attn.o_proj.weight', 'language_model.model.layers.2.self_attn.q_proj.bias', 'language_model.model.layers.2.self_attn.q_proj.weight', 'language_model.model.layers.2.self_attn.v_proj.bias', 'language_model.model.layers.2.self_attn.v_proj.weight', 'language_model.model.layers.20.input_layernorm.weight', 'language_model.model.layers.20.mlp.down_proj.weight', 'language_model.model.layers.20.mlp.gate_proj.weight', 'language_model.model.layers.20.mlp.up_proj.weight', 'language_model.model.layers.20.post_attention_layernorm.weight', 'language_model.model.layers.20.self_attn.k_proj.bias', 'language_model.model.layers.20.self_attn.k_proj.weight', 'language_model.model.layers.20.self_attn.o_proj.weight', 'language_model.model.layers.20.self_attn.q_proj.bias', 'language_model.model.layers.20.self_attn.q_proj.weight', 'language_model.model.layers.20.self_attn.v_proj.bias', 'language_model.model.layers.20.self_attn.v_proj.weight', 'language_model.model.layers.21.input_layernorm.weight', 'language_model.model.layers.21.mlp.down_proj.weight', 'language_model.model.layers.21.mlp.gate_proj.weight', 'language_model.model.layers.21.mlp.up_proj.weight', 'language_model.model.layers.21.post_attention_layernorm.weight', 'language_model.model.layers.21.self_attn.k_proj.bias', 'language_model.model.layers.21.self_attn.k_proj.weight', 'language_model.model.layers.21.self_attn.o_proj.weight', 'language_model.model.layers.21.self_attn.q_proj.bias', 'language_model.model.layers.21.self_attn.q_proj.weight', 'language_model.model.layers.21.self_attn.v_proj.bias', 'language_model.model.layers.21.self_attn.v_proj.weight', 'language_model.model.layers.22.input_layernorm.weight', 'language_model.model.layers.22.mlp.down_proj.weight', 'language_model.model.layers.22.mlp.gate_proj.weight', 'language_model.model.layers.22.mlp.up_proj.weight', 'language_model.model.layers.22.post_attention_layernorm.weight', 'language_model.model.layers.22.self_attn.k_proj.bias', 'language_model.model.layers.22.self_attn.k_proj.weight', 'language_model.model.layers.22.self_attn.o_proj.weight', 'language_model.model.layers.22.self_attn.q_proj.bias', 'language_model.model.layers.22.self_attn.q_proj.weight', 'language_model.model.layers.22.self_attn.v_proj.bias', 'language_model.model.layers.22.self_attn.v_proj.weight', 'language_model.model.layers.23.input_layernorm.weight', 'language_model.model.layers.23.mlp.down_proj.weight', 'language_model.model.layers.23.mlp.gate_proj.weight', 'language_model.model.layers.23.mlp.up_proj.weight', 'language_model.model.layers.23.post_attention_layernorm.weight', 'language_model.model.layers.23.self_attn.k_proj.bias', 'language_model.model.layers.23.self_attn.k_proj.weight', 'language_model.model.layers.23.self_attn.o_proj.weight', 'language_model.model.layers.23.self_attn.q_proj.bias', 'language_model.model.layers.23.self_attn.q_proj.weight', 'language_model.model.layers.23.self_attn.v_proj.bias', 'language_model.model.layers.23.self_attn.v_proj.weight', 'language_model.model.layers.24.input_layernorm.weight', 'language_model.model.layers.24.mlp.down_proj.weight', 'language_model.model.layers.24.mlp.gate_proj.weight', 'language_model.model.layers.24.mlp.up_proj.weight', 'language_model.model.layers.24.post_attention_layernorm.weight', 'language_model.model.layers.24.self_attn.k_proj.bias', 'language_model.model.layers.24.self_attn.k_proj.weight', 'language_model.model.layers.24.self_attn.o_proj.weight', 'language_model.model.layers.24.self_attn.q_proj.bias', 'language_model.model.layers.24.self_attn.q_proj.weight', 'language_model.model.layers.24.self_attn.v_proj.bias', 'language_model.model.layers.24.self_attn.v_proj.weight', 'language_model.model.layers.25.input_layernorm.weight', 'language_model.model.layers.25.mlp.down_proj.weight', 'language_model.model.layers.25.mlp.gate_proj.weight', 'language_model.model.layers.25.mlp.up_proj.weight', 'language_model.model.layers.25.post_attention_layernorm.weight', 'language_model.model.layers.25.self_attn.k_proj.bias', 'language_model.model.layers.25.self_attn.k_proj.weight', 'language_model.model.layers.25.self_attn.o_proj.weight', 'language_model.model.layers.25.self_attn.q_proj.bias', 'language_model.model.layers.25.self_attn.q_proj.weight', 'language_model.model.layers.25.self_attn.v_proj.bias', 'language_model.model.layers.25.self_attn.v_proj.weight', 'language_model.model.layers.26.input_layernorm.weight', 'language_model.model.layers.26.mlp.down_proj.weight', 'language_model.model.layers.26.mlp.gate_proj.weight', 'language_model.model.layers.26.mlp.up_proj.weight', 'language_model.model.layers.26.post_attention_layernorm.weight', 'language_model.model.layers.26.self_attn.k_proj.bias', 'language_model.model.layers.26.self_attn.k_proj.weight', 'language_model.model.layers.26.self_attn.o_proj.weight', 'language_model.model.layers.26.self_attn.q_proj.bias', 'language_model.model.layers.26.self_attn.q_proj.weight', 'language_model.model.layers.26.self_attn.v_proj.bias', 'language_model.model.layers.26.self_attn.v_proj.weight', 'language_model.model.layers.27.input_layernorm.weight', 'language_model.model.layers.27.mlp.down_proj.weight', 'language_model.model.layers.27.mlp.gate_proj.weight', 'language_model.model.layers.27.mlp.up_proj.weight', 'language_model.model.layers.27.post_attention_layernorm.weight', 'language_model.model.layers.27.self_attn.k_proj.bias', 'language_model.model.layers.27.self_attn.k_proj.weight', 'language_model.model.layers.27.self_attn.o_proj.weight', 'language_model.model.layers.27.self_attn.q_proj.bias', 'language_model.model.layers.27.self_attn.q_proj.weight', 'language_model.model.layers.27.self_attn.v_proj.bias', 'language_model.model.layers.27.self_attn.v_proj.weight', 'language_model.model.layers.3.input_layernorm.weight', 'language_model.model.layers.3.mlp.down_proj.weight', 'language_model.model.layers.3.mlp.gate_proj.weight', 'language_model.model.layers.3.mlp.up_proj.weight', 'language_model.model.layers.3.post_attention_layernorm.weight', 'language_model.model.layers.3.self_attn.k_proj.bias', 'language_model.model.layers.3.self_attn.k_proj.weight', 'language_model.model.layers.3.self_attn.o_proj.weight', 'language_model.model.layers.3.self_attn.q_proj.bias', 'language_model.model.layers.3.self_attn.q_proj.weight', 'language_model.model.layers.3.self_attn.v_proj.bias', 'language_model.model.layers.3.self_attn.v_proj.weight', 'language_model.model.layers.4.input_layernorm.weight', 'language_model.model.layers.4.mlp.down_proj.weight', 'language_model.model.layers.4.mlp.gate_proj.weight', 'language_model.model.layers.4.mlp.up_proj.weight', 'language_model.model.layers.4.post_attention_layernorm.weight', 'language_model.model.layers.4.self_attn.k_proj.bias', 'language_model.model.layers.4.self_attn.k_proj.weight', 'language_model.model.layers.4.self_attn.o_proj.weight', 'language_model.model.layers.4.self_attn.q_proj.bias', 'language_model.model.layers.4.self_attn.q_proj.weight', 'language_model.model.layers.4.self_attn.v_proj.bias', 'language_model.model.layers.4.self_attn.v_proj.weight', 'language_model.model.layers.5.input_layernorm.weight', 'language_model.model.layers.5.mlp.down_proj.weight', 'language_model.model.layers.5.mlp.gate_proj.weight', 'language_model.model.layers.5.mlp.up_proj.weight', 'language_model.model.layers.5.post_attention_layernorm.weight', 'language_model.model.layers.5.self_attn.k_proj.bias', 'language_model.model.layers.5.self_attn.k_proj.weight', 'language_model.model.layers.5.self_attn.o_proj.weight', 'language_model.model.layers.5.self_attn.q_proj.bias', 'language_model.model.layers.5.self_attn.q_proj.weight', 'language_model.model.layers.5.self_attn.v_proj.bias', 'language_model.model.layers.5.self_attn.v_proj.weight', 'language_model.model.layers.6.input_layernorm.weight', 'language_model.model.layers.6.mlp.down_proj.weight', 'language_model.model.layers.6.mlp.gate_proj.weight', 'language_model.model.layers.6.mlp.up_proj.weight', 'language_model.model.layers.6.post_attention_layernorm.weight', 'language_model.model.layers.6.self_attn.k_proj.bias', 'language_model.model.layers.6.self_attn.k_proj.weight', 'language_model.model.layers.6.self_attn.o_proj.weight', 'language_model.model.layers.6.self_attn.q_proj.bias', 'language_model.model.layers.6.self_attn.q_proj.weight', 'language_model.model.layers.6.self_attn.v_proj.bias', 'language_model.model.layers.6.self_attn.v_proj.weight', 'language_model.model.layers.7.input_layernorm.weight', 'language_model.model.layers.7.mlp.down_proj.weight', 'language_model.model.layers.7.mlp.gate_proj.weight', 'language_model.model.layers.7.mlp.up_proj.weight', 'language_model.model.layers.7.post_attention_layernorm.weight', 'language_model.model.layers.7.self_attn.k_proj.bias', 'language_model.model.layers.7.self_attn.k_proj.weight', 'language_model.model.layers.7.self_attn.o_proj.weight', 'language_model.model.layers.7.self_attn.q_proj.bias', 'language_model.model.layers.7.self_attn.q_proj.weight', 'language_model.model.layers.7.self_attn.v_proj.bias', 'language_model.model.layers.7.self_attn.v_proj.weight', 'language_model.model.layers.8.input_layernorm.weight', 'language_model.model.layers.8.mlp.down_proj.weight', 'language_model.model.layers.8.mlp.gate_proj.weight', 'language_model.model.layers.8.mlp.up_proj.weight', 'language_model.model.layers.8.post_attention_layernorm.weight', 'language_model.model.layers.8.self_attn.k_proj.bias', 'language_model.model.layers.8.self_attn.k_proj.weight', 'language_model.model.layers.8.self_attn.o_proj.weight', 'language_model.model.layers.8.self_attn.q_proj.bias', 'language_model.model.layers.8.self_attn.q_proj.weight', 'language_model.model.layers.8.self_attn.v_proj.bias', 'language_model.model.layers.8.self_attn.v_proj.weight', 'language_model.model.layers.9.input_layernorm.weight', 'language_model.model.layers.9.mlp.down_proj.weight', 'language_model.model.layers.9.mlp.gate_proj.weight', 'language_model.model.layers.9.mlp.up_proj.weight', 'language_model.model.layers.9.post_attention_layernorm.weight', 'language_model.model.layers.9.self_attn.k_proj.bias', 'language_model.model.layers.9.self_attn.k_proj.weight', 'language_model.model.layers.9.self_attn.o_proj.weight', 'language_model.model.layers.9.self_attn.q_proj.bias', 'language_model.model.layers.9.self_attn.q_proj.weight', 'language_model.model.layers.9.self_attn.v_proj.bias', 'language_model.model.layers.9.self_attn.v_proj.weight', 'language_model.model.norm.weight', 'vision_tower._vision_tower.vision_model.embeddings.patch_embedding.bias', 'vision_tower._vision_tower.vision_model.embeddings.patch_embedding.weight', 'vision_tower._vision_tower.vision_model.embeddings.position_embedding.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.0.layer_norm1.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.0.layer_norm1.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.0.layer_norm2.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.0.layer_norm2.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.0.mlp.fc1.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.0.mlp.fc1.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.0.mlp.fc2.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.0.mlp.fc2.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.0.self_attn.k_proj.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.0.self_attn.k_proj.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.0.self_attn.out_proj.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.0.self_attn.out_proj.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.0.self_attn.q_proj.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.0.self_attn.q_proj.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.0.self_attn.v_proj.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.0.self_attn.v_proj.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.1.layer_norm1.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.1.layer_norm1.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.1.layer_norm2.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.1.layer_norm2.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.1.mlp.fc1.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.1.mlp.fc1.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.1.mlp.fc2.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.1.mlp.fc2.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.1.self_attn.k_proj.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.1.self_attn.k_proj.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.1.self_attn.out_proj.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.1.self_attn.out_proj.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.1.self_attn.q_proj.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.1.self_attn.q_proj.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.1.self_attn.v_proj.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.1.self_attn.v_proj.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.10.layer_norm1.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.10.layer_norm1.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.10.layer_norm2.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.10.layer_norm2.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.10.mlp.fc1.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.10.mlp.fc1.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.10.mlp.fc2.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.10.mlp.fc2.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.10.self_attn.k_proj.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.10.self_attn.k_proj.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.10.self_attn.out_proj.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.10.self_attn.out_proj.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.10.self_attn.q_proj.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.10.self_attn.q_proj.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.10.self_attn.v_proj.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.10.self_attn.v_proj.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.11.layer_norm1.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.11.layer_norm1.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.11.layer_norm2.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.11.layer_norm2.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.11.mlp.fc1.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.11.mlp.fc1.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.11.mlp.fc2.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.11.mlp.fc2.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.11.self_attn.k_proj.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.11.self_attn.k_proj.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.11.self_attn.out_proj.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.11.self_attn.out_proj.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.11.self_attn.q_proj.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.11.self_attn.q_proj.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.11.self_attn.v_proj.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.11.self_attn.v_proj.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.12.layer_norm1.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.12.layer_norm1.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.12.layer_norm2.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.12.layer_norm2.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.12.mlp.fc1.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.12.mlp.fc1.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.12.mlp.fc2.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.12.mlp.fc2.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.12.self_attn.k_proj.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.12.self_attn.k_proj.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.12.self_attn.out_proj.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.12.self_attn.out_proj.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.12.self_attn.q_proj.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.12.self_attn.q_proj.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.12.self_attn.v_proj.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.12.self_attn.v_proj.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.13.layer_norm1.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.13.layer_norm1.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.13.layer_norm2.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.13.layer_norm2.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.13.mlp.fc1.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.13.mlp.fc1.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.13.mlp.fc2.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.13.mlp.fc2.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.13.self_attn.k_proj.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.13.self_attn.k_proj.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.13.self_attn.out_proj.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.13.self_attn.out_proj.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.13.self_attn.q_proj.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.13.self_attn.q_proj.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.13.self_attn.v_proj.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.13.self_attn.v_proj.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.14.layer_norm1.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.14.layer_norm1.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.14.layer_norm2.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.14.layer_norm2.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.14.mlp.fc1.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.14.mlp.fc1.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.14.mlp.fc2.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.14.mlp.fc2.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.14.self_attn.k_proj.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.14.self_attn.k_proj.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.14.self_attn.out_proj.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.14.self_attn.out_proj.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.14.self_attn.q_proj.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.14.self_attn.q_proj.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.14.self_attn.v_proj.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.14.self_attn.v_proj.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.15.layer_norm1.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.15.layer_norm1.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.15.layer_norm2.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.15.layer_norm2.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.15.mlp.fc1.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.15.mlp.fc1.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.15.mlp.fc2.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.15.mlp.fc2.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.15.self_attn.k_proj.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.15.self_attn.k_proj.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.15.self_attn.out_proj.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.15.self_attn.out_proj.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.15.self_attn.q_proj.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.15.self_attn.q_proj.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.15.self_attn.v_proj.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.15.self_attn.v_proj.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.16.layer_norm1.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.16.layer_norm1.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.16.layer_norm2.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.16.layer_norm2.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.16.mlp.fc1.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.16.mlp.fc1.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.16.mlp.fc2.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.16.mlp.fc2.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.16.self_attn.k_proj.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.16.self_attn.k_proj.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.16.self_attn.out_proj.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.16.self_attn.out_proj.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.16.self_attn.q_proj.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.16.self_attn.q_proj.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.16.self_attn.v_proj.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.16.self_attn.v_proj.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.17.layer_norm1.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.17.layer_norm1.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.17.layer_norm2.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.17.layer_norm2.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.17.mlp.fc1.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.17.mlp.fc1.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.17.mlp.fc2.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.17.mlp.fc2.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.17.self_attn.k_proj.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.17.self_attn.k_proj.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.17.self_attn.out_proj.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.17.self_attn.out_proj.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.17.self_attn.q_proj.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.17.self_attn.q_proj.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.17.self_attn.v_proj.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.17.self_attn.v_proj.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.18.layer_norm1.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.18.layer_norm1.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.18.layer_norm2.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.18.layer_norm2.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.18.mlp.fc1.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.18.mlp.fc1.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.18.mlp.fc2.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.18.mlp.fc2.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.18.self_attn.k_proj.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.18.self_attn.k_proj.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.18.self_attn.out_proj.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.18.self_attn.out_proj.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.18.self_attn.q_proj.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.18.self_attn.q_proj.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.18.self_attn.v_proj.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.18.self_attn.v_proj.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.19.layer_norm1.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.19.layer_norm1.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.19.layer_norm2.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.19.layer_norm2.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.19.mlp.fc1.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.19.mlp.fc1.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.19.mlp.fc2.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.19.mlp.fc2.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.19.self_attn.k_proj.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.19.self_attn.k_proj.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.19.self_attn.out_proj.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.19.self_attn.out_proj.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.19.self_attn.q_proj.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.19.self_attn.q_proj.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.19.self_attn.v_proj.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.19.self_attn.v_proj.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.2.layer_norm1.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.2.layer_norm1.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.2.layer_norm2.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.2.layer_norm2.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.2.mlp.fc1.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.2.mlp.fc1.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.2.mlp.fc2.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.2.mlp.fc2.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.2.self_attn.k_proj.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.2.self_attn.k_proj.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.2.self_attn.out_proj.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.2.self_attn.out_proj.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.2.self_attn.q_proj.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.2.self_attn.q_proj.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.2.self_attn.v_proj.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.2.self_attn.v_proj.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.20.layer_norm1.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.20.layer_norm1.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.20.layer_norm2.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.20.layer_norm2.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.20.mlp.fc1.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.20.mlp.fc1.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.20.mlp.fc2.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.20.mlp.fc2.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.20.self_attn.k_proj.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.20.self_attn.k_proj.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.20.self_attn.out_proj.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.20.self_attn.out_proj.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.20.self_attn.q_proj.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.20.self_attn.q_proj.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.20.self_attn.v_proj.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.20.self_attn.v_proj.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.21.layer_norm1.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.21.layer_norm1.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.21.layer_norm2.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.21.layer_norm2.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.21.mlp.fc1.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.21.mlp.fc1.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.21.mlp.fc2.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.21.mlp.fc2.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.21.self_attn.k_proj.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.21.self_attn.k_proj.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.21.self_attn.out_proj.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.21.self_attn.out_proj.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.21.self_attn.q_proj.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.21.self_attn.q_proj.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.21.self_attn.v_proj.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.21.self_attn.v_proj.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.22.layer_norm1.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.22.layer_norm1.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.22.layer_norm2.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.22.layer_norm2.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.22.mlp.fc1.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.22.mlp.fc1.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.22.mlp.fc2.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.22.mlp.fc2.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.22.self_attn.k_proj.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.22.self_attn.k_proj.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.22.self_attn.out_proj.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.22.self_attn.out_proj.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.22.self_attn.q_proj.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.22.self_attn.q_proj.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.22.self_attn.v_proj.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.22.self_attn.v_proj.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.23.layer_norm1.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.23.layer_norm1.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.23.layer_norm2.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.23.layer_norm2.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.23.mlp.fc1.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.23.mlp.fc1.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.23.mlp.fc2.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.23.mlp.fc2.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.23.self_attn.k_proj.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.23.self_attn.k_proj.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.23.self_attn.out_proj.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.23.self_attn.out_proj.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.23.self_attn.q_proj.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.23.self_attn.q_proj.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.23.self_attn.v_proj.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.23.self_attn.v_proj.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.24.layer_norm1.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.24.layer_norm1.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.24.layer_norm2.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.24.layer_norm2.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.24.mlp.fc1.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.24.mlp.fc1.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.24.mlp.fc2.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.24.mlp.fc2.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.24.self_attn.k_proj.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.24.self_attn.k_proj.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.24.self_attn.out_proj.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.24.self_attn.out_proj.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.24.self_attn.q_proj.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.24.self_attn.q_proj.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.24.self_attn.v_proj.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.24.self_attn.v_proj.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.25.layer_norm1.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.25.layer_norm1.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.25.layer_norm2.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.25.layer_norm2.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.25.mlp.fc1.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.25.mlp.fc1.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.25.mlp.fc2.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.25.mlp.fc2.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.25.self_attn.k_proj.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.25.self_attn.k_proj.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.25.self_attn.out_proj.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.25.self_attn.out_proj.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.25.self_attn.q_proj.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.25.self_attn.q_proj.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.25.self_attn.v_proj.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.25.self_attn.v_proj.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.26.layer_norm1.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.26.layer_norm1.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.26.layer_norm2.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.26.layer_norm2.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.26.mlp.fc1.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.26.mlp.fc1.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.26.mlp.fc2.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.26.mlp.fc2.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.26.self_attn.k_proj.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.26.self_attn.k_proj.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.26.self_attn.out_proj.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.26.self_attn.out_proj.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.26.self_attn.q_proj.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.26.self_attn.q_proj.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.26.self_attn.v_proj.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.26.self_attn.v_proj.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.3.layer_norm1.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.3.layer_norm1.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.3.layer_norm2.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.3.layer_norm2.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.3.mlp.fc1.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.3.mlp.fc1.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.3.mlp.fc2.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.3.mlp.fc2.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.3.self_attn.k_proj.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.3.self_attn.k_proj.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.3.self_attn.out_proj.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.3.self_attn.out_proj.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.3.self_attn.q_proj.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.3.self_attn.q_proj.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.3.self_attn.v_proj.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.3.self_attn.v_proj.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.4.layer_norm1.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.4.layer_norm1.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.4.layer_norm2.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.4.layer_norm2.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.4.mlp.fc1.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.4.mlp.fc1.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.4.mlp.fc2.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.4.mlp.fc2.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.4.self_attn.k_proj.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.4.self_attn.k_proj.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.4.self_attn.out_proj.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.4.self_attn.out_proj.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.4.self_attn.q_proj.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.4.self_attn.q_proj.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.4.self_attn.v_proj.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.4.self_attn.v_proj.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.5.layer_norm1.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.5.layer_norm1.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.5.layer_norm2.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.5.layer_norm2.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.5.mlp.fc1.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.5.mlp.fc1.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.5.mlp.fc2.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.5.mlp.fc2.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.5.self_attn.k_proj.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.5.self_attn.k_proj.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.5.self_attn.out_proj.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.5.self_attn.out_proj.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.5.self_attn.q_proj.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.5.self_attn.q_proj.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.5.self_attn.v_proj.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.5.self_attn.v_proj.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.6.layer_norm1.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.6.layer_norm1.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.6.layer_norm2.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.6.layer_norm2.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.6.mlp.fc1.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.6.mlp.fc1.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.6.mlp.fc2.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.6.mlp.fc2.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.6.self_attn.k_proj.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.6.self_attn.k_proj.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.6.self_attn.out_proj.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.6.self_attn.out_proj.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.6.self_attn.q_proj.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.6.self_attn.q_proj.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.6.self_attn.v_proj.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.6.self_attn.v_proj.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.7.layer_norm1.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.7.layer_norm1.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.7.layer_norm2.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.7.layer_norm2.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.7.mlp.fc1.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.7.mlp.fc1.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.7.mlp.fc2.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.7.mlp.fc2.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.7.self_attn.k_proj.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.7.self_attn.k_proj.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.7.self_attn.out_proj.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.7.self_attn.out_proj.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.7.self_attn.q_proj.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.7.self_attn.q_proj.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.7.self_attn.v_proj.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.7.self_attn.v_proj.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.8.layer_norm1.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.8.layer_norm1.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.8.layer_norm2.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.8.layer_norm2.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.8.mlp.fc1.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.8.mlp.fc1.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.8.mlp.fc2.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.8.mlp.fc2.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.8.self_attn.k_proj.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.8.self_attn.k_proj.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.8.self_attn.out_proj.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.8.self_attn.out_proj.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.8.self_attn.q_proj.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.8.self_attn.q_proj.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.8.self_attn.v_proj.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.8.self_attn.v_proj.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.9.layer_norm1.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.9.layer_norm1.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.9.layer_norm2.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.9.layer_norm2.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.9.mlp.fc1.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.9.mlp.fc1.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.9.mlp.fc2.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.9.mlp.fc2.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.9.self_attn.k_proj.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.9.self_attn.k_proj.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.9.self_attn.out_proj.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.9.self_attn.out_proj.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.9.self_attn.q_proj.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.9.self_attn.q_proj.weight', 'vision_tower._vision_tower.vision_model.encoder.layers.9.self_attn.v_proj.bias', 'vision_tower._vision_tower.vision_model.encoder.layers.9.self_attn.v_proj.weight', 'vision_tower._vision_tower.vision_model.head.attention.in_proj_bias', 'vision_tower._vision_tower.vision_model.head.attention.in_proj_weight', 'vision_tower._vision_tower.vision_model.head.attention.out_proj.bias', 'vision_tower._vision_tower.vision_model.head.attention.out_proj.weight', 'vision_tower._vision_tower.vision_model.head.layernorm.bias', 'vision_tower._vision_tower.vision_model.head.layernorm.weight', 'vision_tower._vision_tower.vision_model.head.mlp.fc1.bias', 'vision_tower._vision_tower.vision_model.head.mlp.fc1.weight', 'vision_tower._vision_tower.vision_model.head.mlp.fc2.bias', 'vision_tower._vision_tower.vision_model.head.mlp.fc2.weight', 'vision_tower._vision_tower.vision_model.head.probe', 'vision_tower._vision_tower.vision_model.post_layernorm.bias', 'vision_tower._vision_tower.vision_model.post_layernorm.weight']) + + + + +VM-0-21-tencentos:109:5236 [4] NCCL INFO [Service thread] Connection closed by localRank 4 +VM-0-21-tencentos:111:5242 [6] NCCL INFO [Service thread] Connection closed by localRank 6 +VM-0-21-tencentos:108:5298 [3] NCCL INFO comm 0x525e1ea0 rank 3 nranks 16 cudaDev 3 busId 44000 - Abort COMPLETE +VM-0-21-tencentos:110:5235 [5] NCCL INFO [Service thread] Connection closed by localRank 5 +VM-0-21-tencentos:107:5299 [2] NCCL INFO comm 0x3ee19520 rank 2 nranks 16 cudaDev 2 busId 43000 - Abort COMPLETE +VM-0-21-tencentos:109:5301 [4] NCCL INFO comm 0x38719160 rank 4 nranks 16 cudaDev 4 busId 83000 - Abort COMPLETE +VM-0-21-tencentos:111:5302 [6] NCCL INFO comm 0x279a5e30 rank 6 nranks 16 cudaDev 6 busId c3000 - Abort COMPLETE +VM-0-21-tencentos:110:5303 [5] NCCL INFO comm 0x5a546450 rank 5 nranks 16 cudaDev 5 busId 84000 - Abort COMPLETE +transfer added_tokens.json +transfer config.json +transfer vocab.json +transfer training_args.bin +transfer merges.txt +transfer tokenizer_config.json +transfer generation_config.json +transfer special_tokens_map.json +saving vision_tower +VM-0-21-tencentos:106:5237 [1] NCCL INFO [Service thread] Connection closed by localRank 1 +VM-0-21-tencentos:112:5231 [7] NCCL INFO [Service thread] Connection closed by localRank 7 +VM-0-21-tencentos:106:5304 [1] NCCL INFO comm 0x6be95090 rank 1 nranks 16 cudaDev 1 busId 24000 - Abort COMPLETE +saving connector +saving language_model +VM-0-21-tencentos:112:5305 [7] NCCL INFO comm 0x373318c0 rank 7 nranks 16 cudaDev 7 busId c4000 - Abort COMPLETE +all modules are saved ! +VM-0-21-tencentos:105:5238 [0] NCCL INFO [Service thread] Connection closed by localRank 0 +VM-0-21-tencentos:105:5310 [0] NCCL INFO comm 0x13b36580 rank 0 nranks 16 cudaDev 0 busId 23000 - Abort COMPLETE