Upload merged LoRA model

Files changed (9) hide show

README.md CHANGED Viewed

@@ -1,56 +1,58 @@
----
-library_name: peft
-base_model: openai/clip-vit-base-patch32
-tags:
-- generated_from_trainer
-model-index:
-- name: cultureclip_lora_100k_0317_32_07_03
-  results: []
----
-<!-- This model card has been generated automatically according to the information the Trainer had access to. You
-should probably proofread and complete it, then remove this comment. -->
-# cultureclip_lora_100k_0317_32_07_03
-This model is a fine-tuned version of [openai/clip-vit-base-patch32](https://huggingface.co/openai/clip-vit-base-patch32) on an unknown dataset.
-## Model description
-More information needed
-## Intended uses & limitations
-More information needed
-## Training and evaluation data
-More information needed
-## Training procedure
-### Training hyperparameters
-The following hyperparameters were used during training:
-- learning_rate: 3e-06
-- train_batch_size: 128
-- eval_batch_size: 8
-- seed: 42
-- gradient_accumulation_steps: 16
-- total_train_batch_size: 2048
-- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
-- lr_scheduler_type: linear
-- lr_scheduler_warmup_steps: 100
-- num_epochs: 10.0
-### Training results
-### Framework versions
-- PEFT 0.14.1.dev0
-- Transformers 4.49.0
-- Pytorch 2.5.1+cu124
-- Datasets 3.2.0
-- Tokenizers 0.21.0

+# CultureCLIP模型（LoRA微调）
+此模型是使用LoRA方法微调的CLIP模型，已合并LoRA权重到基础模型中。
+## 模型详情
+- **基础模型**: openai/clip-vit-base-patch32
+- **任务**: 对比学习图像-文本匹配
+- **训练参数**:
+  - 批次大小: 128
+  - 学习率: 3e-06
+  - 训练轮数: 10.0
+  - 梯度累积步数: 16
+  - 损失函数: cultureclip
+  - Caption损失权重: 0.7
+  - Context损失权重: 0.3
+## LoRA配置
+- LoRA秩(r): 4
+- LoRA Alpha: 16
+- LoRA Dropout: 0.1
+- 应用到视觉模型: True
+- 应用到文本模型: True
+- 目标位置: all
+- 目标参数: qv
+- 骨干网络: ViT-B/32
+## 冻结设置
+- 冻结视觉模型: False
+- 冻结文本模型: False
+## 数据集信息
+- 训练文件: /data/yuchen/CultureCLIP_data/pos_neg_crope/train_100k.jsonl
+- 验证文件: /data/yuchen/CultureCLIP_data/pos_neg_crope/val_100k.jsonl
+- 最大序列长度: 77
+- 使用多输入对比学习: True
+## 使用方法
+```python
+from transformers import CLIPModel, CLIPProcessor
+# 加载模型和处理器
+model = CLIPModel.from_pretrained("None")
+processor = CLIPProcessor.from_pretrained("None")
+# 处理文本和图像
+inputs = processor(
+    text=["一张猫的照片", "一张狗的照片"],
+    images=image,
+    return_tensors="pt",
+    padding=True
+)
+# 获取输出
+outputs = model(**inputs)
+```

config.json ADDED Viewed

+{
+  "_name_or_path": "openai/clip-vit-base-patch32",
+  "architectures": [
+    "CLIPModel"
+  ],
+  "initializer_factor": 1.0,
+  "logit_scale_init_value": 2.6592,
+  "model_type": "clip",
+  "projection_dim": 512,
+  "text_config": {
+    "bos_token_id": 0,
+    "dropout": 0.0,
+    "eos_token_id": 2,
+    "model_type": "clip_text_model",
+    "torch_dtype": "float32"
+  },
+  "torch_dtype": "float32",
+  "transformers_version": "4.49.0",
+  "vision_config": {
+    "dropout": 0.0,
+    "model_type": "clip_vision_model",
+    "torch_dtype": "float32"
+  }
+}

merges.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

model.safetensors ADDED Viewed

+version https://git-lfs.github.com/spec/v1
+oid sha256:dde8e98a3e5d2f32d1c954e9dc1e88a5e37a0b95b3418f159339dbd2064e469a
+size 605156676

preprocessor_config.json ADDED Viewed

+{
+  "crop_size": {
+    "height": 224,
+    "width": 224
+  },
+  "do_center_crop": true,
+  "do_convert_rgb": true,
+  "do_normalize": true,
+  "do_rescale": true,
+  "do_resize": true,
+  "image_mean": [
+    0.48145466,
+    0.4578275,
+    0.40821073
+  ],
+  "image_processor_type": "CLIPImageProcessor",
+  "image_std": [
+    0.26862954,
+    0.26130258,
+    0.27577711
+  ],
+  "processor_class": "CLIPProcessor",
+  "resample": 3,
+  "rescale_factor": 0.00392156862745098,
+  "size": {
+    "shortest_edge": 224
+  }
+}

special_tokens_map.json ADDED Viewed

+{
+  "bos_token": {
+    "content": "<|startoftext|>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  },
+  "eos_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "unk_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

tokenizer_config.json ADDED Viewed

+{
+  "add_prefix_space": false,
+  "added_tokens_decoder": {
+    "49406": {
+      "content": "<|startoftext|>",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "49407": {
+      "content": "<|endoftext|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "bos_token": "<|startoftext|>",
+  "clean_up_tokenization_spaces": false,
+  "do_lower_case": true,
+  "eos_token": "<|endoftext|>",
+  "errors": "replace",
+  "extra_special_tokens": {},
+  "model_max_length": 77,
+  "pad_token": "<|endoftext|>",
+  "processor_class": "CLIPProcessor",
+  "tokenizer_class": "CLIPTokenizer",
+  "unk_token": "<|endoftext|>"
+}

vocab.json ADDED Viewed

The diff for this file is too large to render. See raw diff