wargoninnovation commited on
Commit
972befb
·
verified ·
1 Parent(s): 53c286d

Initial upload of Wargon Clothing Classifier v1.0

Browse files
Files changed (5) hide show
  1. README.md +229 -0
  2. class_mappings.json +122 -0
  3. config.json +83 -0
  4. model.safetensors +3 -0
  5. preprocessor_config.json +31 -0
README.md ADDED
@@ -0,0 +1,229 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model: google/vit-base-patch16-224
4
+ tags:
5
+ - image-classification
6
+ - vision
7
+ - clothing
8
+ - fashion
9
+ - vit
10
+ - pytorch
11
+ datasets:
12
+ - wargoninnovation/clothingdatasetsecondhand
13
+ metrics:
14
+ - accuracy
15
+ - f1
16
+ pipeline_tag: image-classification
17
+ widget:
18
+ - src: https://huggingface.co/datasets/mishig/sample_images/resolve/main/tiger.jpg
19
+ example_title: Tiger
20
+ ---
21
+
22
+ # Wargon Clothing Classifier
23
+
24
+ A Vision Transformer (ViT) based model for clothing classification, trained on secondhand clothing images. This model can classify 27 different types of clothing items with 73% accuracy.
25
+
26
+ ## Model Details
27
+
28
+ ### Model Description
29
+
30
+ This is a Vision Transformer model fine-tuned for clothing classification. It was developed to solve real-world clothing categorization challenges in secondhand fashion applications.
31
+
32
+ - **Developed by:** Wargon Innovation
33
+ - **Model type:** Image Classification
34
+ - **Language(s):** N/A (Vision model)
35
+ - **License:** Apache 2.0
36
+ - **Finetuned from model:** google/vit-base-patch16-224
37
+
38
+ ### Model Sources
39
+
40
+ - **Repository:** [Wargon Innovation Clothing Dataset](https://huggingface.co/datasets/wargoninnovation/clothingdatasetsecondhand)
41
+ - **Base Model:** [google/vit-base-patch16-224](https://huggingface.co/google/vit-base-patch16-224)
42
+
43
+ ## Uses
44
+
45
+ ### Direct Use
46
+
47
+ This model can be used for:
48
+ - Automatic clothing categorization in e-commerce
49
+ - Fashion inventory management
50
+ - Secondhand clothing marketplaces
51
+ - Fashion recommendation systems
52
+
53
+ ### Downstream Use
54
+
55
+ The model can be fine-tuned for:
56
+ - Specific clothing brand recognition
57
+ - Size estimation from images
58
+ - Style classification
59
+ - Multi-label clothing attribute detection
60
+
61
+ ## How to Get Started with the Model
62
+
63
+ ```python
64
+ from transformers import AutoImageProcessor, AutoModelForImageClassification
65
+ from PIL import Image
66
+ import torch
67
+
68
+ # Load model and processor
69
+ processor = AutoImageProcessor.from_pretrained("wargoninnovation/wargon-clothing-classifier")
70
+ model = AutoModelForImageClassification.from_pretrained("wargoninnovation/wargon-clothing-classifier")
71
+
72
+ # Load and preprocess image
73
+ image = Image.open("path_to_clothing_image.jpg")
74
+ inputs = processor(image, return_tensors="pt")
75
+
76
+ # Make prediction
77
+ with torch.no_grad():
78
+ outputs = model(**inputs)
79
+ predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
80
+
81
+ # Get top prediction
82
+ predicted_class_id = predictions.argmax().item()
83
+ ```
84
+
85
+ ## Training Details
86
+
87
+ ### Training Data
88
+
89
+ The model was trained on the [wargoninnovation/clothingdatasetsecondhand](https://huggingface.co/datasets/wargoninnovation/clothingdatasetsecondhand) dataset, which contains over 30,000 images of secondhand clothing items across 34+ categories.
90
+
91
+ **Data Preprocessing:**
92
+ - Filtered classes with fewer than 10 samples to ensure robust train/validation splits
93
+ - Final dataset contains 27 clothing categories
94
+ - Images resized to 224x224 pixels
95
+ - Stratified train/validation split (80/20)
96
+
97
+ ### Training Procedure
98
+
99
+ #### Preprocessing
100
+
101
+ - **Image Size:** 224x224 pixels
102
+ - **Normalization:** ImageNet statistics
103
+ - **Data Augmentation:** Standard transformations applied
104
+
105
+ #### Training Hyperparameters
106
+
107
+ - **Training regime:** Mixed precision (fp16)
108
+ - **Learning Rate:** 2e-5
109
+ - **Batch Size:** 16
110
+ - **Epochs:** 6
111
+ - **Optimizer:** AdamW
112
+ - **Weight Decay:** 0.01
113
+ - **Warmup Steps:** 500
114
+ - **Label Smoothing:** 0.1
115
+
116
+ #### Hardware
117
+
118
+ - **GPU:** NVIDIA RTX 3060 (12GB VRAM)
119
+ - **Training Time:** ~1.5 hours
120
+
121
+ ## Evaluation
122
+
123
+ ### Testing Data, Factors & Metrics
124
+
125
+ The model was evaluated on a stratified validation set (20% of the filtered dataset).
126
+
127
+ #### Metrics
128
+
129
+ - **Validation Accuracy:** 73.0%
130
+ - **F1 Score:** 72.7%
131
+ - **Precision:** 72.8%
132
+ - **Recall:** 73.0%
133
+
134
+ ### Results
135
+
136
+ The model achieves balanced performance across major clothing categories, with particular strength in:
137
+ - Common items (T-shirts, Jeans, Dresses)
138
+ - Well-represented categories in the training data
139
+ - Clean product photography (as in the training dataset)
140
+
141
+ ## Clothing Categories
142
+
143
+ The model can classify the following 27 clothing types:
144
+
145
+ 1. Blazer
146
+ 2. Blouse
147
+ 3. Cardigan
148
+ 4. Dress
149
+ 5. Hoodie
150
+ 6. Jacket
151
+ 7. Jeans
152
+ 8. Nightgown
153
+ 9. Outerwear
154
+ 10. Pajamas
155
+ 11. Rain jacket
156
+ 12. Rain trousers
157
+ 13. Robe
158
+ 14. Shirt
159
+ 15. Shorts
160
+ 16. Skirt
161
+ 17. Sweater
162
+ 18. T-shirt
163
+ 19. Tank top
164
+ 20. Tights
165
+ 21. Top
166
+ 22. Training top
167
+ 23. Trousers
168
+ 24. Tunic
169
+ 25. Vest
170
+ 26. Winter jacket
171
+ 27. Winter trousers
172
+
173
+ ## Limitations and Bias
174
+
175
+ ### Limitations
176
+
177
+ - **Image Quality:** Best performance on clean, well-lit product photos similar to training data
178
+ - **Background:** Optimized for images with minimal background distractions
179
+ - **Viewpoint:** Trained primarily on front-facing clothing images
180
+ - **Categories:** Limited to the 27 categories present in training data
181
+
182
+ ### Bias
183
+
184
+ - **Data Source:** Trained on secondhand clothing, may not generalize well to new/luxury items
185
+ - **Cultural Bias:** Dataset may reflect specific regional fashion preferences
186
+ - **Class Imbalance:** Some categories had limited representation even after filtering
187
+
188
+ ## Environmental Impact
189
+
190
+ - **Hardware Type:** NVIDIA RTX 3060
191
+ - **Hours Used:** ~1.5 hours training time
192
+ - **Cloud Provider:** N/A (Local training)
193
+ - **Compute Region:** Local
194
+
195
+ ## Technical Specifications
196
+
197
+ ### Model Architecture
198
+
199
+ - **Base:** Vision Transformer (ViT-Base/16)
200
+ - **Parameters:** ~86M parameters
201
+ - **Input Size:** 224x224x3
202
+ - **Patch Size:** 16x16
203
+ - **Number of Classes:** 27
204
+
205
+ ### Software
206
+
207
+ - **Framework:** PyTorch
208
+ - **Libraries:** HuggingFace Transformers, Datasets
209
+ - **Training Libraries:** Weights & Biases (W&B)
210
+
211
+ ## Citation
212
+
213
+ ```bibtex
214
+ @misc{wargon_clothing_classifier_2024,
215
+ title={Wargon Clothing Classifier: A Vision Transformer for Secondhand Fashion Classification},
216
+ author={Wargon Innovation},
217
+ year={2024},
218
+ publisher={Hugging Face},
219
+ howpublished={\url{https://huggingface.co/wargoninnovation/wargon-clothing-classifier}},
220
+ }
221
+ ```
222
+
223
+ ## Model Card Authors
224
+
225
+ Wargon Innovation Team
226
+
227
+ ## Model Card Contact
228
+
229
+ For questions and feedback, please open an issue in the model repository or contact the Wargon Innovation team.
class_mappings.json ADDED
@@ -0,0 +1,122 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "class_to_id": {
3
+ "Blazer": 0,
4
+ "Blouse": 1,
5
+ "Cardigan": 2,
6
+ "Dress": 3,
7
+ "Hoodie": 4,
8
+ "Jacket": 5,
9
+ "Jeans": 6,
10
+ "Nightgown": 7,
11
+ "Outerwear": 8,
12
+ "Pajamas": 9,
13
+ "Rain jacket": 10,
14
+ "Rain trousers": 11,
15
+ "Robe": 12,
16
+ "Shirt": 13,
17
+ "Shorts": 14,
18
+ "Skirt": 15,
19
+ "Sweater": 16,
20
+ "T-shirt": 17,
21
+ "Tank top": 18,
22
+ "Tights": 19,
23
+ "Top": 20,
24
+ "Training top": 21,
25
+ "Trousers": 22,
26
+ "Tunic": 23,
27
+ "Vest": 24,
28
+ "Winter jacket": 25,
29
+ "Winter trousers": 26
30
+ },
31
+ "id_to_class": {
32
+ "0": "Blazer",
33
+ "1": "Blouse",
34
+ "2": "Cardigan",
35
+ "3": "Dress",
36
+ "4": "Hoodie",
37
+ "5": "Jacket",
38
+ "6": "Jeans",
39
+ "7": "Nightgown",
40
+ "8": "Outerwear",
41
+ "9": "Pajamas",
42
+ "10": "Rain jacket",
43
+ "11": "Rain trousers",
44
+ "12": "Robe",
45
+ "13": "Shirt",
46
+ "14": "Shorts",
47
+ "15": "Skirt",
48
+ "16": "Sweater",
49
+ "17": "T-shirt",
50
+ "18": "Tank top",
51
+ "19": "Tights",
52
+ "20": "Top",
53
+ "21": "Training top",
54
+ "22": "Trousers",
55
+ "23": "Tunic",
56
+ "24": "Vest",
57
+ "25": "Winter jacket",
58
+ "26": "Winter trousers"
59
+ },
60
+ "num_classes": 27,
61
+ "valid_classes": [
62
+ 0,
63
+ 1,
64
+ 2,
65
+ 3,
66
+ 4,
67
+ 5,
68
+ 6,
69
+ 7,
70
+ 8,
71
+ 9,
72
+ 10,
73
+ 11,
74
+ 12,
75
+ 13,
76
+ 14,
77
+ 15,
78
+ 16,
79
+ 17,
80
+ 18,
81
+ 19,
82
+ 20,
83
+ 21,
84
+ 22,
85
+ 23,
86
+ 25,
87
+ 26,
88
+ 27,
89
+ 30,
90
+ 31,
91
+ 32
92
+ ],
93
+ "class_weights": [
94
+ 3.2049648761749268,
95
+ 0.7775523066520691,
96
+ 0.9295064210891724,
97
+ 0.4611579179763794,
98
+ 1.5798324346542358,
99
+ 1.1890760660171509,
100
+ 0.7341421842575073,
101
+ 5.0,
102
+ 3.072801351547241,
103
+ 5.0,
104
+ 5.0,
105
+ 5.0,
106
+ 5.0,
107
+ 0.5360822677612305,
108
+ 0.9422394037246704,
109
+ 1.1290216445922852,
110
+ 0.4077451825141907,
111
+ 0.29895859956741333,
112
+ 0.8248940706253052,
113
+ 1.6935325860977173,
114
+ 0.2645518183708191,
115
+ 5.0,
116
+ 0.3766576051712036,
117
+ 4.585565090179443,
118
+ 4.609201908111572,
119
+ 5.0,
120
+ 5.0
121
+ ]
122
+ }
config.json ADDED
@@ -0,0 +1,83 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "ViTForImageClassification"
4
+ ],
5
+ "attention_probs_dropout_prob": 0.0,
6
+ "encoder_stride": 16,
7
+ "hidden_act": "gelu",
8
+ "hidden_dropout_prob": 0.0,
9
+ "hidden_size": 768,
10
+ "id2label": {
11
+ "0": "LABEL_0",
12
+ "1": "LABEL_1",
13
+ "2": "LABEL_2",
14
+ "3": "LABEL_3",
15
+ "4": "LABEL_4",
16
+ "5": "LABEL_5",
17
+ "6": "LABEL_6",
18
+ "7": "LABEL_7",
19
+ "8": "LABEL_8",
20
+ "9": "LABEL_9",
21
+ "10": "LABEL_10",
22
+ "11": "LABEL_11",
23
+ "12": "LABEL_12",
24
+ "13": "LABEL_13",
25
+ "14": "LABEL_14",
26
+ "15": "LABEL_15",
27
+ "16": "LABEL_16",
28
+ "17": "LABEL_17",
29
+ "18": "LABEL_18",
30
+ "19": "LABEL_19",
31
+ "20": "LABEL_20",
32
+ "21": "LABEL_21",
33
+ "22": "LABEL_22",
34
+ "23": "LABEL_23",
35
+ "24": "LABEL_24",
36
+ "25": "LABEL_25",
37
+ "26": "LABEL_26"
38
+ },
39
+ "image_size": 224,
40
+ "initializer_range": 0.02,
41
+ "intermediate_size": 3072,
42
+ "label2id": {
43
+ "LABEL_0": 0,
44
+ "LABEL_1": 1,
45
+ "LABEL_10": 10,
46
+ "LABEL_11": 11,
47
+ "LABEL_12": 12,
48
+ "LABEL_13": 13,
49
+ "LABEL_14": 14,
50
+ "LABEL_15": 15,
51
+ "LABEL_16": 16,
52
+ "LABEL_17": 17,
53
+ "LABEL_18": 18,
54
+ "LABEL_19": 19,
55
+ "LABEL_2": 2,
56
+ "LABEL_20": 20,
57
+ "LABEL_21": 21,
58
+ "LABEL_22": 22,
59
+ "LABEL_23": 23,
60
+ "LABEL_24": 24,
61
+ "LABEL_25": 25,
62
+ "LABEL_26": 26,
63
+ "LABEL_3": 3,
64
+ "LABEL_4": 4,
65
+ "LABEL_5": 5,
66
+ "LABEL_6": 6,
67
+ "LABEL_7": 7,
68
+ "LABEL_8": 8,
69
+ "LABEL_9": 9
70
+ },
71
+ "layer_norm_eps": 1e-12,
72
+ "model_type": "vit",
73
+ "num_attention_heads": 12,
74
+ "num_channels": 3,
75
+ "num_hidden_layers": 12,
76
+ "patch_size": 16,
77
+ "pooler_act": "tanh",
78
+ "pooler_output_size": 768,
79
+ "problem_type": "single_label_classification",
80
+ "qkv_bias": true,
81
+ "torch_dtype": "float32",
82
+ "transformers_version": "4.55.3"
83
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6e24861295646225ec60be0f8e195e01bc20144fc20cf637fcd92acde23d5bea
3
+ size 343300876
preprocessor_config.json ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "crop_size": null,
3
+ "data_format": "channels_first",
4
+ "default_to_square": true,
5
+ "device": null,
6
+ "disable_grouping": null,
7
+ "do_center_crop": null,
8
+ "do_convert_rgb": null,
9
+ "do_normalize": true,
10
+ "do_rescale": true,
11
+ "do_resize": true,
12
+ "image_mean": [
13
+ 0.5,
14
+ 0.5,
15
+ 0.5
16
+ ],
17
+ "image_processor_type": "ViTImageProcessorFast",
18
+ "image_std": [
19
+ 0.5,
20
+ 0.5,
21
+ 0.5
22
+ ],
23
+ "input_data_format": null,
24
+ "resample": 2,
25
+ "rescale_factor": 0.00392156862745098,
26
+ "return_tensors": null,
27
+ "size": {
28
+ "height": 224,
29
+ "width": 224
30
+ }
31
+ }