quandao92 commited on
Commit
4cddd05
·
verified ·
1 Parent(s): b35336d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +54 -0
README.md CHANGED
@@ -186,8 +186,62 @@
186
 
187
  ### 모델 실행 단계:
188
 
 
189
  ### ✅ Prompt generating
190
  ```ruby
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
191
  → If you want to focus only on the final layers (where the model usually learns complex features), you can choose fewer DPAM layers.
192
  ```
193
 
 
186
 
187
  ### 모델 실행 단계:
188
 
189
+
190
  ### ✅ Prompt generating
191
  ```ruby
192
+ training_lib/prompt_ensemble.py
193
+ ```
194
+ 👍 **Prompts Built in the Code**
195
+ 1. Normal Prompt: *'["{ }"]'*
196
+ → Normal Prompt Example: "object"
197
+ 2. Anomaly Prompt: *'["damaged { }"]'*
198
+ → Anomaly Prompt Example: "damaged object"
199
+
200
+ 👍 **Construction Process**
201
+ 1. *'prompts_pos (Normal)'*: Combines the class name with the normal template
202
+ 2. *'prompts_neg (Anomaly)'*: Combines the class name with the anomaly template
203
+
204
+ ### ✅ Initial setting for training
205
+
206
+ - Define the path to the training dataset and model checkpoint saving
207
+ ```ruby
208
+ parser.add_argument("--train_data_path", type=str, default="./data/", help="train dataset path")
209
+ parser.add_argument("--dataset", type=str, default='smoke_cloud', help="train dataset name")
210
+ parser.add_argument("--save_path", type=str, default='./checkpoint/', help='path to save results')
211
+ ```
212
+
213
+ ### ✅ Hyper parameters setting
214
+
215
+ - Set the depth parameter: depth of the embedding learned during prompt training. This affects the model's ability to learn complex features from the data
216
+ ```ruby
217
+ parser.add_argument("--depth", type=int, default=9, help="image size")
218
+ ```
219
+
220
+ - Define the size of input images used for training (pixel)
221
+ ```ruby
222
+ parser.add_argument("--image_size", type=int, default=518, help="image size")
223
+ ```
224
+
225
+ - Setting parameters for training
226
+ ```ruby
227
+ parser.add_argument("--epoch", type=int, default=500, help="epochs")
228
+ parser.add_argument("--learning_rate", type=float, default=0.0001, help="learning rate")
229
+ parser.add_argument("--batch_size", type=int, default=8, help="batch size")
230
+ ```
231
+
232
+ - Size/depth parameter for the DPAM (Deep Prompt Attention Mechanism)
233
+ ```ruby
234
+ parser.add_argument("--dpam", type=int, default=20, help="dpam size")
235
+
236
+ 1. ViT-B/32 and ViT-B/16: --dpam should be around 10-13
237
+ 2. ViT-L/14 and ViT-L/14@336px: --dpam should be around 20-24
238
+ ```
239
+ ```ruby
240
+ → DPAM is used to refine and enhance specific layers of a model, particularly in Vision Transformers (ViT).
241
+ → Helps the model focus on important features within each layer through an attention mechanism
242
+ → Layers: DPAM is applied across multiple layers, allowing deeper and more detailed feature extraction
243
+ → Number of layers DPAM influences is adjustable (--dpam), controlling how much of the model is fine-tuned.
244
+ → If you want to refine the entire model, you can set --dpam to the number of layers in the model (e.g., 12 for ViT-B and 24 for ViT-L).
245
  → If you want to focus only on the final layers (where the model usually learns complex features), you can choose fewer DPAM layers.
246
  ```
247