Update README.md
Browse files
README.md
CHANGED
@@ -186,8 +186,62 @@
|
|
186 |
|
187 |
### 모델 실행 단계:
|
188 |
|
|
|
189 |
### ✅ Prompt generating
|
190 |
```ruby
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
191 |
→ If you want to focus only on the final layers (where the model usually learns complex features), you can choose fewer DPAM layers.
|
192 |
```
|
193 |
|
|
|
186 |
|
187 |
### 모델 실행 단계:
|
188 |
|
189 |
+
|
190 |
### ✅ Prompt generating
|
191 |
```ruby
|
192 |
+
training_lib/prompt_ensemble.py
|
193 |
+
```
|
194 |
+
👍 **Prompts Built in the Code**
|
195 |
+
1. Normal Prompt: *'["{ }"]'*
|
196 |
+
→ Normal Prompt Example: "object"
|
197 |
+
2. Anomaly Prompt: *'["damaged { }"]'*
|
198 |
+
→ Anomaly Prompt Example: "damaged object"
|
199 |
+
|
200 |
+
👍 **Construction Process**
|
201 |
+
1. *'prompts_pos (Normal)'*: Combines the class name with the normal template
|
202 |
+
2. *'prompts_neg (Anomaly)'*: Combines the class name with the anomaly template
|
203 |
+
|
204 |
+
### ✅ Initial setting for training
|
205 |
+
|
206 |
+
- Define the path to the training dataset and model checkpoint saving
|
207 |
+
```ruby
|
208 |
+
parser.add_argument("--train_data_path", type=str, default="./data/", help="train dataset path")
|
209 |
+
parser.add_argument("--dataset", type=str, default='smoke_cloud', help="train dataset name")
|
210 |
+
parser.add_argument("--save_path", type=str, default='./checkpoint/', help='path to save results')
|
211 |
+
```
|
212 |
+
|
213 |
+
### ✅ Hyper parameters setting
|
214 |
+
|
215 |
+
- Set the depth parameter: depth of the embedding learned during prompt training. This affects the model's ability to learn complex features from the data
|
216 |
+
```ruby
|
217 |
+
parser.add_argument("--depth", type=int, default=9, help="image size")
|
218 |
+
```
|
219 |
+
|
220 |
+
- Define the size of input images used for training (pixel)
|
221 |
+
```ruby
|
222 |
+
parser.add_argument("--image_size", type=int, default=518, help="image size")
|
223 |
+
```
|
224 |
+
|
225 |
+
- Setting parameters for training
|
226 |
+
```ruby
|
227 |
+
parser.add_argument("--epoch", type=int, default=500, help="epochs")
|
228 |
+
parser.add_argument("--learning_rate", type=float, default=0.0001, help="learning rate")
|
229 |
+
parser.add_argument("--batch_size", type=int, default=8, help="batch size")
|
230 |
+
```
|
231 |
+
|
232 |
+
- Size/depth parameter for the DPAM (Deep Prompt Attention Mechanism)
|
233 |
+
```ruby
|
234 |
+
parser.add_argument("--dpam", type=int, default=20, help="dpam size")
|
235 |
+
|
236 |
+
1. ViT-B/32 and ViT-B/16: --dpam should be around 10-13
|
237 |
+
2. ViT-L/14 and ViT-L/14@336px: --dpam should be around 20-24
|
238 |
+
```
|
239 |
+
```ruby
|
240 |
+
→ DPAM is used to refine and enhance specific layers of a model, particularly in Vision Transformers (ViT).
|
241 |
+
→ Helps the model focus on important features within each layer through an attention mechanism
|
242 |
+
→ Layers: DPAM is applied across multiple layers, allowing deeper and more detailed feature extraction
|
243 |
+
→ Number of layers DPAM influences is adjustable (--dpam), controlling how much of the model is fine-tuned.
|
244 |
+
→ If you want to refine the entire model, you can set --dpam to the number of layers in the model (e.g., 12 for ViT-B and 24 for ViT-L).
|
245 |
→ If you want to focus only on the final layers (where the model usually learns complex features), you can choose fewer DPAM layers.
|
246 |
```
|
247 |
|