quandao92
/

clip-based-anomaly-detection

Model card Files Files and versions

quandao92 commited on Dec 3, 2024

Commit

4cddd05

·

verified ·

1 Parent(s): b35336d

Update README.md

Files changed (1) hide show

README.md +54 -0

README.md CHANGED Viewed

@@ -186,8 +186,62 @@
 ### 모델 실행 단계:
 ### ✅ Prompt generating
 ```ruby
 →  If you want to focus only on the final layers (where the model usually learns complex features), you can choose fewer DPAM layers.
 ```

 ### 모델 실행 단계:
 ### ✅ Prompt generating
 ```ruby
+  training_lib/prompt_ensemble.py
+```
+👍 **Prompts Built in the Code**
+1. Normal Prompt: *'["{  }"]'*
+ → Normal Prompt Example: "object"
+2. Anomaly Prompt: *'["damaged { }"]'*
+ → Anomaly Prompt Example: "damaged object"
+👍 **Construction Process**
+1. *'prompts_pos (Normal)'*: Combines the class name with the normal template
+2. *'prompts_neg (Anomaly)'*: Combines the class name with the anomaly template
+### ✅ Initial setting for training
+- Define the path to the training dataset and model checkpoint saving
+```ruby
+parser.add_argument("--train_data_path", type=str, default="./data/", help="train dataset path")
+parser.add_argument("--dataset", type=str, default='smoke_cloud', help="train dataset name")
+parser.add_argument("--save_path", type=str, default='./checkpoint/', help='path to save results')
+```
+### ✅ Hyper parameters setting
+- Set the depth parameter: depth of the embedding learned during prompt training. This affects the model's ability to learn complex features from the data
+```ruby
+parser.add_argument("--depth", type=int, default=9, help="image size")
+```
+- Define the size of input images used for training (pixel)
+```ruby
+parser.add_argument("--image_size", type=int, default=518, help="image size")
+```
+- Setting parameters for training
+```ruby
+parser.add_argument("--epoch", type=int, default=500, help="epochs")
+parser.add_argument("--learning_rate", type=float, default=0.0001, help="learning rate")
+parser.add_argument("--batch_size", type=int, default=8, help="batch size")
+```
+- Size/depth parameter for the DPAM (Deep Prompt Attention Mechanism)
+```ruby
+parser.add_argument("--dpam", type=int, default=20, help="dpam size")
+1. ViT-B/32 and ViT-B/16: --dpam should be around 10-13
+2. ViT-L/14 and ViT-L/14@336px: --dpam should be around 20-24
+```
+```ruby
+→ DPAM is used to refine and enhance specific layers of a model, particularly in Vision Transformers (ViT).
+→ Helps the model focus on important features within each layer through an attention mechanism
+→ Layers: DPAM is applied across multiple layers, allowing deeper and more detailed feature extraction
+→ Number of layers DPAM influences is adjustable (--dpam), controlling how much of the model is fine-tuned.
+→ If you want to refine the entire model, you can set --dpam to the number of layers in the model (e.g., 12 for ViT-B and 24 for ViT-L).
 →  If you want to focus only on the final layers (where the model usually learns complex features), you can choose fewer DPAM layers.
 ```