quandao92
/

clip-based-anomaly-detection

Model card Files Files and versions

xet

Community

quandao92 commited on Dec 3, 2024

Commit

b35336d

verified ·

1 Parent(s): 71d05bb

Update README.md

Browse files

Files changed (1) hide show

README.md +182 -245

README.md CHANGED Viewed

@@ -1,276 +1,213 @@
-# CLIP based ANOMALY DETECTION
-<div align="center">
-[![Status](https://img.shields.io/badge/status-active-success.svg)]()
-[![GitHub Issues](https://img.shields.io/github/issues/kylelobo/The-Documentation-Compendium.svg)](https://github.com/kylelobo/The-Documentation-Compendium/issues)
-[![GitHub Pull Requests](https://img.shields.io/github/issues-pr/kylelobo/The-Documentation-Compendium.svg)](https://github.com/kylelobo/The-Documentation-Compendium/pulls)
-[![License](https://img.shields.io/badge/license-MIT-blue.svg)](/LICENSE)
-</div>
----
-<p align="center"> Anomaly detection (AD) requires detection models trained using auxiliary data to detect anomalies without any training sample in a target dataset. AnomalyCLIP is to learn object-agnostic text prompts that capture generic normality and abnormality in an image regardless of its foreground objects. This allows our model to focus on the abnormal image regions rather than the object semantics, enabling generalized normality and abnormality recognition on diverse types of objects. All experiments are conducted in PyTorch-2.0.0 with a single NVIDIA RTX 4090 24GB.
-    <br>
-</p>
-# 📝 Table of Contents
-- [Update](#update)
-- [Install & Dependence](#install--dependence)
-- [Dataset Preparation](#dataset-preparation)
-- [Pre-trained CLIP model](#pre-trained-clip-model)
-- [Usage](#usage)
-- [Code Details](#code-details)
-- [References](#references)
-# Update
-- 08.08.2024: Code has been released !!!
-# Install & Dependence
-### ⭕ Tested Platform
-- Software Information
-  ```
-  OS: Windows 11 64 bit
-  Python: 3.9.18 (anaconda)
-  PyTorch: 2.0.0
-  Cuda Toolkit: 11.8
-  CudDNN: 9.3.0.75 for cuda11
-  ```
-  ![analysis](./docs/CUDA_info.png)
-- Hardware
-  ```
-  CPU: Intel(R) Core(TM) i7-14700K   3.40 GHz
-  RAM: 64GB
-  GPU: Nvidia RTX4090 (24GB)
-  ```
-- Install Python libraries
-  ```
-  pip install -r requirements.txt
-  ```
-# Dataset Preparation
-Download the dataset below:
-* Industrial Domain:
-| Dataset | Download |
-| ---     | ---   |
-| MVTec | [download](https://www.mvtec.com/company/research/datasets/mvtec-ad) |
-| VisA | [download](https://github.com/amazon-science/spot-diff) |
-| MPDD | [download](https://github.com/stepanje/MPDD) |
-| BTAD | [download](http://avires.dimi.uniud.it/papers/btad/btad.zip) |
-| SDD | [download](https://www.vicos.si/resources/kolektorsdd/) |
-| DAGM | [download](https://www.kaggle.com/datasets/mhskjelvareid/dagm-2007-competition-dataset-optical-inspection) |
-| DTD-Synthetic | [download](https://drive.google.com/drive/folders/10OyPzvI3H6llCZBxKxFlKWt1Pw1tkMK1) |
 # Pre-trained CLIP model
 | Model | Download |
-| ---     | ---   |
-| ViT-B/32 | [download](https://openaipublic.azureedge.net/clip/models/40d365715913c9da98579312b702a82c18be219cc2a73407c4526f58eba950af/ViT-B-32.pt) |
-| ViT-B/16 | [download](https://openaipublic.azureedge.net/clip/models/5806e77cd80f8b59890b7e101eabd078d9fb84e6937f9e85e4ecb61988df416f/ViT-B-16.pt) |
 | ViT-L/14 | [download](https://openaipublic.azureedge.net/clip/models/b8cca3fd41ae0c99ba7e8951adf17d267cdb84cd88be6f7c2e0eca1737a03836/ViT-L-14.pt) |
 | ViT-L/14@336px | [download](https://openaipublic.azureedge.net/clip/models/3035c92b350959924f9f00213499208652fc7ea050643e8b385c2dac08641f02/ViT-L-14-336px.pt) |
-# Usage
-- for train (Fine-tuning)
-  ```ruby
-  python train.py
-  ```
-- for test with dataset (many data)
-  ```ruby
-  python test.py
-  ```
-- for simple test (개별 이미지 테스트)
-  ```ruby
-  python Simple_test_code.py
-  ```
-- for UI app test (simple app developed)
-  ```ruby
-  python monitor_check.py
-  ```
-- for real-time detection test (webcam and video tracking)
-  ```ruby
-  python real_time_CLIP.py
-  ```
-# Code Details
-### ✅Dataset configuration
-- Dataset configuration as example below
-```
-├── data/
-│   ├── COMP_1/
-│   │   ├── product_1/
-│   │   │   ├──grouth_truth
-│   │   │   │   ├──anomaly_1
-│   │   │   │   ├──anomaly_2
-│   │   │   │
-│   │   │   ├──test/
-│   │   │   │   ├──good
-│   │   │   │   ├──anomaly_1
-│   │   │   │   ├──anomaly_2
-│   │   │   │
-│   │   │   ├──train/
-│   │   │   │   ├──good
-│   │   │   │   ├──anomaly_1
-│   │   │   │   ├──anomaly_2
-│   │   │   │
-│   │   ├── product_2/
-│   │   │   │
-│   │
-│   ├── COMP_2/
-│   │
-```
-- Generate JSON file storing all the above information of dataset ( -> meta_train.json, meta_test.json)
-```ruby
-cd dataset_config
-python dataset_get_json.py
-```
-- Making all grouth_truth (only anomaly mask) by hand
-```ruby
-cd dataset_config
-python image_ground_truth.py
-```
-- Dataset configuration for train and test
-```ruby
-cd training_libs
-python dataset.py
-```
- →  _ _init_ _ 메서드는 데이터셋의 루트 디렉토리, 변환 함수, 데이터셋 이름, 모드를 입력으로 받음
- →  메타 정보를 담은 JSON 파일 (meta_train.json)을 읽어와 클래스 이름 목록과 모든 데이터 항목을 리스트에 저장
- →  generate_class_info 함수를 호출하여 클래스 정보를 생성하고 클래스 이름을 클래스 ID에 매핑
- →  _ _len_ _ 메서드는 데이터셋의 샘플 수를 반환
- →  _ _getitem_ _ 메서드는 주어진 인덱스의 샘플 데이터를 반환
- →  이미지 경로를 통해 이미지를 읽고, 이상 여부에 따라 마스크 이미지를 생성
- →  필요시 이미지와 마스크에 변환 함수를 적용
- →  이미지, 마스크, 클래스 이름, 이상 여부, 이미지 경로, 클래스 ID를 포함한 딕셔너리를 반환
-### ✅ Image pre-processing (transformation) for train and test
-```ruby
-  training_lib/utils.py
-```
-```ruby
-  AnomalyCLIP_lib/transform.py
-```
-⭐ **Data Processing Techniques**
-1. Normalization
- → Standardize image pixel values using mean and standard deviation
- → Utilized via *'Normalize'* from *'torchvision.transforms'*
-2. Normalization
- → Resize the image to a maximum dimension while maintaining aspect ratio, with padding
- → Custom *'ResizeMaxSize'* class
-3. RandomResizedCrop
- → Randomly crop and resize images during training to create variability
- → Implemented via *'RandomResizedCrop'* from *'torchvision.transforms'*
-4. Resize
- → Resize images to a fixed size for model input
- → Done using *'Resize'* with BICUBIC interpolation
-5. Center Crop
- → Crop the central region of the image to the desired size
- → Applied using *'CenterCrop'*
-6. ToTensor
- → Convert images to PyTorch tensors
- → Done with *'ToTensor'*
-7. Augmentation (Optional)
- → Apply various random transformations for data augmentation, configurable via *'AugmentationCfg' *
- → Uses *'timm'* library if specified
-⭐ **Libraries Used**
-1. *'torch'*: Core deep learning library for tensor operations and model building
-2. *'torchvision'*: Provides image processing utilities like Resize, CenterCrop, Normalize, etc
-3. *'timm'*: Optional, for advanced augmentation and transformations
-4. *'AnomalyCLIP_lib'*: Custom library for dataset-specific constants and transformations
 ### ✅ Prompt generating
 ```ruby
-  training_lib/prompt_ensemble.py
-```
-👍 **Prompts Built in the Code**
-1. Normal Prompt: *'["{  }"]'*
- → Normal Prompt Example: "object"
-2. Anomaly Prompt: *'["damaged { }"]'*
- → Anomaly Prompt Example: "damaged object"
-👍 **Construction Process**
-1. *'prompts_pos (Normal)'*: Combines the class name with the normal template
-2. *'prompts_neg (Anomaly)'*: Combines the class name with the anomaly template
-### ✅ Initial setting for training
-- Define the path to the training dataset and model checkpoint saving
-```ruby
-parser.add_argument("--train_data_path", type=str, default="./data/", help="train dataset path")
-parser.add_argument("--dataset", type=str, default='smoke_cloud', help="train dataset name")
-parser.add_argument("--save_path", type=str, default='./checkpoint/', help='path to save results')
-```
-### ✅ Hyper parameters setting
-- Set the depth parameter: depth of the embedding learned during prompt training. This affects the model's ability to learn complex features from the data
-```ruby
-parser.add_argument("--depth", type=int, default=9, help="image size")
-```
-- Define the size of input images used for training (pixel)
-```ruby
-parser.add_argument("--image_size", type=int, default=518, help="image size")
-```
-- Setting parameters for training
-```ruby
-parser.add_argument("--epoch", type=int, default=500, help="epochs")
-parser.add_argument("--learning_rate", type=float, default=0.0001, help="learning rate")
-parser.add_argument("--batch_size", type=int, default=8, help="batch size")
 ```
-- Size/depth parameter for the DPAM (Deep Prompt Attention Mechanism)
-```ruby
-parser.add_argument("--dpam", type=int, default=20, help="dpam size")
-1. ViT-B/32 and ViT-B/16: --dpam should be around 10-13
-2. ViT-L/14 and ViT-L/14@336px: --dpam should be around 20-24
 ```
-```ruby
-→ DPAM is used to refine and enhance specific layers of a model, particularly in Vision Transformers (ViT).
-→ Helps the model focus on important features within each layer through an attention mechanism
-→ Layers: DPAM is applied across multiple layers, allowing deeper and more detailed feature extraction
-→ Number of layers DPAM influences is adjustable (--dpam), controlling how much of the model is fine-tuned.
-→ If you want to refine the entire model, you can set --dpam to the number of layers in the model (e.g., 12 for ViT-B and 24 for ViT-L).
-→  If you want to focus only on the final layers (where the model usually learns complex features), you can choose fewer DPAM layers.
 ```
 # References
 - AnomalyCLIP: Object-agnostic Prompt Learning for Zero-shot Anomaly Detection [[github](https://github.com/zqhang/AnomalyCLIP.git)]

+# CLIP 기반 제품 결함 탐지 모델 카드
+## 모델 세부사항
+### 모델 설명
+이 모델은 CLIP 기반의 이상 탐지 방법을 사용하여 제품 결함을 탐지합니다.
+사전 훈련된 CLIP 모델을 fine-tuning하여 제품 이미지에서 결함을 식별하고, 생산 라인에서 품질 관리 및 결함 감지를 자동화합니다.
+- **Developed by:** 오석
+- **Funded by:** 4INLAB INC.
+- **Shared by:** zhou2023anomalyclip
+- **Model type:** CLIP based Anomaly Detection
+- **Language(s):** Python, PyTorch
+- **License:** Apache 2.0, MIT, GPL-3.0
+### 기술적 제한사항
+- 모델은 결함 탐지를 위한 충분하고 다양한 훈련 데이터를 필요로 합니다. 훈련 데이터셋이 부족하거나 불균형할 경우, 모델의 성능이 저하될 수 있습니다.
+- 실시간 결함 감지 성능은 하드웨어 사양에 따라 달라질 수 있으며, 높은 해상도에서 결함을 탐지하는 정확도가 떨어질 수 있습니다.
+- 결함이 미세하거나 제품 간 유사성이 매우 높은 경우, 모델이 결함을 정확하게 탐지하지 못할 수 있습니다.
+## 학습 세부사항
+### Hardware
+  - **CPU:** Intel Core i9-13900K (24 Cores, 32 Threads)
+  - **RAM:** 64GB DDR5
+  - **GPU:** NVIDIA RTX 4090Ti 24GB
+  - **Storage:** 1TB NVMe SSD + 2TB HDD
+  - **Operating System:** Windows 11 pro
+### 데이터셋 정보
+이 모델은 시계열 재고 데이터를 사용하여 훈련됩니다. 이 데이터는 재고 수준, 날짜 및 기타 관련 특성에 대한 정보를 포함하고 있습니다.
+데이터는 Conv1D와 BiLSTM 레이어에 적합하도록 MinMax 스케일링을 사용하여 전처리되고 정규화됩니다.
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/65e7d0935ea025ead9623dde/E8pMyLfUnlIQFCLbTiLba.png)
+- **Data sources:**    https://huggingface.co/datasets/quandao92/vision-inventory-prediction-data
+- **Training size:**
+   - 1차 : Few-shot learning with anomaly (10ea), good (4ea)
+   - 2차 : Few-shot learning with anomaly (10ea), good (10ea)
+   - 3차 : Few-shot learning with anomaly (10ea), good (110ea)
+- **Time-step:** 5초 이내
+- **Data Processing Techniques:**
+  - normalization:
+      description: "이미지 픽셀 값을 평균 및 표준편차로 표준화"
+      method: "'Normalize' from 'torchvision.transforms'"
+  - max_resize:
+      description: "이미지의 최대 크기를 유지하며, 비율을 맞추고 패딩을 추가하여 크기 조정"
+      method: "Custom 'ResizeMaxSize' class"
+  - random_resized_crop:
+      description: "훈련 중에 이미지를 랜덤으로 자르고 크기를 조정하여 변형을 추가"
+      method: "'RandomResizedCrop' from 'torchvision.transforms'"
+  - resize:
+      description: "모델 입력에 맞게 이미지를 고정된 크기로 조정"
+      method: "'Resize' with BICUBIC interpolation"
+  - center_crop:
+      description: "이미지의 중앙 부분을 지정된 크기로 자르기"
+      method: "'CenterCrop'"
+  - to_tensor:
+      description: "이미지를 PyTorch 텐서로 변환"
+      method: "'ToTensor'"
+  - augmentation (optional):
+      description: "데이터 증강을 위해 다양한 랜덤 변환 적용, 'AugmentationCfg'로 설정 가능"
+      method: "Uses 'timm' library if specified"
+# AD-CLIP Model Architecture
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/65e7d0935ea025ead9623dde/1wFBzBCgF4sOefROGE7RO.png)
+- **model:**
+  - input_layer:
+      - image_size: [640, 640, 3]  # 표준 입력 이미지 크기
+  - backbone:
+      - name: CLIP (ViT-B-32)  # CLIP 모델의 비전 트랜스포머를 백본으로 사용
+      - filters: [32, 64, 128, 256, 512]  # 비전 트랜스포머의 각 레이어 필터 크기
+  - neck:
+      - name: Anomaly Detection Module  # 결함 탐지를 위한 추가 모듈
+      - method: Contrastive Learning  # CLIP 모델의 특징을 사용한 대조 학습 기법
+  - head:
+      - name: Anomaly Detection Head  # 결함 탐지를 위한 최종 출력 레이어
+      - outputs:
+          - anomaly_score: 1  # 이상 탐지 점수 (비정상/정상 구분)
+          - class_probabilities: N  # 각 클래스에 대한 확률 (결함 여부)
+# Optimizer and Loss Function
+- **training:**
+  - optimizer:
+      - name: AdamW  # AdamW 옵티마이저 (가중치 감쇠 포함)
+      - lr: 0.0001  # 학습률
+  - loss:
+      - classification_loss: 1.0  # 분류 손실 (교차 엔트로피)
+      - anomaly_loss: 1.0  # 결함 탐지 손실 (이상 탐지 모델에 대한 손실)
+      - contrastive_loss: 1.0  # 대조 학습 손실 (유사도 기반 손실)
+# Metrics
+- **metrics:**
+  - Precision  # 정밀도 (Precision)
+  - Recall  # 재현율 (Recall)
+  - mAP  # 평균 정밀도 (Mean Average Precision)
+  - F1-Score  # F1-점수 (균형 잡힌 평가 지표)
+# Training Parameters
+ **하이퍼파라미터 설정**
+- Learning Rate: 0.001.
+- Batch Size: 8.
+- Epochs: 200.
 # Pre-trained CLIP model
 | Model | Download |
 | ViT-L/14 | [download](https://openaipublic.azureedge.net/clip/models/b8cca3fd41ae0c99ba7e8951adf17d267cdb84cd88be6f7c2e0eca1737a03836/ViT-L-14.pt) |
 | ViT-L/14@336px | [download](https://openaipublic.azureedge.net/clip/models/3035c92b350959924f9f00213499208652fc7ea050643e8b385c2dac08641f02/ViT-L-14-336px.pt) |
+# Evaluation Parameters
+- F1-score: 95%이상.
+# 학습 성능 및 테스트 결과
+- **학습성능 결과과 그래프**:
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/65e7d0935ea025ead9623dde/RduhNlkWiyPXj-vbAkJga.png)
+<div style="display: flex; justify-content: space-between;">
+  <div style="text-align: center; margin-right: 20px;">
+    <img src="https://cdn-uploads.huggingface.co/production/uploads/65e7d0935ea025ead9623dde/_lUD77x-yueXycuIn7jya.png" height="80%" width="100%" style="margin-right:5px;">
+    <p>1차 학습 성능</p>
+  </div>
+  <div style="text-align: center; margin-right: 20px;">
+    <img src="https://cdn-uploads.huggingface.co/production/uploads/65e7d0935ea025ead9623dde/NHDH9N94cI-KqP8k-ASUN.png" height="80%" width="100%" style="margin-right:5px;">
+    <p>2차 학습 성능</p>
+  </div>
+  <div style="text-align: center; margin-right: 20px;">
+    <img src="https://cdn-uploads.huggingface.co/production/uploads/65e7d0935ea025ead9623dde/6n0DnnQjXD8Ql-p3Owxan.png" height="80%" width="100%" style="margin-right:5px;">
+    <p>3차 학습 성능</p>
+  </div>
+</div>
+- **학습 결과표**:
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/65e7d0935ea025ead9623dde/kDxl9q6X2dxCRJm5nc7jR.png)
+- **테스트 결과**:
+<div style="display: flex; justify-content: space-between;">
+  <div style="text-align: center; margin-right: 20px;">
+    <img src="https://cdn-uploads.huggingface.co/production/uploads/65e7d0935ea025ead9623dde/A91V0GdrcUcX01cC-biG9.png" height="600" width="1000" style="margin-right:5px;">
+    <p>Anomaly Product</p>
+  </div>
+  <div style="text-align: center; margin-right: 20px;">
+    <img src="https://cdn-uploads.huggingface.co/production/uploads/65e7d0935ea025ead9623dde/PxleIhphzViTGCubVhWn7.png" height="600" width="1000" style="margin-right:5px;">
+    <p>Normal Product</p>
+  </div>
+</div>
+# 설치 및 실행 가이라인
+이 모델을 실행하려면 Python과 함께 다음 라이브러리가 필요합니다:
+- **ftfy==6.2.0**: 텍스트 정규화 및 인코딩 문제를 해결하는 라이브러리.
+- **matplotlib==3.9.0**: 데이터 시각화 및 그래프 생성을 위한 라이브러리.
+- **numpy==1.24.3**: 수치 연산을 위한 핵심 라이브러리.
+- **opencv_python==4.9.0.80**: 이미지 및 비디오 처리용 라이브러리.
+- **pandas==2.2.2**: 데이터 분석 및 조작을 위한 라이브러리.
+- **Pillow==10.3.0**: 이미지 파일 처리 및 변환을 위한 라이브러리.
+- **PyQt5==5.15.10**: GUI 애플리케이션 개발을 위한 프레임워크.
+- **PyQt5_sip==12.13.0**: PyQt5와 Python 간의 인터페이스를 제공하는 라이브러리.
+- **regex==2024.5.15**: 정규 표현식 처리를 위한 라이���러리.
+- **scikit_learn==1.2.2**: 기계 학습 및 데이터 분석을 위한 라이브러리.
+- **scipy==1.9.1**: 과학 및 기술 계산을 위한 라이브러리.
+- **setuptools==59.5.0**: Python 패키지 배포 및 설치를 위한 라이브러리.
+- **scikit-image**: 이미지 처리 및 분석을 위한 라이브러리.
+- **tabulate==0.9.0**: 표 형태로 데이터를 출력하는 라이브러리.
+- **thop==0.1.1.post2209072238**: PyTorch 모델의 FLOP 수를 계산하는 도구.
+- **timm==0.6.13**: 다양한 최신 이미지 분류 모델을 제공하는 라이브러리.
+- **torch==2.0.0**: PyTorch 딥러닝 프레임워크.
+- **torchvision==0.15.1**: 컴퓨터 비전 작업을 위한 PyTorch 확장 라이브러리.
+- **tqdm==4.65.0**: 진행 상황을 시각적으로 표시하는 라이브러리.
+- **pyautogui**: GUI 자동화를 위한 라이브러리.
+### 모델 실행 단계:
 ### ✅ Prompt generating
 ```ruby
+→  If you want to focus only on the final layers (where the model usually learns complex features), you can choose fewer DPAM layers.
 ```
+### ✅ Test process
+👍 **Load pre-trained and Fine tuned (Checkpoints) models**
+1. Pre-trained mode (./pre-trained model/):
+ ```ruby
+→ Contains the pre-trained model (ViT-B, ViT-L,....)
+→ Used as the starting point for training the CLIP model
+→ Pre-trained model helps speed up and improve training by leveraging previously learned features
 ```
+2. Fine-tuned models (./checkpoint/):
+ ```ruby
+→ "epoch_N.pth" files in this folder store the model's states during the fine-tuning process.
+→ Each ".pth" file represents a version of the model fine-tuned from the pre-trained model
+→ These checkpoints can be used to resume fine-tuning, evaluate the model at different stages, or select the best-performing version
 ```
 # References
 - AnomalyCLIP: Object-agnostic Prompt Learning for Zero-shot Anomaly Detection [[github](https://github.com/zqhang/AnomalyCLIP.git)]