clip-based-anomaly-detection / README.md

Update README.md

b35336d verified 10 months ago

10.1 kB

	# CLIP 기반 제품 결함 탐지 모델 카드

	## 모델 세부사항

	### 모델 설명

	이 모델은 CLIP 기반의 이상 탐지 방법을 사용하여 제품 결함을 탐지합니다.
	사전 훈련된 CLIP 모델을 fine-tuning하여 제품 이미지에서 결함을 식별하고, 생산 라인에서 품질 관리 및 결함 감지를 자동화합니다.

	- Developed by: 오석
	- Funded by: 4INLAB INC.
	- Shared by: zhou2023anomalyclip
	- Model type: CLIP based Anomaly Detection
	- Language(s): Python, PyTorch
	- License: Apache 2.0, MIT, GPL-3.0

	### 기술적 제한사항

	- 모델은 결함 탐지를 위한 충분하고 다양한 훈련 데이터를 필요로 합니다. 훈련 데이터셋이 부족하거나 불균형할 경우, 모델의 성능이 저하될 수 있습니다.
	- 실시간 결함 감지 성능은 하드웨어 사양에 따라 달라질 수 있으며, 높은 해상도에서 결함을 탐지하는 정확도가 떨어질 수 있습니다.
	- 결함이 미세하거나 제품 간 유사성이 매우 높은 경우, 모델이 결함을 정확하게 탐지하지 못할 수 있습니다.

	## 학습 세부사항

	### Hardware
	- CPU: Intel Core i9-13900K (24 Cores, 32 Threads)
	- RAM: 64GB DDR5
	- GPU: NVIDIA RTX 4090Ti 24GB
	- Storage: 1TB NVMe SSD + 2TB HDD
	- Operating System: Windows 11 pro

	### 데이터셋 정보

	이 모델은 시계열 재고 데이터를 사용하여 훈련됩니다. 이 데이터는 재고 수준, 날짜 및 기타 관련 특성에 대한 정보를 포함하고 있습니다.
	데이터는 Conv1D와 BiLSTM 레이어에 적합하도록 MinMax 스케일링을 사용하여 전처리되고 정규화됩니다.


	![image/png](https://cdn-uploads.huggingface.co/production/uploads/65e7d0935ea025ead9623dde/E8pMyLfUnlIQFCLbTiLba.png)

	- Data sources: https://huggingface.co/datasets/quandao92/vision-inventory-prediction-data
	- Training size:
	- 1차 : Few-shot learning with anomaly (10ea), good (4ea)
	- 2차 : Few-shot learning with anomaly (10ea), good (10ea)
	- 3차 : Few-shot learning with anomaly (10ea), good (110ea)

	- Time-step: 5초 이내

	- Data Processing Techniques:
	- normalization:
	description: "이미지 픽셀 값을 평균 및 표준편차로 표준화"
	method: "'Normalize' from 'torchvision.transforms'"
	- max_resize:
	description: "이미지의 최대 크기를 유지하며, 비율을 맞추고 패딩을 추가하여 크기 조정"
	method: "Custom 'ResizeMaxSize' class"
	- random_resized_crop:
	description: "훈련 중에 이미지를 랜덤으로 자르고 크기를 조정하여 변형을 추가"
	method: "'RandomResizedCrop' from 'torchvision.transforms'"
	- resize:
	description: "모델 입력에 맞게 이미지를 고정된 크기로 조정"
	method: "'Resize' with BICUBIC interpolation"
	- center_crop:
	description: "이미지의 중앙 부분을 지정된 크기로 자르기"
	method: "'CenterCrop'"
	- to_tensor:
	description: "이미지를 PyTorch 텐서로 변환"
	method: "'ToTensor'"
	- augmentation (optional):
	description: "데이터 증강을 위해 다양한 랜덤 변환 적용, 'AugmentationCfg'로 설정 가능"
	method: "Uses 'timm' library if specified"


	# AD-CLIP Model Architecture


	![image/png](https://cdn-uploads.huggingface.co/production/uploads/65e7d0935ea025ead9623dde/1wFBzBCgF4sOefROGE7RO.png)

	- model:
	- input_layer:
	- image_size: [640, 640, 3] # 표준 입력 이미지 크기
	- backbone:
	- name: CLIP (ViT-B-32) # CLIP 모델의 비전 트랜스포머를 백본으로 사용
	- filters: [32, 64, 128, 256, 512] # 비전 트랜스포머의 각 레이어 필터 크기
	- neck:
	- name: Anomaly Detection Module # 결함 탐지를 위한 추가 모듈
	- method: Contrastive Learning # CLIP 모델의 특징을 사용한 대조 학습 기법
	- head:
	- name: Anomaly Detection Head # 결함 탐지를 위한 최종 출력 레이어
	- outputs:
	- anomaly_score: 1 # 이상 탐지 점수 (비정상/정상 구분)
	- class_probabilities: N # 각 클래스에 대한 확률 (결함 여부)

	# Optimizer and Loss Function
	- training:
	- optimizer:
	- name: AdamW # AdamW 옵티마이저 (가중치 감쇠 포함)
	- lr: 0.0001 # 학습률
	- loss:
	- classification_loss: 1.0 # 분류 손실 (교차 엔트로피)
	- anomaly_loss: 1.0 # 결함 탐지 손실 (이상 탐지 모델에 대한 손실)
	- contrastive_loss: 1.0 # 대조 학습 손실 (유사도 기반 손실)

	# Metrics
	- metrics:
	- Precision # 정밀도 (Precision)
	- Recall # 재현율 (Recall)
	- mAP # 평균 정밀도 (Mean Average Precision)
	- F1-Score # F1-점수 (균형 잡힌 평가 지표)

	# Training Parameters
	하이퍼파라미터 설정
	- Learning Rate: 0.001.
	- Batch Size: 8.
	- Epochs: 200.

	# Pre-trained CLIP model
	\| Model \| Download \|
	\| ViT-L/14 \| [download](https://openaipublic.azureedge.net/clip/models/b8cca3fd41ae0c99ba7e8951adf17d267cdb84cd88be6f7c2e0eca1737a03836/ViT-L-14.pt) \|
	\| ViT-L/14@336px \| [download](https://openaipublic.azureedge.net/clip/models/3035c92b350959924f9f00213499208652fc7ea050643e8b385c2dac08641f02/ViT-L-14-336px.pt) \|

	# Evaluation Parameters
	- F1-score: 95%이상.



	# 학습 성능 및 테스트 결과

	- 학습성능 결과과 그래프:
	![image/png](https://cdn-uploads.huggingface.co/production/uploads/65e7d0935ea025ead9623dde/RduhNlkWiyPXj-vbAkJga.png)

	<div style="display: flex; justify-content: space-between;">
	<div style="text-align: center; margin-right: 20px;">
	<img src="https://cdn-uploads.huggingface.co/production/uploads/65e7d0935ea025ead9623dde/_lUD77x-yueXycuIn7jya.png" height="80%" width="100%" style="margin-right:5px;">
	<p>1차 학습 성능</p>
	</div>
	<div style="text-align: center; margin-right: 20px;">
	<img src="https://cdn-uploads.huggingface.co/production/uploads/65e7d0935ea025ead9623dde/NHDH9N94cI-KqP8k-ASUN.png" height="80%" width="100%" style="margin-right:5px;">
	<p>2차 학습 성능</p>
	</div>
	<div style="text-align: center; margin-right: 20px;">
	<img src="https://cdn-uploads.huggingface.co/production/uploads/65e7d0935ea025ead9623dde/6n0DnnQjXD8Ql-p3Owxan.png" height="80%" width="100%" style="margin-right:5px;">
	<p>3차 학습 성능</p>
	</div>
	</div>

	- 학습 결과표:
	![image/png](https://cdn-uploads.huggingface.co/production/uploads/65e7d0935ea025ead9623dde/kDxl9q6X2dxCRJm5nc7jR.png)

	- 테스트 결과:
	<div style="display: flex; justify-content: space-between;">
	<div style="text-align: center; margin-right: 20px;">
	<img src="https://cdn-uploads.huggingface.co/production/uploads/65e7d0935ea025ead9623dde/A91V0GdrcUcX01cC-biG9.png" height="600" width="1000" style="margin-right:5px;">
	<p>Anomaly Product</p>
	</div>
	<div style="text-align: center; margin-right: 20px;">
	<img src="https://cdn-uploads.huggingface.co/production/uploads/65e7d0935ea025ead9623dde/PxleIhphzViTGCubVhWn7.png" height="600" width="1000" style="margin-right:5px;">
	<p>Normal Product</p>
	</div>
	</div>



	# 설치 및 실행 가이라인

	이 모델을 실행하려면 Python과 함께 다음 라이브러리가 필요합니다:

	- ftfy==6.2.0: 텍스트 정규화 및 인코딩 문제를 해결하는 라이브러리.
	- matplotlib==3.9.0: 데이터 시각화 및 그래프 생성을 위한 라이브러리.
	- numpy==1.24.3: 수치 연산을 위한 핵심 라이브러리.
	- opencv_python==4.9.0.80: 이미지 및 비디오 처리용 라이브러리.
	- pandas==2.2.2: 데이터 분석 및 조작을 위한 라이브러리.
	- Pillow==10.3.0: 이미지 파일 처리 및 변환을 위한 라이브러리.
	- PyQt5==5.15.10: GUI 애플리케이션 개발을 위한 프레임워크.
	- PyQt5_sip==12.13.0: PyQt5와 Python 간의 인터페이스를 제공하는 라이브러리.
	- regex==2024.5.15: 정규 표현식 처리를 위한 라이브러리.
	- scikit_learn==1.2.2: 기계 학습 및 데이터 분석을 위한 라이브러리.
	- scipy==1.9.1: 과학 및 기술 계산을 위한 라이브러리.
	- setuptools==59.5.0: Python 패키지 배포 및 설치를 위한 라이브러리.
	- scikit-image: 이미지 처리 및 분석을 위한 라이브러리.
	- tabulate==0.9.0: 표 형태로 데이터를 출력하는 라이브러리.
	- thop==0.1.1.post2209072238: PyTorch 모델의 FLOP 수를 계산하는 도구.
	- timm==0.6.13: 다양한 최신 이미지 분류 모델을 제공하는 라이브러리.
	- torch==2.0.0: PyTorch 딥러닝 프레임워크.
	- torchvision==0.15.1: 컴퓨터 비전 작업을 위한 PyTorch 확장 라이브러리.
	- tqdm==4.65.0: 진행 상황을 시각적으로 표시하는 라이브러리.
	- pyautogui: GUI 자동화를 위한 라이브러리.

	### 모델 실행 단계:

	### ✅ Prompt generating
	```ruby
	→ If you want to focus only on the final layers (where the model usually learns complex features), you can choose fewer DPAM layers.
	```

	### ✅ Test process

	👍 Load pre-trained and Fine tuned (Checkpoints) models
	1. Pre-trained mode (./pre-trained model/):
	```ruby
	→ Contains the pre-trained model (ViT-B, ViT-L,....)
	→ Used as the starting point for training the CLIP model
	→ Pre-trained model helps speed up and improve training by leveraging previously learned features
	```
	2. Fine-tuned models (./checkpoint/):
	```ruby
	→ "epoch_N.pth" files in this folder store the model's states during the fine-tuning process.
	→ Each ".pth" file represents a version of the model fine-tuned from the pre-trained model
	→ These checkpoints can be used to resume fine-tuning, evaluate the model at different stages, or select the best-performing version
	```


	# References
	- AnomalyCLIP: Object-agnostic Prompt Learning for Zero-shot Anomaly Detection [[github](https://github.com/zqhang/AnomalyCLIP.git)]