clip-based-anomaly-detection / README.md

Update README.md

75f40a9 verified 10 months ago

30.8 kB

	---
	datasets:
	- quandao92/ad-clip-dataset
	metrics:
	- f1
	base_model:
	- openai/clip-vit-base-patch32
	---
	<div style='text-align: center; font-size: 28px; font-weight: bold'>CLIP 기반 제품 결함 탐지 모델 카드</div>

	## 모델 세부사항

	### 모델 설명

	AnomalyCLIP은 특정 객체에 의존하지 않는 텍스트 프롬프트를 학습하여 이미지 내의 전경 객체와 상관없이 일반적인 정상 및 비정상 패턴을 포착하는 것을 목표로 합니다.
	이 모델은 CLIP 기반 이상 탐지 기법을 활용하여 제품 결함을 탐지합니다.
	사전 학습된 CLIP 모델을 파인튜닝(Fine-tuning)하여 제품 이미지에서 결함을 식별하며, 이를 통해 생산 라인의 품질 관리 및 결함 탐지 작업을 자동화할 수 있습니다.

	- Developed by: 윤석민
	- Funded by: SOLUWINS Co., Ltd. (솔루윈스)
	- Referenced by: zhou2023 anomalyclip [[github](https://github.com/zqhang/AnomalyCLIP.git)]
	- Model type: CLIP (Contrastive Language-Image Pretraining) - Domain-Agnostic Prompt Learning Model
	- Language(s): Python
	- License: Apache 2.0, MIT, OpenAI

	### 기술적 제한사항

	- 모델은 결함 탐지를 위한 충분하고 다양한 훈련 데이터를 필요로 합니다. 훈련 데이터셋이 부족하거나 불균형할 경우, 모델의 성능이 저하될 수 있습니다.
	- 실시간 결함 감지 성능은 하드웨어 사양에 따라 달라질 수 있으며, 높은 해상도에서 결함을 탐지하는 정확도가 떨어질 수 있습니다.
	- 결함이 미세하거나 제품 간 유사성이 매우 높은 경우, 모델이 결함을 정확하게 탐지하지 못할 수 있습니다.

	## 학습 세부사항

	### Hardware
	- CPU: Intel Core i9-13900K (24 Cores, 32 Threads)
	- RAM: 64GB DDR5
	- GPU: NVIDIA RTX 4090Ti 24GB
	- Storage: 1TB NVMe SSD + 2TB HDD

	### Software
	- OS: Windows 11 64 bit/ Ubuntu 20.04LTS
	- Python: 3.8 (anaconda)
	- PyTorch: 1.9.0
	- OpenCV: 4.5.3
	- Cuda Toolkit: 11.8
	- CudDNN: 9.3.0.75 for cuda11

	### 데이터셋 정보

	이 모델은 제품의 정상 이미지와 결함 이미지를 사용하여 훈련됩니다.
	이 데이터는 제품의 이미지, 결함 영역에 대한 ground truth 정보, 그리고 기타 관련 특성을 포함하고 있습니다.
	이미지는 CLIP 모델의 입력 형식에 적합하도록 전처리되며, 결함 영역의 평가를 위해 ground truth 마킹이 포함됩니다.

	- 데이터 소스: https://huggingface.co/datasets/quandao92/ad-clip-dataset
	- 데이터 수집 장비:
	- 수집 H/W: jetson orin nano 8GB
	- 카메라: BFS-U3-89S6C Color Camera
	- 렌즈: 8mm Fiexd Focal Length Lens
	- 조명: LIDLA-120070
	- 데이터 형식: .bpm, .jpg
	- 데이터 버전 관리:
	- 1차 : 20240910_V0_간이 환경 데이터 수집
	데이터 버전 및 사용 이력
	- V01: 전처리 전 데이터 원본 -> 데이터 수집 원본: 7ea
	- V02: 데이터 분류 -> 정상/불량 분류: 4ea/3ea
	- V03: 데이터 분류, 데이터 회전 -> 이미지 증강_45/90/135도로 회전_28ea
	<div style="text-align: center;">
	<img src="https://cdn-uploads.huggingface.co/production/uploads/65e7d0935ea025ead9623dde/6kvzgbH81jJrHJECaEspY.png" height="500" width="100%">
	<p>Ground Truth Marking</p>
	</div>

	<div style="display: flex; justify-content: space-between;">
	<div style="text-align: center; margin-right: 5px;">
	<img src="https://cdn-uploads.huggingface.co/production/uploads/65e7d0935ea025ead9623dde/_fkcI52_BTcqvQyrJ4EXl.png" height="80%" width="90%" style="margin-right:5px;">
	<p>PCA 분포 시각화</p>
	</div>
	<div style="text-align: center; margin-right: 5px;">
	<img src="https://cdn-uploads.huggingface.co/production/uploads/65e7d0935ea025ead9623dde/biaWPJtbm6iwNf7ZqnW5O.png" height="80%" width="90%" style="margin-right:5px;">
	<p>Isolation Forest로 이상값 식별 결과</p>
	</div>
	</div>

	- 2차 : 20240920_V1_하우징 내 이미지 수집
	데이터 버전 및 사용 이력
	- V01: 전처리 전 데이터 원본 -> 데이터 수집 원본: 16ea
	- V02: 데이터 분류 -> 정상/불량 분류: 14ea/2ea
	- V03: 데이터 분류, 데이터 회전 -> 이미지 증강__64ea
	<div style="text-align: center;">
	<img src="https://cdn-uploads.huggingface.co/production/uploads/65e7d0935ea025ead9623dde/YsP7UwejFabUFp2Im0xWj.png" height="500" width="100%">
	<p>Ground Truth Marking</p>
	</div>

	<div style="display: flex; justify-content: space-between;">
	<div style="text-align: center; margin-right: 5px;">
	<img src="https://cdn-uploads.huggingface.co/production/uploads/65e7d0935ea025ead9623dde/CNFdse5mHQY1KkMb5BYpb.png" height="80%" width="90%" style="margin-right:5px;">
	<p>PCA 분포 시각화</p>
	</div>
	<div style="text-align: center; margin-right: 5px;">
	<img src="https://cdn-uploads.huggingface.co/production/uploads/65e7d0935ea025ead9623dde/nRO00DJFT0-B1EJYf8lzK.png" height="80%" width="90%" style="margin-right:5px;">
	<p>Isolation Forest로 이상값 식별 결과</p>
	</div>
	</div>

	- 3차 : 20241002_V2_설비 내 데이터 수집
	데이터 버전 및 사용 이력
	- V01: 전처리 전 데이터 원본 -> 이미지 수집_49개
	- V02: 데이터 분류 -> 정상/불량 분류 수행_error/normal
	- V03: 데이터 분류, 데이터 회전 -> 이미지 증강 수행_이미지 회전을 통해 이미지 개수 102개
	<div style="text-align: center;">
	<img src="https://cdn-uploads.huggingface.co/production/uploads/65e7d0935ea025ead9623dde/MFyVWaqr4GDNs8W2mWzGZ.png" height="500" width="100%">
	<p>Ground Truth Marking</p>
	</div>

	<div style="display: flex; justify-content: space-between;">
	<div style="text-align: center; margin-right: 5px;">
	<img src="https://cdn-uploads.huggingface.co/production/uploads/65e7d0935ea025ead9623dde/Kc3EMbY05frUFQh5HbVHn.png" height="80%" width="90%" style="margin-right:5px;">
	<p>PCA 분포 시각화</p>
	</div>
	<div style="text-align: center; margin-right: 5px;">
	<img src="https://cdn-uploads.huggingface.co/production/uploads/65e7d0935ea025ead9623dde/SP4R5LjGo2M1Zvby1Bar_.png" height="80%" width="90%" style="margin-right:5px;">
	<p>Isolation Forest로 이상값 식별 결과</p>
	</div>
	</div>

	- Data Configuration:
	- 이미지 크기 조정 및 정규화:
	- 이미지는 일정한 크기(예: 518x518)로 리사이즈되며, CLIP 모델의 입력으로 적합하게 처리됩니다.
	- 정규화를 통해 픽셀 값을 [0, 1] 범위로 변환합니다.
	- Ground Truth 마킹:
	- 결함이 있는 이미지에 대해 결함 영역을 bounding box 형식 또는 binary mask로 표시합니다.
	- 마킹된 데이터를 JSON 또는 CSV 형식으로 저장하여 모델 평가 시 사용합니다.

	<div style="text-align: center;">
	<img src="https://cdn-uploads.huggingface.co/production/uploads/65e7d0935ea025ead9623dde/k8GQgaTK7JfQExNpCYpzz.png" height="500" width="100%" style="margin-right:5px;">
	<p>Ground Truth Marking</p>
	</div>

	- 데이터 분류:
	- Normal: 결함이 없는 정상 제품의 이미지.
	- Error: 결함이 있는 제품의 이미지. 결함 위치와 관련 정보가 포함됩니다.
	<div style="display: flex;justify-content: space-between;">
	<div style="text-align: center;margin-right: 5px;">
	<img src="https://cdn-uploads.huggingface.co/production/uploads/65e7d0935ea025ead9623dde/5pGwZ-sptjWjf7WpHifyJ.jpeg" height="400" width="450"">
	</div>
	<div style="text-align: center;justify-content: space-between; margin-right: 5px;">
	<img src="https://cdn-uploads.huggingface.co/production/uploads/65e7d0935ea025ead9623dde/3iihck7VfkXKw9VcIl06x.jpeg" height="400" width="450"">
	</div>
	<div style="text-align: center;justify-content: space-between;margin-right: 5px;">
	<img src="https://cdn-uploads.huggingface.co/production/uploads/65e7d0935ea025ead9623dde/tjsmiXq9pp0K6KSuS1iOS.jpeg" height="400" width="450"">
	</div>
	</div>
	<p style="text-align: center;">Normal Product Images</p>

	<div style="display: flex;justify-content: space-between;">
	<div style="text-align: center;margin-right: 5px;">
	<img src="https://cdn-uploads.huggingface.co/production/uploads/65e7d0935ea025ead9623dde/Qv01zDzEM5u8cQYdALrSU.jpeg" height="400" width="450"">
	</div>
	<div style="text-align: center;justify-content: space-between; margin-right: 5px;">
	<img src="https://cdn-uploads.huggingface.co/production/uploads/65e7d0935ea025ead9623dde/B5q_FKiTVXkuElTSlUc4s.jpeg" height="400" width="450"">
	</div>
	<div style="text-align: center;justify-content: space-between;margin-right: 5px;">
	<img src="https://cdn-uploads.huggingface.co/production/uploads/65e7d0935ea025ead9623dde/3pro8oEqMTiEwiwFKcACn.jpeg" height="400" width="450"">
	</div>
	</div>
	<p style="text-align: center;">Error Product Images</p>

	### 데이터 라벨링 가이드
	본 데이터 라벨링 가이드는 AnomalyDetection 기반 모델 학습을 위해 수집된 데이터를 라벨링하는 기준과 프로세스를 명확히 정의합니다.
	데이터는 주로 정상(normal) 데이터를 중심으로 구성되며, 최소한의 비정상(anomaly) 데이터를 포함합니다.
	본 가이드는 데이터의 품질을 유지하고 모델 학습 및 테스트를 최적화하는 데 목표를 둡니다.
	- 라벨링 범위

	1. 정상(normal) 데이터:
	- 전체 데이터의 약 95% 이상을 차지.
	- 다양한 환경 조건에서 수집된 데이터를 포함 (조명, 각도, 배경 등).
	- 정상적인 상태의 금속 표면, 정밀한 구조, 균일한 광택을 가진 데이터.
	2. 비정상(anomaly) 데이터:
	- 전체 데이터의 약 5% 이하로 제한.
	- 결함 유형:
	- Scratch: 스크래치.
	- Contamination: 얼룩 또는 이물질.
	- Crack: 표면 균열.
	- 결함 이미지 예시
	- 데이터 라벨링 기준

	-1. 파일 네이밍 규칙

	- 데이터 버전별 파일명은 버전별로 상이함.
	- 각 버전의 데이터 관리 문서 참고
	- 데이터 폴더명은 `<수집년월일>_<V버전>_<간단한 설명>` 형식으로 작성.
	- 예시:20240910_V0_간이 환경 데이터 수집

	- 2. 라벨 메타데이터

	라벨 메타데이터는 csv 형식으로 저장하며, 각 데이터의 라벨 및 설명을 포함.

	- 필수 필드:
	- `image_id`: 이미지 파일명.
	- `label`: 정상(`normal`) 또는 비정상(`anomaly`) 여부.
	- `description`: 상세 설명(예: 결함 유형).

	- 예시:
	```ruby
	{
	"image_id": "normal_20241111_001.jpg",
	"label": "normal",
	"description": "매끄러운 표면을 가진 정상적인 금속 부품, 광택이 균일함."
	}
	{
	"image_id": "abnormal_20241111_002.jpg",
	"label": "error",
	"description": "표면에 선형 스크래치가 발견됨."
	}
	```


	# AD-CLIP Model Architecture
	AD-CLIP 모델은 CLIP (ViT-B-32)을 백본으로 사용하여 이미지에서 특징을 추출하고, 대조 학습을 통해 이상을 탐지합니다.
	최종 출력은 이미지가 비정상인지 정상인지를 판별하는 이상 점수와 각 클래스의 확률을 제공합니다.
	<div style="display: flex; justify-content: center; align-items: center; flex-direction: column;">
	<img src="https://cdn-uploads.huggingface.co/production/uploads/65e7d0935ea025ead9623dde/62sYcSncxxzqGjQAa0MgQ.png" height="500" width="70%">
	<p>CLIP-based Anomaly Detection Model Architecture</p>
	</div>

	- model:
	- 입력 계층 (Input Layer):
	- 입력 이미지: 모델은 크기 [640, 640, 3]의 이미지를 입력받습니다. 여기서 640x640은 이미지의 가로와 세로 크기이며, 3은 RGB 색상의 채널 수를 나타냅니다.
	- 기능: 이 계층은 입력된 이미지를 처리하고 모델의 나머지 부분에 맞는 형식으로 데이터를 준비하는 역할을 합니다.
	- backbone:
	- CLIP (ViT-B-32): 모델은 CLIP의 Vision Transformer (ViT-B-32) 아키텍처를 사용하여 이미지에서 특징을 추출합니다. ViT-B-32는 이미지를 이해하는 데 필요한 고급 특성을 추출할 수 있는 능력을 가지고 있습니다.
	- 필터: 필터 크기 [32, 64, 128, 256, 512]는 각 ViT 레이어에서 사용되며, 이미지의 각 레벨에서 중요한 정보를 추출하여 특징을 학습합니다.
	- neck:
	- 이상 탐지 모듈 (Anomaly Detection Module): 이 모듈은 CLIP에서 추출된 특징을 기반으로 이미지를 분석하고 이상 여부를 판단합니다. 이 단계에서는 이미지 내에서 정상과 비정상 데이터를 구별하기 위한 중요한 처리가 이루어집니다.
	- 대조 학습 (Contrastive Learning): 대조 학습 방법은 정상 이미지와 비정상 이미지 간의 차이를 학습하여, 이미지의 이상 여부를 더욱 명확하게 구분할 수 있게 도와줍니다.
	- head:
	- 이상 탐지 헤드 (Anomaly Detection Head): 모델의 마지막 부분으로, 이 계층은 이미지가 비정상적인지 정상적인지를 결정합니다.
	- outputs:
	- 이상 점수 (Anomaly Score): 모델은 이미지가 이상인지 아닌지를 나타내는 점수(예: 1은 이상, 0은 정상)를 출력합니다.
	- 클래스 확률 (Class Probabilities): 모델은 각 클래스에 대한 확률을 출력하며, 이 확률을 통해 결함이 있는지 없는지의 여부를 판단합니다.

	# Optimizer and Loss Function
	- training:
	- optimizer:
	- name: AdamW # AdamW 옵티마이저 (가중치 감쇠 포함)
	- lr: 0.0001 # 학습률
	- loss:
	- classification_loss: 1.0 # 분류 손실 (교차 엔트로피)
	- anomaly_loss: 1.0 # 결함 탐지 손실 (이상 탐지 모델에 대한 손실)
	- contrastive_loss: 1.0 # 대조 학습 손실 (유사도 기반 손실)

	# Metrics
	- metrics:
	- Precision # 정밀도 (Precision)
	- Recall # 재현율 (Recall)
	- mAP # 평균 정밀도 (Mean Average Precision)
	- F1-Score # F1-점수 (균형 잡힌 평가 지표)

	# Training Parameters
	하이퍼파라미터 설정
	- Learning Rate: 0.001.
	- Batch Size: 8.
	- Epochs: 200.

	# Pre-trained CLIP model
	\| Model \| Download \|
	\| --- \| --- \|
	\| ViT-B/32 \| [download](https://openaipublic.azureedge.net/clip/models/40d365715913c9da98579312b702a82c18be219cc2a73407c4526f58eba950af/ViT-B-32.pt) \|
	\| ViT-B/16 \| [download](https://openaipublic.azureedge.net/clip/models/5806e77cd80f8b59890b7e101eabd078d9fb84e6937f9e85e4ecb61988df416f/ViT-B-16.pt) \|
	\| ViT-L/14 \| [download](https://openaipublic.azureedge.net/clip/models/b8cca3fd41ae0c99ba7e8951adf17d267cdb84cd88be6f7c2e0eca1737a03836/ViT-L-14.pt) \|
	\| ViT-L/14@336px \| [download](https://openaipublic.azureedge.net/clip/models/3035c92b350959924f9f00213499208652fc7ea050643e8b385c2dac08641f02/ViT-L-14-336px.pt) \|

	# Evaluation Parameters
	- F1-score: 90%이상.



	# 학습 성능 및 테스트 결과

	- 학습성능 결과과 그래프:
	<div style="display: flex; justify-content: space-between; margin-bottom: 10px;">
	<div style="text-align: center; margin-right: 20px;">
	<img src="https://cdn-uploads.huggingface.co/production/uploads/65e7d0935ea025ead9623dde/7Q1RzKyia-WNSCJHnk2-d.png" height="80%" width="100%" style="margin-right:5px;">
	</div>
	<div style="text-align: center; margin-right: 20px;">
	<img src="https://cdn-uploads.huggingface.co/production/uploads/65e7d0935ea025ead9623dde/9PyBtPZMACgN1lJOqlVbG.png" height="80%" width="100%" style="margin-right:5px;">
	</div>
	</div>
	<p style="text-align: center;">학습 과정 예시</p>

	<div style="display: flex; justify-content: space-between;">
	<div style="text-align: center; margin-right: 20px;">
	<img src="https://cdn-uploads.huggingface.co/production/uploads/65e7d0935ea025ead9623dde/_lUD77x-yueXycuIn7jya.png" height="80%" width="100%" style="margin-right:5px;">
	<p>1차 학습 성능</p>
	</div>
	<div style="text-align: center; margin-right: 20px;">
	<img src="https://cdn-uploads.huggingface.co/production/uploads/65e7d0935ea025ead9623dde/NHDH9N94cI-KqP8k-ASUN.png" height="80%" width="100%" style="margin-right:5px;">
	<p>2차 학습 성능</p>
	</div>
	<div style="text-align: center; margin-right: 20px;">
	<img src="https://cdn-uploads.huggingface.co/production/uploads/65e7d0935ea025ead9623dde/6n0DnnQjXD8Ql-p3Owxan.png" height="80%" width="100%" style="margin-right:5px;">
	<p>3차 학습 성능</p>
	</div>
	</div>

	- 테스트 결과표:
	<div style="display: flex; justify-content: space-between;">
	<div style="text-align: center; margin-right: 20px;">
	<img src="https://cdn-uploads.huggingface.co/production/uploads/65e7d0935ea025ead9623dde/u1DQHjXM41DMq1JIUOGlp.png" height="100%" width="100%" style="margin-right:5px;">
	</div>
	<div style="text-align: center; margin-right: 20px;">
	<img src="https://cdn-uploads.huggingface.co/production/uploads/65e7d0935ea025ead9623dde/ndQ60TKlheW8hmOrMBELU.png" height="100%" width="100%" style="margin-right:5px;">
	</div>
	</div>

	- 테스트 결과:
	<div style="display: flex; justify-content: space-between;">
	<div style="text-align: center; margin-right: 20px;">
	<img src="https://cdn-uploads.huggingface.co/production/uploads/65e7d0935ea025ead9623dde/A91V0GdrcUcX01cC-biG9.png" height="600" width="1000" style="margin-right:5px;">
	<p>Anomaly Product</p>
	</div>
	<div style="text-align: center; margin-right: 20px;">
	<img src="https://cdn-uploads.huggingface.co/production/uploads/65e7d0935ea025ead9623dde/PxleIhphzViTGCubVhWn7.png" height="600" width="1000" style="margin-right:5px;">
	<p>Normal Product</p>
	</div>
	</div>



	# 설치 및 실행 가이라인

	이 모델을 실행하려면 Python과 함께 다음 라이브러리가 필요합니다:

	- ftfy==6.2.0: 텍스트 정규화 및 인코딩 문제를 해결하는 라이브러리.
	- matplotlib==3.9.0: 데이터 시각화 및 그래프 생성을 위한 라이브러리.
	- numpy==1.24.3: 수치 연산을 위한 핵심 라이브러리.
	- opencv_python==4.9.0.80: 이미지 및 비디오 처리용 라이브러리.
	- pandas==2.2.2: 데이터 분석 및 조작을 위한 라이브러리.
	- Pillow==10.3.0: 이미지 파일 처리 및 변환을 위한 라이브러리.
	- PyQt5==5.15.10: GUI 애플리케이션 개발을 위한 프레임워크.
	- PyQt5_sip==12.13.0: PyQt5와 Python 간의 인터페이스를 제공하는 라이브러리.
	- regex==2024.5.15: 정규 표현식 처리를 위한 라이브러리.
	- scikit_learn==1.2.2: 기계 학습 및 데이터 분석을 위한 라이브러리.
	- scipy==1.9.1: 과학 및 기술 계산을 위한 라이브러리.
	- setuptools==59.5.0: Python 패키지 배포 및 설치를 위한 라이브러리.
	- scikit-image: 이미지 처리 및 분석을 위한 라이브러리.
	- tabulate==0.9.0: 표 형태로 데이터를 출력하는 라이브러리.
	- thop==0.1.1.post2209072238: PyTorch 모델의 FLOP 수를 계산하는 도구.
	- timm==0.6.13: 다양한 최신 이미지 분류 모델을 제공하는 라이브러리.
	- torch==2.0.0: PyTorch 딥러닝 프레임워크.
	- torchvision==0.15.1: 컴퓨터 비전 작업을 위한 PyTorch 확장 라이브러리.
	- tqdm==4.65.0: 진행 상황을 시각적으로 표시하는 라이브러리.
	- pyautogui: GUI 자동화를 위한 라이브러리.

	- Install Python libraries
	```
	pip install -r requirements.txt
	```


	## 모델 실행 단계:

	### ✅Dataset configuration

	- Dataset configuration as example below
	```
	├── data/
	│ ├── COMP_1/
	│ │ ├── product_1/
	│ │ │ ├──grouth_truth
	│ │ │ │ ├──anomaly_1
	│ │ │ │ ├──anomaly_2
	│ │ │ │
	│ │ │ ├──test/
	│ │ │ │ ├──good
	│ │ │ │ ├──anomaly_1
	│ │ │ │ ├──anomaly_2
	│ │ │ │
	│ │ │ ├──train/
	│ │ │ │ ├──good
	│ │ │ │ ├──anomaly_1
	│ │ │ │ ├──anomaly_2
	│ │ │ │
	│ │ ├── product_2/
	│ │ │ │
	│ │ ├── meta.json
	│ │ │
	│ ├── COMP_2/
	│ │
	```

	- Generate JSON file storing all the above information of dataset ( -> meta_train.json, meta_test.json)
	```ruby
	cd dataset_config
	python dataset_get_json.py
	```

	- Making all grouth_truth (only anomaly mask) by hand
	```ruby
	cd dataset_config
	python image_ground_truth.py
	```

	- Dataset configuration for train and test
	```ruby
	cd training_libs
	python dataset.py
	```

	→ _ _init_ _ 메서드는 데이터셋의 루트 디렉토리, 변환 함수, 데이터셋 이름, 모드를 입력으로 받음
	→ 메타 정보를 담은 JSON 파일 (meta_train.json)을 읽어와 클래스 이름 목록과 모든 데이터 항목을 리스트에 저장
	→ generate_class_info 함수를 호출하여 클래스 정보를 생성하고 클래스 이름을 클래스 ID에 매핑
	→ _ _len_ _ 메서드는 데이터셋의 샘플 수를 반환
	→ _ _getitem_ _ 메서드는 주어진 인덱스의 샘플 데이터를 반환
	→ 이미지 경로를 통해 이미지를 읽고, 이상 여부에 따라 마스크 이미지를 생성
	→ 필요시 이미지와 마스크에 변환 함수를 적용
	→ 이미지, 마스크, 클래스 이름, 이상 여부, 이미지 경로, 클래스 ID를 포함한 딕셔너리를 반환


	### ✅ Image pre-processing (transformation) for train and test
	```ruby
	training_libs/utils.py
	```
	```ruby
	AnomalyCLIP_lib/transform.py
	```
	- Data Processing Techniques:
	- normalization:
	description: "이미지 픽셀 값을 평균 및 표준편차로 표준화"
	method: "'Normalize' from 'torchvision.transforms'"
	- max_resize:
	description: "이미지의 최대 크기를 유지하며, 비율을 맞추고 패딩을 추가하여 크기 조정"
	method: "Custom 'ResizeMaxSize' class"
	- random_resized_crop:
	description: "훈련 중에 이미지를 랜덤으로 자르고 크기를 조정하여 변형을 추가"
	method: "'RandomResizedCrop' from 'torchvision.transforms'"
	- resize:
	description: "모델 입력에 맞게 이미지를 고정된 크기로 조정"
	method: "'Resize' with BICUBIC interpolation"
	- center_crop:
	description: "이미지의 중앙 부분을 지정된 크기로 자르기"
	method: "'CenterCrop'"
	- to_tensor:
	description: "이미지를 PyTorch 텐서로 변환"
	method: "'ToTensor'"
	- augmentation (optional):
	description: "데이터 증강을 위해 다양한 랜덤 변환 적용, 'AugmentationCfg'로 설정 가능"
	method: "Uses 'timm' library if specified"

	### ✅ Prompt generating
	```ruby
	training_lib/prompt_ensemble.py
	```
	👍 Prompts Built in the Code
	1. Normal Prompt: '["{ }"]'
	→ Normal Prompt Example: "object"
	2. Anomaly Prompt: '["damaged { }"]'
	→ Anomaly Prompt Example: "damaged object"

	👍 Construction Process
	1. 'prompts_pos (Normal)': Combines the class name with the normal template
	2. 'prompts_neg (Anomaly)': Combines the class name with the anomaly template

	### ✅ Initial setting for training

	- Define the path to the training dataset and model checkpoint saving
	```ruby
	parser.add_argument("--train_data_path", type=str, default="./data/", help="train dataset path")
	parser.add_argument("--dataset", type=str, default='smoke_cloud', help="train dataset name")
	parser.add_argument("--save_path", type=str, default='./checkpoint/', help='path to save results')
	```

	### ✅ Hyper parameters setting

	- Set the depth parameter: depth of the embedding learned during prompt training. This affects the model's ability to learn complex features from the data
	```ruby
	parser.add_argument("--depth", type=int, default=9, help="image size")
	```

	- Define the size of input images used for training (pixel)
	```ruby
	parser.add_argument("--image_size", type=int, default=518, help="image size")
	```

	- Setting parameters for training
	```ruby
	parser.add_argument("--epoch", type=int, default=500, help="epochs")
	parser.add_argument("--learning_rate", type=float, default=0.0001, help="learning rate")
	parser.add_argument("--batch_size", type=int, default=8, help="batch size")
	```

	- Size/depth parameter for the DPAM (Deep Prompt Attention Mechanism)
	```ruby
	parser.add_argument("--dpam", type=int, default=20, help="dpam size")

	1. ViT-B/32 and ViT-B/16: --dpam should be around 10-13
	2. ViT-L/14 and ViT-L/14@336px: --dpam should be around 20-24
	```
	```ruby
	→ DPAM is used to refine and enhance specific layers of a model, particularly in Vision Transformers (ViT).
	→ Helps the model focus on important features within each layer through an attention mechanism
	→ Layers: DPAM is applied across multiple layers, allowing deeper and more detailed feature extraction
	→ Number of layers DPAM influences is adjustable (--dpam), controlling how much of the model is fine-tuned.
	→ If you want to refine the entire model, you can set --dpam to the number of layers in the model (e.g., 12 for ViT-B and 24 for ViT-L).
	→ If you want to focus only on the final layers (where the model usually learns complex features), you can choose fewer DPAM layers.
	```

	### ✅ Test process

	👍 Load pre-trained and Fine tuned (Checkpoints) models
	1. Pre-trained mode (./pre-trained model/):
	```ruby
	→ Contains the pre-trained model (ViT-B, ViT-L,....)
	→ Used as the starting point for training the CLIP model
	→ Pre-trained model helps speed up and improve training by leveraging previously learned features
	```
	2. Fine-tuned models (./checkpoint/):
	```ruby
	→ "epoch_N.pth" files in this folder store the model's states during the fine-tuning process.
	→ Each ".pth" file represents a version of the model fine-tuned from the pre-trained model
	→ These checkpoints can be used to resume fine-tuning, evaluate the model at different stages, or select the best-performing version
	```


	# 모델 공격 취약점 분석
	본 문서는 AnomalyCLIP 모델의 취약점 분석 및 적대적 공격(Adversarial Attacks)에 대한 방어 대책을 체계적으로 수립하기 위해 작성되었습니다.
	모델의 신뢰성과 안정성을 확보하고 데이터 무결성을 유지하기 위해, 데이터 및 모델 수준의 방어 전략을 구현하고 성능을 평가한 결과를 포함합니다.
	## 1. 취약점 분석
	- ### 적대적 공격 시나리오
	1. Adversarial Examples:
	- 설명: 입력 데이터에 작은 노이즈를 추가하여 모델의 예측을 왜곡.
	- 예: 정상 이미지를 결함 이미지로 예측하도록 유도.
	2. Data Poisoning:
	- 설명: 학습 데이터에 악의적 데이터를 삽입하여 모델 학습을 왜곡.
	- 예: 비정상 데이터를 정상 데이터로 학습시키는 경우.
	3. Evasion Attacks:
	- 설명: 추론 시 모델의 분류 결과를 조작.
	- 예: 결함 데이터를 정상으로 예측하도록 유도.

	- ### 모델 및 데이터셋 영향
	- 성능 저하: 적대적 샘플 입력 시 모델의 정확도 감소.
	- 무결성 손상: 데이터 변조로 인해 학습된 모델이 실제 환경에서 신뢰성을 상실.
	- 악의적 활용 가능성: 모델의 의사결정이 오작동하여 생산 품질 관리 실패 가능성 증가.

	## 2. 대응 방안

	- ### 데이터 수준 방어 대책
	1. 데이터 정제:
	- 흐릿하거나 잘린 이미지 제거.
	- 데이터 노이즈 제거 및 결함 복구.
	- 결과: 데이터 품질 강화로 적대적 노이즈 효과 감소.
	2. 데이터 증강(Data Augmentation):
	- 랜덤 회전, 크기 조정, 밝기 및 대비 조정.
	- Gaussian Noise 및 Salt-and-Pepper Noise 추가.
	- 결과: 데이터 다양성 확보 및 모델 일반화 성능 강화.
	3. 데이터 무결성 검증:
	- 각 데이터 해시값(MD5) 저장 및 위변조 여부 확인.
	- 결과: 데이터셋 신뢰성 및 무결성 보장.

	- ### 모델 수준 방어 대책
	1. Adversarial Training:
	- FGSM 기반의 적대적 샘플을 학습 데이터에 포함.
	- 결과: 적대적 샘플에서도 평균 정확도 5% 향상.
	2. Gradient Masking:
	- 그래디언트를 숨겨 모델이 적대적 공격에 노출되지 않도록 방어.
	3. Temperature Scaling:
	- 모델의 예측 확률을 조정하여 적대적 샘플 민감도 완화.

	- ### 시스템 수준 방어 대책
	1. 실시간 탐지 및 대응:
	- 입력 데이터의 이상 패턴을 실시간으로 탐지하는 시스템 구축.
	- 결과: 적대적 공격 발생 시 즉각적인 경고 및 대응 가능.
	2. 자동화된 방어 도구:
	- Adversarial Examples 생성 및 방어 테스트 자동화.

	## 3. 실험 결과

	- ### 평가 데이터

	- 데이터셋 구성:
	- 정상 데이터: 110건
	- 결함 데이터: 10건
	- 적대적 데이터(FGSM 공격): 100건

	- ### 주요 성능 지표
	메트릭 \| 기본 데이터 \| 적대적 데이터 \| 변화율
	-----------------\|-------------\|---------------\|--------
	Accuracy \| 98% \| 92% \| -6%
	F1 Score \| 0.935 \| 0.91 \| -2.5%
	False Positive \| 2% \| 5% \| +3%
	False Negative \| 3% \| 7% \| +4%

	## 4. 향후 계획
	1. 다양한 공격 기법 테스트:
	- PGD, DeepFool 등 새로운 공격 기법 적용 및 평가.
	2. 모델 개선:
	- Contrastive Learning 및 앙상블 학습을 통한 견고성 강화.
	3. 실시간 방어 시스템 구축:
	- 모델의 실시간 예측 데이터를 분석하여 적대적 입력 탐지 및 차단.



	# References
	- AnomalyCLIP: Object-agnostic Prompt Learning for Zero-shot Anomaly Detection [[github](https://github.com/zqhang/AnomalyCLIP.git)]