File size: 3,602 Bytes
e52a56e
 
 
 
 
 
 
 
 
 
 
78d2da1
 
 
 
8c3de8f
78d2da1
 
 
8c3de8f
78d2da1
 
 
 
 
8c3de8f
78d2da1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8c3de8f
78d2da1
 
8c3de8f
78d2da1
 
 
0f7f1ea
78d2da1
0f7f1ea
78d2da1
8c3de8f
78d2da1
 
 
 
0f7f1ea
78d2da1
0f7f1ea
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
---
license: mit
datasets:
- detection-datasets/coco
tags:
- orientation
- detection
- rotate
- rotation
- images
---
# Image Orientation Detector

This project implements a deep learning model to detect the orientation of images and determine the rotation needed to correct them. It uses a pre-trained EfficientNetV2 model from PyTorch, fine-tuned for the task of classifying images into four orientation categories: 0°, 90°, 180°, and 270°.

The model achieves **98.12% accuracy** on the validation set.

## Training Performance and Model History

This model was trained on a single NVIDIA RTX 4080 GPU, taking approximately **4 hours and 56.4 minutes** to complete.

The final model is using `EfficientNetV2-S`, but the project evolved through several iterations:

- **ResNet18:** Achieved ~90% accuracy with a model size of around 30MB.
- **ResNet50:** Improved accuracy to 95.26% with a model size of ~100MB.
- **EfficientNetV2-S:** Reached the "final" (for now) accuracy of **98.12%** with ~78MB.

## How It Works

The model is trained on a dataset of images, where each image is rotated by 0°, 90°, 180°, and 270°. The model learns to predict which rotation has been applied. The prediction can then be used to determine the correction needed to bring the image to its upright orientation.

The four classes correspond to the following rotations:

- **Class 0:** Image is correctly oriented (0°).
- **Class 1:** Image needs to be rotated 90° Counter-Clockwise to be correct.
- **Class 2:** Image needs to be rotated 180° to be correct.
- **Class 3:** Image needs to be rotated 90° Clockwise to be correct.

## Dataset

The model was trained on several datasets:

- **Microsoft COCO Dataset:** A large-scale object detection, segmentation, and captioning dataset ([link](https://cocodataset.org/)).
- **AI-Generated vs. Real Images:** A dataset from Kaggle ([link](https://www.kaggle.com/datasets/cashbowman/ai-generated-images-vs-real-images)) was included to make the model aware of the typical orientations on different compositions found in art and illustrations.
- **TextOCR - Text Extraction from Images Dataset:** A dataset from Kaggle ([link](https://www.kaggle.com/datasets/robikscube/textocr-text-extraction-from-images-dataset?resource=download)) was included to improve the model's ability to detect the orientation of images containing text. (However over 1300 images needed have the orientation manually corrected like 0007a5a18213563f.jpg)
- **Personal Images:** A small, curated collection of personal photographs to include unique examples and edge cases.

The combined dataset consists of **70,732** unique images. Each image is augmented by being rotated in four ways (0°, 90°, 180°, 270°), creating a total of **282,928** samples. This augmented dataset was then split into **226,342 samples for training** and **56,586 samples for validation**.

## Usage

For detailed usage instructions, including how to run predictions, export to ONNX, and train the model, please refer to the [GitHub repository](https://github.com/duartebarbosadev/deep-image-orientation-detection).

## Performance Comparison (PyTorch vs. ONNX)

For a dataset of non-compressed 5055 images, the performance on a RTX 4080 running in **single-thread** was:

- **PyTorch (`predict.py`):** 135.71 seconds
- **ONNX (`predict_onnx.py`):** 60.83 seconds

---

For more in-depth information about the project, including the full source code, training scripts, and detailed documentation, please visit the [GitHub repository](https://github.com/duartebarbosadev/deep-image-orientation-detection).