Tingquan commited on
Commit
6c44ddd
·
verified ·
1 Parent(s): 5152641

Upload folder using huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +150 -0
README.md ADDED
@@ -0,0 +1,150 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ ---
4
+
5
+ # PP-LCNet_x1_0_doc_ori
6
+
7
+ ## Introduction
8
+
9
+ The Document Image Orientation Classification Module is primarily designed to distinguish the orientation of document images and correct them through post-processing. During processes such as document scanning or ID photo capturing, the device might be rotated to achieve clearer images, resulting in images with various orientations. Standard OCR pipelines may not handle these images effectively. By leveraging image classification techniques, the orientation of documents or IDs containing text regions can be pre-determined and adjusted, thereby improving the accuracy of OCR processing. The key accuracy metrics are as follow:
10
+
11
+ <table>
12
+ <tr>
13
+ <th>Model</th>
14
+ <th>Recognition Avg Accuracy(%)</th>
15
+ <th>GPU Inference Time (ms)<br/>[Normal Mode / High-Performance Mode]</th>
16
+ <th>CPU Inference Time (ms)<br/>[Normal Mode / High-Performance Mode]</th>
17
+ <th>Model Storage Size (M)</th>
18
+ <th>Introduction</th>
19
+ </tr>
20
+ <tr>
21
+ <td>PP-LCNet_x1_0_doc_ori</td>
22
+ <td>99.06</td>
23
+ <td>2.31 / 0.43</td>
24
+ <td>3.37 / 1.27</td>
25
+ <td>7</td>
26
+ <td>A document image classification model based on PP-LCNet_x1_0, with four categories: 0°, 90°, 180°, and 270°.</td>
27
+ </tr>
28
+ </table>
29
+
30
+
31
+
32
+ **Note**: If any character (including punctuation) in a line is incorrect, the entire line is marked as wrong. This ensures higher accuracy in practical applications.
33
+
34
+ ## Quick Start
35
+
36
+ ### Installation
37
+
38
+ 1. PaddlePaddle
39
+
40
+ Please refer to the following commands to install PaddlePaddle using pip:
41
+
42
+ ```bash
43
+ # for CUDA11.8
44
+ python -m pip install paddlepaddle-gpu==3.0.0 -i https://www.paddlepaddle.org.cn/packages/stable/cu118/
45
+
46
+ # for CUDA12.6
47
+ python -m pip install paddlepaddle-gpu==3.0.0 -i https://www.paddlepaddle.org.cn/packages/stable/cu126/
48
+
49
+ # for CPU
50
+ python -m pip install paddlepaddle==3.0.0 -i https://www.paddlepaddle.org.cn/packages/stable/cpu/
51
+ ```
52
+
53
+ For details about PaddlePaddle installation, please refer to the [PaddlePaddle official website](https://www.paddlepaddle.org.cn/en/install/quick).
54
+
55
+ 2. PaddleOCR
56
+
57
+ Install the latest version of the PaddleOCR inference package from PyPI:
58
+
59
+ ```bash
60
+ python -m pip install paddleocr
61
+ ```
62
+
63
+ ### Model Usage
64
+
65
+ You can quickly experience the functionality with a single command:
66
+
67
+ ```bash
68
+ paddleocr doc_img_orientation_classification \
69
+ --model_name PP-LCNet_x1_0_doc_ori \
70
+ -i https://cdn-uploads.huggingface.co/production/uploads/681c1ecd9539bdde5ae1733c/4ifXaBJmFByG_mAnF86Vv.png
71
+ ```
72
+
73
+ You can also integrate the model inference of the text recognition module into your project. Before running the following code, please download the sample image to your local machine.
74
+
75
+ ```python
76
+ from paddleocr import DocImgOrientationClassification
77
+ model = DocImgOrientationClassification(model_name="PP-LCNet_x1_0_doc_ori")
78
+ output = model.predict(input="4ifXaBJmFByG_mAnF86Vv.png", batch_size=1)
79
+ for res in output:
80
+ res.print()
81
+ res.save_to_img(save_path="./output/")
82
+ res.save_to_json(save_path="./output/res.json")
83
+ ```
84
+
85
+ After running, the obtained result is as follows:
86
+
87
+ ```json
88
+ {'res': {'input_path': '/root/.paddlex/predict_input/4ifXaBJmFByG_mAnF86Vv.png', 'page_index': None, 'class_ids': array([2], dtype=int32), 'scores': array([0.90971], dtype=float32), 'label_names': ['180']}}
89
+ ```
90
+
91
+ The visualized image is as follows:
92
+
93
+ ![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/681c1ecd9539bdde5ae1733c/DU_k30fxijLXFdXl179-0.png)
94
+
95
+ For details about usage command and descriptions of parameters, please refer to the [Document](https://paddlepaddle.github.io/PaddleOCR/latest/en/version3.x/module_usage/text_recognition.html#iii-quick-start).
96
+
97
+ ### Pipeline Usage
98
+
99
+ The ability of a single model is limited. But the pipeline consists of several models can provide more capacity to resolve difficult problems in real-world scenarios.
100
+
101
+ #### doc_preprocessor
102
+
103
+ The Document Image Preprocessing Pipeline integrates two key functions: document orientation classification and geometric distortion correction. The document orientation classification module automatically identifies the four possible orientations of a document (0°, 90°, 180°, 270°), ensuring that the document is processed in the correct direction. The text image unwarping model is designed to correct geometric distortions that occur during document photography or scanning, restoring the document's original shape and proportions. This pipeline is suitable for digital document management, preprocessing tasks for OCR, and any scenario requiring improved document image quality. By automating orientation correction and geometric distortion correction, this module significantly enhances the accuracy and efficiency of document processing, providing a more reliable foundation for image analysis. The pipeline also offers flexible service-oriented deployment options, supporting calls from various programming languages on multiple hardware platforms. Additionally, the pipeline supports secondary development, allowing you to fine-tune the models on your own datasets and seamlessly integrate the trained models. And there are 2 modules in the pipeline:
104
+ * Document Image Orientation Classification Module (Optional)
105
+ * Text Image Unwarping Module (Optional)
106
+
107
+ Run a single command to quickly experience the OCR pipeline:
108
+
109
+ ```bash
110
+ paddleocr doc_preprocessor -i https://cdn-uploads.huggingface.co/production/uploads/681c1ecd9539bdde5ae1733c/pY6sY6wLDuoHF1-cGUvDr.png \
111
+ --use_doc_orientation_classify True \
112
+ --use_doc_unwarping True \
113
+ --doc_orientation_classify_model_name PP-LCNet_x1_0_doc_ori \
114
+ --save_path ./output \
115
+ --device gpu:0
116
+ ```
117
+
118
+ Results are printed to the terminal:
119
+
120
+ ```json
121
+ {'res': {'input_path': '/root/.paddlex/predict_input/pY6sY6wLDuoHF1-cGUvDr.png', 'page_index': None, 'model_settings': {'use_doc_orientation_classify': True, 'use_doc_unwarping': True}, 'angle': 180}}
122
+ ```
123
+
124
+ If save_path is specified, the visualization results will be saved under `save_path`. The visualization output is shown below:
125
+
126
+ ![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/681c1ecd9539bdde5ae1733c/HM8xQKtyBHx-CNVGk2ZJd.png)
127
+
128
+ The command-line method is for quick experience. For project integration, also only a few codes are needed as well:
129
+
130
+ ```python
131
+ from paddleocr import DocPreprocessor
132
+
133
+ ocr = DocPreprocessor(
134
+ doc_orientation_classify_model_name="PP-LCNet_x1_0_doc_ori",
135
+ use_doc_orientation_classify=True, # Use use_doc_orientation_classify to enable/disable document orientation classification model
136
+ use_doc_unwarping=True, # Use use_doc_unwarping to enable/disable document unwarping module
137
+ device="gpu:0", # Use device to specify GPU for model inference
138
+ )
139
+ result = ocr.predict("https://cdn-uploads.huggingface.co/production/uploads/681c1ecd9539bdde5ae1733c/pY6sY6wLDuoHF1-cGUvDr.png")
140
+ for res in result:
141
+ res.print()
142
+ res.save_to_img("output")
143
+ res.save_to_json("output")
144
+ ```
145
+
146
+ ## Links
147
+
148
+ [PaddleOCR Repo](https://github.com/paddlepaddle/paddleocr)
149
+
150
+ [PaddleOCR Documentation](https://paddlepaddle.github.io/PaddleOCR/latest/en/index.html)