Update README.md
Browse files
README.md
CHANGED
@@ -21,6 +21,22 @@ The Swedish National Archives presents an end-to-end Handwritten Text Recognitio
|
|
21 |
|
22 |
The models are designed to provide a generic pipeline for handwritten text recognition, offering robust performance for documents from the 16th to the 19th century.
|
23 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
24 |
## Intended Use
|
25 |
The Swedish National Archives HTR pipeline is intended to be used for the following purposes:
|
26 |
|
@@ -53,15 +69,12 @@ The training data was annotated to provide ground truth for text region and line
|
|
53 |
|
54 |
The data can be find here: (WIP will be added soon)
|
55 |
|
56 |
-
|
57 |
## Caveats and Future Work
|
58 |
Although the Swedish National Archives HTR pipeline has been trained and optimized for running-text documents from the specified time period, there are a few caveats and considerations to keep in mind:
|
59 |
|
60 |
-
|
61 |
-
|
62 |
-
- **Continuous Improvement**: The pipeline can benefit from continuous updates and improvements as new training data becomes available and advancements in OCR technology occur. Regular evaluations and updates are recommended to enhance its performance and adaptability.
|
63 |
|
64 |
-
|
65 |
|
66 |
## References
|
67 |
If you would like to learn more about the Swedish National Archives HTR pipeline or access the training data, please refer to the following resources:
|
|
|
21 |
|
22 |
The models are designed to provide a generic pipeline for handwritten text recognition, offering robust performance for documents from the 16th to the 19th century.
|
23 |
|
24 |
+
## Evaluation
|
25 |
+
|
26 |
+
The Swedish National Archives HTR pipeline has been evaluated using standard evaluation metrics for Handwritten Text Recognition. The Word Error Rate (WER) and Character Error Rate (CER) are commonly used to assess the accuracy of the pipeline.
|
27 |
+
|
28 |
+
The reported performance metrics are obtained on a test dataset that represents a diverse range of historical running-text documents from the 16th to the 19th century. It is important to note that the actual performance may vary depending on the specific documents and handwriting styles encountered in practical usage.
|
29 |
+
|
30 |
+
| Metric | Performance |
|
31 |
+
|--------|-------------|
|
32 |
+
| WER | XX% |
|
33 |
+
| CER | XX% |
|
34 |
+
|
35 |
+
The WER measures the percentage of incorrectly recognized words compared to the ground truth, while the CER measures the percentage of incorrectly recognized characters.
|
36 |
+
|
37 |
+
Regular evaluations are conducted to monitor and improve the performance of the pipeline. As new evaluation results become available, this table will be updated to reflect the most recent performance metrics.
|
38 |
+
|
39 |
+
|
40 |
## Intended Use
|
41 |
The Swedish National Archives HTR pipeline is intended to be used for the following purposes:
|
42 |
|
|
|
69 |
|
70 |
The data can be find here: (WIP will be added soon)
|
71 |
|
|
|
72 |
## Caveats and Future Work
|
73 |
Although the Swedish National Archives HTR pipeline has been trained and optimized for running-text documents from the specified time period, there are a few caveats and considerations to keep in mind:
|
74 |
|
75 |
+
Continuous Improvement: The pipeline is continuously being updated and improved as new training data becomes available and advancements in OCR technology occur. With access to more training data, the models will be updated to further enhance their performance and adaptability.
|
|
|
|
|
76 |
|
77 |
+
User Feedback: Users are encouraged to provide feedback on the pipeline's performance, identify issues, and report any potential biases or limitations. This feedback is highly valuable in refining the pipeline, addressing concerns, and informing future updates.
|
78 |
|
79 |
## References
|
80 |
If you would like to learn more about the Swedish National Archives HTR pipeline or access the training data, please refer to the following resources:
|