HTR_pipeline_models / README.md
Gabriel's picture
Update README.md
034bb1e
|
raw
history blame
5.96 kB
metadata
license: mit
language:
  - sv
pipeline_tag: image-to-text
tags:
  - HTR

Model Card: Swedish National Archives HTR Pipeline

Demo

You can try out a demo of the Swedish National Archives HTR pipeline at Riksarkivet HTR Demo.

Model Description

The Swedish National Archives presents an end-to-end Handwritten Text Recognition (HTR) pipeline for running-text documents ranging from the 16th to the 19th century. The pipeline consists of the following components:

  1. RTMDet Instance Segmentation Models: The pipeline utilizes two RTMDet instance segmentation models, trained using MMDetection. The first model is designed to segment text regions within the documents, while the second model focuses on segmenting text lines within these regions. These models enable the identification and localization of text areas, which is a crucial step in the HTR pipeline.

  2. SATRN HTR Model: The pipeline incorporates a SATRN (Spatial Attention Transformer Networks) model, trained using MMOCR (OpenMMLab's OCR toolbox). SATRN is a state-of-the-art model for HTR tasks and provides accurate recognition of handwritten text. The SATRN model is trained specifically to handle the characteristics and challenges of handwritten text present in the Swedish National Archives' documents.

The models are designed to provide a generic pipeline for handwritten text recognition, offering robust performance for documents from the 16th to the 19th century.

Intended Use

The Swedish National Archives HTR pipeline is intended to be used for the following purposes:

  • Handwritten Text Recognition: The pipeline enables the automatic recognition of handwritten text in running-text documents from the 16th to the 19th century. It can be utilized by researchers, historians, and archivists to efficiently transcribe and analyze historical texts.

  • Document Digitization: The pipeline aids in the process of digitizing archival documents by automating the extraction and transcription of handwritten text. This facilitates broader accessibility and preservation of historical materials.

It's important to note that the pipeline is optimized for running-text documents from the specified time period and may not perform optimally for other types of documents or handwriting styles. Additionally, it is currently more suitable for documents from books rather than complex layouts from either tabels or newspapers.

Performance and Limitations

The performance of the Swedish National Archives HTR pipeline is influenced by several factors:

  • Accuracy: The pipeline achieves high accuracy in segmenting text regions and lines, as well as recognizing the text content accurately. However, the recognition accuracy may vary depending on the quality of the original document, handwriting style, and legibility.

  • Speed: The pipeline aims to provide real-time or near real-time performance for efficient processing of handwritten text documents. The speed may vary depending on the hardware used for inference.

  • Document Specificity: The pipeline is specifically trained for running-text documents from the 16th to the 19th century. It may not perform optimally for documents outside this time range or for documents with unique characteristics or handwriting styles not covered by the training data.

  • Language Limitations: The pipeline is tailored for Swedish text recognition. While it may handle other languages to some extent, its performance may not be as accurate as for Swedish.

  • Handwriting Style: The pipeline is optimized for the cursive handwriting style prevalent in the historical documents of the Swedish National Archives. It may not perform as well for other handwriting styles, such as block letters or highly stylized scripts.

Training Data

The Swedish National Archives HTR pipeline was trained using a diverse dataset of running-text documents from the 16th to the 19th century. The training data includes various types of historical texts, such as letters, manuscripts, and official records.

The dataset comprises both high-quality and challenging examples to ensure the models' robustness. It covers a wide range of handwriting styles, legibility levels, and document conditions.

The training data was annotated to provide ground truth for text region and line segmentation, as well as text transcription. Expert archivists and historians contributed to the annotation process to ensure accurate labeling.

The data can be find here: (WIP will be added soon)

Caveats and Future Work

Although the Swedish National Archives HTR pipeline has been trained and optimized for running-text documents from the specified time period, there are a few caveats and considerations to keep in mind:

  • Out-of-Scope Documents: The pipeline may encounter difficulties when processing documents that deviate significantly from the expected document characteristics or handwriting styles present in the training data.

  • Continuous Improvement: The pipeline can benefit from continuous updates and improvements as new training data becomes available and advancements in OCR technology occur. Regular evaluations and updates are recommended to enhance its performance and adaptability.

  • User Feedback: Users are encouraged to provide feedback on the pipeline's performance, identify issues, and report any potential biases or limitations. This feedback can contribute to refining the pipeline and addressing concerns.

References

If you would like to learn more about the Swedish National Archives HTR pipeline or access the training data, please refer to the following resources: