sakshamhooda
/

TotalText-STDR

Model card Files Files and versions

TotalText-STDR / README.md

sakshamhooda's picture

Update README.md

9ff5007 verified 2 months ago

|

history blame contribute delete

3.19 kB

	---
	license: apache-2.0
	language: en
	pipeline_tag: image-to-text
	---

	# TotalText-STDR: End-to-End Scene Text Detection and Recognition

	This repository contains the official models and inference pipeline for the TotalText Scene Text Detection and Recognition (STDR) project. It provides a complete solution for identifying and transcribing text, including curved text, from images.

	The pipeline combines a fine-tuned Differentiable Binarization (DBNet) model for text detection and a pre-trained Attention-based model (TPS-ResNet-BiLSTM-Attn) for text recognition.

	## Models

	### Text Detection
	- Architecture: Differentiable Binarization (DBNet) with a ResNet-50 backbone.
	- Pretraining: Pre-trained on the SynthText dataset.
	- Fine-tuning: Fine-tuned on the Total-Text dataset for high precision on curved and oriented text.
	- Framework: PyTorch

	### Text Recognition
	- Architecture: TPS-ResNet-BiLSTM-Attention.
	- Training: Pre-trained on a large-scale dataset of real and synthetic word images.
	- Framework: PyTorch

	## How to Use

	The end-to-end inference logic is encapsulated in the `OCR_Pipeline` class in `pipeline.py`.

	### 1. Installation

	First, clone the repository and install the required dependencies:

	```bash
	git clone https://huggingface.co/sakshamhooda/TotalText-STDR
	cd TotalText-STDR

	# Install dependencies (use of a virtual environment is recommended)
	# Note: Ensure you have the correct PyTorch version for your CUDA setup.
	pip install -r requirements.txt
	```

	### 2. Inference

	You can run the pipeline on an image using the following Python script. Make sure the model weights are present in the repository.

	```python
	import cv2
	from pathlib import Path
	from pipeline import OCR_Pipeline

	# --- Configuration ---
	DETECTOR_CKPT = "runs/dbnet_detector/dbnet_best_tt_1.pth"
	RECOGNIZER_CKPT = "recognition-ptr-weights/TPS-ResNet-BiLSTM-Attn-case-sensitive.pth"
	CHARSET_PATH = "config/charset_totaltext.txt"
	IMAGE_PATH = "Total-Text-Dataset/test/img/img4.jpg" # Example image

	# --- Initialization ---
	pipeline = OCR_Pipeline(
	det_model_path=DETECTOR_CKPT,
	rec_model_path=RECOGNIZER_CKPT,
	charset_path=CHARSET_PATH,
	)

	# --- Run Inference ---
	print(f"Running inference on: {IMAGE_PATH}")
	input_image = cv2.imread(IMAGE_PATH)

	results, heatmap = pipeline.run(input_image)

	# --- Visualize and Print Results ---
	print(f"Found {len(results)} text instances.")

	output_image = input_image.copy()
	for res in results:
	poly = np.array(res['polygon']).astype(np.int32)
	text = res['text']

	cv2.polylines(output_image, [poly], isClosed=True, color=(0, 255, 0), thickness=2)
	cv2.putText(output_image, text, tuple(poly[0]), cv2.FONT_HERSHEY_SIMPLEX, 0.8, (255, 0, 0), 2)

	# Save the output
	output_path = Path("./pipeline_output.jpg")
	cv2.imwrite(str(output_path), output_image)
	print(f"Output image with results saved to: {output_path}")

	```

	## Project Information

	This project was developed to provide a high-precision OCR solution for the Total-Text dataset. Experiment tracking was managed with W&B, and model versioning with MLflow. For more details on the training process, see the original project source.