NirajRajai
/

dots_table

document-understanding

text-extraction

Model card Files Files and versions

dots_table / README.md

NirajRajai's picture

Add model card

3ff07bd verified 16 days ago

|

history blame contribute delete

2.18 kB

	---
	license: apache-2.0
	base_model: DotsOCR
	tags:
	- vision
	- ocr
	- document-understanding
	- text-extraction
	datasets:
	- custom
	language:
	- en
	pipeline_tag: image-to-text
	---

	# dots_table

	This is a fine-tuned version of DotsOCR, optimized for document OCR tasks.

	## Model Details

	- Base Model: DotsOCR (1.7B parameters)
	- Training: LoRA fine-tuning with rank 48
	- Task: Document text extraction and OCR
	- Input: Document images
	- Output: Extracted text in structured format

	## Usage

	```python
	from transformers import AutoModelForCausalLM, AutoProcessor
	import torch
	from PIL import Image

	# Load model and processor
	model = AutoModelForCausalLM.from_pretrained(
	"NirajRajai/dots_table",
	torch_dtype=torch.bfloat16,
	device_map="auto",
	trust_remote_code=True,
	attn_implementation="flash_attention_2"
	)
	processor = AutoProcessor.from_pretrained(
	"NirajRajai/dots_table",
	trust_remote_code=True
	)

	# Process image
	image = Image.open("document.png")
	messages = [
	{
	"role": "user",
	"content": [
	{"type": "image", "image": image},
	{"type": "text", "text": "Extract the text content from this image."}
	]
	}
	]

	# Generate text
	text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
	image_inputs, video_inputs = process_vision_info(messages)
	inputs = processor(
	text=[text],
	images=image_inputs,
	videos=video_inputs,
	padding=True,
	return_tensors="pt"
	).to(model.device)

	generated_ids = model.generate(**inputs, max_new_tokens=2048)
	generated_ids_trimmed = [
	out_ids[len(in_ids):] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
	]
	output_text = processor.batch_decode(
	generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
	)[0]

	print(output_text)
	```

	## Training Details

	- Hardware: NVIDIA H100 80GB
	- Training Duration: 3 epochs
	- Batch Size: 2 (with gradient accumulation)
	- Learning Rate: 5e-5
	- Optimizer: AdamW 8-bit

	## License

	Apache 2.0

	## Citation

	If you use this model, please cite the original DotsOCR paper and this fine-tuned version.