|
|
--- |
|
|
license: mit |
|
|
base_model: naver-clova-ix/donut-base-finetuned-rvlcdip |
|
|
library_name: transformers |
|
|
tags: ['donut','classification','irs','tax','document AI'] |
|
|
--- |
|
|
|
|
|
# Donut - model fine-tuned for US IRS tax documents classification |
|
|
This donut model has been fine-tuned for IRS (US) tax document classification. It can classify up to 28 different types of IRS documents, targeting common set of documents used for tax returns. |
|
|
|
|
|
|
|
|
1. 1040 U.S. Individual Income Tax Return |
|
|
2. 1040-NR U.S. Nonresident Alien Income Tax Return |
|
|
3. 1040-NR SCHEDULE OI Other Information |
|
|
4. 1040 SCHEDULE 1 Additional Income and Adjustments to Income |
|
|
5. 1040 SCHEDULE 2 Additional Taxes |
|
|
6. 1040 SCHEDULE 3 Additional Credits and Payments |
|
|
7. 1040 SCHEDULE 8812 Credits for Qualifying Children and Other Dependents |
|
|
8. 1040 SCHEDULE A Itemized Deductions |
|
|
9. 1040 SCHEDULE B Interest and Ordinary Dividends |
|
|
10. 1040 SCHEDULE C Profit or Loss From Business |
|
|
11. 1040 SCHEDULE D Capital Gains and Losses |
|
|
12. 1040 SCHEDULE E Supplemental Income and Loss |
|
|
13. 1040 SCHEDULE SE Self-Employment Tax |
|
|
14. Form 1125-A Cost of Goods Sold |
|
|
15. Form 8949 Sales and Other Dispositions of Capital Assets |
|
|
16. Form 8959 Additional Medicare Tax |
|
|
17. Form 8960 Net Investment Income Tax — Individuals, Estates, and Trusts |
|
|
18. Form 8995 Qualified Business Income Deduction Simplified Computation |
|
|
19. Form 8995-A SCHEDULE A Specified Service Trades or Businesses |
|
|
20. Form W-2 Wage and Tax Statement |
|
|
|
|
|
|
|
|
|
|
|
## Model Details & Description |
|
|
The base model is ['naver-clova-ix/donut-base-finetuned-rvlcdip'][base], the model is finetuned using training data set of over 3000+ documents. |
|
|
The config.json file has assocociated label2id updated to reflect all labels that can be classified via the model. |
|
|
|
|
|
For inference use image size with width: 1920 px and height: 2560 px |
|
|
|
|
|
## Sample Code for Document Inference |
|
|
```python |
|
|
# load dependencies |
|
|
import torch |
|
|
from transformers import DonutSwinModel, DonutSwinPreTrainedModel,DonutProcessor |
|
|
from torch import nn |
|
|
from PIL import Image |
|
|
|
|
|
# |
|
|
class DonutForImageClassification(DonutSwinPreTrainedModel): |
|
|
def __init__(self, config): |
|
|
super().__init__(config) |
|
|
self.num_labels = config.num_labels |
|
|
self.swin = DonutSwinModel(config) |
|
|
self.dropout = nn.Dropout(0.5) |
|
|
self.classifier = nn.Linear(self.swin.num_features, config.num_labels) |
|
|
|
|
|
def forward(self, pixel_values: torch.Tensor) -> torch.Tensor: |
|
|
outputs = self.swin(pixel_values) |
|
|
pooled_output = outputs[1] |
|
|
pooled_output = self.dropout(pooled_output) |
|
|
logits = self.classifier(pooled_output) |
|
|
return logits |
|
|
|
|
|
sModelName = 'hsarfraz/donut-irs-tax-docs-classifier' |
|
|
processor = DonutProcessor.from_pretrained(sModelName) |
|
|
model = DonutForImageClassification.from_pretrained(sModelName) |
|
|
|
|
|
device = 'cuda' if torch.cuda.is_available() else 'cpu' |
|
|
model.to(device) |
|
|
|
|
|
model.eval() |
|
|
|
|
|
# load test image |
|
|
sTestImagePath ='replace this with document image path' # i.e. |
|
|
# open image |
|
|
img = Image.open(sTestImagePath) |
|
|
# resize image to width 1920 and height 2560 - fine tuned model is trained with this width and height |
|
|
img_new = img.resize((1920,2560),Image.Resampling.LANCZOS) |
|
|
|
|
|
# perfoem inference |
|
|
predicted_label = '' |
|
|
with torch.no_grad(): |
|
|
pixel_values = processor(img_new.convert("RGB"), return_tensors="pt").pixel_values |
|
|
print(pixel_values.shape) |
|
|
pixel_values = pixel_values.to(device) |
|
|
outputs = model(pixel_values) |
|
|
logits, predicted = torch.max(outputs.data, 1) |
|
|
pval = predicted.cpu().numpy()[0] |
|
|
predicted_label = model.config.id2label[pval] |
|
|
|
|
|
print('---------------------------------- ') |
|
|
print('Document Image Classification: ',predicted_label) |
|
|
|
|
|
|
|
|
``` |
|
|
|
|
|
|
|
|
[base]: https://huggingface.co/naver-clova-ix/donut-base-finetuned-rvlcdip |