Japanese Receipt VL lfm2-450M

Model Description

Japanese-Receipt-VL-lfm2-450M is a specialized vision-language model fine-tuned for understanding and processing Japanese receipts. Built on LiquidAI's LFM2-VL-450M foundation model, this model can analyze receipt images and extract structured information, answer questions about receipt contents, and provide detailed descriptions in both Japanese and English.

Model Details

  • Base Model: liquidai/lfm2-vl-450m
  • Model Size: 450M parameters
  • Fine-tuning Method: LoRA (Low-Rank Adaptation)
  • Languages: Japanese (primary), English (secondary)
  • Architecture: Vision-Language Transformer
  • Training: Fine-tuned on Japanese receipt datasets

Intended Use

Primary Use Cases

  • Comprehensive Receipt Parsing: Convert any Japanese receipt to structured JSON with exact text preservation
  • Retail Analytics: Extract detailed product information, pricing, and tax data from store receipts
  • Multi-tax Rate Processing: Handle complex Japanese tax scenarios (8%, 10%, tax-exempt items)
  • Financial Document Digitization: Process banking, credit card, and payment system receipts
  • E-commerce Integration: Extract product catalogs and pricing from retail receipts
  • Accounting Automation: Comprehensive expense categorization with tax breakdown details
  • Compliance Documentation: Maintain exact formatting for audit and regulatory requirements
  • Payment Processing Analysis: Extract credit card transaction details and approval codes

Target Users

  • Financial technology companies
  • Accounting software developers
  • Expense management platforms
  • Retail analytics companies
  • Japanese businesses and consumers

Usage

Installation

pip install transformers torch pillow

Basic Usage

from transformers import AutoProcessor, AutoModelForVision2Seq
from PIL import Image
import torch

# Load model and processor
model = AutoModelForVision2Seq.from_pretrained("sabaridsnfuji/Japanese-Receipt-VL-lfm2-450M")
processor = AutoProcessor.from_pretrained("sabaridsnfuji/Japanese-Receipt-VL-lfm2-450M")

# Load receipt image
image = Image.open("japanese_receipt.jpg")

# System prompt for structured extraction
system_prompt = """You are an intelligent document parser. Read the following Japanese receipt and extract every piece of information exactly as it appears, and present it in a well-structured JSON format using Japanese keys and values.

Please strictly follow these rules:

Only extract information that is actually present on the receipt. Do not include any missing, blank, or inferred fields.

Do not summarize, omit, translate, or modify any part of the receipt. Every character, number, symbol, and line must be retained exactly as printed.

Extract all available content including but not limited to: store details, receipt number, date, time, cashier name, product list, prices, tax breakdowns, payment details, receipt bags, barcodes, notices, and any footer messages.

Preserve original formatting such as line breaks, symbols, and full-width characters (hiragana, katakana, kanji, numbers, etc.).

Do not perform any translation, correction, interpretation, or reformatting of content. Use only what is present.

Output the result in JSON format, using Japanese field names as keys."""

# Prepare conversation format
messages = [
    {
        'role': 'system',
        'content': [{'type': 'text', 'text': system_prompt}]
    },
    {
        'role': 'user', 
        'content': [
            {'type': 'text', 'text': 'Please parse this Japanese receipt.'},
            {'type': 'image', 'image': image}
        ]
    }
]

# Process and generate
inputs = processor.apply_chat_template(messages, return_tensors="pt")
with torch.no_grad():
    outputs = model.generate(**inputs, max_new_tokens=1024)
    
# Decode response
response = processor.decode(outputs[0], skip_special_tokens=True)
print(response)

Example Output

Example 1: Seven Bank PayPay Transaction Receipt

{
  "ご利用明細票": {
    "セブン銀行": "QR",
    "取引金額": "¥10,000*",
    "日付": "2025年03月26日",
    "時間": "15:46",
    "店舗番号": "0034",
    "店番": "BranchNo0100",
    "口座番号": "************9384",
    "金額票": "114703045-8277103",
    "照合コード": "0000",
    "お取引会社からのご連絡": "PayPayのお取引です"
  },
  "お知らせ": [
    "PayPayスクラッチくじ!すべての対象のお店で200円以上の支払いで1等最大全額戻ってくる(付与上限・条件あり)",
    "詳しくはPayPayアプリで♪"
  ],
  "注意事項": [
    "暗証番号は他人に知られないようにしてください。銀行員が直接あるいは電話で暗証番号をお尋ねすることはありません。",
    "上記ご取引内容についてご不明の点は、お取引会社にお問合せください。"
  ],
  "セブン銀行": "セブン銀行"
}

Example 2: DAISO Retail Store Receipt

{
  "店舗名": "ダイソー青葉台東急スクエア店",
  "電話番号": "TEL:082-420-0100",
  "公式通販サイトURL": "「DAISOオンラインショップ」『ダイソーオンライン』で検索!",
  "令状:校訂証正日付": "2025年6月22日(日)",
  "レジ日時": "19:24",
  "レジ番号": "0006",
  "責任者名": "99999992",
  "商品列表": [
    {
      "商品コード": "ドウシシャ",
      "商品名": "ナタデココ入",
      "価格": "¥100※"
    },
    {
      "商品コード": "ドウシシャ",
      "商品名": "チアシードド",
      "価格": "¥100※"
    },
    {
      "商品名": "消臭ポリ袋(おむつ用)",
      "価格": "¥100外"
    },
    {
      "商品名": "化粧ブラシセット(5本)",
      "価格": "¥300外"
    },
    {
      "商品名": "シャワー線棒 1 1 0本入",
      "価格": "¥100外"
    },
    {
      "商品名": "抗菌線棒(バガスパルブ配",
      "価格": "¥100外"
    }
  ],
  "小計点数": "6点",
  "小計金額": "¥800",
  "税込ポイント": "",
  "各税別": {
    "10%税抜対象額": "¥600",
    "10%税率額": "¥60",
    "8%税抜対象額": "¥200",
    "8%税率額": "¥16"
  },
  "合計金額": "¥876",
  "ビザ/マスター金額": "¥876",
  "お釣り金額": "¥0",
  "注意事項": "※印は軽減税率適用商品です。",
  "登録番号": "T7240001022681",
  "QRコード1": "",
  "QRコード2": "",
  "QRコード3": "",
  "クレジット売上票情報": "",
  "カード会社": "カイツ",
  "会員番号": "104",
  "ビザ/マスター": "",
  "有効期限": "429769XXXXXXXX5489-NFC",
  "取扱い日": "2025年06月22日",
  "承認番号": "0705755",
  "伝票番号": "05755",
  "取引内容": "売上(オンライン)",
  "支払区分": "一括",
  "取引金額": "¥876",
  "端末番号": "4971162449343",
  "ATC": "011C",
  "カードシークス番号": "00",
  "AID": "A00000000031010",
  "APL名": "VISACREDIT",
  "店舗番号": "008943",
  "レジット番号": "1841"
}

Advanced Usage - Custom Extraction

# Custom extraction with specific requirements
custom_prompt = """Parse this Japanese receipt and extract only the following information in JSON format:
- Transaction amount (取引金額)  
- Date and time (日付・時間)
- Store information (店舗情報)
- Payment method details (支払い方法)

Use Japanese keys and preserve exact formatting."""

messages = [
    {
        'role': 'system',
        'content': [{'type': 'text', 'text': custom_prompt}]
    },
    {
        'role': 'user',
        'content': [
            {'type': 'text', 'text': 'Extract the requested information from this receipt.'},
            {'type': 'image', 'image': image}
        ]
    }
]

inputs = processor.apply_chat_template(messages, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=512)
response = processor.decode(outputs[0], skip_special_tokens=True)
print(response)

Batch Processing

import os
from pathlib import Path

def process_receipt_batch(image_folder, output_file):
    """Process multiple receipts and save results"""
    results = []
    
    for image_path in Path(image_folder).glob("*.jpg"):
        image = Image.open(image_path)
        
        # Use the standard system prompt for full extraction
        messages = [
            {'role': 'system', 'content': [{'type': 'text', 'text': system_prompt}]},
            {'role': 'user', 'content': [
                {'type': 'text', 'text': 'Parse this receipt.'},
                {'type': 'image', 'image': image}
            ]}
        ]
        
        inputs = processor.apply_chat_template(messages, return_tensors="pt")
        outputs = model.generate(**inputs, max_new_tokens=1024)
        response = processor.decode(outputs[0], skip_special_tokens=True)
        
        results.append({
            "filename": image_path.name,
            "extracted_data": response
        })
    
    # Save results
    import json
    with open(output_file, 'w', encoding='utf-8') as f:
        json.dump(results, f, ensure_ascii=False, indent=2)

# Process all receipts in a folder
process_receipt_batch("./receipts/", "extracted_data.json")

Training Details

Training Data

  • Primary Dataset: Japanese-Mobile-Receipt-OCR-1.3K dataset
  • Data Size: 1,300+ receipt images
  • Data Sources: Various Japanese retailers, restaurants, and service providers
  • Annotation: Manual annotation of key information fields and structured extraction

Training Process

  • Fine-tuning Method: LoRA (Low-Rank Adaptation)
  • Base Model: liquidai/lfm2-vl-450m
  • Training Framework: PyTorch + Transformers
  • Optimization: AdamW optimizer
  • Training Time: Approximately 48 hours on V100 GPUs

Key Features Learned

  • Structured JSON extraction with Japanese field names and hierarchical organization
  • Exact text preservation including full-width characters, symbols, and formatting
  • Multi-type receipt support: Banking transactions, retail stores, payment systems
  • Comprehensive product parsing: Item lists with codes, names, and individual pricing
  • Advanced tax calculation extraction: Multiple tax rates (8%, 10%), tax-exempt items, reduced tax rate indicators
  • Payment method details: Credit card information, transaction codes, terminal data
  • Store and business information: Contact details, registration numbers, URLs
  • Transaction metadata: Receipt numbers, cashier info, timestamps, approval codes
  • Promotional content extraction: Notices, QR codes, loyalty program information
  • Privacy-aware data handling: Proper masking of sensitive account information
  • Japanese retail format understanding: DAISO, convenience stores, department stores

Training Details

Benchmarks

The model has been evaluated on a held-out test set of Japanese receipts across various categories including:

  • Banking receipts (銀行レシート) - Seven Bank, Japan Post Bank, ATM transactions
  • Payment system receipts (決済システム) - PayPay, LINE Pay, Rakuten Pay
  • Retail store receipts (小売店レシート) - DAISO, convenience stores (7-Eleven, Lawson), supermarkets
  • Department store receipts (デパートレシート) - Complex itemized purchases with multiple tax rates
  • Restaurant receipts (レストランレシート) - Food service with reduced tax rates
  • Transportation receipts (交通レシート) - Train tickets, bus passes, parking
  • Credit card receipts (クレジットカードレシート) - Detailed payment processing information

Limitations

Known Limitations

  • Image Quality: Performance degrades with blurry, damaged, or low-resolution images
  • Handwritten Receipts: Limited accuracy on handwritten receipts
  • Regional Variations: Optimized for standard Japanese receipt formats
  • Language Mixing: May struggle with receipts containing mixed scripts
  • Old Receipt Formats: Older or non-standard receipt layouts may reduce accuracy

Bias Considerations

  • Training Data Bias: Model performance may vary across different Japanese regions
  • Retailer Bias: Better performance on common retail chains represented in training data
  • Format Bias: Optimized for modern thermal printer receipts

Ethical Considerations

Privacy

  • Personal Information: Model may extract personal information from receipts
  • Data Handling: Users should implement appropriate privacy safeguards
  • Compliance: Ensure compliance with local data protection regulations

Security

  • Sensitive Data: Receipts may contain sensitive financial information
  • Access Control: Implement proper access controls in production environments

Citation

If you use this model in your research or applications, please cite:

@misc{japanese-receipt-vl-lfm2-450m,
  title={Japanese Receipt VL lfm2-450M: A Specialized Vision-Language Model for Japanese Receipt Understanding},
  author={sabaridsnfuji},
  year={2024},
  publisher={Hugging Face},
  url={https://huggingface.co/sabaridsnfuji/Japanese-Receipt-VL-lfm2-450M}
}

Dataset Reference

If you use this model or its underlying dataset, please also cite the original dataset paper:

@article{japanese-mobile-receipt-ocr-2024,
  title={Japanese-Mobile-Receipt-OCR-1.3K: A Comprehensive Dataset Analysis and Fine-tuned Vision-Language Model for Structured Receipt Data Extraction},
  author={Sabari Nathan},
  year={2024},
  doi={10.21203/rs.3.rs-7357197/v1},
  url={https://doi.org/10.21203/rs.3.rs-7357197/v1},
  note={Preprint}
}

Base Model Reference

Please also cite the base LFM2-VL model:

@article{lfm2-vl-2024,
  title={LFM2-VL: Large Foundation Model for Vision-Language Tasks},
  author={LiquidAI},
  year={2024},
  publisher={LiquidAI},
  url={https://huggingface.co/liquidai/lfm2-vl-450m}
}

License

This model is released under the Apache 2.0 License. Please ensure compliance with the license terms when using this model.

Acknowledgments

  • Base Model: LiquidAI LFM2-VL team
  • Training Infrastructure: [Your organization/platform]
  • Dataset Contributors: Japanese receipt data annotators
  • Community: Hugging Face community for tools and support

Contact

For questions, issues, or collaboration opportunities, please reach out through:

  • GitHub Issues: [Your GitHub repository]
  • Hugging Face Discussions: [Model discussion page]
  • Email: [Your contact email]

Model Card Authors

  • sabaridsnfuji

Model Card Contact

For questions about this model card, please contact the model authors.

Downloads last month
31
Safetensors
Model size
451M params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for sabaridsnfuji/Japanese-Receipt-VL-lfm2-450M

Adapters
1 model