Japanese Receipt VL lfm2-450M

Model Description

Japanese-Receipt-VL-lfm2-450M is a specialized vision-language model fine-tuned for understanding and processing Japanese receipts. Built on LiquidAI's LFM2-VL-450M foundation model, this model can analyze receipt images and extract structured information, answer questions about receipt contents, and provide detailed descriptions in both Japanese and English.

Model Details

Base Model: liquidai/lfm2-vl-450m
Model Size: 450M parameters
Fine-tuning Method: LoRA (Low-Rank Adaptation)
Languages: Japanese (primary), English (secondary)
Architecture: Vision-Language Transformer
Training: Fine-tuned on Japanese receipt datasets

Intended Use

Primary Use Cases

Comprehensive Receipt Parsing: Convert any Japanese receipt to structured JSON with exact text preservation
Retail Analytics: Extract detailed product information, pricing, and tax data from store receipts
Multi-tax Rate Processing: Handle complex Japanese tax scenarios (8%, 10%, tax-exempt items)
Financial Document Digitization: Process banking, credit card, and payment system receipts
E-commerce Integration: Extract product catalogs and pricing from retail receipts
Accounting Automation: Comprehensive expense categorization with tax breakdown details
Compliance Documentation: Maintain exact formatting for audit and regulatory requirements
Payment Processing Analysis: Extract credit card transaction details and approval codes

Target Users

Financial technology companies
Accounting software developers
Expense management platforms
Retail analytics companies
Japanese businesses and consumers

Usage

Installation

pip install transformers torch pillow

Basic Usage

from transformers import AutoProcessor, AutoModelForVision2Seq
from PIL import Image
import torch

# Load model and processor
model = AutoModelForVision2Seq.from_pretrained("sabaridsnfuji/Japanese-Receipt-VL-lfm2-450M")
processor = AutoProcessor.from_pretrained("sabaridsnfuji/Japanese-Receipt-VL-lfm2-450M")

# Load receipt image
image = Image.open("japanese_receipt.jpg")

# System prompt for structured extraction
system_prompt = """You are an intelligent document parser. Read the following Japanese receipt and extract every piece of information exactly as it appears, and present it in a well-structured JSON format using Japanese keys and values.

Please strictly follow these rules:

Only extract information that is actually present on the receipt. Do not include any missing, blank, or inferred fields.

Do not summarize, omit, translate, or modify any part of the receipt. Every character, number, symbol, and line must be retained exactly as printed.

Extract all available content including but not limited to: store details, receipt number, date, time, cashier name, product list, prices, tax breakdowns, payment details, receipt bags, barcodes, notices, and any footer messages.

Preserve original formatting such as line breaks, symbols, and full-width characters (hiragana, katakana, kanji, numbers, etc.).

Do not perform any translation, correction, interpretation, or reformatting of content. Use only what is present.

Output the result in JSON format, using Japanese field names as keys."""

# Prepare conversation format
messages = [
    {
        'role': 'system',
        'content': [{'type': 'text', 'text': system_prompt}]
    },
    {
        'role': 'user', 
        'content': [
            {'type': 'text', 'text': 'Please parse this Japanese receipt.'},
            {'type': 'image', 'image': image}
        ]
    }
]

# Process and generate
inputs = processor.apply_chat_template(messages, return_tensors="pt")
with torch.no_grad():
    outputs = model.generate(**inputs, max_new_tokens=1024)
    
# Decode response
response = processor.decode(outputs[0], skip_special_tokens=True)
print(response)

Example Output

Example 1: Seven Bank PayPay Transaction Receipt

{
  "ご利用明細票": {
    "セブン銀行": "QR",
    "取引金額": "￥10,000*",
    "日付": "2025年03月26日",
    "時間": "15:46",
    "店舗番号": "0034",
    "店番": "BranchNo0100",
    "口座番号": "************9384",
    "金額票": "114703045-8277103",
    "照合コード": "0000",
    "お取引会社からのご連絡": "PayPayのお取引です"
  },
  "お知らせ": [
    "PayPayスクラッチくじ！すべての対象のお店で200円以上の支払いで1等最大全額戻ってくる（付与上限・条件あり）",
    "詳しくはPayPayアプリで♪"
  ],
  "注意事項": [
    "暗証番号は他人に知られないようにしてください。銀行員が直接あるいは電話で暗証番号をお尋ねすることはありません。",
    "上記ご取引内容についてご不明の点は、お取引会社にお問合せください。"
  ],
  "セブン銀行": "セブン銀行"
}

Example 2: DAISO Retail Store Receipt

{
  "店舗名": "ダイソー青葉台東急スクエア店",
  "電話番号": "TEL:082-420-0100",
  "公式通販サイトURL": "「DAISOオンラインショップ」『ダイソーオンライン』で検索！",
  "令状：校訂証正日付": "2025年6月22日(日)",
  "レジ日時": "19:24",
  "レジ番号": "0006",
  "責任者名": "99999992",
  "商品列表": [
    {
      "商品コード": "ドウシシャ",
      "商品名": "ナタデココ入",
      "価格": "¥100※"
    },
    {
      "商品コード": "ドウシシャ",
      "商品名": "チアシードド",
      "価格": "¥100※"
    },
    {
      "商品名": "消臭ポリ袋（おむつ用）",
      "価格": "¥100外"
    },
    {
      "商品名": "化粧ブラシセット（5本）",
      "価格": "¥300外"
    },
    {
      "商品名": "シャワー線棒 1 1 0本入",
      "価格": "¥100外"
    },
    {
      "商品名": "抗菌線棒（バガスパルブ配",
      "価格": "¥100外"
    }
  ],
  "小計点数": "6点",
  "小計金額": "¥800",
  "税込ポイント": "",
  "各税別": {
    "10%税抜対象額": "¥600",
    "10%税率額": "¥60",
    "8%税抜対象額": "¥200",
    "8%税率額": "¥16"
  },
  "合計金額": "¥876",
  "ビザ/マスター金額": "¥876",
  "お釣り金額": "￥0",
  "注意事項": "※印は軽減税率適用商品です。",
  "登録番号": "T7240001022681",
  "QRコード1": "",
  "QRコード2": "",
  "QRコード3": "",
  "クレジット売上票情報": "",
  "カード会社": "カイツ",
  "会員番号": "104",
  "ビザ/マスター": "",
  "有効期限": "429769XXXXXXXX5489-NFC",
  "取扱い日": "2025年06月22日",
  "承認番号": "0705755",
  "伝票番号": "05755",
  "取引内容": "売上（オンライン）",
  "支払区分": "一括",
  "取引金額": "¥876",
  "端末番号": "4971162449343",
  "ATC": "011C",
  "カードシークス番号": "00",
  "AID": "A00000000031010",
  "APL名": "VISACREDIT",
  "店舗番号": "008943",
  "レジット番号": "1841"
}

Advanced Usage - Custom Extraction

# Custom extraction with specific requirements
custom_prompt = """Parse this Japanese receipt and extract only the following information in JSON format:
- Transaction amount (取引金額)  
- Date and time (日付・時間)
- Store information (店舗情報)
- Payment method details (支払い方法)

Use Japanese keys and preserve exact formatting."""

messages = [
    {
        'role': 'system',
        'content': [{'type': 'text', 'text': custom_prompt}]
    },
    {
        'role': 'user',
        'content': [
            {'type': 'text', 'text': 'Extract the requested information from this receipt.'},
            {'type': 'image', 'image': image}
        ]
    }
]

inputs = processor.apply_chat_template(messages, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=512)
response = processor.decode(outputs[0], skip_special_tokens=True)
print(response)

Batch Processing

import os
from pathlib import Path

def process_receipt_batch(image_folder, output_file):
    """Process multiple receipts and save results"""
    results = []
    
    for image_path in Path(image_folder).glob("*.jpg"):
        image = Image.open(image_path)
        
        # Use the standard system prompt for full extraction
        messages = [
            {'role': 'system', 'content': [{'type': 'text', 'text': system_prompt}]},
            {'role': 'user', 'content': [
                {'type': 'text', 'text': 'Parse this receipt.'},
                {'type': 'image', 'image': image}
            ]}
        ]
        
        inputs = processor.apply_chat_template(messages, return_tensors="pt")
        outputs = model.generate(**inputs, max_new_tokens=1024)
        response = processor.decode(outputs[0], skip_special_tokens=True)
        
        results.append({
            "filename": image_path.name,
            "extracted_data": response
        })
    
    # Save results
    import json
    with open(output_file, 'w', encoding='utf-8') as f:
        json.dump(results, f, ensure_ascii=False, indent=2)

# Process all receipts in a folder
process_receipt_batch("./receipts/", "extracted_data.json")

Training Details

Training Data

Primary Dataset: Japanese-Mobile-Receipt-OCR-1.3K dataset
Data Size: 1,300+ receipt images
Data Sources: Various Japanese retailers, restaurants, and service providers
Annotation: Manual annotation of key information fields and structured extraction

Training Process

Fine-tuning Method: LoRA (Low-Rank Adaptation)
Base Model: liquidai/lfm2-vl-450m
Training Framework: PyTorch + Transformers
Optimization: AdamW optimizer
Training Time: Approximately 48 hours on V100 GPUs

Key Features Learned

Structured JSON extraction with Japanese field names and hierarchical organization
Exact text preservation including full-width characters, symbols, and formatting
Multi-type receipt support: Banking transactions, retail stores, payment systems
Comprehensive product parsing: Item lists with codes, names, and individual pricing
Advanced tax calculation extraction: Multiple tax rates (8%, 10%), tax-exempt items, reduced tax rate indicators
Payment method details: Credit card information, transaction codes, terminal data
Store and business information: Contact details, registration numbers, URLs
Transaction metadata: Receipt numbers, cashier info, timestamps, approval codes
Promotional content extraction: Notices, QR codes, loyalty program information
Privacy-aware data handling: Proper masking of sensitive account information
Japanese retail format understanding: DAISO, convenience stores, department stores

Training Details

Benchmarks

The model has been evaluated on a held-out test set of Japanese receipts across various categories including:

Banking receipts (銀行レシート) - Seven Bank, Japan Post Bank, ATM transactions
Payment system receipts (決済システム) - PayPay, LINE Pay, Rakuten Pay
Retail store receipts (小売店レシート) - DAISO, convenience stores (7-Eleven, Lawson), supermarkets
Department store receipts (デパートレシート) - Complex itemized purchases with multiple tax rates
Restaurant receipts (レストランレシート) - Food service with reduced tax rates
Transportation receipts (交通レシート) - Train tickets, bus passes, parking
Credit card receipts (クレジットカードレシート) - Detailed payment processing information

Limitations

Known Limitations

Image Quality: Performance degrades with blurry, damaged, or low-resolution images
Handwritten Receipts: Limited accuracy on handwritten receipts
Regional Variations: Optimized for standard Japanese receipt formats
Language Mixing: May struggle with receipts containing mixed scripts
Old Receipt Formats: Older or non-standard receipt layouts may reduce accuracy

Bias Considerations

Training Data Bias: Model performance may vary across different Japanese regions
Retailer Bias: Better performance on common retail chains represented in training data
Format Bias: Optimized for modern thermal printer receipts

Ethical Considerations

Privacy

Personal Information: Model may extract personal information from receipts
Data Handling: Users should implement appropriate privacy safeguards
Compliance: Ensure compliance with local data protection regulations

Security

Sensitive Data: Receipts may contain sensitive financial information
Access Control: Implement proper access controls in production environments

Citation

If you use this model in your research or applications, please cite:

@misc{japanese-receipt-vl-lfm2-450m,
  title={Japanese Receipt VL lfm2-450M: A Specialized Vision-Language Model for Japanese Receipt Understanding},
  author={sabaridsnfuji},
  year={2024},
  publisher={Hugging Face},
  url={https://huggingface.co/sabaridsnfuji/Japanese-Receipt-VL-lfm2-450M}
}

Dataset Reference

If you use this model or its underlying dataset, please also cite the original dataset paper:

@article{japanese-mobile-receipt-ocr-2024,
  title={Japanese-Mobile-Receipt-OCR-1.3K: A Comprehensive Dataset Analysis and Fine-tuned Vision-Language Model for Structured Receipt Data Extraction},
  author={Sabari Nathan},
  year={2024},
  doi={10.21203/rs.3.rs-7357197/v1},
  url={https://doi.org/10.21203/rs.3.rs-7357197/v1},
  note={Preprint}
}

Base Model Reference

Please also cite the base LFM2-VL model:

@article{lfm2-vl-2024,
  title={LFM2-VL: Large Foundation Model for Vision-Language Tasks},
  author={LiquidAI},
  year={2024},
  publisher={LiquidAI},
  url={https://huggingface.co/liquidai/lfm2-vl-450m}
}

License

This model is released under the Apache 2.0 License. Please ensure compliance with the license terms when using this model.

Acknowledgments

Base Model: LiquidAI LFM2-VL team
Training Infrastructure: [Your organization/platform]
Dataset Contributors: Japanese receipt data annotators
Community: Hugging Face community for tools and support

Contact

For questions, issues, or collaboration opportunities, please reach out through:

GitHub Issues: [Your GitHub repository]
Hugging Face Discussions: [Model discussion page]
Email: [Your contact email]

Model Card Authors

sabaridsnfuji

Model Card Contact

For questions about this model card, please contact the model authors.

sabaridsnfuji
/

Japanese-Receipt-VL-lfm2-450M

Japanese Receipt VL lfm2-450M

Model Description

Model Details

Intended Use

Primary Use Cases

Target Users

Usage

Installation

Basic Usage

Example Output

Advanced Usage - Custom Extraction

Batch Processing

Training Details

Training Data

Training Process

Key Features Learned

Training Details

Benchmarks

Limitations

Known Limitations

Bias Considerations

Ethical Considerations

Privacy

Security

Citation

Dataset Reference

Base Model Reference

License

Acknowledgments

Contact

Model Card Authors

Model Card Contact

Model tree for sabaridsnfuji/Japanese-Receipt-VL-lfm2-450M