luna_ocr / server /README.md

veela4

Add files using upload-large-folder tool

373c769 verified 3 months ago

preview code

raw

history blame contribute delete

3.83 kB

Luna OCR Backend

Real OCR processing backend using Gemini AI for intelligent text extraction and formatting.

🚀 Quick Start

1. Install Dependencies

cd server
npm install

2. Start the Server

npm start
# or for development with auto-reload:
npm run dev

3. Test the API

curl http://localhost:3001/api/health

📡 API Endpoints

Health Check

GET /api/health

OCR Processing

POST /api/ocr
Content-Type: multipart/form-data

Parameters:
- file: Image file (PNG, JPG, WebP) or PDF
- apiKey: Google Gemini API key
- mode: "standard" or "structured"

🔧 Configuration

Environment Variables

Create a .env file (optional):

PORT=3001
MAX_FILE_SIZE=10485760

Supported File Types

Images: PNG, JPG, JPEG, WebP
Documents: PDF (converted to images)
Max Size: 10MB per file

🎯 Processing Modes

Standard Mode

Uses Gemini 1.5 Flash (faster)
Returns clean plain text
Good for simple text extraction

Structured Mode

Uses Gemini 1.5 Pro (more intelligent)
Returns formatted Markdown
Creates tables, headers, lists automatically
Perfect for complex documents

📊 Response Format

{
  "success": true,
  "data": {
    "fileName": "document.png",
    "fileSize": 1234567,
    "processingMode": "structured",
    "extractedText": "# Document Title\n\n...",
    "formats": {
      "txt": "plain text version",
      "md": "markdown version", 
      "json": { "metadata": {...}, "content": {...} }
    },
    "metadata": {
      "characterCount": 1500,
      "wordCount": 250,
      "lineCount": 45,
      "processedAt": "2024-01-01T12:00:00.000Z"
    }
  }
}

🛠️ Development

Project Structure

server/
├── server.js          # Main server file
├── package.json       # Dependencies
├── uploads/           # Temporary file storage
└── README.md          # This file

Key Features

Image Enhancement: Automatic image preprocessing for better OCR
Smart Formatting: Gemini AI creates beautiful Markdown output
Multiple Formats: Returns TXT, MD, and JSON formats
Error Handling: Comprehensive error handling and cleanup
File Cleanup: Automatic temporary file cleanup

🔑 Getting Gemini API Key

Go to Google AI Studio
Create a new API key
Copy the key and use it in the frontend

🚨 Troubleshooting

Common Issues

"Cannot connect to OCR backend"

Make sure server is running: npm start
Check port 3001 is not in use
Verify no firewall blocking

"Invalid API key"

Check your Gemini API key is correct
Ensure API key has proper permissions
Try creating a new API key

"File too large"

Maximum file size is 10MB
Compress images before uploading
For PDFs, try splitting into smaller files

"Processing failed"

Check image quality (not too blurry)
Ensure text is clearly visible
Try different processing mode

Debug Mode

Set NODE_ENV=development for detailed logging:

NODE_ENV=development npm start

📝 Notes

Server runs on port 3001 by default
Temporary files are automatically cleaned up
CORS is enabled for frontend integration
Image enhancement improves OCR accuracy
Gemini AI provides intelligent text formatting

🔗 Integration

The backend integrates seamlessly with the Luna OCR React frontend. Make sure both are running:

Backend: cd server && npm start (port 3001)
Frontend: npm start (port 3000)

The frontend will automatically call the backend API for real OCR processing!