luna_ocr / server /README.md
veela4's picture
Add files using upload-large-folder tool
373c769 verified

Luna OCR Backend

Real OCR processing backend using Gemini AI for intelligent text extraction and formatting.

🚀 Quick Start

1. Install Dependencies

cd server
npm install

2. Start the Server

npm start
# or for development with auto-reload:
npm run dev

3. Test the API

curl http://localhost:3001/api/health

📡 API Endpoints

Health Check

GET /api/health

OCR Processing

POST /api/ocr
Content-Type: multipart/form-data

Parameters:
- file: Image file (PNG, JPG, WebP) or PDF
- apiKey: Google Gemini API key
- mode: "standard" or "structured"

🔧 Configuration

Environment Variables

Create a .env file (optional):

PORT=3001
MAX_FILE_SIZE=10485760

Supported File Types

  • Images: PNG, JPG, JPEG, WebP
  • Documents: PDF (converted to images)
  • Max Size: 10MB per file

🎯 Processing Modes

Standard Mode

  • Uses Gemini 1.5 Flash (faster)
  • Returns clean plain text
  • Good for simple text extraction

Structured Mode

  • Uses Gemini 1.5 Pro (more intelligent)
  • Returns formatted Markdown
  • Creates tables, headers, lists automatically
  • Perfect for complex documents

📊 Response Format

{
  "success": true,
  "data": {
    "fileName": "document.png",
    "fileSize": 1234567,
    "processingMode": "structured",
    "extractedText": "# Document Title\n\n...",
    "formats": {
      "txt": "plain text version",
      "md": "markdown version", 
      "json": { "metadata": {...}, "content": {...} }
    },
    "metadata": {
      "characterCount": 1500,
      "wordCount": 250,
      "lineCount": 45,
      "processedAt": "2024-01-01T12:00:00.000Z"
    }
  }
}

🛠️ Development

Project Structure

server/
├── server.js          # Main server file
├── package.json       # Dependencies
├── uploads/           # Temporary file storage
└── README.md          # This file

Key Features

  • Image Enhancement: Automatic image preprocessing for better OCR
  • Smart Formatting: Gemini AI creates beautiful Markdown output
  • Multiple Formats: Returns TXT, MD, and JSON formats
  • Error Handling: Comprehensive error handling and cleanup
  • File Cleanup: Automatic temporary file cleanup

🔑 Getting Gemini API Key

  1. Go to Google AI Studio
  2. Create a new API key
  3. Copy the key and use it in the frontend

🚨 Troubleshooting

Common Issues

"Cannot connect to OCR backend"

  • Make sure server is running: npm start
  • Check port 3001 is not in use
  • Verify no firewall blocking

"Invalid API key"

  • Check your Gemini API key is correct
  • Ensure API key has proper permissions
  • Try creating a new API key

"File too large"

  • Maximum file size is 10MB
  • Compress images before uploading
  • For PDFs, try splitting into smaller files

"Processing failed"

  • Check image quality (not too blurry)
  • Ensure text is clearly visible
  • Try different processing mode

Debug Mode

Set NODE_ENV=development for detailed logging:

NODE_ENV=development npm start

📝 Notes

  • Server runs on port 3001 by default
  • Temporary files are automatically cleaned up
  • CORS is enabled for frontend integration
  • Image enhancement improves OCR accuracy
  • Gemini AI provides intelligent text formatting

🔗 Integration

The backend integrates seamlessly with the Luna OCR React frontend. Make sure both are running:

  1. Backend: cd server && npm start (port 3001)
  2. Frontend: npm start (port 3000)

The frontend will automatically call the backend API for real OCR processing!