Luna OCR Backend
Real OCR processing backend using Gemini AI for intelligent text extraction and formatting.
🚀 Quick Start
1. Install Dependencies
cd server
npm install
2. Start the Server
npm start
# or for development with auto-reload:
npm run dev
3. Test the API
curl http://localhost:3001/api/health
📡 API Endpoints
Health Check
GET /api/health
OCR Processing
POST /api/ocr
Content-Type: multipart/form-data
Parameters:
- file: Image file (PNG, JPG, WebP) or PDF
- apiKey: Google Gemini API key
- mode: "standard" or "structured"
🔧 Configuration
Environment Variables
Create a .env
file (optional):
PORT=3001
MAX_FILE_SIZE=10485760
Supported File Types
- Images: PNG, JPG, JPEG, WebP
- Documents: PDF (converted to images)
- Max Size: 10MB per file
🎯 Processing Modes
Standard Mode
- Uses Gemini 1.5 Flash (faster)
- Returns clean plain text
- Good for simple text extraction
Structured Mode
- Uses Gemini 1.5 Pro (more intelligent)
- Returns formatted Markdown
- Creates tables, headers, lists automatically
- Perfect for complex documents
📊 Response Format
{
"success": true,
"data": {
"fileName": "document.png",
"fileSize": 1234567,
"processingMode": "structured",
"extractedText": "# Document Title\n\n...",
"formats": {
"txt": "plain text version",
"md": "markdown version",
"json": { "metadata": {...}, "content": {...} }
},
"metadata": {
"characterCount": 1500,
"wordCount": 250,
"lineCount": 45,
"processedAt": "2024-01-01T12:00:00.000Z"
}
}
}
🛠️ Development
Project Structure
server/
├── server.js # Main server file
├── package.json # Dependencies
├── uploads/ # Temporary file storage
└── README.md # This file
Key Features
- Image Enhancement: Automatic image preprocessing for better OCR
- Smart Formatting: Gemini AI creates beautiful Markdown output
- Multiple Formats: Returns TXT, MD, and JSON formats
- Error Handling: Comprehensive error handling and cleanup
- File Cleanup: Automatic temporary file cleanup
🔑 Getting Gemini API Key
- Go to Google AI Studio
- Create a new API key
- Copy the key and use it in the frontend
🚨 Troubleshooting
Common Issues
"Cannot connect to OCR backend"
- Make sure server is running:
npm start
- Check port 3001 is not in use
- Verify no firewall blocking
"Invalid API key"
- Check your Gemini API key is correct
- Ensure API key has proper permissions
- Try creating a new API key
"File too large"
- Maximum file size is 10MB
- Compress images before uploading
- For PDFs, try splitting into smaller files
"Processing failed"
- Check image quality (not too blurry)
- Ensure text is clearly visible
- Try different processing mode
Debug Mode
Set NODE_ENV=development
for detailed logging:
NODE_ENV=development npm start
📝 Notes
- Server runs on port 3001 by default
- Temporary files are automatically cleaned up
- CORS is enabled for frontend integration
- Image enhancement improves OCR accuracy
- Gemini AI provides intelligent text formatting
🔗 Integration
The backend integrates seamlessly with the Luna OCR React frontend. Make sure both are running:
- Backend:
cd server && npm start
(port 3001) - Frontend:
npm start
(port 3000)
The frontend will automatically call the backend API for real OCR processing!