--- tags: - multimodal - NPU - On-device - Snapdragon PC - Android license: other license_name: nexa-research license_link: LICENSE ---

omnineural

# **OmniNeural** — World’s First NPU-aware Multimodal Model ## **Overview** **OmniNeural** is the first fully multimodal model designed specifically for Neural Processing Units (NPUs). It natively understands **text, images, and audio**, and runs across PCs, mobile devices, automobile, IoT, and robotics. ## Demos ### 📱 Mobile Phone NPU - Demo on Samsung S25 Ultra The first-ever fully local, multimodal, and conversational AI assistant that hears you and sees what you see, running **natively on Snapdragon NPU** for long battery life and low latency.

--- ## ✨ PC NPU - Capabilities Highlights

🖼️ Multi-Image Reasoning
Spot the difference across two images in multi-round dialogue.

🤖 Image + Text → Function Call
Snap a poster, add a text instruction, and AI agent creates a calendar event.

🎶 Multi-Audio Comparison
Tell the difference between two music clips locally.

--- ## **Key Features** - **Multimodal Intelligence** – Processes **text, image, and audio** in a unified model for richer reasoning and perception. - **NPU-Optimized Architecture** – Uses ReLU ops, sparse tensors, convolutional layers, and static graph execution for maximum throughput — **20% faster than non-NPU-aware models** . - **Hardware-Aware Attention** – Attention patterns tuned for NPU, lowering compute and memory demand . - **Native Static Graph** – Supports variable-length multimodal inputs with stable, predictable latency . - **Performance Gains** – **9× faster audio processing** and **3.5× faster image processing** on NPUs compared to baseline encoders . - **Privacy-First Inference** – All computation stays local: private, offline-capable, and cost-efficient. --- ## **Performance / Benchmarks** ### Human Evaluation (vs baselines) - **Vision**: Wins/ties in ~75% of prompts against Apple Foundation, Gemma-3n-E4B, Qwen2.5-Omni-3B. - **Audio**: Clear lead over baselines, much better than Gemma3n and Apple foundation model. - **Text**: Matches or outperforms leading multimodal baselines.

Human eval chart

### Nexa Attention Speedups - **9× faster** audio encoding (vs Whisper encoder). - **3.5× faster** image encoding (vs SigLIP encoder).

Human eval chart

--- ## **Architecture Overview** OmniNeural’s design is tightly coupled with NPU hardware: - **NPU-friendly ops** (ReLU > GELU/SILU). - **Sparse + small tensor multiplications** for efficiency. - **Convolutional layers** favored over linear for better NPU parallelization. - **Hardware-aware attention** patterns to cut compute cost. - **Static graph execution** for predictable latency. ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6618e0424dbef6bd3c72f89a/oINYbgXILJgTuKxKc1aO_.png) --- ## **Production Use Cases** - **PC & Mobile** – On-device AI agents combine **voice, vision, and text** for natural, accurate responses. - Examples: Summarize slides into an email (PC)*, *extract action items from chat (mobile). - Benefits: Private, offline, battery-efficient. - **Automotive** – In-car assistants handle **voice control, cabin safety, and environment awareness**. - Examples: Detects risks (child unbuckled, pet left, loose objects) and road conditions (fog, construction). - Benefits: Decisions run locally in milliseconds. - **IoT & Robotics** – Multimodal sensing for **factories, AR/VR, drones, and robots**. - Examples: Defect detection, technician overlays, hazard spotting mid-flight, natural robot interaction. - Benefits: Works without network connectivity. --- ## How to use > ⚠️ **Hardware requirement:** OmniNeural-4B currently runs **only on Qualcomm NPUs** (e.g., Snapdragon-powered AIPC). > Apple NPU support is planned next. ### 1) Install Nexa-SDK - Download and follow the steps under "Deploy Section" Nexa's model page: [Download Windows arm64 SDK](https://sdk.nexa.ai/model/OmniNeural-4B) - (Other platforms coming soon) ### 2) Get an access token Create a token in the Model Hub, then log in: ```bash nexa config set license '' ``` ### 3) Run the model Running: ```bash nexa infer NexaAI/OmniNeural-4B ``` /mic mode. Once the model is running, you can type below to record your voice directly in terminal ```bash > /mic ``` For images and audio, simply drag your files into the command line. Remember to leave space between file paths. --- ## Links & Community [![Discord](https://img.shields.io/badge/Discord-Join-5865F2?logo=discord&logoColor=white)](https://discord.com/invite/nexa-ai) [![X (Twitter) Follow](https://img.shields.io/badge/Follow-@nexa_ai-111?logo=x&logoColor=white)](https://x.com/nexa_ai) [![Website](https://img.shields.io/badge/Website-nexa.ai-0A84FF)](https://nexa.ai) - **Issues / Feedback:** Use the **HF Discussions** tab or submit an issue in our discord or nexa-sdk github. - **Roadmap & updates:** Follow us on X and Discord. > If you want to see more **NPU-first, multimodal** releases on HF, please give our model a like ❤️. ## Limitation The current model is mainly optimized for English. We will optimize other language as the next step. --- ## **Citation** ```bibtex @misc{ title={OmniNeural: World’s First NPU-aware Multimodal Model}, author={Nexa AI}, year={2025}, url={https://huggingface.co/NexaAI/OmniNeural-4B}, } ```