---
tags:
- multimodal
- NPU
- On-device
- Snapdragon PC
- Android
license: other
license_name: nexa-research
license_link: LICENSE
---
# **OmniNeural** — World’s First NPU-aware Multimodal Model
## **Overview**
**OmniNeural** is the first fully multimodal model designed specifically for Neural Processing Units (NPUs). It natively understands **text, images, and audio**, and runs across PCs, mobile devices, automobile, IoT, and robotics.
## Demos
### 📱 Mobile Phone NPU - Demo on Samsung S25 Ultra
The first-ever fully local, multimodal, and conversational AI assistant that hears you and sees what you see, running **natively on Snapdragon NPU** for long battery life and low latency.
---
## ✨ PC NPU - Capabilities Highlights
🖼️ Multi-Image Reasoning Spot the difference across two images in multi-round dialogue.
🤖 Image + Text → Function Call Snap a poster, add a text instruction, and AI agent creates a calendar event.
🎶 Multi-Audio Comparison Tell the difference between two music clips locally.
---
## **Key Features**
- **Multimodal Intelligence** – Processes **text, image, and audio** in a unified model for richer reasoning and perception.
- **NPU-Optimized Architecture** – Uses ReLU ops, sparse tensors, convolutional layers, and static graph execution for maximum throughput — **20% faster than non-NPU-aware models** .
- **Hardware-Aware Attention** – Attention patterns tuned for NPU, lowering compute and memory demand .
- **Native Static Graph** – Supports variable-length multimodal inputs with stable, predictable latency .
- **Performance Gains** – **9× faster audio processing** and **3.5× faster image processing** on NPUs compared to baseline encoders .
- **Privacy-First Inference** – All computation stays local: private, offline-capable, and cost-efficient.
---
## **Performance / Benchmarks**
### Human Evaluation (vs baselines)
- **Vision**: Wins/ties in ~75% of prompts against Apple Foundation, Gemma-3n-E4B, Qwen2.5-Omni-3B.
- **Audio**: Clear lead over baselines, much better than Gemma3n and Apple foundation model.
- **Text**: Matches or outperforms leading multimodal baselines.
---
## **Architecture Overview**
OmniNeural’s design is tightly coupled with NPU hardware:
- **NPU-friendly ops** (ReLU > GELU/SILU).
- **Sparse + small tensor multiplications** for efficiency.
- **Convolutional layers** favored over linear for better NPU parallelization.
- **Hardware-aware attention** patterns to cut compute cost.
- **Static graph execution** for predictable latency.

---
## **Production Use Cases**
- **PC & Mobile** – On-device AI agents combine **voice, vision, and text** for natural, accurate responses.
- Examples: Summarize slides into an email (PC)*, *extract action items from chat (mobile).
- Benefits: Private, offline, battery-efficient.
- **Automotive** – In-car assistants handle **voice control, cabin safety, and environment awareness**.
- Examples: Detects risks (child unbuckled, pet left, loose objects) and road conditions (fog, construction).
- Benefits: Decisions run locally in milliseconds.
- **IoT & Robotics** – Multimodal sensing for **factories, AR/VR, drones, and robots**.
- Examples: Defect detection, technician overlays, hazard spotting mid-flight, natural robot interaction.
- Benefits: Works without network connectivity.
---
## How to use
> ⚠️ **Hardware requirement:** OmniNeural-4B currently runs **only on Qualcomm NPUs** (e.g., Snapdragon-powered AIPC).
> Apple NPU support is planned next.
### 1) Install Nexa-SDK
- Download and follow the steps under "Deploy Section" Nexa's model page: [Download Windows arm64 SDK](https://sdk.nexa.ai/model/OmniNeural-4B)
- (Other platforms coming soon)
### 2) Get an access token
Create a token in the Model Hub, then log in:
```bash
nexa config set license ''
```
### 3) Run the model
Running:
```bash
nexa infer NexaAI/OmniNeural-4B
```
/mic mode. Once the model is running, you can type below to record your voice directly in terminal
```bash
> /mic
```
For images and audio, simply drag your files into the command line. Remember to leave space between file paths.
---
## Links & Community
[](https://discord.com/invite/nexa-ai)
[](https://x.com/nexa_ai)
[](https://nexa.ai)
- **Issues / Feedback:** Use the **HF Discussions** tab or submit an issue in our discord or nexa-sdk github.
- **Roadmap & updates:** Follow us on X and Discord.
> If you want to see more **NPU-first, multimodal** releases on HF, please give our model a like ❤️.
## Limitation
The current model is mainly optimized for English. We will optimize other language as the next step.
---
## **Citation**
```bibtex
@misc{
title={OmniNeural: World’s First NPU-aware Multimodal Model},
author={Nexa AI},
year={2025},
url={https://huggingface.co/NexaAI/OmniNeural-4B},
}
```