OmniNeural-4B / README.md

Update README.md

311dfee verified 1 day ago

6.82 kB

	---
	tags:
	- multimodal
	- NPU
	- On-device
	- Snapdragon PC
	- Android
	license: other
	license_name: nexa-research
	license_link: LICENSE
	---
	<p align="center">
	<img alt="omnineural" src="https://cdn-uploads.huggingface.co/production/uploads/6618e0424dbef6bd3c72f89a/zRUnoWmw43fl9hrXHg0pE.png">
	</p>

	# OmniNeural — World’s First NPU-aware Multimodal Model


	## Overview
	OmniNeural is the first fully multimodal model designed specifically for Neural Processing Units (NPUs). It natively understands text, images, and audio, and runs across PCs, mobile devices, automobile, IoT, and robotics.

	## Demos

	### 📱 Mobile Phone NPU - Demo on Samsung S25 Ultra
	The first-ever fully local, multimodal, and conversational AI assistant that hears you and sees what you see, running natively on Snapdragon NPU for long battery life and low latency.

	<video controls width="720" preload="metadata"
	src="https://huggingface.co/NexaAI/OmniNeural-4B/resolve/main/assets/MOBILE_50MB.mp4"
	type="video/mp4"></video>

	---

	## ✨ PC NPU - Capabilities Highlights

	<table>
	<tr>
	<td width="33%">
	<video controls width="100%" preload="metadata"
	src="https://huggingface.co/NexaAI/OmniNeural-4B/resolve/main/assets/PC_demo_2_image.mov"></video>
	<p align="center"><b>🖼️ Multi-Image Reasoning</b><br>Spot the difference across two images in multi-round dialogue.</p>
	</td>

	<td width="33%">
	<video controls width="100%" preload="metadata"
	src="https://huggingface.co/NexaAI/OmniNeural-4B/resolve/main/assets/PC_Demo_Agent.mov"></video>
	<p align="center"><b>🤖 Image + Text → Function Call</b><br>Snap a poster, add a text instruction, and AI agent creates a calendar event.</p>
	</td>

	<td width="33%">
	<video controls width="100%" preload="metadata"
	src="https://huggingface.co/NexaAI/OmniNeural-4B/resolve/main/assets/PC_Demo_Audio.mov"></video>
	<p align="center"><b>🎶 Multi-Audio Comparison</b><br>Tell the difference between two music clips locally.</p>
	</td>
	</tr>
	</table>



	---

	## Key Features
	- Multimodal Intelligence – Processes text, image, and audio in a unified model for richer reasoning and perception.
	- NPU-Optimized Architecture – Uses ReLU ops, sparse tensors, convolutional layers, and static graph execution for maximum throughput — 20% faster than non-NPU-aware models .
	- Hardware-Aware Attention – Attention patterns tuned for NPU, lowering compute and memory demand .
	- Native Static Graph – Supports variable-length multimodal inputs with stable, predictable latency .
	- Performance Gains – 9× faster audio processing and 3.5× faster image processing on NPUs compared to baseline encoders .
	- Privacy-First Inference – All computation stays local: private, offline-capable, and cost-efficient.

	---

	## Performance / Benchmarks
	### Human Evaluation (vs baselines)
	- Vision: Wins/ties in ~75% of prompts against Apple Foundation, Gemma-3n-E4B, Qwen2.5-Omni-3B.
	- Audio: Clear lead over baselines, much better than Gemma3n and Apple foundation model.
	- Text: Matches or outperforms leading multimodal baselines.


	<p align="center">
	<img src="https://cdn-uploads.huggingface.co/production/uploads/6618e0424dbef6bd3c72f89a/vsrg43GxTOSAj7q_SI60o.png" width="1560" alt="Human eval chart" />
	</p>

	### Nexa Attention Speedups
	- 9× faster audio encoding (vs Whisper encoder).
	- 3.5× faster image encoding (vs SigLIP encoder).


	<p align="center">
	<img src="https://cdn-uploads.huggingface.co/production/uploads/6618e0424dbef6bd3c72f89a/1039SN5JBQkS04z4YnoIi.png" width="400" alt="Human eval chart" />
	</p>

	---

	## Architecture Overview
	OmniNeural’s design is tightly coupled with NPU hardware:
	- NPU-friendly ops (ReLU > GELU/SILU).
	- Sparse + small tensor multiplications for efficiency.
	- Convolutional layers favored over linear for better NPU parallelization.
	- Hardware-aware attention patterns to cut compute cost.
	- Static graph execution for predictable latency.


	![image/png](https://cdn-uploads.huggingface.co/production/uploads/6618e0424dbef6bd3c72f89a/oINYbgXILJgTuKxKc1aO_.png)

	---

	## Production Use Cases

	- PC & Mobile – On-device AI agents combine voice, vision, and text for natural, accurate responses.
	- Examples: Summarize slides into an email (PC), extract action items from chat (mobile).
	- Benefits: Private, offline, battery-efficient.

	- Automotive – In-car assistants handle voice control, cabin safety, and environment awareness.
	- Examples: Detects risks (child unbuckled, pet left, loose objects) and road conditions (fog, construction).
	- Benefits: Decisions run locally in milliseconds.

	- IoT & Robotics – Multimodal sensing for factories, AR/VR, drones, and robots.
	- Examples: Defect detection, technician overlays, hazard spotting mid-flight, natural robot interaction.
	- Benefits: Works without network connectivity.

	---

	## How to use

	> ⚠️ Hardware requirement: OmniNeural-4B currently runs only on Qualcomm NPUs (e.g., Snapdragon-powered AIPC).
	> Apple NPU support is planned next.

	### 1) Install Nexa-SDK

	- Download and follow the steps under "Deploy Section" Nexa's model page: [Download Windows arm64 SDK](https://sdk.nexa.ai/model/OmniNeural-4B)
	- (Other platforms coming soon)

	### 2) Get an access token
	Create a token in the Model Hub, then log in:

	```bash
	nexa config set license '<access_token>'
	```

	### 3) Run the model
	Running:

	```bash
	nexa infer NexaAI/OmniNeural-4B
	```

	/mic mode. Once the model is running, you can type below to record your voice directly in terminal
	```bash
	> /mic
	```

	For images and audio, simply drag your files into the command line. Remember to leave space between file paths.

	---

	## Links & Community

	[![Discord](https://img.shields.io/badge/Discord-Join-5865F2?logo=discord&logoColor=white)](https://discord.com/invite/nexa-ai)

	[![X (Twitter) Follow](https://img.shields.io/badge/Follow-@nexa_ai-111?logo=x&logoColor=white)](https://x.com/nexa_ai)

	[![Website](https://img.shields.io/badge/Website-nexa.ai-0A84FF)](https://nexa.ai)

	- Issues / Feedback: Use the HF Discussions tab or submit an issue in our discord or nexa-sdk github.
	- Roadmap & updates: Follow us on X and Discord.

	> If you want to see more NPU-first, multimodal releases on HF, please give our model a like ❤️.

	## Limitation
	The current model is mainly optimized for English. We will optimize other language as the next step.

	---

	## Citation

	```bibtex
	@misc{
	title={OmniNeural: World’s First NPU-aware Multimodal Model},
	author={Nexa AI},
	year={2025},
	url={https://huggingface.co/NexaAI/OmniNeural-4B},
	}
	```