Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,100 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: cc
|
3 |
+
tags:
|
4 |
+
- multimodal
|
5 |
+
---
|
6 |
+
# **OmniNeural** — World’s First NPU-Optimized Multimodal Model
|
7 |
+
|
8 |
+
## **Overview**
|
9 |
+
**OmniNeural** is the first multimodal model designed specifically for Neural Processing Units (NPUs). It natively understands **text, images, and audio**, and runs across PCs, mobile devices, vehicles, IoT, and robotics.
|
10 |
+
|
11 |
+
By co-designing the software and model architecture with NPU hardware, OmniNeural achieves:
|
12 |
+
- **Up to 1.5× faster than CPU and 4× faster than GPU** for inference on consumer devices (e.g., Samsung S25 Ultra) .
|
13 |
+
- **2–4× better efficiency than CPU and 4–8× better than GPU** in battery usage .
|
14 |
+
- **Smooth multitasking**, running large generative AI models without slowing other applications .
|
15 |
+
|
16 |
+
This combination of speed, efficiency, and NPU support makes OmniNeural the most practical multimodal foundation for edge intelligence.
|
17 |
+
|
18 |
+
---
|
19 |
+
|
20 |
+
## **Key Features**
|
21 |
+
- **Multimodal Intelligence** – Processes **text, image, and audio** in a unified model for richer reasoning and perception.
|
22 |
+
- **NPU-Optimized Architecture** – Uses ReLU ops, sparse tensors, convolutional layers, and static graph execution for maximum throughput — **20% faster than non-NPU-aware models** .
|
23 |
+
- **Hardware-Aware Attention** – Attention patterns tuned for NPU, lowering compute and memory demand .
|
24 |
+
- **Native Static Graph** – Supports variable-length multimodal inputs with stable, predictable latency .
|
25 |
+
- **Performance Gains** – **9× faster audio processing** and **3.5× faster image processing** on NPUs compared to baseline encoders .
|
26 |
+
- **Privacy-First Inference** – All computation stays local: private, offline-capable, and cost-efficient.
|
27 |
+
|
28 |
+
---
|
29 |
+
|
30 |
+
## **Use Cases**
|
31 |
+
|
32 |
+
- **PC & Mobile** – On-device AI agents combine **voice, vision, and text** for natural, accurate responses.
|
33 |
+
- Examples: *Summarize slides into an email (PC)*, *extract action items from chat (mobile)*.
|
34 |
+
- Benefits: Private, offline, battery-efficient.
|
35 |
+
|
36 |
+
- **Automotive** – In-car assistants handle **voice control, cabin safety, and environment awareness**.
|
37 |
+
- Detects risks (child unbuckled, pet left, loose objects) and road conditions (fog, construction).
|
38 |
+
- Decisions run locally in milliseconds.
|
39 |
+
|
40 |
+
- **IoT & Robotics** – Multimodal sensing for **factories, AR/VR, drones, and robots**.
|
41 |
+
- Examples: Defect detection, technician overlays, hazard spotting mid-flight, natural robot interaction.
|
42 |
+
- Works without network connectivity.
|
43 |
+
|
44 |
+
---
|
45 |
+
|
46 |
+
## **Performance / Benchmarks**
|
47 |
+
### Human Evaluation (vs baselines)
|
48 |
+
- **Vision**: Wins/ties in ~75% of prompts against Apple Foundation, Gemma-3n-E4B, Qwen2.5-Omni-3B.
|
49 |
+
- **Audio**: Clear lead over baselines, especially in Whisper-encoder style tasks.
|
50 |
+
- **Text**: Matches or outperforms leading multimodal baselines.
|
51 |
+
|
52 |
+
|
53 |
+

|
54 |
+
|
55 |
+
|
56 |
+

|
57 |
+
|
58 |
+
|
59 |
+

|
60 |
+
|
61 |
+
### Nexa Attention Speedups
|
62 |
+
- **9× faster** audio encoding (vs Whisper).
|
63 |
+
- **3.5× faster** image encoding (vs SigLIP).
|
64 |
+
|
65 |
+
|
66 |
+

|
67 |
+
|
68 |
+
---
|
69 |
+
|
70 |
+
## **Architecture Overview**
|
71 |
+
OmniNeural’s design is tightly coupled with NPU hardware:
|
72 |
+
- **NPU-friendly ops** (ReLU > GELU/SILU).
|
73 |
+
- **Sparse + small tensor multiplications** for efficiency.
|
74 |
+
- **Convolutional layers** favored over linear for better NPU parallelization.
|
75 |
+
- **Hardware-aware attention** patterns to cut compute cost.
|
76 |
+
- **Static graph execution** for predictable latency.
|
77 |
+
|
78 |
+
|
79 |
+

|
80 |
+
|
81 |
+
---
|
82 |
+
|
83 |
+
## **How to use** //TODO
|
84 |
+
|
85 |
+
> ⚠️ Note: OmniNeural currently runs on Qualcomm NPUs (Snapdragon devices).
|
86 |
+
> Apple NPU support is planned for the next release.
|
87 |
+
|
88 |
+
**Install via Nexa-SDK:**
|
89 |
+
|
90 |
+
## **License & Citation**
|
91 |
+
|
92 |
+
@misc{omninneural2025,
|
93 |
+
title={OmniNeural: NPU-Optimized Multimodal Model for On-Device AI},
|
94 |
+
author={Nexa AI},
|
95 |
+
year={2025},
|
96 |
+
howpublished={\url{https://huggingface.co/NexaAI/OmniNeural-4B}}
|
97 |
+
}
|
98 |
+
|
99 |
+
## **Links & Community** //TODO
|
100 |
+
|