alanzhuly commited on
Commit
57ca287
·
verified ·
1 Parent(s): 8db61f6

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +100 -0
README.md ADDED
@@ -0,0 +1,100 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc
3
+ tags:
4
+ - multimodal
5
+ ---
6
+ # **OmniNeural** — World’s First NPU-Optimized Multimodal Model
7
+
8
+ ## **Overview**
9
+ **OmniNeural** is the first multimodal model designed specifically for Neural Processing Units (NPUs). It natively understands **text, images, and audio**, and runs across PCs, mobile devices, vehicles, IoT, and robotics.
10
+
11
+ By co-designing the software and model architecture with NPU hardware, OmniNeural achieves:
12
+ - **Up to 1.5× faster than CPU and 4× faster than GPU** for inference on consumer devices (e.g., Samsung S25 Ultra) .
13
+ - **2–4× better efficiency than CPU and 4–8× better than GPU** in battery usage .
14
+ - **Smooth multitasking**, running large generative AI models without slowing other applications .
15
+
16
+ This combination of speed, efficiency, and NPU support makes OmniNeural the most practical multimodal foundation for edge intelligence.
17
+
18
+ ---
19
+
20
+ ## **Key Features**
21
+ - **Multimodal Intelligence** – Processes **text, image, and audio** in a unified model for richer reasoning and perception.
22
+ - **NPU-Optimized Architecture** – Uses ReLU ops, sparse tensors, convolutional layers, and static graph execution for maximum throughput — **20% faster than non-NPU-aware models** .
23
+ - **Hardware-Aware Attention** – Attention patterns tuned for NPU, lowering compute and memory demand .
24
+ - **Native Static Graph** – Supports variable-length multimodal inputs with stable, predictable latency .
25
+ - **Performance Gains** – **9× faster audio processing** and **3.5× faster image processing** on NPUs compared to baseline encoders .
26
+ - **Privacy-First Inference** – All computation stays local: private, offline-capable, and cost-efficient.
27
+
28
+ ---
29
+
30
+ ## **Use Cases**
31
+
32
+ - **PC & Mobile** – On-device AI agents combine **voice, vision, and text** for natural, accurate responses.
33
+ - Examples: *Summarize slides into an email (PC)*, *extract action items from chat (mobile)*.
34
+ - Benefits: Private, offline, battery-efficient.
35
+
36
+ - **Automotive** – In-car assistants handle **voice control, cabin safety, and environment awareness**.
37
+ - Detects risks (child unbuckled, pet left, loose objects) and road conditions (fog, construction).
38
+ - Decisions run locally in milliseconds.
39
+
40
+ - **IoT & Robotics** – Multimodal sensing for **factories, AR/VR, drones, and robots**.
41
+ - Examples: Defect detection, technician overlays, hazard spotting mid-flight, natural robot interaction.
42
+ - Works without network connectivity.
43
+
44
+ ---
45
+
46
+ ## **Performance / Benchmarks**
47
+ ### Human Evaluation (vs baselines)
48
+ - **Vision**: Wins/ties in ~75% of prompts against Apple Foundation, Gemma-3n-E4B, Qwen2.5-Omni-3B.
49
+ - **Audio**: Clear lead over baselines, especially in Whisper-encoder style tasks.
50
+ - **Text**: Matches or outperforms leading multimodal baselines.
51
+
52
+
53
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6618e0424dbef6bd3c72f89a/nplpd2RWyL_cYj0t-xvhq.png)
54
+
55
+
56
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6618e0424dbef6bd3c72f89a/0BK09jUWUnDqYjsKpdFcR.png)
57
+
58
+
59
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6618e0424dbef6bd3c72f89a/LJNCxyU0OKp7Z4ecLSD9C.png)
60
+
61
+ ### Nexa Attention Speedups
62
+ - **9× faster** audio encoding (vs Whisper).
63
+ - **3.5× faster** image encoding (vs SigLIP).
64
+
65
+
66
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6618e0424dbef6bd3c72f89a/tKZ9zPjjZtVdGW2N3yBHp.png)
67
+
68
+ ---
69
+
70
+ ## **Architecture Overview**
71
+ OmniNeural’s design is tightly coupled with NPU hardware:
72
+ - **NPU-friendly ops** (ReLU > GELU/SILU).
73
+ - **Sparse + small tensor multiplications** for efficiency.
74
+ - **Convolutional layers** favored over linear for better NPU parallelization.
75
+ - **Hardware-aware attention** patterns to cut compute cost.
76
+ - **Static graph execution** for predictable latency.
77
+
78
+
79
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6618e0424dbef6bd3c72f89a/oINYbgXILJgTuKxKc1aO_.png)
80
+
81
+ ---
82
+
83
+ ## **How to use** //TODO
84
+
85
+ > ⚠️ Note: OmniNeural currently runs on Qualcomm NPUs (Snapdragon devices).
86
+ > Apple NPU support is planned for the next release.
87
+
88
+ **Install via Nexa-SDK:**
89
+
90
+ ## **License & Citation**
91
+
92
+ @misc{omninneural2025,
93
+ title={OmniNeural: NPU-Optimized Multimodal Model for On-Device AI},
94
+ author={Nexa AI},
95
+ year={2025},
96
+ howpublished={\url{https://huggingface.co/NexaAI/OmniNeural-4B}}
97
+ }
98
+
99
+ ## **Links & Community** //TODO
100
+