Open to Collab

64 11

Nima Nooshiri

nimanzik

AI & ML interests

None yet

Recent Activity

updated a collection 10 days ago

Hugging Face In-Depth Articles

updated a collection 11 days ago

Hugging Face In-Depth Articles

updated a collection 11 days ago

Hugging Face Playbooks & Guidebooks

View all activity

Organizations

updated a collection 10 days ago

Hugging Face In-Depth Articles

Collection

2 items • Updated 10 days ago

updated 2 collections 11 days ago

Hugging Face In-Depth Articles

Collection

2 items • Updated 10 days ago

Hugging Face Playbooks & Guidebooks

Collection

8 items • Updated 11 days ago • 2

liked a Space 11 days ago

Token-In, Token-Out Done Right

🧩

Explore an interactive simulation while reading the article

upvoted an article 21 days ago

Article

Profiling in PyTorch (Part 1): A Beginner's Guide to torch.profiler

ariG23498, sayakpaul, sergiopaniego, ror, pcuenq

•

24 days ago

• 122

upvoted an article 25 days ago

Article

Harness, Scaffold, and the AI Agent Terms Worth Getting Right

sergiopaniego, ariG23498

•

28 days ago

• 116

liked a Space 26 days ago

physics-intern: an Autonomous Agent for Physics Research

📝

Explore an autonomous AI workflow for physics research

upvoted an article about 1 month ago

Article

KV Caching Explained: Optimizing Transformer Inference Efficiency

not-lain

•

Jan 30, 2025

• 351

updated a collection about 1 month ago

Hugging Face Playbooks & Guidebooks

Collection

8 items • Updated 11 days ago • 2

published a model about 2 months ago

nimanzik/totem-reproduction-vqvae

Updated May 6

upvoted 6 articles about 2 months ago

Article

KV Cache from scratch in nanoVLM

ariG23498, kashif, lusxvr, andito, pcuenq

•

Jun 4, 2025

• 120

Article

GGML and llama.cpp join HF to ensure the long-term progress of Local AI

ggerganov, ngxson, allozaur, lysandre, victor, julien-c

•

Feb 20

• 507

Article

Introducing Storage Buckets on the Hugging Face Hub

Wauplin, coyotte508, XciD, victor, julien-c, lhoestq, pierric, Sylvestre, hlarcher, rajatarya, seanses, assafvayner

•

Mar 10

• 196

Article

Running AI agents to automate outreach at scale

nielsr

•

Apr 27

• 15

Article

DeepSeek-V4: a million-token context that agents can actually use

burtenshaw

•

Apr 24

• 50

Article

NaFlex in timm

rwightman

•

Apr 9, 2025

• 3

upvoted an article 2 months ago

Article

LateOn-Code & ColGrep: LightOn unveils state-of-the-art code retrieval models and code search tooling

lightonai

•

Feb 12

• 57

reacted to Kseniase's post with 👍 2 months ago

Post

8364

15 types of attention mechanisms

Attention mechanisms allow models to dynamically focus on specific parts of their input when performing tasks. In our recent article, we discussed Multi-Head Latent Attention (MLA) in detail and now it's time to summarize other existing types of attention.

Here is a list of 15 types of attention mechanisms used in AI models:

1. Soft attention (Deterministic attention) -> Neural Machine Translation by Jointly Learning to Align and Translate (1409.0473)
Assigns a continuous weight distribution over all parts of the input. It produces a weighted sum of the input using attention weights that sum to 1.

2. Hard attention (Stochastic attention) -> Effective Approaches to Attention-based Neural Machine Translation (1508.04025)
Makes a discrete selection of some part of the input to focus on at each step, rather than attending to everything.

3. Self-attention -> Attention Is All You Need (1706.03762)
Each element in the sequence "looks" at other elements and "decides" how much to borrow from each of them for its new representation.

4. Cross-Attention (Encoder-Decoder attention) -> Cross-Attention is All You Need: Adapting Pretrained Transformers for Machine Translation (2104.08771)
The queries come from one sequence and the keys/values come from another sequence. It allows a model to combine information from two different sources.

5. Multi-Head Attention (MHA) -> Attention Is All You Need (1706.03762)
Multiple attention “heads” are run in parallel. The model computes several attention distributions (heads), each with its own set of learned projections of queries, keys, and values.

6. Multi-Head Latent Attention (MLA) -> DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model (2405.04434)
Extends MHA by incorporating a latent space where attention heads can dynamically learn different latent factors or representations.

7. Memory-Based attention -> End-To-End Memory Networks (1503.08895)
Involves an external memory and uses attention to read from and write to this memory.

See other types in the comments 👇

1 reply

upvoted an article 2 months ago

Article

DenseOn with the LateOn: Open State-of-the-Art Single and Multi-Vector Models

lightonai

•

Apr 21

• 40

Nima Nooshiri

AI & ML interests

Recent Activity

Organizations

nimanzik's activity

Token-In, Token-Out Done Right

Profiling in PyTorch (Part 1): A Beginner's Guide to torch.profiler

Harness, Scaffold, and the AI Agent Terms Worth Getting Right

physics-intern: an Autonomous Agent for Physics Research

KV Caching Explained: Optimizing Transformer Inference Efficiency

KV Cache from scratch in nanoVLM

GGML and llama.cpp join HF to ensure the long-term progress of Local AI

Introducing Storage Buckets on the Hugging Face Hub

Running AI agents to automate outreach at scale

DeepSeek-V4: a million-token context that agents can actually use

NaFlex in timm

LateOn-Code & ColGrep: LightOn unveils state-of-the-art code retrieval models and code search tooling

DenseOn with the LateOn: Open State-of-the-Art Single and Multi-Vector Models