Ersi Ni's picture

12 24

Ersi Ni

nilbot

·

nilbot

AI & ML interests

Transformers

Recent Activity

updated a collection about 15 hours ago

updated a collection about 15 hours ago

updated a collection 11 months ago

View all activity

Organizations

None yet

updated a collection about 15 hours ago

towards AGI

9 items • Updated about 15 hours ago

updated a collection 11 months ago

towards AGI

9 items • Updated about 15 hours ago

upvoted a paper 11 months ago

Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention

Paper • 2502.11089 • Published Feb 16, 2025 • 166

upvoted an article 12 months ago

Article

G2P Shrinks Speech Models

Feb 5, 2025

•

83

liked a model 12 months ago

hexgrad/Kokoro-82M

Text-to-Speech • Updated Apr 10, 2025 • 1.9M • • 5.59k

updated a collection 12 months ago

towards AGI

9 items • Updated about 15 hours ago

upvoted a paper 12 months ago

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Paper • 2501.12948 • Published Jan 22, 2025 • 435

upvoted an article 12 months ago

Article

MiniMax-01 is Now Open-Source: Scaling Lightning Attention for the AI Agent Era

Jan 15, 2025

•

48

upvoted an article about 1 year ago

Article

🌁#82: AI and ML in Real Life

Jan 7, 2025

•

16

updated a collection about 1 year ago

Inbox

4 items • Updated Nov 25, 2024

upvoted a paper about 1 year ago

UnifiedCrawl: Aggregated Common Crawl for Affordable Adaptation of LLMs on Low-Resource Languages

Paper • 2411.14343 • Published Nov 21, 2024 • 7

liked a dataset about 1 year ago

v2ray/anime-collection

Updated Apr 8, 2025 • 4 • 7

liked a model about 1 year ago

mistralai/Ministral-8B-Instruct-2410

8B • Updated Jul 31, 2025 • 87.4k • 570

liked a dataset about 1 year ago

neuralwork/arxiver

Viewer • Updated Nov 1, 2024 • 63.4k • 428 • 365

liked 2 models about 1 year ago

deepseek-ai/Janus-1.3B

Any-to-Any • 2B • Updated Jan 27, 2025 • 2.93k • 592

nvidia/Llama-3.1-Nemotron-70B-Instruct-HF

Text Generation • 71B • Updated Apr 13, 2025 • 4.22k • • 2.06k

replied to PLB's post over 1 year ago

Interesting, but how does this approach generalize to arbitrary user query / document domains? Would you need to train a separate network for each domain / dataset?

updated a collection over 1 year ago

Inbox

4 items • Updated Nov 25, 2024