Open to Collab

Muhammad Umair

umair894

AI & ML interests

Multimodal Reidentification | Feature Upscaling | Cross-modal alignment | robust generalization | PhD UESTC

Recent Activity

upvoted a paper 2 days ago

ClawGUI: A Unified Framework for Training, Evaluating, and Deploying GUI Agents

liked a Space 4 days ago

LiquidAI/LFM2.5-VL-450M-WebGPU

liked a Space 4 days ago

allenai/WildDet3D

View all activity

Organizations

upvoted a paper 2 days ago

ClawGUI: A Unified Framework for Training, Evaluating, and Deploying GUI Agents

Paper • 2604.11784 • Published 4 days ago • 129

upvoted a paper 4 days ago

WildDet3D: Scaling Promptable 3D Detection in the Wild

Paper • 2604.08626 • Published 8 days ago • 236

upvoted a paper 10 days ago

AURA: Always-On Understanding and Real-Time Assistance via Video Streams

Paper • 2604.04184 • Published 12 days ago • 50

upvoted a paper 11 days ago

A Simple Baseline for Streaming Video Understanding

Paper • 2604.02317 • Published 15 days ago • 72

upvoted an article 11 days ago

Article

Welcome Gemma 4: Frontier multimodal intelligence on device

15 days ago

•

852

upvoted a paper 17 days ago

Towards a Medical AI Scientist

Paper • 2603.28589 • Published 18 days ago • 88

upvoted a paper 21 days ago

Intern-S1-Pro: Scientific Multimodal Foundation Model at Trillion Scale

Paper • 2603.25040 • Published 22 days ago • 131

upvoted a paper 22 days ago

4DGS360: 360° Gaussian Reconstruction of Dynamic Objects from a Single Video

Paper • 2603.21618 • Published 25 days ago • 15

upvoted a paper 23 days ago

PEARL: Personalized Streaming Video Understanding Model

Paper • 2603.20422 • Published 27 days ago • 40

upvoted a paper 29 days ago

Look Before Acting: Enhancing Vision Foundation Representations for Vision-Language-Action Models

Paper • 2603.15618 • Published Mar 16 • 21

upvoted 10 papers about 1 month ago

Make it SING: Analyzing Semantic Invariants in Classifiers

Paper • 2603.14610 • Published Mar 15 • 16

AI Can Learn Scientific Taste

Paper • 2603.14473 • Published Mar 15 • 423

Spatial-TTT: Streaming Visual-based Spatial Intelligence with Test-Time Training

Paper • 2603.12255 • Published Mar 12 • 91

Geometry-Guided Reinforcement Learning for Multi-view Consistent 3D Scene Editing

Paper • 2603.03143 • Published Mar 3 • 145

EmbodiedSplat: Online Feed-Forward Semantic 3DGS for Open-Vocabulary 3D Scene Understanding

Paper • 2603.04254 • Published Mar 4 • 1

Utonia: Toward One Encoder for All Point Clouds

Paper • 2603.03283 • Published Mar 3 • 185

OmniLottie: Generating Vector Animations via Parameterized Lottie Tokens

Paper • 2603.02138 • Published Mar 2 • 151

Muhammad Umair

AI & ML interests

Recent Activity

Organizations

umair894's activity

Welcome Gemma 4: Frontier multimodal intelligence on device