ClawGUI: A Unified Framework for Training, Evaluating, and Deploying GUI Agents Paper • 2604.11784 • Published 4 days ago • 129
WildDet3D: Scaling Promptable 3D Detection in the Wild Paper • 2604.08626 • Published 8 days ago • 236
AURA: Always-On Understanding and Real-Time Assistance via Video Streams Paper • 2604.04184 • Published 12 days ago • 50
view article Article Welcome Gemma 4: Frontier multimodal intelligence on device +5 15 days ago • 852
Intern-S1-Pro: Scientific Multimodal Foundation Model at Trillion Scale Paper • 2603.25040 • Published 22 days ago • 131
4DGS360: 360° Gaussian Reconstruction of Dynamic Objects from a Single Video Paper • 2603.21618 • Published 25 days ago • 15
PEARL: Personalized Streaming Video Understanding Model Paper • 2603.20422 • Published 27 days ago • 40
Look Before Acting: Enhancing Vision Foundation Representations for Vision-Language-Action Models Paper • 2603.15618 • Published Mar 16 • 21
Make it SING: Analyzing Semantic Invariants in Classifiers Paper • 2603.14610 • Published Mar 15 • 16
Spatial-TTT: Streaming Visual-based Spatial Intelligence with Test-Time Training Paper • 2603.12255 • Published Mar 12 • 91
Geometry-Guided Reinforcement Learning for Multi-view Consistent 3D Scene Editing Paper • 2603.03143 • Published Mar 3 • 145
LoGeR: Long-Context Geometric Reconstruction with Hybrid Memory Paper • 2603.03269 • Published Mar 3 • 63
Holi-Spatial: Evolving Video Streams into Holistic 3D Spatial Intelligence Paper • 2603.07660 • Published Mar 8 • 86
EmbodiedSplat: Online Feed-Forward Semantic 3DGS for Open-Vocabulary 3D Scene Understanding Paper • 2603.04254 • Published Mar 4 • 1
OmniLottie: Generating Vector Animations via Parameterized Lottie Tokens Paper • 2603.02138 • Published Mar 2 • 151