Droplet3D: Commonsense Priors from Videos Facilitate 3D Generation Paper • 2508.20470 • Published Aug 28, 2025 • 75
DropletVideo: A Dataset and Approach to Explore Integral Spatio-Temporal Consistent Video Generation Paper • 2503.06053 • Published Mar 8, 2025 • 138
A Multi-Modal AI Copilot for Single-Cell Analysis with Instruction Following Paper • 2501.08187 • Published Jan 14, 2025 • 27
O1-Pruner: Length-Harmonizing Fine-Tuning for O1-Like Reasoning Pruning Paper • 2501.12570 • Published Jan 22, 2025 • 28
Are VLMs Ready for Autonomous Driving? An Empirical Study from the Reliability, Data, and Metric Perspectives Paper • 2501.04003 • Published Jan 7, 2025 • 27
TransPixar: Advancing Text-to-Video Generation with Transparency Paper • 2501.03006 • Published Jan 6, 2025 • 25
MMDocIR: Benchmarking Multi-Modal Retrieval for Long Documents Paper • 2501.08828 • Published Jan 15, 2025 • 30
Multiple Choice Questions: Reasoning Makes Large Language Models (LLMs) More Self-Confident Even When They Are Wrong Paper • 2501.09775 • Published Jan 16, 2025 • 32
Virgo: A Preliminary Exploration on Reproducing o1-like MLLM Paper • 2501.01904 • Published Jan 3, 2025 • 33
Padding Tone: A Mechanistic Analysis of Padding Tokens in T2I Models Paper • 2501.06751 • Published Jan 12, 2025 • 32
Personalized Graph-Based Retrieval for Large Language Models Paper • 2501.02157 • Published Jan 4, 2025 • 31
O1 Replication Journey -- Part 3: Inference-time Scaling for Medical Reasoning Paper • 2501.06458 • Published Jan 11, 2025 • 31
VideoAuteur: Towards Long Narrative Video Generation Paper • 2501.06173 • Published Jan 10, 2025 • 31
Do generative video models learn physical principles from watching videos? Paper • 2501.09038 • Published Jan 14, 2025 • 34
Omni-RGPT: Unifying Image and Video Region-level Understanding via Token Marks Paper • 2501.08326 • Published Jan 14, 2025 • 33
Can We Generate Images with CoT? Let's Verify and Reinforce Image Generation Step by Step Paper • 2501.13926 • Published Jan 23, 2025 • 43
VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM Paper • 2501.00599 • Published Dec 31, 2024 • 46
MotionBench: Benchmarking and Improving Fine-grained Video Motion Understanding for Vision Language Models Paper • 2501.02955 • Published Jan 6, 2025 • 44