PEARL: Personalized Streaming Video Understanding Model Paper • 2603.20422 • Published 8 days ago • 37 • 4
PEARL: Personalized Streaming Video Understanding Model Paper • 2603.20422 • Published 8 days ago • 37
Unlocking Multimodal Document Intelligence: From Current Triumphs to Future Frontiers of Visual Document Retrieval Paper • 2602.19961 • Published Feb 23 • 2
Temporal Gains, Spatial Costs: Revisiting Video Fine-Tuning in Multimodal Large Language Models Paper • 2603.17541 • Published 11 days ago • 20
AndroTMem: From Interaction Trajectories to Anchored Memory in Long-Horizon GUI Agents Paper • 2603.18429 • Published 10 days ago • 26
Generation Models Know Space: Unleashing Implicit 3D Priors for Scene Understanding Paper • 2603.19235 • Published 9 days ago • 93
Unlocking Multimodal Document Intelligence: From Current Triumphs to Future Frontiers of Visual Document Retrieval Paper • 2602.19961 • Published Feb 23 • 2
AndroTMem: From Interaction Trajectories to Anchored Memory in Long-Horizon GUI Agents Paper • 2603.18429 • Published 10 days ago • 26
AndroTMem: From Interaction Trajectories to Anchored Memory in Long-Horizon GUI Agents Paper • 2603.18429 • Published 10 days ago • 26
Temporal Gains, Spatial Costs: Revisiting Video Fine-Tuning in Multimodal Large Language Models Paper • 2603.17541 • Published 11 days ago • 20
Temporal Gains, Spatial Costs: Revisiting Video Fine-Tuning in Multimodal Large Language Models Paper • 2603.17541 • Published 11 days ago • 20
BPDQ: Bit-Plane Decomposition Quantization on a Variable Grid for Large Language Models Paper • 2602.04163 • Published Feb 4 • 10
BPDQ: Bit-Plane Decomposition Quantization on a Variable Grid for Large Language Models Paper • 2602.04163 • Published Feb 4 • 10
BPDQ: Bit-Plane Decomposition Quantization on a Variable Grid for Large Language Models Paper • 2602.04163 • Published Feb 4 • 10
OmniSIFT: Modality-Asymmetric Token Compression for Efficient Omni-modal Large Language Models Paper • 2602.04804 • Published Feb 4 • 49
OmniSIFT: Modality-Asymmetric Token Compression for Efficient Omni-modal Large Language Models Paper • 2602.04804 • Published Feb 4 • 49
OmniSIFT: Modality-Asymmetric Token Compression for Efficient Omni-modal Large Language Models Paper • 2602.04804 • Published Feb 4 • 49
JavisGPT: A Unified Multi-modal LLM for Sounding-Video Comprehension and Generation Paper • 2512.22905 • Published Dec 28, 2025 • 20