Benchmarking Visual State Tracking in Multimodal Video Understanding Paper • 2606.03920 • Published about 1 month ago • 53