WebMMU Collection WebMMU: A Benchmark for Multimodal Multilingual Website Understanding and Code Generation • 1 item • Updated Jul 4 • 2
AlignVLM: Bridging Vision and Language Latent Spaces for Multimodal Understanding Paper • 2502.01341 • Published Feb 3 • 39
Multimodal foundation world models for generalist embodied agents Paper • 2406.18043 • Published Jun 26, 2024 • 1