OmniACT: A Dataset and Benchmark for Enabling Multimodal Generalist Autonomous Agents for Desktop and Web Paper • 2402.17553 • Published Feb 27, 2024 • 25
Grounding Language Models to Images for Multimodal Inputs and Outputs Paper • 2301.13823 • Published Jan 31, 2023 • 2