multimodal - a Andyrasika Collection

Andyrasika 's Collections

Reasoning-Model

Agents

Prompt-collection

Ankush Collection

Audio

Reinforcement Learning

Stable Diffusion

Synthetic Datasets

multimodal

updated Jul 11

this collection is for multimodal papers

Masked Generative Video-to-Audio Transformers with Enhanced Synchronicity

Paper • 2407.10387 • Published Jul 15, 2024 • 8
Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models

Paper • 2411.04996 • Published Nov 7, 2024 • 52
Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos

Paper • 2501.04001 • Published Jan 7 • 47
Scaling RL to Long Videos

Paper • 2507.07966 • Published Jul 10 • 157