Ovis: Structural Embedding Alignment for Multimodal Large Language Model Paper • 2405.20797 • Published May 31, 2024 • 30
Advancing Tool-Augmented Large Language Models: Integrating Insights from Errors in Inference Trees Paper • 2406.07115 • Published Jun 11, 2024
Wings: Learning Multimodal LLMs without Text-only Forgetting Paper • 2406.03496 • Published Jun 5, 2024
Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image Synthesis Paper • 2410.08261 • Published Oct 10, 2024 • 53
UNIC-Adapter: Unified Image-instruction Adapter with Multi-modal Transformer for Image Generation Paper • 2412.18928 • Published Dec 25, 2024
PEMF-VVTO: Point-Enhanced Video Virtual Try-on via Mask-free Paradigm Paper • 2412.03021 • Published Dec 4, 2024 • 1
Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities Paper • 2505.02567 • Published May 5 • 79
CHATS: Combining Human-Aligned Optimization and Test-Time Sampling for Text-to-Image Generation Paper • 2502.12579 • Published Feb 18 • 1
MMCR: Advancing Visual Language Model in Multimodal Multi-Turn Contextual Reasoning Paper • 2503.18533 • Published Mar 24
LPO: Towards Accurate GUI Agent Interaction via Location Preference Optimization Paper • 2506.09373 • Published Jun 11
TeEFusion: Blending Text Embeddings to Distill Classifier-Free Guidance Paper • 2507.18192 • Published Jul 24 • 7
Ovis2.5 Collection Our next-generation MLLMs for native-resolution vision and advanced reasoning • 5 items • Updated 6 days ago • 52