Does Understanding Inform Generation in Unified Multimodal Models? From Analysis to Path Forward Paper • 2511.20561 • Published Nov 25, 2025 • 32
Chain-of-Visual-Thought: Teaching VLMs to See and Think Better with Continuous Visual Tokens Paper • 2511.19418 • Published Nov 24, 2025 • 28
CoVT: Chain-of-Visual-Thought Collection Enrich VLMs’ vision-centric reasoning capabilities via Chain-of-Visual-Thought! • 7 items • Updated Nov 25, 2025 • 6
SRUM: Fine-Grained Self-Rewarding for Unified Multimodal Models Paper • 2510.12784 • Published Oct 14, 2025 • 19
GIR-Bench: Versatile Benchmark for Generating Images with Reasoning Paper • 2510.11026 • Published Oct 13, 2025 • 17
Reconstruction Alignment Improves Unified Multimodal Models Paper • 2509.07295 • Published Sep 8, 2025 • 40
view article Article Kimina-Prover: Applying Test-time RL Search on Large Formal Reasoning Models Jul 10, 2025 • 53
RecA Collection Unlocking the Massive Zero-shot Potential in Unified Multimodal Models through Self-supervised Learning! • 8 items • Updated Sep 22, 2025 • 14
In-Context Edit: Enabling Instructional Image Editing with In-Context Generation in Large Scale Diffusion Transformer Paper • 2504.20690 • Published Apr 29, 2025 • 19
DreamRenderer: Taming Multi-Instance Attribute Control in Large-Scale Text-to-Image Models Paper • 2503.12885 • Published Mar 17, 2025 • 43
OmniEdit: Building Image Editing Generalist Models Through Specialist Supervision Paper • 2411.07199 • Published Nov 11, 2024 • 50
SynthLight: Portrait Relighting with Diffusion Model by Learning to Re-render Synthetic Faces Paper • 2501.09756 • Published Jan 16, 2025 • 20
3DIS: Depth-Driven Decoupled Instance Synthesis for Text-to-Image Generation Paper • 2410.12669 • Published Oct 16, 2024 • 1
3DIS-FLUX: simple and efficient multi-instance generation with DiT rendering Paper • 2501.05131 • Published Jan 9, 2025 • 37