-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 29 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 13 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 44 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 23
Collections
Discover the best community collections!
Collections including paper arxiv:2507.07105
-
LinFusion: 1 GPU, 1 Minute, 16K Image
Paper • 2409.02097 • Published • 35 -
Phidias: A Generative Model for Creating 3D Content from Text, Image, and 3D Conditions with Reference-Augmented Diffusion
Paper • 2409.11406 • Published • 28 -
Diffusion Models Are Real-Time Game Engines
Paper • 2408.14837 • Published • 127 -
Segment Anything with Multiple Modalities
Paper • 2408.09085 • Published • 23
-
A Survey of Context Engineering for Large Language Models
Paper • 2507.13334 • Published • 245 -
MemOS: A Memory OS for AI System
Paper • 2507.03724 • Published • 151 -
4KAgent: Agentic Any Image to 4K Super-Resolution
Paper • 2507.07105 • Published • 99 -
A Survey on Latent Reasoning
Paper • 2507.06203 • Published • 89
-
yandex/stable-diffusion-3.5-medium-alchemist
Text-to-Image • Updated • 79 • 4 -
Ovis-U1 Technical Report
Paper • 2506.23044 • Published • 62 -
FreeMorph: Tuning-Free Generalized Image Morphing with Diffusion Model
Paper • 2507.01953 • Published • 19 -
LongAnimation: Long Animation Generation with Dynamic Global-Local Memory
Paper • 2507.01945 • Published • 77
-
BlobCtrl: A Unified and Flexible Framework for Element-level Image Generation and Editing
Paper • 2503.13434 • Published • 27 -
Edit Transfer: Learning Image Editing via Vision In-Context Relations
Paper • 2503.13327 • Published • 29 -
WideRange4D: Enabling High-Quality 4D Reconstruction with Wide-Range Movements and Scenes
Paper • 2503.13435 • Published • 18 -
MediaTek-Research/Llama-Breeze2-8B-Instruct
8B • Updated • 10.6k • 46
-
Can Large Language Models Understand Context?
Paper • 2402.00858 • Published • 24 -
OLMo: Accelerating the Science of Language Models
Paper • 2402.00838 • Published • 84 -
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 152 -
SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity
Paper • 2401.17072 • Published • 24
-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 29 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 13 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 44 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 23
-
A Survey of Context Engineering for Large Language Models
Paper • 2507.13334 • Published • 245 -
MemOS: A Memory OS for AI System
Paper • 2507.03724 • Published • 151 -
4KAgent: Agentic Any Image to 4K Super-Resolution
Paper • 2507.07105 • Published • 99 -
A Survey on Latent Reasoning
Paper • 2507.06203 • Published • 89
-
yandex/stable-diffusion-3.5-medium-alchemist
Text-to-Image • Updated • 79 • 4 -
Ovis-U1 Technical Report
Paper • 2506.23044 • Published • 62 -
FreeMorph: Tuning-Free Generalized Image Morphing with Diffusion Model
Paper • 2507.01953 • Published • 19 -
LongAnimation: Long Animation Generation with Dynamic Global-Local Memory
Paper • 2507.01945 • Published • 77
-
BlobCtrl: A Unified and Flexible Framework for Element-level Image Generation and Editing
Paper • 2503.13434 • Published • 27 -
Edit Transfer: Learning Image Editing via Vision In-Context Relations
Paper • 2503.13327 • Published • 29 -
WideRange4D: Enabling High-Quality 4D Reconstruction with Wide-Range Movements and Scenes
Paper • 2503.13435 • Published • 18 -
MediaTek-Research/Llama-Breeze2-8B-Instruct
8B • Updated • 10.6k • 46
-
LinFusion: 1 GPU, 1 Minute, 16K Image
Paper • 2409.02097 • Published • 35 -
Phidias: A Generative Model for Creating 3D Content from Text, Image, and 3D Conditions with Reference-Augmented Diffusion
Paper • 2409.11406 • Published • 28 -
Diffusion Models Are Real-Time Game Engines
Paper • 2408.14837 • Published • 127 -
Segment Anything with Multiple Modalities
Paper • 2408.09085 • Published • 23
-
Can Large Language Models Understand Context?
Paper • 2402.00858 • Published • 24 -
OLMo: Accelerating the Science of Language Models
Paper • 2402.00838 • Published • 84 -
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 152 -
SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity
Paper • 2401.17072 • Published • 24