MeshCoder: LLM-Powered Structured Mesh Code Generation from Point Clouds Paper • 2508.14879 • Published 4 days ago • 47
Tinker: Diffusion's Gift to 3D--Multi-View Consistent Editing From Sparse Inputs without Per-Scene Optimization Paper • 2508.14811 • Published 4 days ago • 34
SageAttention2 Technical Report: Accurate 4 Bit Attention for Plug-and-play Inference Acceleration Paper • 2411.10958 • Published Nov 17, 2024 • 57
ComoRAG: A Cognitive-Inspired Memory-Organized RAG for Stateful Long Narrative Reasoning Paper • 2508.10419 • Published 11 days ago • 69
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models Paper • 2409.17146 • Published Sep 25, 2024 • 122
PixMo Collection A set of vision-language datasets built by Ai2 and used to train the Molmo family of models. Read more at https://molmo.allenai.org/blog • 10 items • Updated Apr 30 • 77
view article Article Reachy Mini - The Open-Source Robot for Today's and Tomorrow's AI Builders By thomwolf and 1 other • Jul 9 • 655
JarvisArt: Liberating Human Artistic Creativity via an Intelligent Photo Retouching Agent Paper • 2506.17612 • Published Jun 21 • 62
AnimaX: Animating the Inanimate in 3D with Joint Video-Pose Diffusion Models Paper • 2506.19851 • Published Jun 24 • 59
ShareGPT-4o-Image: Aligning Multimodal Models with GPT-4o-Level Image Generation Paper • 2506.18095 • Published Jun 22 • 65
Light of Normals: Unified Feature Representation for Universal Photometric Stereo Paper • 2506.18882 • Published Jun 23 • 89
LongAnimation: Long Animation Generation with Dynamic Global-Local Memory Paper • 2507.01945 • Published Jul 2 • 77
view article Article nanoVLM: The simplest repository to train your VLM in pure PyTorch By ariG23498 and 6 others • May 21 • 206