12 19 1

Siteng Huang

huangsiteng

https://kyonhuang.top/

AI & ML interests

vision-language models

Recent Activity

authored a paper 13 days ago

Unicorn: Text-Only Data Synthesis for Vision Language Model Training

authored a paper 13 days ago

OpenHelix: A Short Survey, Empirical Analysis, and Open-Source Dual-System VLA Model for Robotic Manipulation

authored a paper 13 days ago

SSR: Enhancing Depth Perception in Vision-Language Models via Rationale-Guided Spatial Reasoning

View all activity

Organizations

authored 11 papers 13 days ago

Unicorn: Text-Only Data Synthesis for Vision Language Model Training

Paper • 2503.22655 • Published Mar 28 • 39

OpenHelix: A Short Survey, Empirical Analysis, and Open-Source Dual-System VLA Model for Robotic Manipulation

Paper • 2505.03912 • Published May 6 • 9

SSR: Enhancing Depth Perception in Vision-Language Models via Rationale-Guided Spatial Reasoning

Paper • 2505.12448 • Published May 18 • 10

QUART-Online: Latency-Free Large Multimodal Language Model for Quadruped Robot Learning

Paper • 2412.15576 • Published Dec 20, 2024

VLA-Adapter: An Effective Paradigm for Tiny-Scale Vision-Language-Action Model

Paper • 2509.09372 • Published Sep 11 • 242

Long-VLA: Unleashing Long-Horizon Capability of Vision Language Action Model for Robot Manipulation

Paper • 2508.19958 • Published Aug 27

High-Fidelity Simulated Data Generation for Real-World Zero-Shot Robotic Manipulation Learning with Gaussian Splatting

Paper • 2510.10637 • Published Oct 12 • 12

HiF-VLA: Hindsight, Insight and Foresight through Motion Representation for Vision-Language-Action Models

Paper • 2512.09928 • Published 14 days ago • 11

submitted a paper to Daily Papers 14 days ago

HiF-VLA: Hindsight, Insight and Foresight through Motion Representation for Vision-Language-Action Models

Paper • 2512.09928 • Published 14 days ago • 11

authored a paper about 1 month ago

RynnVLA-002: A Unified Vision-Language-Action and World Model

Paper • 2511.17502 • Published Nov 21 • 25

authored a paper 3 months ago

RynnVLA-001: Using Human Demonstrations to Improve Robot Manipulation

Paper • 2509.15212 • Published Sep 18 • 21

authored a paper 9 months ago

Exploring the Evolution of Physics Cognition in Video Generation: A Survey

Paper • 2503.21765 • Published Mar 27 • 11

authored 3 papers about 1 year ago

Accelerating Diffusion Transformers with Token-wise Feature Caching

Paper • 2410.05317 • Published Oct 5, 2024

Rethinking Token Reduction in MLLMs: Towards a Unified Paradigm for Training-Free Acceleration

Paper • 2411.17686 • Published Nov 26, 2024 • 19

CARP: Visuomotor Policy Learning via Coarse-to-Fine Autoregressive Prediction

Paper • 2412.06782 • Published Dec 9, 2024 • 7

authored a paper over 1 year ago

PiTe: Pixel-Temporal Alignment for Large Video-Language Model

Paper • 2409.07239 • Published Sep 11, 2024 • 15

authored a paper almost 2 years ago

Cobra: Extending Mamba to Multi-Modal Large Language Model for Efficient Inference

Paper • 2403.14520 • Published Mar 21, 2024 • 35

Siteng Huang

AI & ML interests

Recent Activity

Organizations

huangsiteng's activity

🎉 Free Image Generator Now Available!