NaVILA is a two-level framework that combines VLAs with locomotion skills for navigation. It generates high-level language-based commands, while a real-time locomotion policy ensures obstacle avoidance.
@article{cheng2024navila,
title = {{NaVILA: Legged Robot Vision-Language-Action Model for Navigation}},
author = {Cheng, An-Chieh and Ji, Yandong and Yang, Zhaojing and Zou, Xueyan and Kautz, Jan and Biyik, Erdem and Yin,
Hongxu and Liu, Sifei and Wang, Xiaolong},
journal = {arXiv preprint arXiv:2412.04453},
year={2024},
}
We sincerely thank Chengjing Yuan for their support with hardware setup and 3D modeling. We also thank Xuxin Cheng and Jialong Li for their help in setting up the G1 robot, as well as Jiazhao Zhang and Yukang Chen for their valuable discussions.