Skip to main content

Showing 1–2 of 2 results for author: Zai, K L

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.01551  [pdf, ps, other

    cs.CV cs.AI cs.CL

    EvolveNav: Self-Improving Embodied Reasoning for LLM-Based Vision-Language Navigation

    Authors: Bingqian Lin, Yunshuang Nie, Khun Loun Zai, Ziming Wei, Mingfei Han, Rongtao Xu, Minzhe Niu, Jianhua Han, Liang Lin, Cewu Lu, Xiaodan Liang

    Abstract: Building Vision-Language Navigation (VLN) agents which can navigate following natural language instructions is a long-standing goal in human-robot interaction applications. Recent studies have revealed the potential of training open-source Large Language Models (LLMs) to unleash LLMs' reasoning ability for improving navigation, and simultaneously mitigate the domain gap between LLMs' training corp… ▽ More

    Submitted 2 June, 2025; originally announced June 2025.

  2. arXiv:2406.01388  [pdf, ps, other

    cs.CV

    AutoStudio: Crafting Consistent Subjects in Multi-turn Interactive Image Generation

    Authors: Junhao Cheng, Xi Lu, Hanhui Li, Khun Loun Zai, Baiqiao Yin, Yuhao Cheng, Yiqiang Yan, Xiaodan Liang

    Abstract: As cutting-edge Text-to-Image (T2I) generation models already excel at producing remarkable single images, an even more challenging task, i.e., multi-turn interactive image generation begins to attract the attention of related research communities. This task requires models to interact with users over multiple turns to generate a coherent sequence of images. However, since users may switch subject… ▽ More

    Submitted 30 May, 2025; v1 submitted 3 June, 2024; originally announced June 2024.

    Comments: Multi-turn interactive image generation