Skip to main content

Showing 1–1 of 1 results for author: Pan, K W

.
  1. arXiv:2406.19593  [pdf, ps, other

    cs.CL cs.CV

    SK-VQA: Synthetic Knowledge Generation at Scale for Training Context-Augmented Multimodal LLMs

    Authors: Xin Su, Man Luo, Kris W Pan, Tien Pei Chou, Vasudev Lal, Phillip Howard

    Abstract: Multimodal retrieval augmented generation (RAG) plays a crucial role in domains such as knowledge-based visual question answering (KB-VQA), where external knowledge is needed to answer a question. However, existing multimodal LLMs (MLLMs) are not designed for context-augmented generation, limiting their effectiveness in such tasks. While synthetic data generation has recently gained attention for… ▽ More

    Submitted 9 June, 2025; v1 submitted 27 June, 2024; originally announced June 2024.

    Comments: ICML 2025 Spotlight Oral