Skip to main content

Showing 1–7 of 7 results for author: Aklilu, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2504.02799  [pdf, other

    cs.CV cs.AI

    Systematic Evaluation of Large Vision-Language Models for Surgical Artificial Intelligence

    Authors: Anita Rau, Mark Endo, Josiah Aklilu, Jaewoo Heo, Khaled Saab, Alberto Paderno, Jeffrey Jopling, F. Christopher Holsinger, Serena Yeung-Levy

    Abstract: Large Vision-Language Models offer a new paradigm for AI-driven image understanding, enabling models to perform tasks without task-specific training. This flexibility holds particular promise across medicine, where expert-annotated data is scarce. Yet, VLMs' practical utility in intervention-focused domains--especially surgery, where decision-making is subjective and clinical scenarios are variabl… ▽ More

    Submitted 3 April, 2025; originally announced April 2025.

  2. arXiv:2503.22727  [pdf, other

    cs.CL cs.LG

    A Large-Scale Vision-Language Dataset Derived from Open Scientific Literature to Advance Biomedical Generalist AI

    Authors: Alejandro Lozano, Min Woo Sun, James Burgess, Jeffrey J. Nirschl, Christopher Polzak, Yuhui Zhang, Liangyu Chen, Jeffrey Gu, Ivan Lopez, Josiah Aklilu, Anita Rau, Austin Wolfgang Katzer, Collin Chiu, Orr Zohar, Xiaohan Wang, Alfred Seunghoon Song, Chiang Chia-Chun, Robert Tibshirani, Serena Yeung-Levy

    Abstract: Despite the excitement behind biomedical artificial intelligence (AI), access to high-quality, diverse, and large-scale data - the foundation for modern AI systems - is still a bottleneck to unlocking its full potential. To address this gap, we introduce Biomedica, an open-source dataset derived from the PubMed Central Open Access subset, containing over 6 million scientific articles and 24 millio… ▽ More

    Submitted 1 April, 2025; v1 submitted 26 March, 2025; originally announced March 2025.

  3. arXiv:2501.07171  [pdf, other

    cs.CV cs.CL

    BIOMEDICA: An Open Biomedical Image-Caption Archive, Dataset, and Vision-Language Models Derived from Scientific Literature

    Authors: Alejandro Lozano, Min Woo Sun, James Burgess, Liangyu Chen, Jeffrey J Nirschl, Jeffrey Gu, Ivan Lopez, Josiah Aklilu, Austin Wolfgang Katzer, Collin Chiu, Anita Rau, Xiaohan Wang, Yuhui Zhang, Alfred Seunghoon Song, Robert Tibshirani, Serena Yeung-Levy

    Abstract: The development of vision-language models (VLMs) is driven by large-scale and diverse multimodal datasets. However, progress toward generalist biomedical VLMs is limited by the lack of annotated, publicly accessible datasets across biology and medicine. Existing efforts are restricted to narrow domains, missing the full diversity of biomedical knowledge encoded in scientific literature. To address… ▽ More

    Submitted 1 April, 2025; v1 submitted 13 January, 2025; originally announced January 2025.

  4. arXiv:2501.03225  [pdf, other

    cs.CV cs.AI cs.CL cs.CY cs.LG

    Automated Generation of Challenging Multiple-Choice Questions for Vision Language Model Evaluation

    Authors: Yuhui Zhang, Yuchang Su, Yiming Liu, Xiaohan Wang, James Burgess, Elaine Sui, Chenyu Wang, Josiah Aklilu, Alejandro Lozano, Anjiang Wei, Ludwig Schmidt, Serena Yeung-Levy

    Abstract: The rapid development of vision language models (VLMs) demands rigorous and reliable evaluation. However, current visual question answering (VQA) benchmarks often depend on open-ended questions, making accurate evaluation difficult due to the variability in natural language responses. To address this, we introduce AutoConverter, an agentic framework that automatically converts these open-ended que… ▽ More

    Submitted 9 April, 2025; v1 submitted 6 January, 2025; originally announced January 2025.

    Comments: CVPR 2025

  5. arXiv:2410.14340  [pdf, other

    cs.CV

    Zero-shot Action Localization via the Confidence of Large Vision-Language Models

    Authors: Josiah Aklilu, Xiaohan Wang, Serena Yeung-Levy

    Abstract: Precise action localization in untrimmed video is vital for fields such as professional sports and minimally invasive surgery, where the delineation of particular motions in recordings can dramatically enhance analysis. But in many cases, large scale datasets with video-label pairs for localization are unavailable, limiting the opportunity to fine-tune video-understanding models. Recent developmen… ▽ More

    Submitted 24 March, 2025; v1 submitted 18 October, 2024; originally announced October 2024.

  6. arXiv:2403.13206  [pdf, ps, other

    cs.CV cs.AI

    Depth-guided NeRF Training via Earth Mover's Distance

    Authors: Anita Rau, Josiah Aklilu, F. Christopher Holsinger, Serena Yeung-Levy

    Abstract: Neural Radiance Fields (NeRFs) are trained to minimize the rendering loss of predicted viewpoints. However, the photometric loss often does not provide enough information to disambiguate between different possible geometries yielding the same image. Previous work has thus incorporated depth supervision during NeRF training, leveraging dense predictions from pre-trained depth networks as pseudo-gro… ▽ More

    Submitted 4 September, 2024; v1 submitted 19 March, 2024; originally announced March 2024.

    Comments: Accepted to ECCV 2024

  7. arXiv:2401.14555  [pdf, other

    cs.CV cs.LG

    Revisiting Active Learning in the Era of Vision Foundation Models

    Authors: Sanket Rajan Gupte, Josiah Aklilu, Jeffrey J. Nirschl, Serena Yeung-Levy

    Abstract: Foundation vision or vision-language models are trained on large unlabeled or noisy data and learn robust representations that can achieve impressive zero- or few-shot performance on diverse tasks. Given these properties, they are a natural fit for active learning (AL), which aims to maximize labeling efficiency. However, the full potential of foundation models has not been explored in the context… ▽ More

    Submitted 24 June, 2024; v1 submitted 25 January, 2024; originally announced January 2024.

    Comments: Accepted to TMLR