Skip to main content

Showing 1–32 of 32 results for author: Lao, Q

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.00406  [pdf, other

    cs.CV

    iDPA: Instance Decoupled Prompt Attention for Incremental Medical Object Detection

    Authors: Huahui Yi, Wei Xu, Ziyuan Qin, Xi Chen, Xiaohu Wu, Kang Li, Qicheng Lao

    Abstract: Existing prompt-based approaches have demonstrated impressive performance in continual learning, leveraging pre-trained large-scale models for classification tasks; however, the tight coupling between foreground-background information and the coupled attention between prompts and image-text tokens present significant challenges in incremental medical object detection tasks, due to the conceptual g… ▽ More

    Submitted 31 May, 2025; originally announced June 2025.

    Comments: accepted to ICML 2025

  2. arXiv:2505.20648  [pdf, other

    cs.LG cs.AI

    Voronoi-grid-based Pareto Front Learning and Its Application to Collaborative Federated Learning

    Authors: Mengmeng Chen, Xiaohu Wu, Qiqi Liu, Tiantian He, Yew-Soon Ong, Yaochu Jin, Qicheng Lao, Han Yu

    Abstract: Multi-objective optimization (MOO) exists extensively in machine learning, and aims to find a set of Pareto-optimal solutions, called the Pareto front, e.g., it is fundamental for multiple avenues of research in federated learning (FL). Pareto-Front Learning (PFL) is a powerful method implemented using Hypernetworks (PHNs) to approximate the Pareto front. This method enables the acquisition of a m… ▽ More

    Submitted 26 May, 2025; originally announced May 2025.

  3. arXiv:2503.00744  [pdf, other

    cs.CV cs.AI

    Confounder-Aware Medical Data Selection for Fine-Tuning Pretrained Vision Models

    Authors: Anyang Ji, Qingbo Kang, Wei Xu, Changfan Wang, Kang Li, Qicheng Lao

    Abstract: The emergence of large-scale pre-trained vision foundation models has greatly advanced the medical imaging field through the pre-training and fine-tuning paradigm. However, selecting appropriate medical data for downstream fine-tuning remains a significant challenge considering its annotation cost, privacy concerns, and the detrimental effects of confounding variables. In this work, we present a c… ▽ More

    Submitted 2 March, 2025; originally announced March 2025.

    Comments: 5 pages, 3 figures

  4. arXiv:2502.01201  [pdf, other

    cs.CV

    One-to-Normal: Anomaly Personalization for Few-shot Anomaly Detection

    Authors: Yiyue Li, Shaoting Zhang, Kang Li, Qicheng Lao

    Abstract: Traditional Anomaly Detection (AD) methods have predominantly relied on unsupervised learning from extensive normal data. Recent AD methods have evolved with the advent of large pre-trained vision-language models, enhancing few-shot anomaly detection capabilities. However, these latest AD methods still exhibit limitations in accuracy improvement. One contributing factor is their direct comparison… ▽ More

    Submitted 3 February, 2025; originally announced February 2025.

    Comments: In The Thirty-eighth Annual Conference on Neural Information Processing Systems (NeurIPS2024)

    MSC Class: 68T45 ACM Class: I.2.10

  5. arXiv:2501.02385  [pdf, other

    cs.CV cs.CL

    Guiding Medical Vision-Language Models with Explicit Visual Prompts: Framework Design and Comprehensive Exploration of Prompt Variations

    Authors: Kangyu Zhu, Ziyuan Qin, Huahui Yi, Zekun Jiang, Qicheng Lao, Shaoting Zhang, Kang Li

    Abstract: While mainstream vision-language models (VLMs) have advanced rapidly in understanding image level information, they still lack the ability to focus on specific areas designated by humans. Rather, they typically rely on large volumes of high-quality image-text paired data to learn and generate posterior attention maps. To address this critical issue, we propose leveraging visual prompts:simple visu… ▽ More

    Submitted 12 February, 2025; v1 submitted 4 January, 2025; originally announced January 2025.

    Comments: Accepted to NAACL 2025 Main Conference

  6. arXiv:2412.05722  [pdf, other

    cs.CV

    Evaluating Hallucination in Text-to-Image Diffusion Models with Scene-Graph based Question-Answering Agent

    Authors: Ziyuan Qin, Dongjie Cheng, Haoyu Wang, Huahui Yi, Yuting Shao, Zhiyuan Fan, Kang Li, Qicheng Lao

    Abstract: Contemporary Text-to-Image (T2I) models frequently depend on qualitative human evaluations to assess the consistency between synthesized images and the text prompts. There is a demand for quantitative and automatic evaluation tools, given that human evaluation lacks reproducibility. We believe that an effective T2I evaluation metric should accomplish the following: detect instances where the gener… ▽ More

    Submitted 7 December, 2024; originally announced December 2024.

  7. arXiv:2410.19321  [pdf, other

    cs.GT cs.LG

    Free-Rider and Conflict Aware Collaboration Formation for Cross-Silo Federated Learning

    Authors: Mengmeng Chen, Xiaohu Wu, Xiaoli Tang, Tiantian He, Yew-Soon Ong, Qiqi Liu, Qicheng Lao, Han Yu

    Abstract: Federated learning (FL) is a machine learning paradigm that allows multiple FL participants (FL-PTs) to collaborate on training models without sharing private data. Due to data heterogeneity, negative transfer may occur in the FL training process. This necessitates FL-PT selection based on their data complementarity. In cross-silo FL, organizations that engage in business activities are key source… ▽ More

    Submitted 31 January, 2025; v1 submitted 25 October, 2024; originally announced October 2024.

  8. arXiv:2410.18248  [pdf, other

    cs.LG cs.AI

    Fast Inference for Augmented Large Language Models

    Authors: Rana Shahout, Cong Liang, Shiji Xin, Qianru Lao, Yong Cui, Minlan Yu, Michael Mitzenmacher

    Abstract: Augmented Large Language Models (LLMs) enhance the capabilities of standalone LLMs by integrating external data sources through API calls. In interactive LLM applications, efficient scheduling is crucial for maintaining low request completion times, directly impacting user engagement. However, these augmentations introduce scheduling challenges due to the need to manage limited memory for cached i… ▽ More

    Submitted 25 October, 2024; v1 submitted 23 October, 2024; originally announced October 2024.

  9. arXiv:2410.07286  [pdf, other

    cs.LG cs.AI

    Benchmarking Data Heterogeneity Evaluation Approaches for Personalized Federated Learning

    Authors: Zhilong Li, Xiaohu Wu, Xiaoli Tang, Tiantian He, Yew-Soon Ong, Mengmeng Chen, Qiqi Liu, Qicheng Lao, Han Yu

    Abstract: There is growing research interest in measuring the statistical heterogeneity of clients' local datasets. Such measurements are used to estimate the suitability for collaborative training of personalized federated learning (PFL) models. Currently, these research endeavors are taking place in silos and there is a lack of a unified benchmark to provide a fair and convenient comparison among various… ▽ More

    Submitted 28 October, 2024; v1 submitted 9 October, 2024; originally announced October 2024.

    Comments: Accepted to FL@FM-NeurIPS'24

  10. arXiv:2409.00695  [pdf, other

    cs.CV cs.AI

    Curriculum Prompting Foundation Models for Medical Image Segmentation

    Authors: Xiuqi Zheng, Yuhang Zhang, Haoran Zhang, Hongrui Liang, Xueqi Bao, Zhuqing Jiang, Qicheng Lao

    Abstract: Adapting large pre-trained foundation models, e.g., SAM, for medical image segmentation remains a significant challenge. A crucial step involves the formulation of a series of specialized prompts that incorporate specific clinical instructions. Past works have been heavily reliant on a singular type of prompt for each instance, necessitating manual input of an ideally correct prompt, which is less… ▽ More

    Submitted 1 September, 2024; originally announced September 2024.

    Comments: Accepted by MICCAI 2024

  11. arXiv:2408.06124  [pdf, other

    cs.CL

    Utilize Transformers for translating Wikipedia category names

    Authors: Hoang-Thang Ta, Quoc Thang La

    Abstract: On Wikipedia, articles are categorized to aid readers in navigating content efficiently. The manual creation of new categories can be laborious and time-intensive. To tackle this issue, we built language models to translate Wikipedia categories from English to Vietnamese with a dataset containing 15,000 English-Vietnamese category pairs. Subsequently, small to medium-scale Transformer pre-trained… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

    Comments: 5 pages, 1 figure

  12. arXiv:2405.18897  [pdf, other

    cs.CV

    MLAE: Masked LoRA Experts for Visual Parameter-Efficient Fine-Tuning

    Authors: Junjie Wang, Guangjing Yang, Wentao Chen, Huahui Yi, Xiaohu Wu, Zhouchen Lin, Qicheng Lao

    Abstract: In response to the challenges posed by the extensive parameter updates required for full fine-tuning of large-scale pre-trained models, parameter-efficient fine-tuning (PEFT) methods, exemplified by Low-Rank Adaptation (LoRA), have emerged. LoRA simplifies the fine-tuning process but may still struggle with a certain level of redundancy in low-rank matrices and limited effectiveness from merely in… ▽ More

    Submitted 10 October, 2024; v1 submitted 29 May, 2024; originally announced May 2024.

    Comments: Tech report

  13. arXiv:2402.18028  [pdf, other

    cs.CV

    OpenMEDLab: An Open-source Platform for Multi-modality Foundation Models in Medicine

    Authors: Xiaosong Wang, Xiaofan Zhang, Guotai Wang, Junjun He, Zhongyu Li, Wentao Zhu, Yi Guo, Qi Dou, Xiaoxiao Li, Dequan Wang, Liang Hong, Qicheng Lao, Tong Ruan, Yukun Zhou, Yixue Li, Jie Zhao, Kang Li, Xin Sun, Lifeng Zhu, Shaoting Zhang

    Abstract: The emerging trend of advancing generalist artificial intelligence, such as GPTv4 and Gemini, has reshaped the landscape of research (academia and industry) in machine learning and many other research areas. However, domain-specific applications of such foundation models (e.g., in medicine) remain untouched or often at their very early stages. It will require an individual set of transfer learning… ▽ More

    Submitted 3 March, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

    Comments: Technical Report. Visit https://github.com/openmedlab for more details

  14. arXiv:2402.15759  [pdf

    cs.CV cs.AI

    TV-SAM: Increasing Zero-Shot Segmentation Performance on Multimodal Medical Images Using GPT-4 Generated Descriptive Prompts Without Human Annotation

    Authors: Zekun Jiang, Dongjie Cheng, Ziyuan Qin, Jun Gao, Qicheng Lao, Abdullaev Bakhrom Ismoilovich, Urazboev Gayrat, Yuldashov Elyorbek, Bekchanov Habibullo, Defu Tang, LinJing Wei, Kang Li, Le Zhang

    Abstract: This study presents a novel multimodal medical image zero-shot segmentation algorithm named the text-visual-prompt segment anything model (TV-SAM) without any manual annotations. The TV-SAM incorporates and integrates the large language model GPT-4, the vision language model GLIP, and the SAM to autonomously generate descriptive text prompts and visual bounding box prompts from medical images, the… ▽ More

    Submitted 14 October, 2024; v1 submitted 24 February, 2024; originally announced February 2024.

    Comments: 13 pages, 5 figures, 4 tables, accepted by BDMA Journal

  15. arXiv:2311.17331  [pdf, other

    cs.CV

    Towards Top-Down Reasoning: An Explainable Multi-Agent Approach for Visual Question Answering

    Authors: Zeqing Wang, Wentao Wan, Qiqing Lao, Runmeng Chen, Minjie Lang, Xiao Wang, Keze Wang, Liang Lin

    Abstract: Recently, to comprehensively improve Vision Language Models (VLMs) for Visual Question Answering (VQA), several methods have been proposed to further reinforce the inference capabilities of VLMs to independently tackle VQA tasks rather than some methods that only utilize VLMs as aids to Large Language Models (LLMs). However, these methods ignore the rich common-sense knowledge inside the given VQA… ▽ More

    Submitted 14 February, 2025; v1 submitted 28 November, 2023; originally announced November 2023.

    Comments: 13 pages, 8 figures

  16. arXiv:2311.17048  [pdf, other

    cs.CV

    Zero-shot Referring Expression Comprehension via Structural Similarity Between Images and Captions

    Authors: Zeyu Han, Fangrui Zhu, Qianru Lao, Huaizu Jiang

    Abstract: Zero-shot referring expression comprehension aims at localizing bounding boxes in an image corresponding to provided textual prompts, which requires: (i) a fine-grained disentanglement of complex visual scene and textual context, and (ii) a capacity to understand relationships among disentangled entities. Unfortunately, existing large vision-language alignment (VLA) models, e.g., CLIP, struggle wi… ▽ More

    Submitted 9 April, 2024; v1 submitted 28 November, 2023; originally announced November 2023.

    Comments: CVPR 2024, Code available at https://github.com/Show-han/Zeroshot_REC

  17. arXiv:2306.08249  [pdf, other

    cs.CV

    Deblurring Masked Autoencoder is Better Recipe for Ultrasound Image Recognition

    Authors: Qingbo Kang, Jun Gao, Kang Li, Qicheng Lao

    Abstract: Masked autoencoder (MAE) has attracted unprecedented attention and achieves remarkable performance in many vision tasks. It reconstructs random masked image patches (known as proxy task) during pretraining and learns meaningful semantic representations that can be transferred to downstream tasks. However, MAE has not been thoroughly explored in ultrasound imaging. In this work, we investigate the… ▽ More

    Submitted 13 July, 2023; v1 submitted 14 June, 2023; originally announced June 2023.

    Comments: Accepted by MICCAI 2023

  18. arXiv:2305.18993  [pdf, other

    cs.CV

    ConES: Concept Embedding Search for Parameter Efficient Tuning Large Vision Language Models

    Authors: Huahui Yi, Ziyuan Qin, Wei Xu, Miaotian Guo, Kun Wang, Shaoting Zhang, Kang Li, Qicheng Lao

    Abstract: Large pre-trained vision-language models have shown great prominence in transferring pre-acquired knowledge to various domains and downstream tasks with appropriate prompting or tuning. Existing prevalent tuning methods can be generally categorized into three genres: 1) prompt engineering by creating suitable prompt texts, which is time-consuming and requires domain expertise; 2) or simply fine-tu… ▽ More

    Submitted 30 May, 2023; originally announced May 2023.

  19. arXiv:2305.00035  [pdf, other

    cs.CV cs.AI

    SAM on Medical Images: A Comprehensive Study on Three Prompt Modes

    Authors: Dongjie Cheng, Ziyuan Qin, Zekun Jiang, Shaoting Zhang, Qicheng Lao, Kang Li

    Abstract: The Segment Anything Model (SAM) made an eye-catching debut recently and inspired many researchers to explore its potential and limitation in terms of zero-shot generalization capability. As the first promptable foundation model for segmentation tasks, it was trained on a large dataset with an unprecedented number of images and annotations. This large-scale dataset and its promptable nature endow… ▽ More

    Submitted 28 April, 2023; originally announced May 2023.

    Comments: 6 pages, 3 figures

  20. arXiv:2303.06580  [pdf, other

    cs.CV cs.CL cs.LG

    Towards General Purpose Medical AI: Continual Learning Medical Foundation Model

    Authors: Huahui Yi, Ziyuan Qin, Qicheng Lao, Wei Xu, Zekun Jiang, Dequan Wang, Shaoting Zhang, Kang Li

    Abstract: Inevitable domain and task discrepancies in real-world scenarios can impair the generalization performance of the pre-trained deep models for medical data. Therefore, we audaciously propose that we should build a general-purpose medical AI system that can be seamlessly adapted to downstream domains/tasks. Since the domain/task adaption procedures usually involve additional labeling work for the ta… ▽ More

    Submitted 12 March, 2023; originally announced March 2023.

  21. arXiv:2211.07846  [pdf, other

    cs.CV

    Category-Adaptive Label Discovery and Noise Rejection for Multi-label Image Recognition with Partial Positive Labels

    Authors: Tao Pu, Qianru Lao, Hefeng Wu, Tianshui Chen, Liang Lin

    Abstract: As a promising solution of reducing annotation cost, training multi-label models with partial positive labels (MLR-PPL), in which merely few positive labels are known while other are missing, attracts increasing attention. Due to the absence of any negative labels, previous works regard unknown labels as negative and adopt traditional MLR algorithms. To reject noisy labels, recent works regard lar… ▽ More

    Submitted 14 November, 2022; originally announced November 2022.

    Comments: arXiv admin note: text overlap with arXiv:2205.13092

  22. arXiv:2209.15517  [pdf, other

    cs.CV

    Medical Image Understanding with Pretrained Vision Language Models: A Comprehensive Study

    Authors: Ziyuan Qin, Huahui Yi, Qicheng Lao, Kang Li

    Abstract: The large-scale pre-trained vision language models (VLM) have shown remarkable domain transfer capability on natural images. However, it remains unknown whether this capability can also apply to the medical image domain. This paper thoroughly studies the knowledge transferability of pre-trained VLMs to the medical domain, where we show that well-designed medical prompts are the key to elicit knowl… ▽ More

    Submitted 7 February, 2023; v1 submitted 30 September, 2022; originally announced September 2022.

    Comments: Accepted to ICLR2023

  23. arXiv:2206.12987  [pdf, other

    cs.LG cs.AI

    FlowX: Towards Explainable Graph Neural Networks via Message Flows

    Authors: Shurui Gui, Hao Yuan, Jie Wang, Qicheng Lao, Kang Li, Shuiwang Ji

    Abstract: We investigate the explainability of graph neural networks (GNNs) as a step toward elucidating their working mechanisms. While most current methods focus on explaining graph nodes, edges, or features, we argue that, as the inherent functional mechanism of GNNs, message flows are more natural for performing explainability. To this end, we propose a novel method here, known as FlowX, to explain GNNs… ▽ More

    Submitted 29 December, 2023; v1 submitted 26 June, 2022; originally announced June 2022.

  24. arXiv:2106.08387  [pdf, other

    cs.LG cs.CR

    Towards Adversarial Robustness via Transductive Learning

    Authors: Jiefeng Chen, Yang Guo, Xi Wu, Tianqi Li, Qicheng Lao, Yingyu Liang, Somesh Jha

    Abstract: There has been emerging interest to use transductive learning for adversarial robustness (Goldwasser et al., NeurIPS 2020; Wu et al., ICML 2020). Compared to traditional "test-time" defenses, these defense mechanisms "dynamically retrain" the model based on test time input via transductive learning; and theoretically, attacking these defenses boils down to bilevel optimization, which seems to rais… ▽ More

    Submitted 15 June, 2021; originally announced June 2021.

  25. arXiv:2012.08072  [pdf, other

    cs.LG cs.CV

    Hypothesis Disparity Regularized Mutual Information Maximization

    Authors: Qicheng Lao, Xiang Jiang, Mohammad Havaei

    Abstract: We propose a hypothesis disparity regularized mutual information maximization~(HDMI) approach to tackle unsupervised hypothesis transfer -- as an effort towards unifying hypothesis transfer learning (HTL) and unsupervised domain adaptation (UDA) -- where the knowledge from a source domain is transferred solely through hypotheses and adapted to the target domain in an unsupervised manner. In contra… ▽ More

    Submitted 14 December, 2020; originally announced December 2020.

    Comments: Accepted to AAAI 2021

  26. Conditional Generation of Medical Images via Disentangled Adversarial Inference

    Authors: Mohammad Havaei, Ximeng Mao, Yiping Wang, Qicheng Lao

    Abstract: Synthetic medical image generation has a huge potential for improving healthcare through many applications, from data augmentation for training machine learning systems to preserving patient privacy. Conditional Adversarial Generative Networks (cGANs) use a conditioning factor to generate images and have shown great success in recent years. Intuitively, the information in an image can be divided i… ▽ More

    Submitted 3 May, 2022; v1 submitted 8 December, 2020; originally announced December 2020.

    Comments: Published in Medical Image Analysis

  27. arXiv:2006.04996  [pdf, other

    cs.LG cs.CV stat.ML

    Implicit Class-Conditioned Domain Alignment for Unsupervised Domain Adaptation

    Authors: Xiang Jiang, Qicheng Lao, Stan Matwin, Mohammad Havaei

    Abstract: We present an approach for unsupervised domain adaptation---with a strong focus on practical considerations of within-domain class imbalance and between-domain class distribution shift---from a class-conditioned domain alignment perspective. Current methods for class-conditioned domain alignment aim to explicitly minimize a loss function based on pseudo-label estimations of the target domain. Howe… ▽ More

    Submitted 8 June, 2020; originally announced June 2020.

    Comments: Accepted at ICML2020. For code, see https://github.com/xiangdal/implicit_alignment

    MSC Class: 68T07

  28. arXiv:2003.04382  [pdf, other

    cs.LG cs.CV stat.ML

    Continuous Domain Adaptation with Variational Domain-Agnostic Feature Replay

    Authors: Qicheng Lao, Xiang Jiang, Mohammad Havaei, Yoshua Bengio

    Abstract: Learning in non-stationary environments is one of the biggest challenges in machine learning. Non-stationarity can be caused by either task drift, i.e., the drift in the conditional distribution of labels given the input data, or the domain drift, i.e., the drift in the marginal distribution of the input data. This paper aims to tackle this challenge in the context of continuous domain adaptation,… ▽ More

    Submitted 9 March, 2020; originally announced March 2020.

  29. arXiv:2003.03877  [pdf, other

    cs.CV

    FoCL: Feature-Oriented Continual Learning for Generative Models

    Authors: Qicheng Lao, Mehrzad Mortazavi, Marzieh Tahaei, Francis Dutil, Thomas Fevens, Mohammad Havaei

    Abstract: In this paper, we propose a general framework in continual learning for generative models: Feature-oriented Continual Learning (FoCL). Unlike previous works that aim to solve the catastrophic forgetting problem by introducing regularization in the parameter space or image space, FoCL imposes regularization in the feature space. We show in our experiments that FoCL has faster adaptation to distribu… ▽ More

    Submitted 8 March, 2020; originally announced March 2020.

  30. arXiv:1908.05324  [pdf, other

    cs.CV

    Dual Adversarial Inference for Text-to-Image Synthesis

    Authors: Qicheng Lao, Mohammad Havaei, Ahmad Pesaranghader, Francis Dutil, Lisa Di Jorio, Thomas Fevens

    Abstract: Synthesizing images from a given text description involves engaging two types of information: the content, which includes information explicitly described in the text (e.g., color, composition, etc.), and the style, which is usually not well described in the text (e.g., location, quantity, size, etc.). However, in previous works, it is typically treated as a process of generating images only from… ▽ More

    Submitted 14 August, 2019; originally announced August 2019.

    Comments: Accepted to ICCV 2019

  31. arXiv:1905.11567  [pdf, other

    cs.CV eess.IV

    Case-Based Histopathological Malignancy Diagnosis using Convolutional Neural Networks

    Authors: Qicheng Lao, Thomas Fevens

    Abstract: In practice, histopathological diagnosis of tumor malignancy often requires a human expert to scan through histopathological images at multiple magnification levels, after which a final diagnosis can be accurately determined. However, previous research on such classification tasks using convolutional neural networks primarily determine a diagnosis for a single magnification level. In this paper, w… ▽ More

    Submitted 27 May, 2019; originally announced May 2019.

    Journal ref: British Machine Vision Conference (BMVC) 2017

  32. arXiv:1806.10128  [pdf, other

    cs.CV

    Leveraging Disease Progression Learning for Medical Image Recognition

    Authors: Qicheng Lao, Thomas Fevens, Boyu Wang

    Abstract: Unlike natural images, medical images often have intrinsic characteristics that can be leveraged for neural network learning. For example, images that belong to different stages of a disease may continuously follow a certain progression pattern. In this paper, we propose a novel method that leverages disease progression learning for medical image recognition. In our method, sequences of images ord… ▽ More

    Submitted 1 September, 2018; v1 submitted 26 June, 2018; originally announced June 2018.

    Journal ref: IEEE International Conference on Bioinformatics and Biomedicine (BIBM) 2018