Skip to main content

Showing 1–7 of 7 results for author: Dhakal, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.09021  [pdf, ps, other

    cs.SE cs.AI cs.PL

    AI-Mediated Code Comment Improvement

    Authors: Maria Dhakal, Chia-Yi Su, Robert Wallace, Chris Fakhimi, Aakash Bansal, Toby Li, Yu Huang, Collin McMillan

    Abstract: This paper describes an approach to improve code comments along different quality axes by rewriting those comments with customized Artificial Intelligence (AI)-based tools. We conduct an empirical study followed by grounded theory qualitative analysis to determine the quality axes to improve. Then we propose a procedure using a Large Language Model (LLM) to rewrite existing code comments along the… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

  2. arXiv:2505.02971  [pdf, other

    cs.CV

    Adversarial Robustness Analysis of Vision-Language Models in Medical Image Segmentation

    Authors: Anjila Budathoki, Manish Dhakal

    Abstract: Adversarial attacks have been fairly explored for computer vision and vision-language models. However, the avenue of adversarial attack for the vision language segmentation models (VLSMs) is still under-explored, especially for medical image analysis. Thus, we have investigated the robustness of VLSMs against adversarial attacks for 2D medical images with different modalities with radiology, pho… ▽ More

    Submitted 5 May, 2025; originally announced May 2025.

  3. arXiv:2410.05239  [pdf, other

    cs.CV cs.CL

    TuneVLSeg: Prompt Tuning Benchmark for Vision-Language Segmentation Models

    Authors: Rabin Adhikari, Safal Thapaliya, Manish Dhakal, Bishesh Khanal

    Abstract: Vision-Language Models (VLMs) have shown impressive performance in vision tasks, but adapting them to new domains often requires expensive fine-tuning. Prompt tuning techniques, including textual, visual, and multimodal prompting, offer efficient alternatives by leveraging learnable prompts. However, their application to Vision-Language Segmentation Models (VLSMs) and evaluation under significant… ▽ More

    Submitted 8 October, 2024; v1 submitted 7 October, 2024; originally announced October 2024.

    Comments: Accepted at ACCV 2024 (oral presentation)

  4. Automatic speech recognition for the Nepali language using CNN, bidirectional LSTM and ResNet

    Authors: Manish Dhakal, Arman Chhetri, Aman Kumar Gupta, Prabin Lamichhane, Suraj Pandey, Subarna Shakya

    Abstract: This paper presents an end-to-end deep learning model for Automatic Speech Recognition (ASR) that transcribes Nepali speech to text. The model was trained and tested on the OpenSLR (audio, text) dataset. The majority of the audio dataset have silent gaps at both ends which are clipped during dataset preprocessing for a more uniform mapping of audio frames and their corresponding texts. Mel Frequen… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: Accepted at 2022 International Conference on Inventive Computation Technologies (ICICT), IEEE

    Journal ref: 2022 International Conference on Inventive Computation Technologies (ICICT), pp. 515-521

  5. arXiv:2405.06196  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    VLSM-Adapter: Finetuning Vision-Language Segmentation Efficiently with Lightweight Blocks

    Authors: Manish Dhakal, Rabin Adhikari, Safal Thapaliya, Bishesh Khanal

    Abstract: Foundation Vision-Language Models (VLMs) trained using large-scale open-domain images and text pairs have recently been adapted to develop Vision-Language Segmentation Models (VLSMs) that allow providing text prompts during inference to guide image segmentation. If robust and powerful VLSMs can be built for medical images, it could aid medical professionals in many clinical tasks where they must s… ▽ More

    Submitted 27 June, 2024; v1 submitted 9 May, 2024; originally announced May 2024.

    Comments: Accepted at MICCAI 2024, the 27th International Conference on Medical Image Computing and Computer Assisted Intervention

  6. arXiv:2309.12829  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    Synthetic Boost: Leveraging Synthetic Data for Enhanced Vision-Language Segmentation in Echocardiography

    Authors: Rabin Adhikari, Manish Dhakal, Safal Thapaliya, Kanchan Poudel, Prasiddha Bhandari, Bishesh Khanal

    Abstract: Accurate segmentation is essential for echocardiography-based assessment of cardiovascular diseases (CVDs). However, the variability among sonographers and the inherent challenges of ultrasound images hinder precise segmentation. By leveraging the joint representation of image and text modalities, Vision-Language Segmentation Models (VLSMs) can incorporate rich contextual information, potentially… ▽ More

    Submitted 22 September, 2023; originally announced September 2023.

    Comments: Accepted at the 4th International Workshop of Advances in Simplifying Medical UltraSound (ASMUS)

  7. arXiv:2308.07706  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    Exploring Transfer Learning in Medical Image Segmentation using Vision-Language Models

    Authors: Kanchan Poudel, Manish Dhakal, Prasiddha Bhandari, Rabin Adhikari, Safal Thapaliya, Bishesh Khanal

    Abstract: Medical image segmentation allows quantifying target structure size and shape, aiding in disease diagnosis, prognosis, surgery planning, and comprehension.Building upon recent advancements in foundation Vision-Language Models (VLMs) from natural image-text pairs, several studies have proposed adapting them to Vision-Language Segmentation Models (VLSMs) that allow using language text as an addition… ▽ More

    Submitted 20 June, 2024; v1 submitted 15 August, 2023; originally announced August 2023.

    Comments: Medical Imaging with Deep Learning (MIDL) 2024 (Oral)