Skip to main content

Showing 1–50 of 322 results for author: Phan, H

.
  1. arXiv:2506.07821  [pdf, ps, other

    math.CO

    A Note on Reconfiguration Graphs of Cliques

    Authors: Quan N. Lam, Huu An Phan, Duc A. Hoang

    Abstract: In a reconfiguration setting, each clique of a graph $G$ is viewed as a set of tokens placed on vertices of $G$ such that no vertex has more than one token and any two tokens are adjacent. Additionally, three well-known reconfiguration rules have been studied in the literature: Token Jumping ($\mathsf{TJ}$, which involves moving a token to any unoccupied vertex), Token Sliding ($\mathsf{TS}$, whic… ▽ More

    Submitted 9 June, 2025; originally announced June 2025.

    Comments: 18 pages

  2. arXiv:2506.00736  [pdf, ps, other

    eess.AS cs.SD

    IMPACT: Iterative Mask-based Parallel Decoding for Text-to-Audio Generation with Diffusion Modeling

    Authors: Kuan-Po Huang, Shu-wen Yang, Huy Phan, Bo-Ru Lu, Byeonggeun Kim, Sashank Macha, Qingming Tang, Shalini Ghosh, Hung-yi Lee, Chieh-Chi Kao, Chao Wang

    Abstract: Text-to-audio generation synthesizes realistic sounds or music given a natural language prompt. Diffusion-based frameworks, including the Tango and the AudioLDM series, represent the state-of-the-art in text-to-audio generation. Despite achieving high audio fidelity, they incur significant inference latency due to the slow diffusion sampling process. MAGNET, a mask-based model operating on discret… ▽ More

    Submitted 31 May, 2025; originally announced June 2025.

    Comments: Accepted by ICML 2025. Project website: https://audio-impact.github.io/

  3. arXiv:2505.23143  [pdf, ps, other

    cs.CV

    Interpreting Chest X-rays Like a Radiologist: A Benchmark with Clinical Reasoning

    Authors: Jinquan Guan, Qi Chen, Lizhou Liang, Yuhang Liu, Vu Minh Hieu Phan, Minh-Son To, Jian Chen, Yutong Xie

    Abstract: Artificial intelligence (AI)-based chest X-ray (CXR) interpretation assistants have demonstrated significant progress and are increasingly being applied in clinical settings. However, contemporary medical AI models often adhere to a simplistic input-to-output paradigm, directly processing an image and an instruction to generate a result, where the instructions may be integral to the model's archit… ▽ More

    Submitted 29 May, 2025; originally announced May 2025.

    Comments: 10 pages (main text), 18 pages (appendix)

  4. arXiv:2505.15123  [pdf, ps, other

    cs.CV cs.AI

    Seeing the Trees for the Forest: Rethinking Weakly-Supervised Medical Visual Grounding

    Authors: Ta Duc Huy, Duy Anh Huynh, Yutong Xie, Yuankai Qi, Qi Chen, Phi Le Nguyen, Sen Kim Tran, Son Lam Phung, Anton van den Hengel, Zhibin Liao, Minh-Son To, Johan W. Verjans, Vu Minh Hieu Phan

    Abstract: Visual grounding (VG) is the capability to identify the specific regions in an image associated with a particular text description. In medical imaging, VG enhances interpretability by highlighting relevant pathological features corresponding to textual descriptions, improving model transparency and trustworthiness for wider adoption of deep learning models in clinical practice. Current models stru… ▽ More

    Submitted 21 May, 2025; originally announced May 2025.

    Comments: Under Review

  5. arXiv:2505.10312  [pdf, ps, other

    cs.HC cs.CV

    SOS: A Shuffle Order Strategy for Data Augmentation in Industrial Human Activity Recognition

    Authors: Anh Tuan Ha, Hoang Khang Phan, Thai Minh Tien Ngo, Anh Phan Truong, Nhat Tan Le

    Abstract: In the realm of Human Activity Recognition (HAR), obtaining high quality and variance data is still a persistent challenge due to high costs and the inherent variability of real-world activities. This study introduces a generation dataset by deep learning approaches (Attention Autoencoder and conditional Generative Adversarial Networks). Another problem that data heterogeneity is a critical challe… ▽ More

    Submitted 15 May, 2025; originally announced May 2025.

  6. arXiv:2505.00744  [pdf, other

    cs.CV

    Localizing Before Answering: A Hallucination Evaluation Benchmark for Grounded Medical Multimodal LLMs

    Authors: Dung Nguyen, Minh Khoi Ho, Huy Ta, Thanh Tam Nguyen, Qi Chen, Kumar Rav, Quy Duong Dang, Satwik Ramchandre, Son Lam Phung, Zhibin Liao, Minh-Son To, Johan Verjans, Phi Le Nguyen, Vu Minh Hieu Phan

    Abstract: Medical Large Multi-modal Models (LMMs) have demonstrated remarkable capabilities in medical data interpretation. However, these models frequently generate hallucinations contradicting source evidence, particularly due to inadequate localization reasoning. This work reveals a critical limitation in current medical LMMs: instead of analyzing relevant pathological regions, they often rely on linguis… ▽ More

    Submitted 21 May, 2025; v1 submitted 30 April, 2025; originally announced May 2025.

    Comments: Accepted at Joint Conference on Artificial Intelligence (IJCAI) 2025

  7. arXiv:2504.21815  [pdf, other

    eess.AS

    From Aesthetics to Human Preferences: Comparative Perspectives of Evaluating Text-to-Music Systems

    Authors: Huan Zhang, Jinhua Liang, Huy Phan, Wenwu Wang, Emmanouil Benetos

    Abstract: Evaluating generative models remains a fundamental challenge, particularly when the goal is to reflect human preferences. In this paper, we use music generation as a case study to investigate the gap between automatic evaluation metrics and human preferences. We conduct comparative experiments across five state-of-the-art music generation approaches, assessing both perceptual quality and distribut… ▽ More

    Submitted 30 April, 2025; originally announced April 2025.

  8. arXiv:2504.14757  [pdf, other

    cs.SE cs.AI

    SWE-Synth: Synthesizing Verifiable Bug-Fix Data to Enable Large Language Models in Resolving Real-World Bugs

    Authors: Minh V. T. Pham, Huy N. Phan, Hoang N. Phan, Cuong Le Chi, Tien N. Nguyen, Nghi D. Q. Bui

    Abstract: Large language models (LLMs) are transforming automated program repair (APR) through agent-based approaches that localize bugs, generate patches, and verify fixes. However, the lack of high-quality, scalable training datasets, especially those with verifiable outputs and intermediate reasoning traces-limits progress, particularly for open-source models. In this work, we present SWE-Synth, a framew… ▽ More

    Submitted 20 April, 2025; originally announced April 2025.

    Comments: Work in progress

  9. arXiv:2504.12849  [pdf, ps, other

    cs.LG

    FedX: Adaptive Model Decomposition and Quantization for IoT Federated Learning

    Authors: Phung Lai, Xiaopeng Jiang, Hai Phan, Cristian Borcea, Khang Tran, An Chen, Vijaya Datta Mayyuri, Ruoming Jin

    Abstract: Federated Learning (FL) allows collaborative training among multiple devices without data sharing, thus enabling privacy-sensitive applications on mobile or Internet of Things (IoT) devices, such as mobile health and asset tracking. However, designing an FL system with good model utility that works with low computation/communication overhead on heterogeneous, resource-constrained mobile/IoT device… ▽ More

    Submitted 9 June, 2025; v1 submitted 17 April, 2025; originally announced April 2025.

    Journal ref: The 21st Annual International Conference on Distributed Computing in Smart Systems and the Internet of Things (DCOSS-IoT 2025)

  10. arXiv:2504.11943  [pdf, ps, other

    math.CO cs.DM

    Dividing sums of cycles in the semiring of functional digraphs

    Authors: Florian Bridoux, Christophe Crespelle, Thi Ha Duong Phan, Adrien Richard

    Abstract: Functional digraphs are unlabelled finite digraphs where each vertex has exactly one out-neighbor. They are isomorphic classes of finite discrete-time dynamical systems. Endowed with the direct sum and product, functional digraphs form a semiring with an interesting multiplicative structure. For instance, we do not know if the following division problem can be solved in polynomial time: given two… ▽ More

    Submitted 16 April, 2025; originally announced April 2025.

    Comments: 25 pages

  11. arXiv:2504.02043  [pdf, other

    astro-ph.HE

    Transient gamma rays from the 2021 outburst of the recurrent nova RS Ophiuchi: the effect of gamma-ray absorption

    Authors: Vo Hong Minh Phan, Pierre Cristofari, Enrico Peretti, Vincent Tatischeff, Andrea Ciardi

    Abstract: In 2021, RS Ophiuchi was the first nova to be detected in the very-high-energy (TeV) gamma-ray domain, directly testifying of efficient acceleration of charged particles up to at least the TeV range at the nova shock. Surprisingly, the TeV gamma-ray signal peaks $\sim 2$ days after the GeV signal and the origin of this delay has still not been clearly understood. We investigate the possibility tha… ▽ More

    Submitted 2 April, 2025; originally announced April 2025.

    Comments: 10 pages, 5 figures, submitted

  12. arXiv:2504.00339  [pdf, other

    cs.CL cs.AI

    VNJPTranslate: A comprehensive pipeline for Vietnamese-Japanese translation

    Authors: Hoang Hai Phan, Nguyen Duc Minh Vu, Nam Dang Phuong

    Abstract: Neural Machine Translation (NMT) driven by Transformer architectures has advanced significantly, yet faces challenges with low-resource language pairs like Vietnamese-Japanese (Vi-Ja). Issues include sparse parallel data and handling linguistic/cultural nuances. Recent progress in Large Language Models (LLMs) with strong reasoning, often refined via Reinforcement Learning (RL), enables high-qualit… ▽ More

    Submitted 31 March, 2025; originally announced April 2025.

  13. arXiv:2503.22974  [pdf, ps, other

    hep-ph

    General Formulas for Loop-Induced Decays of $A \to Zγγ$ and Their Applications

    Authors: Dzung Tri Tran, Thanh Huy Nguyen, Khiem Hong Phan

    Abstract: Within the framework of the Standard Model Higgs extensions, including the Two-Higgs-Doublet Model with vector-like fermions and the Triplet-Higgs Model, we derive general one-loop contributions to the rare decay process $A \rightarrow Z γγ$. The analytical expressions are formulated with Passarino-Veltman scalar functions, which represent the scalar coefficients of one-loop Lorentz-covariant tens… ▽ More

    Submitted 29 March, 2025; originally announced March 2025.

    Comments: 24 pages, 4 Figures, 5 Tables

    Report number: DTU-2025-03

  14. arXiv:2503.11471  [pdf

    astro-ph.EP physics.geo-ph

    NH-rich organic compounds from the carbonaceous asteroid (162173) Ryugu: nanoscale spectral and isotopic characterizations

    Authors: L. G. Vacher, V. T. H. Phan, L. Bonal, M. Iskakova, O. Poch, P. Beck, E. Quirico, R. C. Ogliore

    Abstract: The detection of spectral bands at 3.06 um by MicrOmega, combined with the chemical identification of other NH-containing organic molecules in Ryugu samples, suggests the presence of potential NH-bearing compounds. However, the chemical forms of these NH-rich compounds, whether associated with N-rich organics, ammonium (NH4+) salts, NH4 or NH-organics-bearing phyllosilicates, or other forms, remai… ▽ More

    Submitted 14 March, 2025; originally announced March 2025.

    Comments: 7 figures, 1 table

  15. arXiv:2503.10693  [pdf, other

    cs.CV eess.IV

    Knowledge Consultation for Semi-Supervised Semantic Segmentation

    Authors: Thuan Than, Nhat-Anh Nguyen-Dang, Dung Nguyen, Salwa K. Al Khatib, Ahmed Elhagry, Hai Phan, Yihui He, Zhiqiang Shen, Marios Savvides, Dang Huynh

    Abstract: Semi-Supervised Semantic Segmentation reduces reliance on extensive annotations by using unlabeled data and state-of-the-art models to improve overall performance. Despite the success of deep co-training methods, their underlying mechanisms remain underexplored. This work revisits Cross Pseudo Supervision with dual heterogeneous backbones and introduces Knowledge Consultation (SegKC) to further en… ▽ More

    Submitted 12 March, 2025; originally announced March 2025.

  16. arXiv:2503.06873  [pdf, other

    cs.CV cs.AI cs.LG

    Interactive Medical Image Analysis with Concept-based Similarity Reasoning

    Authors: Ta Duc Huy, Sen Kim Tran, Phan Nguyen, Nguyen Hoang Tran, Tran Bao Sam, Anton van den Hengel, Zhibin Liao, Johan W. Verjans, Minh-Son To, Vu Minh Hieu Phan

    Abstract: The ability to interpret and intervene model decisions is important for the adoption of computer-aided diagnosis methods in clinical workflows. Recent concept-based methods link the model predictions with interpretable concepts and modify their activation scores to interact with the model. However, these concepts are at the image level, which hinders the model from pinpointing the exact patches th… ▽ More

    Submitted 11 March, 2025; v1 submitted 9 March, 2025; originally announced March 2025.

    Comments: Accepted CVPR2025

    Journal ref: CVPR 2025

  17. arXiv:2503.06405  [pdf, other

    cs.SD cs.AI eess.AS

    Heterogeneous bimodal attention fusion for speech emotion recognition

    Authors: Jiachen Luo, Huy Phan, Lin Wang, Joshua Reiss

    Abstract: Multi-modal emotion recognition in conversations is a challenging problem due to the complex and complementary interactions between different modalities. Audio and textual cues are particularly important for understanding emotions from a human perspective. Most existing studies focus on exploring interactions between audio and text modalities at the same representation level. However, a critical i… ▽ More

    Submitted 31 March, 2025; v1 submitted 8 March, 2025; originally announced March 2025.

  18. arXiv:2503.05858  [pdf, other

    cs.SD cs.AI cs.CL cs.MM eess.AS

    Bimodal Connection Attention Fusion for Speech Emotion Recognition

    Authors: Jiachen Luo, Huy Phan, Lin Wang, Joshua D. Reiss

    Abstract: Multi-modal emotion recognition is challenging due to the difficulty of extracting features that capture subtle emotional differences. Understanding multi-modal interactions and connections is key to building effective bimodal speech emotion recognition systems. In this work, we propose Bimodal Connection Attention Fusion (BCAF) method, which includes three main modules: the interactive connection… ▽ More

    Submitted 22 March, 2025; v1 submitted 8 March, 2025; originally announced March 2025.

  19. arXiv:2502.00016  [pdf

    cs.CY cs.HC

    Large Language Models for Education: ChemTAsk -- An Open-Source Paradigm for Automated Q&A in the Graduate Classroom

    Authors: Ryann M. Perez, Marie Shimogawa, Yanan Chang, Hoang Anh T. Phan, Jason G. Marmorstein, Evan S. K. Yanagawa, E. James Petersson

    Abstract: Large language models (LLMs) show promise for aiding graduate level education, but are limited by their training data and potential confabulations. We developed ChemTAsk, an open-source pipeline that combines LLMs with retrieval-augmented generation (RAG) to provide accurate, context-specific assistance. ChemTAsk utilizes course materials, including lecture transcripts and primary publications, to… ▽ More

    Submitted 6 February, 2025; v1 submitted 9 January, 2025; originally announced February 2025.

    Comments: 38 pages, 3 figures, 1 table

  20. arXiv:2501.16360  [pdf, other

    cs.LG cs.AI

    Momentum Contrastive Learning with Enhanced Negative Sampling and Hard Negative Filtering

    Authors: Duy Hoang, Huy Ngo, Khoi Pham, Tri Nguyen, Gia Bao, Huy Phan

    Abstract: Contrastive learning has become pivotal in unsupervised representation learning, with frameworks like Momentum Contrast (MoCo) effectively utilizing large negative sample sets to extract discriminative features. However, traditional approaches often overlook the full potential of key embeddings and are susceptible to performance degradation from noisy negative samples in the memory bank. This stud… ▽ More

    Submitted 20 January, 2025; originally announced January 2025.

  21. arXiv:2501.15239  [pdf, ps, other

    hep-ph

    One-loop induced contributions to the rare decay of $A_0 \rightarrow h_0h_0γ$ in Two Higgs Doublet Models

    Authors: Dzung Tri Tran, L. T. Hue, Thanh Huy Nguyen, Vo Quoc Phong, Khiem Hong Phan

    Abstract: The analytic expressions for one-loop contributions to the rare decay process $A_0 \rightarrow h_0h_0γ$ within the CP-conserving of Two Higgs Doublet Models are first reported in this paper. Analytic results are presented in term of scalar one-loop Passarino-Veltman functions following the standard output of the packages~{\tt LoopTools} and {\tt Collier}. In this context, physical results for the… ▽ More

    Submitted 25 January, 2025; originally announced January 2025.

    Comments: 30 pages, 4 Tables, 6 Figures

    Report number: DTU-2025-02

  22. arXiv:2501.04889  [pdf, other

    math.OC

    Projected proximal gradient trust-region algorithm for nonsmooth optimization

    Authors: Minh N. Dao, Hung M. Phan, Lindon Roberts

    Abstract: We consider trust-region methods for solving optimization problems where the objective is the sum of a smooth, nonconvex function and a nonsmooth, convex regularizer. We extend the global convergence theory of such methods to include worst-case complexity bounds in the case of unbounded model Hessian growth, and introduce a new, simple nonsmooth trust-region subproblem solver based on combining se… ▽ More

    Submitted 8 January, 2025; originally announced January 2025.

  23. arXiv:2501.03464  [pdf, other

    cs.SD cs.AI eess.AS

    LHGNN: Local-Higher Order Graph Neural Networks For Audio Classification and Tagging

    Authors: Shubhr Singh, Emmanouil Benetos, Huy Phan, Dan Stowell

    Abstract: Transformers have set new benchmarks in audio processing tasks, leveraging self-attention mechanisms to capture complex patterns and dependencies within audio data. However, their focus on pairwise interactions limits their ability to process the higher-order relations essential for identifying distinct audio objects. To address this limitation, this work introduces the Local- Higher Order Graph N… ▽ More

    Submitted 29 January, 2025; v1 submitted 6 January, 2025; originally announced January 2025.

  24. arXiv:2501.01392  [pdf, other

    eess.IV cs.CV

    ProjectedEx: Enhancing Generation in Explainable AI for Prostate Cancer

    Authors: Xuyin Qi, Zeyu Zhang, Aaron Berliano Handoko, Huazhan Zheng, Mingxi Chen, Ta Duc Huy, Vu Minh Hieu Phan, Lei Zhang, Linqi Cheng, Shiyu Jiang, Zhiwei Zhang, Zhibin Liao, Yang Zhao, Minh-Son To

    Abstract: Prostate cancer, a growing global health concern, necessitates precise diagnostic tools, with Magnetic Resonance Imaging (MRI) offering high-resolution soft tissue imaging that significantly enhances diagnostic accuracy. Recent advancements in explainable AI and representation learning have significantly improved prostate cancer diagnosis by enabling automated and precise lesion classification. Ho… ▽ More

    Submitted 2 January, 2025; originally announced January 2025.

  25. arXiv:2412.17610  [pdf, other

    cs.CV

    Personalized Large Vision-Language Models

    Authors: Chau Pham, Hoang Phan, David Doermann, Yunjie Tian

    Abstract: The personalization model has gained significant attention in image generation yet remains underexplored for large vision-language models (LVLMs). Beyond generic ones, with personalization, LVLMs handle interactive dialogues using referential concepts (e.g., ``Mike and Susan are talking.'') instead of the generic form (e.g., ``a boy and a girl are talking.''), making the conversation more customiz… ▽ More

    Submitted 23 December, 2024; originally announced December 2024.

    Comments: A simple way to personalize your LLM

  26. arXiv:2412.11353  [pdf, other

    physics.optics cond-mat.mes-hall

    Wilson Loop and Topological Properties in 3D Woodpile Photonic Crystal

    Authors: Huyen Thanh Phan, Shun Takahashi, Satoshi Iwamoto, Katsunori Wakabayashi

    Abstract: We numerically study the first and the second order topological states of electromagnetic (EM) wave in the three-dimensional (3D) woodpile photonic crystal (PhC). The recent studies on 3D PhCs have mainly focused on the observation of the topological states. Here, we not only focus on finding the topological states but also propose a numerical calculation method for topological invariants, which i… ▽ More

    Submitted 15 December, 2024; originally announced December 2024.

    Comments: 10 pages, 6 figures

    Journal ref: Phys. Rev. B110, 235429 (2024)

  27. arXiv:2412.02542  [pdf, other

    cs.CV cs.LG

    Unveiling Concept Attribution in Diffusion Models

    Authors: Quang H. Nguyen, Hoang Phan, Khoa D. Doan

    Abstract: Diffusion models have shown remarkable abilities in generating realistic and high-quality images from text prompts. However, a trained model remains largely black-box; little do we know about the roles of its components in exhibiting a concept such as objects or styles. Recent works employ causal tracing to localize knowledge-storing layers in generative models without showing how other layers con… ▽ More

    Submitted 12 March, 2025; v1 submitted 3 December, 2024; originally announced December 2024.

  28. Fast ground-to-air transition with avian-inspired multifunctional legs

    Authors: Won Dong Shin, Hoang-Vu Phan, Monica A. Daley, Auke J. Ijspeert, Dario Floreano

    Abstract: Most birds can navigate seamlessly between aerial and terrestrial environments. Whereas the forelimbs evolved into wings primarily for flight, the hindlimbs serve diverse functions such as walking, hopping, and leaping, and jumping take-off for transitions into flight. These capabilities have inspired engineers to aim for similar multi-modality in aerial robots, expanding their range of applicatio… ▽ More

    Submitted 3 December, 2024; originally announced December 2024.

    Journal ref: Nature volume 636 pages 86-91 (2024)

  29. arXiv:2411.13802  [pdf, other

    cs.CL

    SemiKong: Curating, Training, and Evaluating A Semiconductor Industry-Specific Large Language Model

    Authors: Christopher Nguyen, William Nguyen, Atsushi Suzuki, Daisuke Oku, Hong An Phan, Sang Dinh, Zooey Nguyen, Anh Ha, Shruti Raghavan, Huy Vo, Thang Nguyen, Lan Nguyen, Yoshikuni Hirayama

    Abstract: Large Language Models (LLMs) have demonstrated the potential to address some issues within the semiconductor industry. However, they are often general-purpose models that lack the specialized knowledge needed to tackle the unique challenges of this sector, such as the intricate physics and chemistry of semiconductor devices and processes. SemiKong, the first industry-specific LLM for the semicondu… ▽ More

    Submitted 21 November, 2024; v1 submitted 20 November, 2024; originally announced November 2024.

    Comments: On-going work

  30. arXiv:2411.12195  [pdf, other

    cs.CV

    A Survey of Medical Vision-and-Language Applications and Their Techniques

    Authors: Qi Chen, Ruoshan Zhao, Sinuo Wang, Vu Minh Hieu Phan, Anton van den Hengel, Johan Verjans, Zhibin Liao, Minh-Son To, Yong Xia, Jian Chen, Yutong Xie, Qi Wu

    Abstract: Medical vision-and-language models (MVLMs) have attracted substantial interest due to their capability to offer a natural language interface for interpreting complex medical data. Their applications are versatile and have the potential to improve diagnostic accuracy and decision-making for individual patients while also contributing to enhanced public health monitoring, disease surveillance, and p… ▽ More

    Submitted 18 November, 2024; originally announced November 2024.

  31. arXiv:2411.04168  [pdf, other

    cs.CV cs.AI

    DiMSUM: Diffusion Mamba -- A Scalable and Unified Spatial-Frequency Method for Image Generation

    Authors: Hao Phung, Quan Dao, Trung Dao, Hoang Phan, Dimitris Metaxas, Anh Tran

    Abstract: We introduce a novel state-space architecture for diffusion models, effectively harnessing spatial and frequency information to enhance the inductive bias towards local features in input images for image generation tasks. While state-space networks, including Mamba, a revolutionary advancement in recurrent neural networks, typically scan input sequences from left to right, they face difficulties i… ▽ More

    Submitted 10 April, 2025; v1 submitted 6 November, 2024; originally announced November 2024.

    Comments: Accepted to NeurIPS 2024. Project page: https://vinairesearch.github.io/DiMSUM/

  32. arXiv:2411.02715  [pdf, other

    cs.CV

    CIT: Rethinking Class-incremental Semantic Segmentation with a Class Independent Transformation

    Authors: Jinchao Ge, Bowen Zhang, Akide Liu, Minh Hieu Phan, Qi Chen, Yangyang Shu, Yang Zhao

    Abstract: Class-incremental semantic segmentation (CSS) requires that a model learn to segment new classes without forgetting how to segment previous ones: this is typically achieved by distilling the current knowledge and incorporating the latest data. However, bypassing iterative distillation by directly transferring outputs of initial classes to the current learning task is not supported in existing clas… ▽ More

    Submitted 4 November, 2024; originally announced November 2024.

    Comments: 11 pages, 5 figures

  33. arXiv:2410.23402  [pdf, other

    cs.SE

    VisualCoder: Guiding Large Language Models in Code Execution with Fine-grained Multimodal Chain-of-Thought Reasoning

    Authors: Cuong Chi Le, Hoang-Chau Truong-Vinh, Huy Nhat Phan, Dung Duy Le, Tien N. Nguyen, Nghi D. Q. Bui

    Abstract: Predicting program behavior and reasoning about code execution remain significant challenges in software engineering, particularly for large language models (LLMs) designed for code analysis. While these models excel at understanding static syntax, they often struggle with dynamic reasoning tasks. We introduce VisualCoder, a simple yet effective approach that enhances code reasoning by integrating… ▽ More

    Submitted 9 February, 2025; v1 submitted 30 October, 2024; originally announced October 2024.

    Comments: NAACL 2025

  34. arXiv:2410.19793  [pdf, other

    eess.SP cs.AI cs.HC cs.SD eess.AS q-bio.NC

    Single-word Auditory Attention Decoding Using Deep Learning Model

    Authors: Nhan Duc Thanh Nguyen, Huy Phan, Kaare Mikkelsen, Preben Kidmose

    Abstract: Identifying auditory attention by comparing auditory stimuli and corresponding brain responses, is known as auditory attention decoding (AAD). The majority of AAD algorithms utilize the so-called envelope entrainment mechanism, whereby auditory attention is identified by how the envelope of the auditory stream drives variation in the electroencephalography (EEG) signal. However, neural processing… ▽ More

    Submitted 15 October, 2024; originally announced October 2024.

    Comments: 5 pages, 3 figures

  35. arXiv:2410.13059  [pdf, other

    cs.SD eess.AS eess.SP

    AADNet: An End-to-End Deep Learning Model for Auditory Attention Decoding

    Authors: Nhan Duc Thanh Nguyen, Huy Phan, Simon Geirnaert, Kaare Mikkelsen, Preben Kidmose

    Abstract: Auditory attention decoding (AAD) is the process of identifying the attended speech in a multi-talker environment using brain signals, typically recorded through electroencephalography (EEG). Over the past decade, AAD has undergone continuous development, driven by its promising application in neuro-steered hearing devices. Most AAD algorithms are relying on the increase in neural entrainment to t… ▽ More

    Submitted 16 October, 2024; originally announced October 2024.

    Comments: 11 pages, 6 figures

  36. arXiv:2410.06827  [pdf, ps, other

    hep-ph

    One-loop analytical expressions for $γγ\rightarrow φ_iφ_j$ in Higgs Extensions of the Standard Models and its applications

    Authors: Khiem Hong Phan, Dzung Tri Tran, Thanh Huy Nguyen

    Abstract: General one-loop formulas for loop-induced processes $γγ\rightarrow φ_iφ_j$ with $φ_iφ_j = hh,~hH,~HH$ are presented in the paper. Analytic expressions evaluated in this work are valid for a class of Higgs Extensions of the Standard Models, e.g. Inert Doublet Higgs Models, Two Higgs Doublet Models, Zee-Babu Models as well as Triplet Higgs Models, etc. Analytic expressions for one-loop form factors… ▽ More

    Submitted 9 October, 2024; originally announced October 2024.

    Comments: 44 pages, 3 table and 11 Figures

    Report number: DTU_2024-07

  37. arXiv:2410.04327  [pdf, other

    cs.LG

    Leveraging Hierarchical Taxonomies in Prompt-based Continual Learning

    Authors: Quyen Tran, Hoang Phan, Minh Le, Tuan Truong, Dinh Phung, Linh Ngo, Thien Nguyen, Nhat Ho, Trung Le

    Abstract: Humans perceive the world as a series of sequential events, which can be hierarchically organized with different levels of abstraction based on conceptual knowledge. Drawing inspiration from human learning behaviors, this work proposes a novel approach to mitigate catastrophic forgetting in Prompt-based Continual Learning models by exploiting the relationships between continuously emerging class d… ▽ More

    Submitted 8 March, 2025; v1 submitted 5 October, 2024; originally announced October 2024.

  38. arXiv:2409.16299  [pdf, other

    cs.SE cs.AI

    HyperAgent: Generalist Software Engineering Agents to Solve Coding Tasks at Scale

    Authors: Huy Nhat Phan, Tien N. Nguyen, Phong X. Nguyen, Nghi D. Q. Bui

    Abstract: Large Language Models (LLMs) have revolutionized software engineering (SE), showcasing remarkable proficiency in various coding tasks. Despite recent advancements that have enabled the creation of autonomous software agents utilizing LLMs for end-to-end development tasks, these systems are typically designed for specific SE functions. We introduce HyperAgent, an innovative generalist multi-agent s… ▽ More

    Submitted 5 November, 2024; v1 submitted 9 September, 2024; originally announced September 2024.

    Comments: 49 pages

  39. arXiv:2409.04104  [pdf, other

    cs.LG cs.AI cs.CV cs.HC eess.SP

    MixNet: Joining Force of Classical and Modern Approaches Toward the Comprehensive Pipeline in Motor Imagery EEG Classification

    Authors: Phairot Autthasan, Rattanaphon Chaisaen, Huy Phan, Maarten De Vos, Theerawit Wilaiprasitporn

    Abstract: Recent advances in deep learning (DL) have significantly impacted motor imagery (MI)-based brain-computer interface (BCI) systems, enhancing the decoding of electroencephalography (EEG) signals. However, most studies struggle to identify discriminative patterns across subjects during MI tasks, limiting MI classification performance. In this article, we propose MixNet, a novel classification framew… ▽ More

    Submitted 6 September, 2024; originally announced September 2024.

    Comments: Supplementary materials and source codes are available on-line at https://github.com/Max-Phairot-A/MixNet

    Journal ref: IEEE Internet of Things Journal 2024

  40. arXiv:2409.01390  [pdf, other

    hep-ph

    $(g-2)_{e,μ}$ and Lepton flavor violating decays in a left-right model

    Authors: L. T. Hue, Khiem Hong Phan, T. T. Hong, T. Phong Nguyen, N. H. T. Nha

    Abstract: General expressions for one-loop contributions associated with lepton-flavor violating decays of the standard model-like Higgs boson $h\to e_b^\pm e_a^\mp$ and gauge boson $Z\to e^\pm_b e_a^\mp$ are introduced in the unitary gauge. The results are used to discuss these decays as new physics signals in a minimal left-right symmetric model containing only one bidoublet Higgs and a $SU(2)_R$ Higgs do… ▽ More

    Submitted 14 December, 2024; v1 submitted 2 September, 2024; originally announced September 2024.

    Comments: 42 pages, 4 figures. Consistent with the version published by EPJC

  41. arXiv:2409.00662  [pdf, ps, other

    hep-ph

    Processes $γγ\rightarrow φ_iφ_j$ in Inert Higgs Doublet Models and Two Higgs Doublet Models

    Authors: Khiem Hong Phan, Dzung Tri Tran, Thanh Huy Nguyen

    Abstract: In this paper, we present a phenomenological analysis of one-loop induced processes $γγ\rightarrow φ_iφ_j$, where the CP-even Higgs bosons are denoted as $φ_{i,j} \equiv h,~H$, in high-energy photon-photon collisions, within the frameworks of the Inert Higgs Doublet Model and the Two Higgs Doublet Model. The total cross sections are evaluated as functions of the center-of-mass energy, finding that… ▽ More

    Submitted 7 April, 2025; v1 submitted 1 September, 2024; originally announced September 2024.

    Comments: 31 pages, 8 figures of data, 6 Tables, typos are corrected

    Report number: DTU_2024-04

  42. arXiv:2408.14227  [pdf, other

    cs.CV

    TC-PDM: Temporally Consistent Patch Diffusion Models for Infrared-to-Visible Video Translation

    Authors: Anh-Dzung Doan, Vu Minh Hieu Phan, Surabhi Gupta, Markus Wagner, Tat-Jun Chin, Ian Reid

    Abstract: Infrared imaging offers resilience against changing lighting conditions by capturing object temperatures. Yet, in few scenarios, its lack of visual details compared to daytime visible images, poses a significant challenge for human and machine interpretation. This paper proposes a novel diffusion method, dubbed Temporally Consistent Patch Diffusion Models (TC-DPM), for infrared-to-visible video tr… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

    Comments: Technical report

  43. arXiv:2408.13491  [pdf, other

    cs.CV

    ESA: Annotation-Efficient Active Learning for Semantic Segmentation

    Authors: Jinchao Ge, Zeyu Zhang, Minh Hieu Phan, Bowen Zhang, Akide Liu, Yang Zhao

    Abstract: Active learning enhances annotation efficiency by selecting the most revealing samples for labeling, thereby reducing reliance on extensive human input. Previous methods in semantic segmentation have centered on individual pixels or small areas, neglecting the rich patterns in natural images and the power of advanced pre-trained models. To address these challenges, we propose three key contributio… ▽ More

    Submitted 24 August, 2024; originally announced August 2024.

  44. arXiv:2408.11800  [pdf, ps, other

    cs.CL

    WeQA: A Benchmark for Retrieval Augmented Generation in Wind Energy Domain

    Authors: Rounak Meyur, Hung Phan, Sridevi Wagle, Jan Strube, Mahantesh Halappanavar, Sameera Horawalavithana, Anurag Acharya, Sai Munikoti

    Abstract: Wind energy project assessments present significant challenges for decision-makers, who must navigate and synthesize hundreds of pages of environmental and scientific documentation. These documents often span different regions and project scales, covering multiple domains of expertise. This process traditionally demands immense time and specialized knowledge from decision-makers. The advent of Lar… ▽ More

    Submitted 9 June, 2025; v1 submitted 21 August, 2024; originally announced August 2024.

    Comments: 8 pages without Limitation and References

  45. arXiv:2408.02816  [pdf, other

    cs.SE

    CodeFlow: Program Behavior Prediction with Dynamic Dependencies Learning

    Authors: Cuong Chi Le, Hoang Nhat Phan, Huy Nhat Phan, Tien N. Nguyen, Nghi D. Q. Bui

    Abstract: Predicting program behavior without execution is a critical task in software engineering. Existing models often fall short in capturing the dynamic dependencies among program elements. To address this, we present CodeFlow, a novel machine learning-based approach that predicts code coverage and detects runtime errors by learning both static and dynamic dependencies within the code. By using control… ▽ More

    Submitted 9 February, 2025; v1 submitted 5 August, 2024; originally announced August 2024.

    Comments: FORGE 2025

  46. arXiv:2408.02001  [pdf, other

    cs.CV

    AdaCBM: An Adaptive Concept Bottleneck Model for Explainable and Accurate Diagnosis

    Authors: Townim F. Chowdhury, Vu Minh Hieu Phan, Kewen Liao, Minh-Son To, Yutong Xie, Anton van den Hengel, Johan W. Verjans, Zhibin Liao

    Abstract: The integration of vision-language models such as CLIP and Concept Bottleneck Models (CBMs) offers a promising approach to explaining deep neural network (DNN) decisions using concepts understandable by humans, addressing the black-box concern of DNNs. While CLIP provides both explainability and zero-shot classification capability, its pre-training on generic image and text data may limit its clas… ▽ More

    Submitted 4 August, 2024; originally announced August 2024.

    Comments: Accepted at MICCAI 2024, the 27th International Conference on Medical Image Computing and Computer Assisted Intervention

  47. arXiv:2407.19546  [pdf, other

    cs.CV

    MMCLIP: Cross-modal Attention Masked Modelling for Medical Language-Image Pre-Training

    Authors: Biao Wu, Yutong Xie, Zeyu Zhang, Minh Hieu Phan, Qi Chen, Ling Chen, Qi Wu

    Abstract: Vision-and-language pretraining (VLP) in the medical field utilizes contrastive learning on image-text pairs to achieve effective transfer across tasks. Yet, current VLP approaches with the masked modeling strategy face two challenges when applied to the medical domain. First, current models struggle to accurately reconstruct key pathological features due to the scarcity of medical data. Second, m… ▽ More

    Submitted 16 April, 2025; v1 submitted 28 July, 2024; originally announced July 2024.

  48. arXiv:2407.18180  [pdf

    physics.bio-ph cs.RO

    Passive wing deployment and retraction in beetles and flapping microrobots

    Authors: Hoang-Vu Phan, Hoon Cheol Park, Dario Floreano

    Abstract: Birds, bats and many insects can tuck their wings against their bodies at rest and deploy them to power flight. Whereas birds and bats use well-developed pectoral and wing muscles and tendons, how insects control these movements remains unclear, as mechanisms of wing deployment and retraction vary among insect species. Beetles (Coleoptera) display one of the most complex wing mechanisms. For examp… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

    Comments: 20 pages, 10 figures

    Journal ref: Nature 632 (2024) 1-6

  49. arXiv:2407.14477  [pdf, other

    cs.LG

    Data-Centric Human Preference Optimization with Rationales

    Authors: Hoang Anh Just, Ming Jin, Anit Sahu, Huy Phan, Ruoxi Jia

    Abstract: Reinforcement learning from human feedback plays a crucial role in aligning language models towards human preferences, traditionally represented through comparisons between pairs or sets of responses within a given context. While many studies have enhanced algorithmic techniques to optimize learning from such data, this work shifts focus to improving preference learning through a data-centric appr… ▽ More

    Submitted 3 August, 2024; v1 submitted 19 July, 2024; originally announced July 2024.

    Comments: Data-Centric Human Preference Learning with Rationales

  50. arXiv:2407.07321  [pdf, ps, other

    cs.CL

    Benchmarking LLMs for Environmental Review and Permitting

    Authors: Rounak Meyur, Hung Phan, Koby Hayashi, Ian Stewart, Shivam Sharma, Sarthak Chaturvedi, Mike Parker, Dan Nally, Sadie Montgomery, Karl Pazdernik, Ali Jannesari, Mahantesh Halappanavar, Sai Munikoti, Sameera Horawalavithana, Anurag Acharya

    Abstract: The National Environment Policy Act (NEPA) stands as a foundational piece of environmental legislation in the United States, requiring federal agencies to consider the environmental impacts of their proposed actions. The primary mechanism for achieving this is through the preparation of Environmental Assessments (EAs) and, for significant impacts, comprehensive Environmental Impact Statements (EIS… ▽ More

    Submitted 11 June, 2025; v1 submitted 9 July, 2024; originally announced July 2024.

    Comments: 15 pages