Skip to main content

Showing 1–50 of 93 results for author: Vu, D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2507.02187  [pdf, ps, other

    cs.HC

    VergeIO: Depth-Aware Eye Interaction on Glasses

    Authors: Xiyuxing Zhang, Duc Vu, Chengyi Shen, Yuntao Wang, Yuanchun Shi, Justin Chan

    Abstract: There is growing industry interest in creating unobtrusive designs for electrooculography (EOG) sensing of eye gestures on glasses (e.g. JINS MEME and Apple eyewear). We present VergeIO, the first EOG-based glasses that enables depth-aware eye interaction using vergence with an optimized electrode layout and novel smart glass prototype. It can distinguish between four and six depth-based eye gestu… ▽ More

    Submitted 2 July, 2025; originally announced July 2025.

  2. arXiv:2506.22760  [pdf, ps, other

    cs.CL

    Jan-nano Technical Report

    Authors: Alan Dao, Dinh Bach Vu

    Abstract: Most language models face a fundamental tradeoff where powerful capabilities require substantial computational resources. We shatter this constraint with Jan-nano, a 4B parameter language model that redefines efficiency through radical specialization: instead of trying to know everything, it masters the art of finding anything instantly. Fine-tuned from Qwen3-4B using our novel multi-stage RLVR sy… ▽ More

    Submitted 28 June, 2025; originally announced June 2025.

  3. arXiv:2506.20944  [pdf, ps, other

    cs.MM cs.CR

    E-FreeM2: Efficient Training-Free Multi-Scale and Cross-Modal News Verification via MLLMs

    Authors: Van-Hoang Phan, Long-Khanh Pham, Dang Vu, Anh-Duy Tran, Minh-Son Dao

    Abstract: The rapid spread of misinformation in mobile and wireless networks presents critical security challenges. This study introduces a training-free, retrieval-based multimodal fact verification system that leverages pretrained vision-language models and large language models for credibility assessment. By dynamically retrieving and cross-referencing trusted data sources, our approach mitigates vulnera… ▽ More

    Submitted 25 June, 2025; originally announced June 2025.

    Comments: Accepted to AsiaCCS 2025 @ SCID

  4. arXiv:2506.14835  [pdf, ps, other

    cs.CV

    MonoVQD: Monocular 3D Object Detection with Variational Query Denoising and Self-Distillation

    Authors: Kiet Dang Vu, Trung Thai Tran, Duc Dung Nguyen

    Abstract: Precisely localizing 3D objects from a single image constitutes a central challenge in monocular 3D detection. While DETR-like architectures offer a powerful paradigm, their direct application in this domain encounters inherent limitations, preventing optimal performance. Our work addresses these challenges by introducing MonoVQD, a novel framework designed to fundamentally advance DETR-based mono… ▽ More

    Submitted 14 June, 2025; originally announced June 2025.

  5. arXiv:2506.12437  [pdf

    cs.HC cs.AI cs.CY

    Feeling Machines: Ethics, Culture, and the Rise of Emotional AI

    Authors: Vivek Chavan, Arsen Cenaj, Shuyuan Shen, Ariane Bar, Srishti Binwani, Tommaso Del Becaro, Marius Funk, Lynn Greschner, Roberto Hung, Stina Klein, Romina Kleiner, Stefanie Krause, Sylwia Olbrych, Vishvapalsinhji Parmar, Jaleh Sarafraz, Daria Soroko, Daksitha Withanage Don, Chang Zhou, Hoang Thuy Duong Vu, Parastoo Semnani, Daniel Weinhardt, Elisabeth Andre, Jörg Krüger, Xavier Fresquet

    Abstract: This paper explores the growing presence of emotionally responsive artificial intelligence through a critical and interdisciplinary lens. Bringing together the voices of early-career researchers from multiple fields, it explores how AI systems that simulate or interpret human emotions are reshaping our interactions in areas such as education, healthcare, mental health, caregiving, and digital life… ▽ More

    Submitted 14 June, 2025; originally announced June 2025.

    Comments: From the Spring School 2025 by AI Grid and SCAI (Sorbonne University), 16 pages

  6. arXiv:2506.09162  [pdf

    eess.IV cs.CV

    The RSNA Lumbar Degenerative Imaging Spine Classification (LumbarDISC) Dataset

    Authors: Tyler J. Richards, Adam E. Flanders, Errol Colak, Luciano M. Prevedello, Robyn L. Ball, Felipe Kitamura, John Mongan, Maryam Vazirabad, Hui-Ming Lin, Anne Kendell, Thanat Kanthawang, Salita Angkurawaranon, Emre Altinmakas, Hakan Dogan, Paulo Eduardo de Aguiar Kuriki, Arjuna Somasundaram, Christopher Ruston, Deniz Bulja, Naida Spahovic, Jennifer Sommer, Sirui Jiang, Eduardo Moreno Judice de Mattos Farina, Eduardo Caminha Nunes, Michael Brassil, Megan McNamara , et al. (11 additional authors not shown)

    Abstract: The Radiological Society of North America (RSNA) Lumbar Degenerative Imaging Spine Classification (LumbarDISC) dataset is the largest publicly available dataset of adult MRI lumbar spine examinations annotated for degenerative changes. The dataset includes 2,697 patients with a total of 8,593 image series from 8 institutions across 6 countries and 5 continents. The dataset is available for free fo… ▽ More

    Submitted 10 June, 2025; originally announced June 2025.

  7. arXiv:2505.21441  [pdf, ps, other

    stat.ML cs.AI cs.LG

    Autoencoding Random Forests

    Authors: Binh Duc Vu, Jan Kapar, Marvin Wright, David S. Watson

    Abstract: We propose a principled method for autoencoding with random forests. Our strategy builds on foundational results from nonparametric statistics and spectral graph theory to learn a low-dimensional embedding of the model that optimally represents relationships in the data. We provide exact and approximate solutions to the decoding problem via constrained optimization, split relabeling, and nearest n… ▽ More

    Submitted 27 May, 2025; originally announced May 2025.

    Comments: 10 pages main text, 25 pages total. 5 figures main text, 9 figures total

  8. arXiv:2505.17417  [pdf, ps, other

    eess.AS cs.CL cs.SD

    Speechless: Speech Instruction Training Without Speech for Low Resource Languages

    Authors: Alan Dao, Dinh Bach Vu, Huy Hoang Ha, Tuan Le Duc Anh, Shreyas Gopal, Yue Heng Yeo, Warren Keng Hoong Low, Eng Siong Chng, Jia Qi Yip

    Abstract: The rapid growth of voice assistants powered by large language models (LLM) has highlighted a need for speech instruction data to train these systems. Despite the abundance of speech recognition data, there is a notable scarcity of speech instruction data, which is essential for fine-tuning models to understand and execute spoken commands. Generating high-quality synthetic speech requires a good t… ▽ More

    Submitted 22 May, 2025; originally announced May 2025.

    Comments: This paper was accepted by INTERSPEECH 2025

  9. arXiv:2505.15570  [pdf, ps, other

    cs.LG eess.SP

    Refining Neural Activation Patterns for Layer-Level Concept Discovery in Neural Network-Based Receivers

    Authors: Marko Tuononen, Duy Vu, Dani Korpi, Vesa Starck, Ville Hautamäki

    Abstract: Concept discovery in neural networks often targets individual neurons or human-interpretable features, overlooking distributed layer-wide patterns. We study the Neural Activation Pattern (NAP) methodology, which clusters full-layer activation distributions to identify such layer-level concepts. Applied to visual object recognition and radio receiver models, we propose improved normalization, distr… ▽ More

    Submitted 21 May, 2025; originally announced May 2025.

    Comments: 46 pages, 40 figures, 28 tables, 10 equations, and 5 listings

    MSC Class: 68T07 (Primary) 62H30; 94A05 (Secondary) ACM Class: I.2.6; I.5.3; C.2.1

  10. arXiv:2504.18729  [pdf, other

    cs.LG

    Multimodal graph representation learning for website generation based on visual sketch

    Authors: Tung D. Vu, Chung Hoang, Truong-Son Hy

    Abstract: The Design2Code problem, which involves converting digital designs into functional source code, is a significant challenge in software development due to its complexity and time-consuming nature. Traditional approaches often struggle with accurately interpreting the intricate visual details and structural relationships inherent in webpage designs, leading to limitations in automation and efficienc… ▽ More

    Submitted 25 April, 2025; originally announced April 2025.

  11. arXiv:2504.15252  [pdf, other

    cs.AI cs.CV cs.LG

    SuoiAI: Building a Dataset for Aquatic Invertebrates in Vietnam

    Authors: Tue Vo, Lakshay Sharma, Tuan Dinh, Khuong Dinh, Trang Nguyen, Trung Phan, Minh Do, Duong Vu

    Abstract: Understanding and monitoring aquatic biodiversity is critical for ecological health and conservation efforts. This paper proposes SuoiAI, an end-to-end pipeline for building a dataset of aquatic invertebrates in Vietnam and employing machine learning (ML) techniques for species classification. We outline the methods for data collection, annotation, and model training, focusing on reducing annotati… ▽ More

    Submitted 21 April, 2025; originally announced April 2025.

    Comments: Published as a workshop paper at "Tackling Climate Change with Machine Learning", ICLR 2025

  12. arXiv:2504.03292  [pdf, other

    cs.CV

    FaR: Enhancing Multi-Concept Text-to-Image Diffusion via Concept Fusion and Localized Refinement

    Authors: Gia-Nghia Tran, Quang-Huy Che, Trong-Tai Dam Vu, Bich-Nga Pham, Vinh-Tiep Nguyen, Trung-Nghia Le, Minh-Triet Tran

    Abstract: Generating multiple new concepts remains a challenging problem in the text-to-image task. Current methods often overfit when trained on a small number of samples and struggle with attribute leakage, particularly for class-similar subjects (e.g., two specific dogs). In this paper, we introduce Fuse-and-Refine (FaR), a novel approach that tackles these challenges through two key contributions: Conce… ▽ More

    Submitted 4 April, 2025; originally announced April 2025.

  13. arXiv:2504.00339  [pdf, other

    cs.CL cs.AI

    VNJPTranslate: A comprehensive pipeline for Vietnamese-Japanese translation

    Authors: Hoang Hai Phan, Nguyen Duc Minh Vu, Nam Dang Phuong

    Abstract: Neural Machine Translation (NMT) driven by Transformer architectures has advanced significantly, yet faces challenges with low-resource language pairs like Vietnamese-Japanese (Vi-Ja). Issues include sparse parallel data and handling linguistic/cultural nuances. Recent progress in Large Language Models (LLMs) with strong reasoning, often refined via Reinforcement Learning (RL), enables high-qualit… ▽ More

    Submitted 31 March, 2025; originally announced April 2025.

  14. arXiv:2503.18769  [pdf, other

    cs.CL cs.RO

    AlphaSpace: Enabling Robotic Actions through Semantic Tokenization and Symbolic Reasoning

    Authors: Alan Dao, Dinh Bach Vu, Bui Quang Huy

    Abstract: This paper presents AlphaSpace, a novel methodology designed to enhance the spatial reasoning capabilities of language models for robotic manipulation in 3D Cartesian space. AlphaSpace employs a hierarchical semantics-based tokenization strategy that encodes spatial information at both coarse and fine-grained levels. Our approach represents objects with their attributes, positions, and height info… ▽ More

    Submitted 27 March, 2025; v1 submitted 24 March, 2025; originally announced March 2025.

  15. arXiv:2503.16286  [pdf, other

    cs.LG

    Explainable Graph-theoretical Machine Learning: with Application to Alzheimer's Disease Prediction

    Authors: Narmina Baghirova, Duy-Thanh Vũ, Duy-Cat Can, Christelle Schneuwly Diaz, Julien Bodlet, Guillaume Blanc, Georgi Hrusanov, Bernard Ries, Oliver Y. Chén

    Abstract: Alzheimer's disease (AD) affects 50 million people worldwide and is projected to overwhelm 152 million by 2050. AD is characterized by cognitive decline due partly to disruptions in metabolic brain connectivity. Thus, early and accurate detection of metabolic brain network impairments is crucial for AD management. Chief to identifying such impairments is FDG-PET data. Despite advancements, most gr… ▽ More

    Submitted 20 March, 2025; originally announced March 2025.

  16. arXiv:2503.11282  [pdf, other

    cs.LG q-bio.NC

    OPTIMUS: Predicting Multivariate Outcomes in Alzheimer's Disease Using Multi-modal Data amidst Missing Values

    Authors: Christelle Schneuwly Diaz, Duy-Thanh Vu, Julien Bodelet, Duy-Cat Can, Guillaume Blanc, Haiting Jiang, Lin Yao, Guiseppe Pantaleo, ADNI, Oliver Y. Chén

    Abstract: Alzheimer's disease, a neurodegenerative disorder, is associated with neural, genetic, and proteomic factors while affecting multiple cognitive and behavioral faculties. Traditional AD prediction largely focuses on univariate disease outcomes, such as disease stages and severity. Multimodal data encode broader disease information than a single modality and may, therefore, improve disease predictio… ▽ More

    Submitted 14 March, 2025; originally announced March 2025.

  17. arXiv:2503.07111  [pdf, other

    cs.RO cs.CL

    PoseLess: Depth-Free Vision-to-Joint Control via Direct Image Mapping with VLM

    Authors: Alan Dao, Dinh Bach Vu, Tuan Le Duc Anh, Bui Quang Huy

    Abstract: This paper introduces PoseLess, a novel framework for robot hand control that eliminates the need for explicit pose estimation by directly mapping 2D images to joint angles using projected representations. Our approach leverages synthetic training data generated through randomized joint configurations, enabling zero-shot generalization to real-world scenarios and cross-morphology transfer from rob… ▽ More

    Submitted 10 March, 2025; v1 submitted 10 March, 2025; originally announced March 2025.

  18. arXiv:2503.04790  [pdf, other

    cs.CL cs.AI

    SuperRAG: Beyond RAG with Layout-Aware Graph Modeling

    Authors: Jeff Yang, Duy-Khanh Vu, Minh-Tien Nguyen, Xuan-Quang Nguyen, Linh Nguyen, Hung Le

    Abstract: This paper introduces layout-aware graph modeling for multimodal RAG. Different from traditional RAG methods that mostly deal with flat text chunks, the proposed method takes into account the relationship of multimodalities by using a graph structure. To do that, a graph modeling structure is defined based on document layout parsing. The structure of an input document is retained with the connecti… ▽ More

    Submitted 28 February, 2025; originally announced March 2025.

    Comments: NAACL 2025, Industry Track

  19. arXiv:2502.17909  [pdf, other

    cs.HC cs.AI

    FactFlow: Automatic Fact Sheet Generation and Customization from Tabular Dataset via AI Chain Design & Implementation

    Authors: Minh Duc Vu, Jieshan Chen, Zhenchang Xing, Qinghua Lu, Xiwei Xu, Qian Fu

    Abstract: With the proliferation of data across various domains, there is a critical demand for tools that enable non-experts to derive meaningful insights without deep data analysis skills. To address this need, existing automatic fact sheet generation tools offer heuristic-based solutions to extract facts and generate stories. However, they inadequately grasp the semantics of data and struggle to generate… ▽ More

    Submitted 25 February, 2025; originally announced February 2025.

    Comments: 11 pages, 6 figures

    ACM Class: I.2; H.4

  20. arXiv:2502.16747  [pdf, other

    cs.CL cs.AI cs.LG cs.SE

    SQLong: Enhanced NL2SQL for Longer Contexts with LLMs

    Authors: Dai Quoc Nguyen, Cong Duy Vu Hoang, Duy Vu, Gioacchino Tangari, Thanh Tien Vu, Don Dharmasiri, Yuan-Fang Li, Long Duong

    Abstract: Open-weight large language models (LLMs) have significantly advanced performance in the Natural Language to SQL (NL2SQL) task. However, their effectiveness diminishes when dealing with large database schemas, as the context length increases. To address this limitation, we present SQLong, a novel and efficient data augmentation framework designed to enhance LLM performance in long-context scenarios… ▽ More

    Submitted 20 May, 2025; v1 submitted 23 February, 2025; originally announced February 2025.

    Comments: Accepted to Table Representation Learning Workshop at ACL 2025

  21. arXiv:2502.14669  [pdf, other

    cs.CL

    AlphaMaze: Enhancing Large Language Models' Spatial Intelligence via GRPO

    Authors: Alan Dao, Dinh Bach Vu

    Abstract: Large Language Models (LLMs) have demonstrated impressive capabilities in language processing, yet they often struggle with tasks requiring genuine visual spatial reasoning. In this paper, we introduce a novel two-stage training framework designed to equip standard LLMs with visual reasoning abilities for maze navigation. First, we leverage Supervised Fine Tuning (SFT) on a curated dataset of toke… ▽ More

    Submitted 25 February, 2025; v1 submitted 20 February, 2025; originally announced February 2025.

  22. arXiv:2502.12591  [pdf, other

    cs.CV cs.CL

    CutPaste&Find: Efficient Multimodal Hallucination Detector with Visual-aid Knowledge Base

    Authors: Cong-Duy Nguyen, Xiaobao Wu, Duc Anh Vu, Shuai Zhao, Thong Nguyen, Anh Tuan Luu

    Abstract: Large Vision-Language Models (LVLMs) have demonstrated impressive multimodal reasoning capabilities, but they remain susceptible to hallucination, particularly object hallucination where non-existent objects or incorrect attributes are fabricated in generated descriptions. Existing detection methods achieve strong performance but rely heavily on expensive API calls and iterative LVLM-based validat… ▽ More

    Submitted 18 February, 2025; originally announced February 2025.

  23. arXiv:2501.07192  [pdf, other

    cs.CR cs.CV

    A4O: All Trigger for One sample

    Authors: Duc Anh Vu, Anh Tuan Tran, Cong Tran, Cuong Pham

    Abstract: Backdoor attacks have become a critical threat to deep neural networks (DNNs), drawing many research interests. However, most of the studied attacks employ a single type of trigger. Consequently, proposed backdoor defenders often rely on the assumption that triggers would appear in a unified way. In this paper, we show that this naive assumption can create a loophole, allowing more sophisticated b… ▽ More

    Submitted 13 January, 2025; originally announced January 2025.

  24. arXiv:2411.18126  [pdf, other

    cs.CL

    Curriculum Demonstration Selection for In-Context Learning

    Authors: Duc Anh Vu, Nguyen Tran Cong Duy, Xiaobao Wu, Hoang Minh Nhat, Du Mingzhe, Nguyen Thanh Thong, Anh Tuan Luu

    Abstract: Large Language Models (LLMs) have shown strong in-context learning (ICL) abilities with a few demonstrations. However, one critical challenge is how to select demonstrations to elicit the full potential of LLMs. In this paper, we propose Curriculum Demonstration Selection (CDS), a novel demonstration selection method for ICL. Instead of merely using similarity, CDS additionally partitions samples… ▽ More

    Submitted 15 December, 2024; v1 submitted 27 November, 2024; originally announced November 2024.

    Comments: Accepted at the 40th ACM/SIGAPP Symposium On Applied Computing (SAC 2025), Main Conference

  25. arXiv:2411.11017  [pdf, other

    cs.CR cs.SE

    A Study of Malware Prevention in Linux Distributions

    Authors: Duc-Ly Vu, Trevor Dunlap, Karla Obermeier-Velazquez, Paul Gibert, John Speed Meyers, Santiago Torres-Arias

    Abstract: Malicious attacks on open source software packages are a growing concern. This concern morphed into a panic-inducing crisis after the revelation of the XZ Utils backdoor, which would have provided the attacker with, according to one observer, a "skeleton key" to the internet. This study therefore explores the challenges of preventing and detecting malware in Linux distribution package repositories… ▽ More

    Submitted 25 November, 2024; v1 submitted 17 November, 2024; originally announced November 2024.

    Comments: 14 pages, 3 figures, 11 tables

  26. arXiv:2411.00005  [pdf, other

    cs.SE cs.AI

    Mastering the Craft of Data Synthesis for CodeLLMs

    Authors: Meng Chen, Philip Arthur, Qianyu Feng, Cong Duy Vu Hoang, Yu-Heng Hong, Mahdi Kazemi Moghaddam, Omid Nezami, Thien Nguyen, Gioacchino Tangari, Duy Vu, Thanh Vu, Mark Johnson, Krishnaram Kenthapadi, Don Dharmasiri, Long Duong, Yuan-Fang Li

    Abstract: Large language models (LLMs) have shown impressive performance in \emph{code} understanding and generation, making coding tasks a key focus for researchers due to their practical applications and value as a testbed for LLM evaluation. Data synthesis and filtering techniques have been widely adopted and shown to be highly effective in this context. In this paper, we present a focused survey and tax… ▽ More

    Submitted 7 February, 2025; v1 submitted 16 October, 2024; originally announced November 2024.

    Comments: Accepted at NAACL 2025

  27. arXiv:2410.17410  [pdf, other

    cs.SI cs.LG q-bio.NC stat.ML

    Learning Graph Filters for Structure-Function Coupling based Hub Node Identification

    Authors: Meiby Ortiz-Bouza, Duc Vu, Abdullah Karaaslanli, Selin Aviyente

    Abstract: Over the past two decades, tools from network science have been leveraged to characterize the organization of both structural and functional networks of the brain. One such measure of network organization is hub node identification. Hubs are specialized nodes within a network that link distinct brain units corresponding to specialized functional processes. Conventional methods for identifying hub… ▽ More

    Submitted 22 October, 2024; originally announced October 2024.

    Comments: 13 pages, 4 figures

  28. arXiv:2410.15316  [pdf, other

    cs.CL cs.SD eess.AS

    Ichigo: Mixed-Modal Early-Fusion Realtime Voice Assistant

    Authors: Alan Dao, Dinh Bach Vu, Huy Hoang Ha

    Abstract: Large Language Models (LLMs) have revolutionized natural language processing, but their application to speech-based tasks remains challenging due to the complexities of integrating audio and text modalities. This paper introduces Ichigo, a mixed-modal model that seamlessly processes interleaved sequences of speech and text. Utilizing a tokenized early-fusion approach, Ichigo quantizes speech into… ▽ More

    Submitted 4 April, 2025; v1 submitted 20 October, 2024; originally announced October 2024.

  29. arXiv:2410.01173  [pdf, other

    quant-ph cs.DS

    Low depth amplitude estimation without really trying

    Authors: Dinh-Long Vu, Bin Cheng, Patrick Rebentrost

    Abstract: Standard quantum amplitude estimation algorithms provide quadratic speedup to Monte-Carlo simulations but require a circuit depth that scales as inverse of the estimation error. In view of the shallow depth in near-term devices, the precision achieved by these algorithms would be low. In this paper we bypass this limitation by performing the classical Monte-Carlo method on the quantum algorithm it… ▽ More

    Submitted 1 October, 2024; originally announced October 2024.

    Comments: 31 pages, 7 figures, 2 tables

  30. arXiv:2408.14176  [pdf, other

    cs.CV cs.AI

    SwiftBrush v2: Make Your One-step Diffusion Model Better Than Its Teacher

    Authors: Trung Dao, Thuan Hoang Nguyen, Thanh Le, Duc Vu, Khoi Nguyen, Cuong Pham, Anh Tran

    Abstract: In this paper, we aim to enhance the performance of SwiftBrush, a prominent one-step text-to-image diffusion model, to be competitive with its multi-step Stable Diffusion counterpart. Initially, we explore the quality-diversity trade-off between SwiftBrush and SD Turbo: the former excels in image diversity, while the latter excels in image quality. This observation motivates our proposed modificat… ▽ More

    Submitted 27 August, 2024; v1 submitted 26 August, 2024; originally announced August 2024.

    Comments: Accepted to ECCV'24

  31. arXiv:2408.00122  [pdf, other

    cs.CL

    A Course Shared Task on Evaluating LLM Output for Clinical Questions

    Authors: Yufang Hou, Thy Thy Tran, Doan Nam Long Vu, Yiwen Cao, Kai Li, Lukas Rohde, Iryna Gurevych

    Abstract: This paper presents a shared task that we organized at the Foundations of Language Technology (FoLT) course in 2023/2024 at the Technical University of Darmstadt, which focuses on evaluating the output of Large Language Models (LLMs) in generating harmful answers to health-related clinical questions. We describe the task design considerations and report the feedback we received from the students.… ▽ More

    Submitted 31 July, 2024; originally announced August 2024.

    Comments: accepted at the sixth Workshop on Teaching NLP (co-located with ACL 2024)

  32. arXiv:2407.18789  [pdf, other

    cs.CL

    Granularity is crucial when applying differential privacy to text: An investigation for neural machine translation

    Authors: Doan Nam Long Vu, Timour Igamberdiev, Ivan Habernal

    Abstract: Applying differential privacy (DP) by means of the DP-SGD algorithm to protect individual data points during training is becoming increasingly popular in NLP. However, the choice of granularity at which DP is applied is often neglected. For example, neural machine translation (NMT) typically operates on the sentence-level granularity. From the perspective of DP, this setup assumes that each senten… ▽ More

    Submitted 26 September, 2024; v1 submitted 26 July, 2024; originally announced July 2024.

    Comments: Accepted at EMNLP Findings 2024

  33. arXiv:2405.18368  [pdf, other

    cs.CV

    The 2024 Brain Tumor Segmentation (BraTS) Challenge: Glioma Segmentation on Post-treatment MRI

    Authors: Maria Correia de Verdier, Rachit Saluja, Louis Gagnon, Dominic LaBella, Ujjwall Baid, Nourel Hoda Tahon, Martha Foltyn-Dumitru, Jikai Zhang, Maram Alafif, Saif Baig, Ken Chang, Gennaro D'Anna, Lisa Deptula, Diviya Gupta, Muhammad Ammar Haider, Ali Hussain, Michael Iv, Marinos Kontzialis, Paul Manning, Farzan Moodi, Teresa Nunes, Aaron Simon, Nico Sollmann, David Vu, Maruf Adewole , et al. (60 additional authors not shown)

    Abstract: Gliomas are the most common malignant primary brain tumors in adults and one of the deadliest types of cancer. There are many challenges in treatment and monitoring due to the genetic diversity and high intrinsic heterogeneity in appearance, shape, histology, and treatment response. Treatments include surgery, radiation, and systemic therapies, with magnetic resonance imaging (MRI) playing a key r… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: 10 pages, 4 figures, 1 table

  34. arXiv:2403.16685  [pdf, other

    cs.CL cs.CY

    ToXCL: A Unified Framework for Toxic Speech Detection and Explanation

    Authors: Nhat M. Hoang, Xuan Long Do, Duc Anh Do, Duc Anh Vu, Luu Anh Tuan

    Abstract: The proliferation of online toxic speech is a pertinent problem posing threats to demographic groups. While explicit toxic speech contains offensive lexical signals, implicit one consists of coded or indirect language. Therefore, it is crucial for models not only to detect implicit toxic speech but also to explain its toxicity. This draws a unique need for unified frameworks that can effectively d… ▽ More

    Submitted 20 May, 2024; v1 submitted 25 March, 2024; originally announced March 2024.

    Comments: Accepted at NAACL 2024 (Main Conference)

  35. arXiv:2403.09302  [pdf, other

    eess.IV cs.CV cs.LG

    StainFuser: Controlling Diffusion for Faster Neural Style Transfer in Multi-Gigapixel Histology Images

    Authors: Robert Jewsbury, Ruoyu Wang, Abhir Bhalerao, Nasir Rajpoot, Quoc Dang Vu

    Abstract: Stain normalization algorithms aim to transform the color and intensity characteristics of a source multi-gigapixel histology image to match those of a target image, mitigating inconsistencies in the appearance of stains used to highlight cellular components in the images. We propose a new approach, StainFuser, which treats this problem as a style transfer task using a novel Conditional Latent Dif… ▽ More

    Submitted 12 July, 2024; v1 submitted 14 March, 2024; originally announced March 2024.

  36. arXiv:2402.19296  [pdf

    cs.CV cs.LG

    An AI based Digital Score of Tumour-Immune Microenvironment Predicts Benefit to Maintenance Immunotherapy in Advanced Oesophagogastric Adenocarcinoma

    Authors: Quoc Dang Vu, Caroline Fong, Anderley Gordon, Tom Lund, Tatiany L Silveira, Daniel Rodrigues, Katharina von Loga, Shan E Ahmed Raza, David Cunningham, Nasir Rajpoot

    Abstract: Gastric and oesophageal (OG) cancers are the leading causes of cancer mortality worldwide. In OG cancers, recent studies have showed that PDL1 immune checkpoint inhibitors (ICI) in combination with chemotherapy improves patient survival. However, our understanding of the tumour immune microenvironment in OG cancers remains limited. In this study, we interrogate multiplex immunofluorescence (mIF) i… ▽ More

    Submitted 29 February, 2024; originally announced February 2024.

  37. GPTVoiceTasker: Advancing Multi-step Mobile Task Efficiency Through Dynamic Interface Exploration and Learning

    Authors: Minh Duc Vu, Han Wang, Zhuang Li, Jieshan Chen, Shengdong Zhao, Zhenchang Xing, Chunyang Chen

    Abstract: Virtual assistants have the potential to play an important role in helping users achieves different tasks. However, these systems face challenges in their real-world usability, characterized by inefficiency and struggles in grasping user intentions. Leveraging recent advances in Large Language Models (LLMs), we introduce GptVoiceTasker, a virtual assistant poised to enhance user experiences and ta… ▽ More

    Submitted 13 August, 2024; v1 submitted 25 January, 2024; originally announced January 2024.

    Comments: This paper has been accepted by UIST 2024

  38. arXiv:2312.17205  [pdf, other

    cs.CV

    EFHQ: Multi-purpose ExtremePose-Face-HQ dataset

    Authors: Trung Tuan Dao, Duc Hong Vu, Cuong Pham, Anh Tran

    Abstract: The existing facial datasets, while having plentiful images at near frontal views, lack images with extreme head poses, leading to the downgraded performance of deep learning models when dealing with profile or pitched faces. This work aims to address this gap by introducing a novel dataset named Extreme Pose Face High-Quality Dataset (EFHQ), which includes a maximum of 450k high-quality images of… ▽ More

    Submitted 11 April, 2024; v1 submitted 28 December, 2023; originally announced December 2023.

    Comments: Project Page: https://bomcon123456.github.io/efhq/

  39. arXiv:2312.09982  [pdf, other

    cs.PL cs.AI cs.LG cs.PF

    ACPO: AI-Enabled Compiler Framework

    Authors: Amir H. Ashouri, Muhammad Asif Manzoor, Duc Minh Vu, Raymond Zhang, Colin Toft, Ziwen Wang, Angel Zhang, Bryan Chan, Tomasz S. Czajkowski, Yaoqing Gao

    Abstract: The key to performance optimization of a program is to decide correctly when a certain transformation should be applied by a compiler. This is an ideal opportunity to apply machine-learning models to speed up the tuning process; while this realization has been around since the late 90s, only recent advancements in ML enabled a practical application of ML to compilers as an end-to-end framework.… ▽ More

    Submitted 13 January, 2025; v1 submitted 15 December, 2023; originally announced December 2023.

    Comments: ACPO (12 pages)

    ACM Class: I.2.5; D.3.0; I.2.6

  40. arXiv:2312.02227  [pdf, other

    cs.LG cs.CL

    Improving Multimodal Sentiment Analysis: Supervised Angular Margin-based Contrastive Learning for Enhanced Fusion Representation

    Authors: Cong-Duy Nguyen, Thong Nguyen, Duc Anh Vu, Luu Anh Tuan

    Abstract: The effectiveness of a model is heavily reliant on the quality of the fusion representation of multiple modalities in multimodal sentiment analysis. Moreover, each modality is extracted from raw input and integrated with the rest to construct a multimodal representation. Although previous methods have proposed multimodal representations and achieved promising results, most of them focus on forming… ▽ More

    Submitted 3 December, 2023; originally announced December 2023.

  41. arXiv:2312.01661  [pdf, other

    cs.CL cs.AI

    ChatGPT as a Math Questioner? Evaluating ChatGPT on Generating Pre-university Math Questions

    Authors: Phuoc Pham Van Long, Duc Anh Vu, Nhat M. Hoang, Xuan Long Do, Anh Tuan Luu

    Abstract: Mathematical questioning is crucial for assessing students problem-solving skills. Since manually creating such questions requires substantial effort, automatic methods have been explored. Existing state-of-the-art models rely on fine-tuning strategies and struggle to generate questions that heavily involve multiple steps of logical and arithmetic reasoning. Meanwhile, large language models(LLMs)… ▽ More

    Submitted 27 February, 2024; v1 submitted 4 December, 2023; originally announced December 2023.

    Comments: Accepted at the 39th ACM/SIGAPP Symposium On Applied Computing (SAC 2024), Main Conference

  42. arXiv:2311.14465  [pdf, other

    cs.CL

    DP-NMT: Scalable Differentially-Private Machine Translation

    Authors: Timour Igamberdiev, Doan Nam Long Vu, Felix Künnecke, Zhuo Yu, Jannik Holmer, Ivan Habernal

    Abstract: Neural machine translation (NMT) is a widely popular text generation task, yet there is a considerable research gap in the development of privacy-preserving NMT models, despite significant data privacy concerns for NMT systems. Differentially private stochastic gradient descent (DP-SGD) is a popular method for training machine learning models with concrete privacy guarantees; however, the implemen… ▽ More

    Submitted 24 April, 2024; v1 submitted 24 November, 2023; originally announced November 2023.

    Comments: Accepted at EACL 2024

  43. arXiv:2311.08834  [pdf, ps, other

    cs.AI

    A* search algorithm for an optimal investment problem in vehicle-sharing systems

    Authors: Ba Luat Le, Layla Martin, Emrah Demir, Duc Minh Vu

    Abstract: We study an optimal investment problem that arises in the context of the vehicle-sharing system. Given a set of locations to build stations, we need to determine i) the sequence of stations to be built and the number of vehicles to acquire in order to obtain the target state where all stations are built, and ii) the number of vehicles to acquire and their allocation in order to maximize the total… ▽ More

    Submitted 15 November, 2023; originally announced November 2023.

    Comments: Full version of the conference paper which is accepted to be appear in the proceeding of the The 12th International Conference on Computational Data and Social Networks - SCONET2023

  44. arXiv:2311.08111  [pdf, ps, other

    cs.DM

    Solving Time-Dependent Traveling Salesman Problem with Time Windows under Generic Time-Dependent Travel Cost

    Authors: Duc Minh Vu, Mike Hewitt, Duc Duy Vu

    Abstract: In this paper, we present formulations and an exact method to solve the Time Dependent Traveling Salesman Problem with Time Window (TD-TSPTW) under a generic travel cost function where waiting is allowed. A particular case in which the travel cost is a non-decreasing function has been addressed recently. With that assumption, because of both the First-In-First-Out property of the travel time funct… ▽ More

    Submitted 14 November, 2023; originally announced November 2023.

    Comments: Full version with Appendix - accepted to appear in the proceeding of SCONET2003 conference - https://csonet-conf.github.io/csonet23/index.html

  45. arXiv:2309.01656  [pdf, other

    cs.CV

    Building Footprint Extraction in Dense Areas using Super Resolution and Frame Field Learning

    Authors: Vuong Nguyen, Anh Ho, Duc-Anh Vu, Nguyen Thi Ngoc Anh, Tran Ngoc Thang

    Abstract: Despite notable results on standard aerial datasets, current state-of-the-arts fail to produce accurate building footprints in dense areas due to challenging properties posed by these areas and limited data availability. In this paper, we propose a framework to address such issues in polygonal building extraction. First, super resolution is employed to enhance the spatial resolution of aerial imag… ▽ More

    Submitted 4 September, 2023; originally announced September 2023.

    Comments: Accepted at The 12th International Conference on Awareness Science and Technology

  46. arXiv:2308.16139  [pdf, other

    cs.CV cs.DB cs.LG

    MedShapeNet -- A Large-Scale Dataset of 3D Medical Shapes for Computer Vision

    Authors: Jianning Li, Zongwei Zhou, Jiancheng Yang, Antonio Pepe, Christina Gsaxner, Gijs Luijten, Chongyu Qu, Tiezheng Zhang, Xiaoxi Chen, Wenxuan Li, Marek Wodzinski, Paul Friedrich, Kangxian Xie, Yuan Jin, Narmada Ambigapathy, Enrico Nasca, Naida Solak, Gian Marco Melito, Viet Duc Vu, Afaque R. Memon, Christopher Schlachta, Sandrine De Ribaupierre, Rajnikant Patel, Roy Eagleson, Xiaojun Chen , et al. (132 additional authors not shown)

    Abstract: Prior to the deep learning era, shape was commonly used to describe the objects. Nowadays, state-of-the-art (SOTA) algorithms in medical imaging are predominantly diverging from computer vision, where voxel grids, meshes, point clouds, and implicit surface models are used. This is seen from numerous shape-related publications in premier vision conferences as well as the growing popularity of Shape… ▽ More

    Submitted 12 December, 2023; v1 submitted 30 August, 2023; originally announced August 2023.

    Comments: 16 pages

    MSC Class: 68T01

  47. Capsule network with shortcut routing

    Authors: Dang Thanh Vu, Vo Hoang Trong, Yu Gwang-Hyun, Kim Jin-Young

    Abstract: This study introduces "shortcut routing," a novel routing mechanism in capsule networks that addresses computational inefficiencies by directly activating global capsules from local capsules, eliminating intermediate layers. An attention-based approach with fuzzy coefficients is also explored for improved efficiency. Experimental results on Mnist, smallnorb, and affNist datasets show comparable cl… ▽ More

    Submitted 14 July, 2023; originally announced July 2023.

    Comments: 8 pages, published at IEICE Transactions on Fundamentals of Electronics Communications and Computer Sciences E104.A(8)

  48. arXiv:2305.08409  [pdf, other

    cs.DC

    Validity Constraints for Data Analysis Workflows

    Authors: Florian Schintke, Ninon De Mecquenem, David Frantz, Vanessa Emanuela Guarino, Marcus Hilbrich, Fabian Lehmann, Rebecca Sattler, Jan Arne Sparka, Daniel Speckhard, Hermann Stolte, Anh Duc Vu, Ulf Leser

    Abstract: Porting a scientific data analysis workflow (DAW) to a cluster infrastructure, a new software stack, or even only a new dataset with some notably different properties is often challenging. Despite the structured definition of the steps (tasks) and their interdependencies during a complex data analysis in the DAW specification, relevant assumptions may remain unspecified and implicit. Such hidden a… ▽ More

    Submitted 15 May, 2023; originally announced May 2023.

  49. Voicify Your UI: Towards Android App Control with Voice Commands

    Authors: Minh Duc Vu, Han Wang, Zhuang Li, Gholamreza Haffari, Zhenchang Xing, Chunyang Chen

    Abstract: Nowadays, voice assistants help users complete tasks on the smartphone with voice commands, replacing traditional touchscreen interactions when such interactions are inhibited. However, the usability of those tools remains moderate due to the problems in understanding rich language variations in human commands, along with efficiency and comprehensibility issues. Therefore, we introduce Voicify, an… ▽ More

    Submitted 9 May, 2023; originally announced May 2023.

    Journal ref: Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 7, no. 1 (2023): 1-22

  50. arXiv:2303.06274  [pdf

    cs.CV cs.LG

    CoNIC Challenge: Pushing the Frontiers of Nuclear Detection, Segmentation, Classification and Counting

    Authors: Simon Graham, Quoc Dang Vu, Mostafa Jahanifar, Martin Weigert, Uwe Schmidt, Wenhua Zhang, Jun Zhang, Sen Yang, Jinxi Xiang, Xiyue Wang, Josef Lorenz Rumberger, Elias Baumann, Peter Hirsch, Lihao Liu, Chenyang Hong, Angelica I. Aviles-Rivero, Ayushi Jain, Heeyoung Ahn, Yiyu Hong, Hussam Azzuni, Min Xu, Mohammad Yaqub, Marie-Claire Blache, Benoît Piégu, Bertrand Vernay , et al. (64 additional authors not shown)

    Abstract: Nuclear detection, segmentation and morphometric profiling are essential in helping us further understand the relationship between histology and patient outcome. To drive innovation in this area, we setup a community-wide challenge using the largest available dataset of its kind to assess nuclear segmentation and cellular composition. Our challenge, named CoNIC, stimulated the development of repro… ▽ More

    Submitted 14 March, 2023; v1 submitted 10 March, 2023; originally announced March 2023.