Skip to main content

Showing 1–50 of 113 results for author: Pham, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.12525  [pdf, ps, other

    cs.RO

    A Spatial Relationship Aware Dataset for Robotics

    Authors: Peng Wang, Minh Huy Pham, Zhihao Guo, Wei Zhou

    Abstract: Robotic task planning in real-world environments requires not only object recognition but also a nuanced understanding of spatial relationships between objects. We present a spatial-relationship-aware dataset of nearly 1,000 robot-acquired indoor images, annotated with object attributes, positions, and detailed spatial relationships. Captured using a Boston Dynamics Spot robot and labelled with a… ▽ More

    Submitted 14 June, 2025; originally announced June 2025.

    Comments: 7 pages; 7 figures, 1 table

  2. arXiv:2505.22945  [pdf, other

    cs.CL cs.AI

    OWL: Probing Cross-Lingual Recall of Memorized Texts via World Literature

    Authors: Alisha Srivastava, Emir Korukluoglu, Minh Nhat Le, Duyen Tran, Chau Minh Pham, Marzena Karpinska, Mohit Iyyer

    Abstract: Large language models (LLMs) are known to memorize and recall English text from their pretraining data. However, the extent to which this ability generalizes to non-English languages or transfers across languages remains unclear. This paper investigates multilingual and cross-lingual memorization in LLMs, probing if memorized content in one language (e.g., English) can be recalled when presented i… ▽ More

    Submitted 28 May, 2025; originally announced May 2025.

    Comments: preprint, 25 pages

  3. arXiv:2505.22024  [pdf, ps, other

    cs.SD cs.CV eess.AS

    RESOUND: Speech Reconstruction from Silent Videos via Acoustic-Semantic Decomposed Modeling

    Authors: Long-Khanh Pham, Thanh V. T. Tran, Minh-Tan Pham, Van Nguyen

    Abstract: Lip-to-speech (L2S) synthesis, which reconstructs speech from visual cues, faces challenges in accuracy and naturalness due to limited supervision in capturing linguistic content, accents, and prosody. In this paper, we propose RESOUND, a novel L2S system that generates intelligible and expressive speech from silent talking face videos. Leveraging source-filter theory, our method involves two comp… ▽ More

    Submitted 28 May, 2025; originally announced May 2025.

    Comments: accepted in Interspeech 2025

  4. arXiv:2505.21954  [pdf, ps, other

    cs.CV cs.AI

    UniTalk: Towards Universal Active Speaker Detection in Real World Scenarios

    Authors: Le Thien Phuc Nguyen, Zhuoran Yu, Khoa Quang Nhat Cao, Yuwei Guo, Tu Ho Manh Pham, Tuan Tai Nguyen, Toan Ngo Duc Vo, Lucas Poon, Soochahn Lee, Yong Jae Lee

    Abstract: We present UniTalk, a novel dataset specifically designed for the task of active speaker detection, emphasizing challenging scenarios to enhance model generalization. Unlike previously established benchmarks such as AVA, which predominantly features old movies and thus exhibits significant domain gaps, UniTalk focuses explicitly on diverse and difficult real-world conditions. These include underre… ▽ More

    Submitted 28 May, 2025; originally announced May 2025.

  5. arXiv:2505.18128  [pdf, ps, other

    cs.CL

    Frankentext: Stitching random text fragments into long-form narratives

    Authors: Chau Minh Pham, Jenna Russell, Dzung Pham, Mohit Iyyer

    Abstract: We introduce Frankentexts, a new type of long-form narratives produced by LLMs under the extreme constraint that most tokens (e.g., 90%) must be copied verbatim from human writings. This task presents a challenging test of controllable generation, requiring models to satisfy a writing prompt, integrate disparate text fragments, and still produce a coherent narrative. To generate Frankentexts, we i… ▽ More

    Submitted 28 May, 2025; v1 submitted 23 May, 2025; originally announced May 2025.

  6. arXiv:2505.17013  [pdf, ps, other

    cs.LG cs.CV

    When Are Concepts Erased From Diffusion Models?

    Authors: Kevin Lu, Nicky Kriplani, Rohit Gandikota, Minh Pham, David Bau, Chinmay Hegde, Niv Cohen

    Abstract: Concept erasure, the ability to selectively prevent a model from generating specific concepts, has attracted growing interest, with various approaches emerging to address the challenge. However, it remains unclear how thoroughly these methods erase the target concept. We begin by proposing two conceptual models for the erasure mechanism in diffusion models: (i) reducing the likelihood of generatin… ▽ More

    Submitted 30 May, 2025; v1 submitted 22 May, 2025; originally announced May 2025.

    Comments: Project Page: https://nyu-dice-lab.github.io/when-are-concepts-erased/

  7. arXiv:2505.14549  [pdf, ps, other

    cs.CR cs.AI

    Can Large Language Models Really Recognize Your Name?

    Authors: Dzung Pham, Peter Kairouz, Niloofar Mireshghallah, Eugene Bagdasarian, Chau Minh Pham, Amir Houmansadr

    Abstract: Large language models (LLMs) are increasingly being used to protect sensitive user data. However, current LLM-based privacy solutions assume that these models can reliably detect personally identifiable information (PII), particularly named entities. In this paper, we challenge that assumption by revealing systematic failures in LLM-based privacy tasks. Specifically, we show that modern LLMs regul… ▽ More

    Submitted 20 May, 2025; originally announced May 2025.

  8. arXiv:2505.03299  [pdf, ps, other

    cs.CV cs.AI

    Towards Efficient Benchmarking of Foundation Models in Remote Sensing: A Capabilities Encoding Approach

    Authors: Pierre Adorni, Minh-Tan Pham, Stéphane May, Sébastien Lefèvre

    Abstract: Foundation models constitute a significant advancement in computer vision: after a single, albeit costly, training phase, they can address a wide array of tasks. In the field of Earth observation, over 75 remote sensing vision foundation models have been developed in the past four years. However, none has consistently outperformed the others across all available downstream tasks. To facilitate the… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

    Comments: Accepted at the MORSE workshop of CVPR 2025

  9. arXiv:2505.00831  [pdf, other

    cs.RO cs.CL

    SmallPlan: Leverage Small Language Models for Sequential Path Planning with Simulation-Powered, LLM-Guided Distillation

    Authors: Quang P. M. Pham, Khoi T. N. Nguyen, Nhi H. Doan, Cuong A. Pham, Kentaro Inui, Dezhen Song

    Abstract: Efficient path planning in robotics, particularly within large-scale, dynamic environments, remains a significant hurdle. While Large Language Models (LLMs) offer strong reasoning capabilities, their high computational cost and limited adaptability in dynamic scenarios hinder real-time deployment on edge devices. We present SmallPlan -- a novel framework leveraging LLMs as teacher models to train… ▽ More

    Submitted 11 May, 2025; v1 submitted 1 May, 2025; originally announced May 2025.

    Comments: Paper is under review

  10. arXiv:2504.14757  [pdf, other

    cs.SE cs.AI

    SWE-Synth: Synthesizing Verifiable Bug-Fix Data to Enable Large Language Models in Resolving Real-World Bugs

    Authors: Minh V. T. Pham, Huy N. Phan, Hoang N. Phan, Cuong Le Chi, Tien N. Nguyen, Nghi D. Q. Bui

    Abstract: Large language models (LLMs) are transforming automated program repair (APR) through agent-based approaches that localize bugs, generate patches, and verify fixes. However, the lack of high-quality, scalable training datasets, especially those with verifiable outputs and intermediate reasoning traces-limits progress, particularly for open-source models. In this work, we present SWE-Synth, a framew… ▽ More

    Submitted 20 April, 2025; originally announced April 2025.

    Comments: Work in progress

  11. arXiv:2503.11508  [pdf, other

    cs.CR

    Leveraging Angle of Arrival Estimation against Impersonation Attacks in Physical Layer Authentication

    Authors: Thuy M. Pham, Linda Senigagliesi, Marco Baldi, Rafael F. Schaefer, Gerhard P. Fettweis, Arsenia Chorti

    Abstract: In this paper, we investigate the utilization of the angle of arrival (AoA) as a feature for robust physical layer authentication (PLA). While most of the existing approaches to PLA focus on common features of the physical layer of communication channels, such as channel frequency response, channel impulse response or received signal strength, the use of AoA in this domain has not yet been studied… ▽ More

    Submitted 14 March, 2025; originally announced March 2025.

    Comments: 11 pages, 8 figures, submitted to IEEE Transactions on Information Forensics and Security

  12. arXiv:2503.07919  [pdf, other

    cs.AI cs.CL cs.LG

    BEARCUBS: A benchmark for computer-using web agents

    Authors: Yixiao Song, Katherine Thai, Chau Minh Pham, Yapei Chang, Mazin Nadaf, Mohit Iyyer

    Abstract: Modern web agents possess computer use abilities that allow them to interact with webpages by sending commands to a virtual keyboard and mouse. While such agents have considerable potential to assist human users with complex tasks, evaluating their capabilities in real-world settings poses a major challenge. To this end, we introduce BEARCUBS, a "small but mighty" benchmark of 111 information-seek… ▽ More

    Submitted 10 March, 2025; originally announced March 2025.

    Comments: 16 pages

  13. arXiv:2503.00592  [pdf, other

    cs.LG

    SolidMark: Evaluating Image Memorization in Generative Models

    Authors: Nicky Kriplani, Minh Pham, Gowthami Somepalli, Chinmay Hegde, Niv Cohen

    Abstract: Recent works have shown that diffusion models are able to memorize training images and emit them at generation time. However, the metrics used to evaluate memorization and its mitigation techniques suffer from dataset-dependent biases and struggle to detect whether a given specific image has been memorized or not. This paper begins with a comprehensive exploration of issues surrounding memorizat… ▽ More

    Submitted 1 March, 2025; originally announced March 2025.

  14. arXiv:2502.14854  [pdf, other

    cs.CL

    CLIPPER: Compression enables long-context synthetic data generation

    Authors: Chau Minh Pham, Yapei Chang, Mohit Iyyer

    Abstract: LLM developers are increasingly reliant on synthetic data, but generating high-quality data for complex long-context reasoning tasks remains challenging. We introduce CLIPPER, a compression-based approach for generating synthetic data tailored to narrative claim verification - a task that requires reasoning over a book to verify a given claim. Instead of generating claims directly from the raw tex… ▽ More

    Submitted 20 February, 2025; originally announced February 2025.

  15. arXiv:2502.13028  [pdf, other

    cs.CL

    Whose story is it? Personalizing story generation by inferring author styles

    Authors: Nischal Ashok Kumar, Chau Minh Pham, Mohit Iyyer, Andrew Lan

    Abstract: Personalization is critical for improving user experience in interactive writing and educational applications, yet remains understudied in story generation. We study the task of personalizing story generation, where our goal is to mimic an author's writing style, given other stories written by them. We collect Mythos, a dataset of 3.6k stories from 112 authors, with an average of 16 stories per au… ▽ More

    Submitted 21 May, 2025; v1 submitted 18 February, 2025; originally announced February 2025.

    Comments: preprint:55 pages

  16. arXiv:2501.19219  [pdf, other

    cs.GT

    Advancing Differentiable Economics: A Neural Network Framework for Revenue-Maximizing Combinatorial Auction Mechanisms

    Authors: Mai Pham, Vikrant Vaze, Peter Chin

    Abstract: Differentiable economics, which uses neural networks as function approximators and gradient-based optimization in automated mechanism design (AMD), marked a significant breakthrough with the introduction of RegretNet \citep{regretnet_paper}. It combines the flexibility of deep learning with a regret-based approach to relax incentive compatibility, allowing for approximations of revenue-maximizing… ▽ More

    Submitted 31 January, 2025; originally announced January 2025.

  17. arXiv:2501.08335  [pdf, ps, other

    cs.CL cs.AI

    MERaLiON-TextLLM: Cross-Lingual Understanding of Large Language Models in Chinese, Indonesian, Malay, and Singlish

    Authors: Xin Huang, Tarun Kumar Vangani, Minh Duc Pham, Xunlong Zou, Bin Wang, Zhengyuan Liu, Ai Ti Aw

    Abstract: Multilingual large language models (MLLMs) have shown impressive capabilities across a variety of languages. However, efficacy can differ greatly between different language families, especially for those with limited linguistic resources. This report presents MERaLiON-TextLLM, a series of open-source language models specifically tailored to improve understanding and generation in Chinese, Indonesi… ▽ More

    Submitted 21 January, 2025; v1 submitted 21 December, 2024; originally announced January 2025.

  18. arXiv:2412.09829   

    cs.CY

    Speech-based Multimodel Pipeline for Vietnamese Services Quality Assessment

    Authors: Quang-Anh N. D., Minh-Duc Pham, Thai Kim Dinh

    Abstract: In the evolving landscape of customer service within the digital economy, traditional methods of service quality assessment have shown significant limitations, this research proposes a novel deep-learning approach to service quality assessment, focusing on the Vietnamese service sector. By leveraging a multi-modal pipeline that transcends traditional evaluation methods, the research addresses the… ▽ More

    Submitted 18 December, 2024; v1 submitted 12 December, 2024; originally announced December 2024.

    Comments: I am writing to request the withdrawal of my preprint due to the discovery of significant inaccuracies in the results. These errors could mislead future research and applications, which compromises the integrity of my work. I believe withdrawing the paper is essential to uphold scientific standards and prevent the dissemination of misleading information. Thank you for your understanding

  19. arXiv:2412.08683  [pdf, other

    cs.SD cs.CV eess.AS

    Emotional Vietnamese Speech-Based Depression Diagnosis Using Dynamic Attention Mechanism

    Authors: Quang-Anh N. D., Manh-Hung Ha, Thai Kim Dinh, Minh-Duc Pham, Ninh Nguyen Van

    Abstract: Major depressive disorder is a prevalent and serious mental health condition that negatively impacts your emotions, thoughts, actions, and overall perception of the world. It is complicated to determine whether a person is depressed due to the symptoms of depression not apparent. However, their voice can be one of the factor from which we can acknowledge signs of depression. People who are depress… ▽ More

    Submitted 11 December, 2024; originally announced December 2024.

    Comments: 9 Page, 5 Figures

  20. arXiv:2411.17536  [pdf, other

    cs.CV

    Box for Mask and Mask for Box: weak losses for multi-task partially supervised learning

    Authors: Hoàng-Ân Lê, Paul Berg, Minh-Tan Pham

    Abstract: Object detection and semantic segmentation are both scene understanding tasks yet they differ in data structure and information level. Object detection requires box coordinates for object instances while semantic segmentation requires pixel-wise class labels. Making use of one task's information to train the other would be beneficial for multi-task partially supervised learning where each training… ▽ More

    Submitted 26 November, 2024; originally announced November 2024.

    Comments: Accepted for publishing in BMVC 2024

  21. arXiv:2411.10509  [pdf, other

    cs.CV cs.LG

    TESGNN: Temporal Equivariant Scene Graph Neural Networks for Efficient and Robust Multi-View 3D Scene Understanding

    Authors: Quang P. M. Pham, Khoi T. N. Nguyen, Lan C. Ngo, Truong Do, Dezhen Song, Truong-Son Hy

    Abstract: Scene graphs have proven to be highly effective for various scene understanding tasks due to their compact and explicit representation of relational information. However, current methods often overlook the critical importance of preserving symmetry when generating scene graphs from 3D point clouds, which can lead to reduced accuracy and robustness, particularly when dealing with noisy, multi-view… ▽ More

    Submitted 2 March, 2025; v1 submitted 15 November, 2024; originally announced November 2024.

    Comments: arXiv admin note: text overlap with arXiv:2407.00609

  22. arXiv:2411.09944  [pdf, other

    cs.CL

    SlimLM: An Efficient Small Language Model for On-Device Document Assistance

    Authors: Thang M. Pham, Phat T. Nguyen, Seunghyun Yoon, Viet Dac Lai, Franck Dernoncourt, Trung Bui

    Abstract: While small language models (SLMs) show promises for mobile deployment, their real-world performance and applications on smartphones remains underexplored. We present SlimLM, a series of SLMs optimized for document assistance tasks on mobile devices. Through extensive experiments on a Samsung Galaxy S24, we identify the optimal trade-offs between model size (ranging from 125M to 7B parameters), co… ▽ More

    Submitted 25 November, 2024; v1 submitted 14 November, 2024; originally announced November 2024.

  23. arXiv:2411.00967  [pdf, other

    cs.CV cs.RO

    Raspberry PhenoSet: A Phenology-based Dataset for Automated Growth Detection and Yield Estimation

    Authors: Parham Jafary, Anna Bazangeya, Michelle Pham, Lesley G. Campbell, Sajad Saeedi, Kourosh Zareinia, Habiba Bougherara

    Abstract: The future of the agriculture industry is intertwined with automation. Accurate fruit detection, yield estimation, and harvest time estimation are crucial for optimizing agricultural practices. These tasks can be carried out by robots to reduce labour costs and improve the efficiency of the process. To do so, deep learning models should be trained to perform knowledge-based tasks, which outlines t… ▽ More

    Submitted 1 November, 2024; originally announced November 2024.

  24. arXiv:2410.18572  [pdf, other

    cs.CL cs.AI cs.LG

    Taipan: Efficient and Expressive State Space Language Models with Selective Attention

    Authors: Chien Van Nguyen, Huy Huu Nguyen, Thang M. Pham, Ruiyi Zhang, Hanieh Deilamsalehy, Puneet Mathur, Ryan A. Rossi, Trung Bui, Viet Dac Lai, Franck Dernoncourt, Thien Huu Nguyen

    Abstract: Efficient long-context language modeling remains a significant challenge in Natural Language Processing (NLP). While Transformers dominate language tasks, they struggle with long sequences due to quadratic computational complexity in training and linearly scaling memory costs during inference. Recent State Space Models (SSMs) such as Mamba offer alternatives with constant memory usage, but they un… ▽ More

    Submitted 24 October, 2024; originally announced October 2024.

  25. Scaling Analysis in a Multi-Energy System

    Authors: Jan Soeren Schwarz, Minh Cong Pham, Quoc Tuan Tran, Kai Heussen

    Abstract: This paper presents a scaling study on the planning phase of a multi-energy system (MES), which is becoming increasingly prominent in the energy sector. The research aims to investigate the interactions and challenges associated with integrating heat and electrical systems and scaling their components. In this context, interaction between these two domains are investigated and the size of the dist… ▽ More

    Submitted 23 October, 2024; originally announced October 2024.

    Comments: 6 pages, 9 figures, conference proceedings Asia Meeting on Environment and Electrical Engineering (EEE-AM) 2023

    Journal ref: 2023 Asia Meeting on Environment and Electrical Engineering (EEE-AM), Hanoi, Vietnam, 2023, pp. 01-06

  26. A Toolbox for Design of Experiments for Energy Systems in Co-Simulation and Hardware Tests

    Authors: Jan Sören Schwarz, Leonard Enrique Ramos Perez, Minh Cong Pham, Kai Heussen, Quoc Tuan Tran

    Abstract: In context of highly complex energy system experiments, sensitivity analysis is gaining more and more importance to investigate the effects changing parameterization has on the outcome. Thus, it is crucial how to design an experiment to efficiently use the available resources. This paper describes the functionality of a toolbox designed to support the users in design of experiment for (co-)simulat… ▽ More

    Submitted 22 October, 2024; originally announced October 2024.

    Comments: 7 pages, 6 figures, 2 tables, conference proceedings of OSMSES 2024

    Journal ref: 2024 Open Source Modelling and Simulation of Energy Systems (OSMSES), Vienna, Austria, 2024, pp. 1-7

  27. arXiv:2410.09190  [pdf, other

    cs.LG

    Time to Retrain? Detecting Concept Drifts in Machine Learning Systems

    Authors: Tri Minh Triet Pham, Karthikeyan Premkumar, Mohamed Naili, Jinqiu Yang

    Abstract: With the boom of machine learning (ML) techniques, software practitioners build ML systems to process the massive volume of streaming data for diverse software engineering tasks such as failure prediction in AIOps. Trained using historical data, such ML models encounter performance degradation caused by concept drift, i.e., data and inter-relationship (concept) changes between training and product… ▽ More

    Submitted 11 October, 2024; originally announced October 2024.

  28. arXiv:2410.02131  [pdf, other

    cs.LG cs.CL

    Boosting Masked ECG-Text Auto-Encoders as Discriminative Learners

    Authors: Hung Manh Pham, Aaqib Saeed, Dong Ma

    Abstract: The accurate interpretation of Electrocardiogram (ECG) signals is pivotal for diagnosing cardiovascular diseases. Integrating ECG signals with accompanying textual reports further holds immense potential to enhance clinical diagnostics by combining physiological data and qualitative insights. However, this integration faces significant challenges due to inherent modality disparities and the scarci… ▽ More

    Submitted 7 May, 2025; v1 submitted 2 October, 2024; originally announced October 2024.

    Comments: Accepted at ICML 2025

  29. arXiv:2409.16897  [pdf, other

    cs.CV

    HVT: A Comprehensive Vision Framework for Learning in Non-Euclidean Space

    Authors: Jacob Fein-Ashley, Ethan Feng, Minh Pham

    Abstract: Data representation in non-Euclidean spaces has proven effective for capturing hierarchical and complex relationships in real-world datasets. Hyperbolic spaces, in particular, provide efficient embeddings for hierarchical structures. This paper introduces the Hyperbolic Vision Transformer (HVT), a novel extension of the Vision Transformer (ViT) that integrates hyperbolic geometry. While traditiona… ▽ More

    Submitted 25 September, 2024; v1 submitted 25 September, 2024; originally announced September 2024.

  30. arXiv:2409.10824  [pdf, other

    cs.RO

    Robustness of LiDAR-Based Pose Estimation: Evaluating and Improving Odometry and Localization Under Common Point Cloud Corruptions

    Authors: Bo Yang, Tri Minh Triet Pham, Jinqiu Yang

    Abstract: Accurate and reliable pose estimation, i.e., determining the precise position and orientation of autonomous robots and vehicles, is critical for tasks like navigation and mapping. LiDAR is a widely used sensor for pose estimation, with odometry and localization being two primary tasks. LiDAR odometry estimates the relative motion between consecutive scans, while LiDAR localization aligns real-time… ▽ More

    Submitted 4 March, 2025; v1 submitted 16 September, 2024; originally announced September 2024.

  31. arXiv:2409.00827  [pdf, ps, other

    math.CO cs.DM

    Log-concavity of the independence polynomials of $\mathbf{W}_{p}$ graphs

    Authors: Do Trong Hoang, Vadim E. Levit, Eugen Mandrescu, My Hanh Pham

    Abstract: Let $G$ be a $\mathbf{W}_{p}$ graph if $n\geq p$ and every $p$ pairwise disjoint independent sets of $G$ are contained within $p$ pairwise disjoint maximum independent sets. In this paper, we establish that every $\mathbf{W}_{p}$ graph $G$ is $p$-quasi-regularizable if and only if $n\geq (p+1)α$, where $α$ is the independence number of $G$. This finding ensures that the independence polynomial of… ▽ More

    Submitted 1 September, 2024; originally announced September 2024.

    Comments: 16 pages, 2 figures

    MSC Class: 05C31; 05C69 (Primary) 05C05; 05C48 (Secondary) ACM Class: G.2.1; G.2.2

  32. arXiv:2409.00518  [pdf, other

    cs.CV cs.AI

    Mapping earth mounds from space

    Authors: Baki Uzun, Shivam Pande, Gwendal Cachin-Bernard, Minh-Tan Pham, Sébastien Lefèvre, Rumais Blatrix, Doyle McKey

    Abstract: Regular patterns of vegetation are considered widespread landscapes, although their global extent has never been estimated. Among them, spotted landscapes are of particular interest in the context of climate change. Indeed, regularly spaced vegetation spots in semi-arid shrublands result from extreme resource depletion and prefigure catastrophic shift of the ecosystem to a homogeneous desert, whil… ▽ More

    Submitted 31 August, 2024; originally announced September 2024.

    Comments: 6 pages, 4 figures, 3 tables

  33. arXiv:2408.13686  [pdf, other

    cs.SE

    Perception-Guided Fuzzing for Simulated Scenario-Based Testing of Autonomous Driving Systems

    Authors: Tri Minh Triet Pham, Bo Yang, Jinqiu Yang

    Abstract: Autonomous Driving Systems (ADS) have made huge progress and started on-road testing or even commercializing trials. ADS are complex and difficult to test: they receive input data from multiple sensors and make decisions using a combination of multiple deep neural network models and code logic. The safety of ADS is of utmost importance as their misbehavior can result in costly catastrophes, includ… ▽ More

    Submitted 24 August, 2024; originally announced August 2024.

  34. arXiv:2408.13653  [pdf, other

    cs.SE

    Evaluating the Robustness of LiDAR-based 3D Obstacles Detection and Its Impacts on Autonomous Driving Systems

    Authors: Tri Minh Triet Pham, Bo Yang, Jinqiu Yang

    Abstract: Autonomous driving systems (ADSs) require real-time input from multiple sensors to make time-sensitive decisions using deep neural networks. This makes the correctness of these decisions crucial to ADSs' adoption as errors can cause significant loss. Sensors such as LiDAR are sensitive to environmental changes and built-in inaccuracies and may fluctuate between frames. While there has been extensi… ▽ More

    Submitted 24 August, 2024; originally announced August 2024.

  35. arXiv:2408.13561  [pdf, other

    cs.CV eess.IV

    Variational Autoencoder for Anomaly Detection: A Comparative Study

    Authors: Huy Hoang Nguyen, Cuong Nhat Nguyen, Xuan Tung Dao, Quoc Trung Duong, Dzung Pham Thi Kim, Minh-Tan Pham

    Abstract: This paper aims to conduct a comparative analysis of contemporary Variational Autoencoder (VAE) architectures employed in anomaly detection, elucidating their performance and behavioral characteristics within this specific task. The architectural configurations under consideration encompass the original VAE baseline, the VAE with a Gaussian Random Field prior (VAE-GRF), and the VAE incorporating a… ▽ More

    Submitted 24 August, 2024; originally announced August 2024.

    Comments: 6 pages; accepted to IEEE ICCE 2024 for poster presentation

  36. arXiv:2407.10663  [pdf, ps, other

    cs.CV cs.AI

    Spatio-temporal neural distance fields for conditional generative modeling of the heart

    Authors: Kristine Sørensen, Paula Diez, Jan Margeta, Yasmin El Youssef, Michael Pham, Jonas Jalili Pedersen, Tobias Kühl, Ole de Backer, Klaus Kofoed, Oscar Camara, Rasmus Paulsen

    Abstract: The rhythmic pumping motion of the heart stands as a cornerstone in life, as it circulates blood to the entire human body through a series of carefully timed contractions of the individual chambers. Changes in the size, shape and movement of the chambers can be important markers for cardiac disease and modeling this in relation to clinical demography or disease is therefore of interest. Existing m… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: Accepted for MICCAI2024

  37. arXiv:2407.08792  [pdf, ps, other

    cs.CR

    ProxyGPT: Enabling User Anonymity in LLM Chatbots via (Un)Trustworthy Volunteer Proxies

    Authors: Dzung Pham, Jade Sheffey, Chau Minh Pham, Amir Houmansadr

    Abstract: Popular large language model (LLM) chatbots such as ChatGPT and Claude require users to create an account with an email or a phone number before allowing full access to their services. This practice ties users' personally identifiable information (PII) to their sensitive conversational data, thus posing significant privacy risks. Unfortunately, existing private LLM solutions based on cryptography… ▽ More

    Submitted 11 June, 2025; v1 submitted 11 July, 2024; originally announced July 2024.

  38. arXiv:2407.02873  [pdf, other

    cs.RO

    Robot Shape and Location Retention in Video Generation Using Diffusion Models

    Authors: Peng Wang, Zhihao Guo, Abdul Latheef Sait, Minh Huy Pham

    Abstract: Diffusion models have marked a significant milestone in the enhancement of image and video generation technologies. However, generating videos that precisely retain the shape and location of moving objects such as robots remains a challenge. This paper presents diffusion models specifically tailored to generate videos that accurately maintain the shape and location of mobile robots. This developme… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: 8 pages, 10 figures

  39. arXiv:2407.00609  [pdf, other

    cs.CV cs.LG

    ESGNN: Towards Equivariant Scene Graph Neural Network for 3D Scene Understanding

    Authors: Quang P. M. Pham, Khoi T. N. Nguyen, Lan C. Ngo, Truong Do, Truong Son Hy

    Abstract: Scene graphs have been proven to be useful for various scene understanding tasks due to their compact and explicit nature. However, existing approaches often neglect the importance of maintaining the symmetry-preserving property when generating scene graphs from 3D point clouds. This oversight can diminish the accuracy and robustness of the resulting scene graphs, especially when handling noisy, m… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

  40. arXiv:2406.19928  [pdf, other

    cs.CL cs.HC cs.IR

    Interactive Topic Models with Optimal Transport

    Authors: Garima Dhanania, Sheshera Mysore, Chau Minh Pham, Mohit Iyyer, Hamed Zamani, Andrew McCallum

    Abstract: Topic models are widely used to analyze document collections. While they are valuable for discovering latent topics in a corpus when analysts are unfamiliar with the corpus, analysts also commonly start with an understanding of the content present in a corpus. This may be through categories obtained from an initial pass over the corpus or a desire to analyze the corpus through a predefined set of… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

    Comments: Pre-print; Work in progress

  41. arXiv:2406.19371  [pdf, other

    cs.CL

    Suri: Multi-constraint Instruction Following for Long-form Text Generation

    Authors: Chau Minh Pham, Simeng Sun, Mohit Iyyer

    Abstract: Existing research on instruction following largely focuses on tasks with simple instructions and short responses. In this work, we explore multi-constraint instruction following for generating long-form text. We create Suri, a dataset with 20K human-written long-form texts paired with LLM-generated backtranslated instructions that contain multiple complex constraints. Because of prohibitive challe… ▽ More

    Submitted 1 October, 2024; v1 submitted 27 June, 2024; originally announced June 2024.

    Comments: Accepted to EMNLP'24 (Findings)

  42. arXiv:2405.15394  [pdf, other

    cs.CV

    Leveraging knowledge distillation for partial multi-task learning from multiple remote sensing datasets

    Authors: Hoàng-Ân Lê, Minh-Tan Pham

    Abstract: Partial multi-task learning where training examples are annotated for one of the target tasks is a promising idea in remote sensing as it allows combining datasets annotated for different tasks and predicting more tasks with fewer network parameters. The naïve approach to partial multi-task learning is sub-optimal due to the lack of all-task annotations for learning joint representations. This pap… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

    Comments: Accepted for oral presentation at IGARSS 2024

  43. arXiv:2404.08079  [pdf, other

    cs.LG cs.CV math.OC

    DIMAT: Decentralized Iterative Merging-And-Training for Deep Learning Models

    Authors: Nastaran Saadati, Minh Pham, Nasla Saleem, Joshua R. Waite, Aditya Balu, Zhanhong Jiang, Chinmay Hegde, Soumik Sarkar

    Abstract: Recent advances in decentralized deep learning algorithms have demonstrated cutting-edge performance on various tasks with large pre-trained models. However, a pivotal prerequisite for achieving this level of competitiveness is the significant communication and computation overheads when updating these models, which prohibits the applications of them to real-world scenarios. To address this issue,… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

    Comments: CVPR 2024 accepted paper, 22 pages, 12 figures

  44. arXiv:2404.03631  [pdf, other

    cs.CV

    Robust Concept Erasure Using Task Vectors

    Authors: Minh Pham, Kelly O. Marshall, Chinmay Hegde, Niv Cohen

    Abstract: With the rapid growth of text-to-image models, a variety of techniques have been suggested to prevent undesirable image generations. Yet, these methods often only protect against specific user prompts and have been shown to allow unsafe generations with other inputs. Here we focus on unconditionally erasing a concept from a text-to-image model rather than conditioning the erasure on the user's pro… ▽ More

    Submitted 19 February, 2025; v1 submitted 4 April, 2024; originally announced April 2024.

  45. arXiv:2403.16958  [pdf, other

    cs.CV

    TwinLiteNetPlus: A Real-Time Multi-Task Segmentation Model for Autonomous Driving

    Authors: Quang-Huy Che, Duc-Tri Le, Minh-Quan Pham, Vinh-Tiep Nguyen, Duc-Khai Lam

    Abstract: Semantic segmentation is crucial for autonomous driving, particularly for the tasks of Drivable Area and Lane Segmentation, ensuring safety and navigation. To address the high computational costs of current state-of-the-art (SOTA) models, this paper introduces TwinLiteNetPlus, a model capable of balancing efficiency and accuracy. TwinLiteNetPlus incorporates standard and depth-wise separable dilat… ▽ More

    Submitted 10 April, 2025; v1 submitted 25 March, 2024; originally announced March 2024.

  46. arXiv:2403.13698  [pdf

    cs.CV eess.IV

    Insight Into the Collocation of Multi-Source Satellite Imagery for Multi-Scale Vessel Detection

    Authors: Tran-Vu La, Minh-Tan Pham, Marco Chini

    Abstract: Ship detection from satellite imagery using Deep Learning (DL) is an indispensable solution for maritime surveillance. However, applying DL models trained on one dataset to others having differences in spatial resolution and radiometric features requires many adjustments. To overcome this issue, this paper focused on the DL models trained on datasets that consist of different optical images and a… ▽ More

    Submitted 23 May, 2024; v1 submitted 20 March, 2024; originally announced March 2024.

    Comments: 5 pages, accepted to IGARSS 2024

  47. arXiv:2403.13575  [pdf, other

    cs.CV

    Leveraging feature communication in federated learning for remote sensing image classification

    Authors: Anh-Kiet Duong, Hoàng-Ân Lê, Minh-Tan Pham

    Abstract: In the realm of Federated Learning (FL) applied to remote sensing image classification, this study introduces and assesses several innovative communication strategies. Our exploration includes feature-centric communication, pseudo-weight amalgamation, and a combined method utilizing both weights and features. Experiments conducted on two public scene classification datasets unveil the effectivenes… ▽ More

    Submitted 23 May, 2024; v1 submitted 20 March, 2024; originally announced March 2024.

    Comments: 5 pages, to appear in IGARSS 2024

  48. arXiv:2403.05297  [pdf, other

    cs.CV cs.AI cs.CL

    PEEB: Part-based Image Classifiers with an Explainable and Editable Language Bottleneck

    Authors: Thang M. Pham, Peijie Chen, Tin Nguyen, Seunghyun Yoon, Trung Bui, Anh Totti Nguyen

    Abstract: CLIP-based classifiers rely on the prompt containing a {class name} that is known to the text encoder. Therefore, they perform poorly on new classes or the classes whose names rarely appear on the Internet (e.g., scientific names of birds). For fine-grained classification, we propose PEEB - an explainable and editable classifier to (1) express the class name into a set of text descriptors that des… ▽ More

    Submitted 12 April, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

    Comments: Findings of NAACL 2024 (long paper)

  49. arXiv:2402.17745  [pdf, other

    physics.comp-ph cs.CV physics.optics

    Low-light phase retrieval with implicit generative priors

    Authors: Raunak Manekar, Elisa Negrini, Minh Pham, Daniel Jacobs, Jaideep Srivastava, Stanley J. Osher, Jianwei Miao

    Abstract: Phase retrieval (PR) is fundamentally important in scientific imaging and is crucial for nanoscale techniques like coherent diffractive imaging (CDI). Low radiation dose imaging is essential for applications involving radiation-sensitive samples. However, most PR methods struggle in low-dose scenarios due to high shot noise. Recent advancements in optical data acquisition setups, such as in-situ C… ▽ More

    Submitted 23 August, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

    MSC Class: 68T10 68T07 78A46

  50. arXiv:2402.10239  [pdf, other

    hep-ph cs.LG hep-ex

    A Language Model for Particle Tracking

    Authors: Andris Huang, Yash Melkani, Paolo Calafiura, Alina Lazar, Daniel Thomas Murnane, Minh-Tuan Pham, Xiangyang Ju

    Abstract: Particle tracking is crucial for almost all physics analysis programs at the Large Hadron Collider. Deep learning models are pervasively used in particle tracking related tasks. However, the current practice is to design and train one deep learning model for one task with supervised learning techniques. The trained models work well for tasks they are trained on but show no or little generalization… ▽ More

    Submitted 14 February, 2024; originally announced February 2024.

    Comments: 7 pages, 3 figures, A Proceeding of the Connecting the Dots Workshop (CTD 2023)

    Report number: PROC-CTD2023-33