Skip to main content

Showing 1–50 of 209 results for author: Tran, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.09475  [pdf

    cs.RO

    aUToPath: Unified Planning and Control for Autonomous Vehicles in Urban Environments Using Hybrid Lattice and Free-Space Search

    Authors: Tanmay P. Patel, Connor Wilson, Ellina R. Zhang, Morgan Tran, Chang Keun Paik, Steven L. Waslander, Timothy D. Barfoot

    Abstract: This paper presents aUToPath, a unified online framework for global path-planning and control to address the challenge of autonomous navigation in cluttered urban environments. A key component of our framework is a novel hybrid planner that combines pre-computed lattice maps with dynamic free-space sampling to efficiently generate optimal driveable corridors in cluttered scenarios. Our system also… ▽ More

    Submitted 14 May, 2025; originally announced May 2025.

    Comments: 9 pages, 10 figures. Tanmay P. Patel, Connor Wilson, and Ellina R. Zhang contributed equally

  2. arXiv:2504.21344  [pdf, ps, other

    cs.CV cs.AI q-bio.QM

    Vision-Language Model-Based Semantic-Guided Imaging Biomarker for Early Lung Cancer Detection

    Authors: Luoting Zhuang, Seyed Mohammad Hossein Tabatabaei, Ramin Salehi-Rad, Linh M. Tran, Denise R. Aberle, Ashley E. Prosper, William Hsu

    Abstract: Objective: A number of machine learning models have utilized semantic features, deep features, or both to assess lung nodule malignancy. However, their reliance on manual annotation during inference, limited interpretability, and sensitivity to imaging variations hinder their application in real-world clinical settings. Thus, this research aims to integrate semantic features derived from radiologi… ▽ More

    Submitted 30 April, 2025; originally announced April 2025.

  3. arXiv:2504.04010  [pdf, other

    cs.CV cs.LG

    DiTaiListener: Controllable High Fidelity Listener Video Generation with Diffusion

    Authors: Maksim Siniukov, Di Chang, Minh Tran, Hongkun Gong, Ashutosh Chaubey, Mohammad Soleymani

    Abstract: Generating naturalistic and nuanced listener motions for extended interactions remains an open problem. Existing methods often rely on low-dimensional motion codes for facial behavior generation followed by photorealistic rendering, limiting both visual fidelity and expressive richness. To address these challenges, we introduce DiTaiListener, powered by a video diffusion model with multimodal cond… ▽ More

    Submitted 4 April, 2025; originally announced April 2025.

    Comments: Project page: https://havent-invented.github.io/DiTaiListener

    ACM Class: I.4.9

  4. arXiv:2504.03292  [pdf, other

    cs.CV

    FaR: Enhancing Multi-Concept Text-to-Image Diffusion via Concept Fusion and Localized Refinement

    Authors: Gia-Nghia Tran, Quang-Huy Che, Trong-Tai Dam Vu, Bich-Nga Pham, Vinh-Tiep Nguyen, Trung-Nghia Le, Minh-Triet Tran

    Abstract: Generating multiple new concepts remains a challenging problem in the text-to-image task. Current methods often overfit when trained on a small number of samples and struggle with attribute leakage, particularly for class-similar subjects (e.g., two specific dogs). In this paper, we introduce Fuse-and-Refine (FaR), a novel approach that tackles these challenges through two key contributions: Conce… ▽ More

    Submitted 4 April, 2025; originally announced April 2025.

  5. arXiv:2504.02060  [pdf, other

    cs.CV cs.IR

    LSC-ADL: An Activity of Daily Living (ADL)-Annotated Lifelog Dataset Generated via Semi-Automatic Clustering

    Authors: Minh-Quan Ho-Le, Duy-Khang Ho, Van-Tu Ninh, Cathal Gurrin, Minh-Triet Tran

    Abstract: Lifelogging involves continuously capturing personal data through wearable cameras, providing an egocentric view of daily activities. Lifelog retrieval aims to search and retrieve relevant moments from this data, yet existing methods largely overlook activity-level annotations, which capture temporal relationships and enrich semantic understanding. In this work, we introduce LSC-ADL, an ADL-annota… ▽ More

    Submitted 2 April, 2025; originally announced April 2025.

    Comments: 11 pages, 4 figures

  6. arXiv:2503.18267  [pdf, other

    cs.CV

    Enhancing Dataset Distillation via Non-Critical Region Refinement

    Authors: Minh-Tuan Tran, Trung Le, Xuan-May Le, Thanh-Toan Do, Dinh Phung

    Abstract: Dataset distillation has become a popular method for compressing large datasets into smaller, more efficient representations while preserving critical information for model training. Data features are broadly categorized into two types: instance-specific features, which capture unique, fine-grained details of individual examples, and class-general features, which represent shared, broad patterns a… ▽ More

    Submitted 23 March, 2025; originally announced March 2025.

    Comments: Accepted at CVPR 2025

  7. arXiv:2503.17116  [pdf, other

    cs.MM cs.AI cs.CV cs.IR

    The CASTLE 2024 Dataset: Advancing the Art of Multimodal Understanding

    Authors: Luca Rossetto, Werner Bailer, Duc-Tien Dang-Nguyen, Graham Healy, Björn Þór Jónsson, Onanong Kongmeesub, Hoang-Bao Le, Stevan Rudinac, Klaus Schöffmann, Florian Spiess, Allie Tran, Minh-Triet Tran, Quang-Linh Tran, Cathal Gurrin

    Abstract: Egocentric video has seen increased interest in recent years, as it is used in a range of areas. However, most existing datasets are limited to a single perspective. In this paper, we present the CASTLE 2024 dataset, a multimodal collection containing ego- and exo-centric (i.e., first- and third-person perspective) video and audio from 15 time-aligned sources, as well as other sensor streams and a… ▽ More

    Submitted 21 March, 2025; originally announced March 2025.

    Comments: 7 pages, 6 figures, dataset available via https://castle-dataset.github.io/

  8. arXiv:2503.15105  [pdf, ps, other

    math.NA cs.LG math.OC

    Control, Optimal Transport and Neural Differential Equations in Supervised Learning

    Authors: Minh-Nhat Phung, Minh-Binh Tran

    Abstract: From the perspective of control theory, neural differential equations (neural ODEs) have become an important tool for supervised learning. In the fundamental work of Ruiz-Balet and Zuazua (SIAM REVIEW 2023), the authors pose an open problem regarding the connection between control theory, optimal transport theory, and neural differential equations. More precisely, they inquire how one can quantify… ▽ More

    Submitted 26 March, 2025; v1 submitted 19 March, 2025; originally announced March 2025.

  9. arXiv:2503.07470  [pdf, other

    cs.IR cs.AI cs.LG

    Advancing Vietnamese Information Retrieval with Learning Objective and Benchmark

    Authors: Phu-Vinh Nguyen, Minh-Nam Tran, Long Nguyen, Dien Dinh

    Abstract: With the rapid development of natural language processing, many language models have been invented for multiple tasks. One important task is information retrieval (IR), which requires models to retrieve relevant documents. Despite its importance in many real-life applications, especially in retrieval augmented generation (RAG) systems, this task lacks Vietnamese benchmarks. This situation causes d… ▽ More

    Submitted 10 March, 2025; originally announced March 2025.

    Journal ref: PACLIC38-2024

  10. arXiv:2503.06571  [pdf, other

    cs.LG cs.AI

    SHIP: A Shapelet-based Approach for Interpretable Patient-Ventilator Asynchrony Detection

    Authors: Xuan-May Le, Ling Luo, Uwe Aickelin, Minh-Tuan Tran, David Berlowitz, Mark Howard

    Abstract: Patient-ventilator asynchrony (PVA) is a common and critical issue during mechanical ventilation, affecting up to 85% of patients. PVA can result in clinical complications such as discomfort, sleep disruption, and potentially more severe conditions like ventilator-induced lung injury and diaphragm dysfunction. Traditional PVA management, which relies on manual adjustments by healthcare providers,… ▽ More

    Submitted 12 March, 2025; v1 submitted 9 March, 2025; originally announced March 2025.

    Comments: Accepted at PAKDD 2025

  11. Enhancing Autonomous Vehicle-Pedestrian Interaction in Shared Spaces: The Impact of Intended Path-Projection

    Authors: Le Yue, Tram Thi Minh Tran, Xinyan Yu, Marius Hoggenmueller

    Abstract: External Human-Machine Interfaces (eHMIs) are critical for seamless interactions between autonomous vehicles (AVs) and pedestrians in shared spaces. However, they often struggle to adapt to these environments, where pedestrian movement is fluid and right-of-way is ambiguous. To address these challenges, we propose PaveFlow, an eHMI that projects the AV's intended path onto the ground in real time,… ▽ More

    Submitted 6 March, 2025; originally announced March 2025.

  12. arXiv:2503.04850  [pdf, other

    cs.CR cs.LG

    Slow is Fast! Dissecting Ethereum's Slow Liquidity Drain Scams

    Authors: Minh Trung Tran, Nasrin Sohrabi, Zahir Tari, Qin Wang, Xiaoyu Xia

    Abstract: We identify the slow liquidity drain (SLID) scam, an insidious and highly profitable threat to decentralized finance (DeFi), posing a large-scale, persistent, and growing risk to the ecosystem. Unlike traditional scams such as rug pulls or honeypots (USENIX Sec'19, USENIX Sec'23), SLID gradually siphons funds from liquidity pools over extended periods, making detection significantly more challengi… ▽ More

    Submitted 10 March, 2025; v1 submitted 5 March, 2025; originally announced March 2025.

  13. From Everyday Technologies to Augmented Reality: An Autoethnographic Study of Presence and Engagement

    Authors: Tram Thi Minh Tran

    Abstract: Digital technologies are reshaping how people experience their surroundings, often pulling focus toward virtual spaces and making it harder to stay present and engaged. Wearable augmented reality (AR), by embedding digital information into the physical world, may further immerse users in digital layers. Yet paradoxically, it also holds the potential to support presence and engagement. To explore t… ▽ More

    Submitted 3 March, 2025; originally announced March 2025.

  14. Peek into the `White-Box': A Field Study on Bystander Engagement with Urban Robot Uncertainty

    Authors: Xinyan Yu, Marius Hoggenmueller, Tram Thi Minh Tran, Yiyuan Wang, Qiuming Zhang, Martin Tomitsch

    Abstract: Uncertainty inherently exists in the autonomous decision-making process of robots. Involving humans in resolving this uncertainty not only helps robots mitigate it but is also crucial for improving human-robot interactions. However, in public urban spaces filled with unpredictability, robots often face heightened uncertainty without direct human collaborators. This study investigates how robots ca… ▽ More

    Submitted 28 February, 2025; originally announced March 2025.

  15. Doraemon's Gadget Lab: Unpacking Human Needs and Interaction Design in Speculative Technology

    Authors: Tram Thi Minh Tran

    Abstract: Speculative technologies in science fiction have long inspired advancements in Human-Computer Interaction (HCI). Doraemon, a Japanese manga featuring a robotic cat from the 22nd century, presents an extensive collection of futuristic gadgets-an underexplored source of speculative technologies. This study systematically analyses 379 of these gadgets, categorising them into 33 subcategories within 1… ▽ More

    Submitted 28 February, 2025; originally announced March 2025.

  16. Wearable AR in Everyday Contexts: Insights from a Digital Ethnography of YouTube Videos

    Authors: Tram Thi Minh Tran, Shane Brown, Oliver Weidlich, Soojeong Yoo, Callum Parker

    Abstract: With growing investment in consumer augmented reality (AR) headsets and glasses, wearable AR is moving from niche applications to everyday use. However, current research primarily examines AR in controlled settings, offering limited insights into its use in real-world daily life. To address this gap, we adopt a digital ethnographic approach, analysing 27 hours of 112 YouTube videos featuring early… ▽ More

    Submitted 11 February, 2025; v1 submitted 10 February, 2025; originally announced February 2025.

  17. arXiv:2501.12501  [pdf, other

    eess.AS cs.SD

    A Domain Adaptation Framework for Speech Recognition Systems with Only Synthetic data

    Authors: Minh Tran, Yutong Pang, Debjyoti Paul, Laxmi Pandey, Kevin Jiang, Jinxi Guo, Ke Li, Shun Zhang, Xuedong Zhang, Xin Lei

    Abstract: We introduce DAS (Domain Adaptation with Synthetic data), a novel domain adaptation framework for pre-trained ASR model, designed to efficiently adapt to various language-defined domains without requiring any real data. In particular, DAS first prompts large language models (LLMs) to generate domain-specific texts before converting these texts to speech via text-to-speech technology. The synthetic… ▽ More

    Submitted 21 January, 2025; originally announced January 2025.

    Comments: ICASSP 2025

  18. arXiv:2501.03717  [pdf, other

    cs.CV cs.AI cs.GR

    Materialist: Physically Based Editing Using Single-Image Inverse Rendering

    Authors: Lezhong Wang, Duc Minh Tran, Ruiqi Cui, Thomson TG, Manmohan Chandraker, Jeppe Revall Frisvad

    Abstract: To perform image editing based on single-view, inverse physically based rendering, we present a method combining a learning-based approach with progressive differentiable rendering. Given an image, our method leverages neural networks to predict initial material properties. Progressive differentiable rendering is then used to optimize the environment map and refine the material properties with the… ▽ More

    Submitted 7 January, 2025; originally announced January 2025.

    Comments: code will be available at github.com/lez-s/Materialist

  19. arXiv:2501.00865  [pdf, other

    cs.CL cs.LG

    Negative to Positive Co-learning with Aggressive Modality Dropout

    Authors: Nicholas Magal, Minh Tran, Riku Arakawa, Suzanne Nie

    Abstract: This paper aims to document an effective way to improve multimodal co-learning by using aggressive modality dropout. We find that by using aggressive modality dropout we are able to reverse negative co-learning (NCL) to positive co-learning (PCL). Aggressive modality dropout can be used to "prep" a multimodal model for unimodal deployment, and dramatically increases model performance during negati… ▽ More

    Submitted 1 January, 2025; originally announced January 2025.

  20. arXiv:2412.18149  [pdf, other

    cs.CV

    Dense-Face: Personalized Face Generation Model via Dense Annotation Prediction

    Authors: Xiao Guo, Manh Tran, Jiaxin Cheng, Xiaoming Liu

    Abstract: The text-to-image (T2I) personalization diffusion model can generate images of the novel concept based on the user input text caption. However, existing T2I personalized methods either require test-time fine-tuning or fail to generate images that align well with the given text caption. In this work, we propose a new T2I personalization diffusion model, Dense-Face, which can generate face images wi… ▽ More

    Submitted 23 December, 2024; originally announced December 2024.

    Comments: 15 figures, 5 tables

  21. Robots in the Wild: Contextually-Adaptive Human-Robot Interactions in Urban Public Environments

    Authors: Xinyan Yu, Yiyuan Wang, Tram Thi Minh Tran, Yi Zhao, Julie Stephany Berrio Perez, Marius Hoggenmuller, Justine Humphry, Lian Loke, Lynn Masuda, Callum Parker, Martin Tomitsch, Stewart Worrall

    Abstract: The increasing transition of human-robot interaction (HRI) context from controlled settings to dynamic, real-world public environments calls for enhanced adaptability in robotic systems. This can go beyond algorithmic navigation or traditional HRI strategies in structured settings, requiring the ability to navigate complex public urban systems containing multifaceted dynamics and various socio-tec… ▽ More

    Submitted 9 December, 2024; v1 submitted 5 December, 2024; originally announced December 2024.

  22. Gesture Classification in Artworks Using Contextual Image Features

    Authors: Azhar Hussian, Mathias Zinnen, Thi My Hang Tran, Andreas Maier, Vincent Christlein

    Abstract: Recognizing gestures in artworks can add a valuable dimension to art understanding and help to acknowledge the role of the sense of smell in cultural heritage. We propose a method to recognize smell gestures in historical artworks. We show that combining local features with global image context improves classification performance notably on different backbones.

    Submitted 4 December, 2024; originally announced December 2024.

    Journal ref: Digital Humanities Conference, Arlington, USA, 2024, pp.287-290

  23. arXiv:2412.01147  [pdf, other

    cs.CV

    A2VIS: Amodal-Aware Approach to Video Instance Segmentation

    Authors: Minh Tran, Thang Pham, Winston Bounsavy, Tri Nguyen, Ngan Le

    Abstract: Handling occlusion remains a significant challenge for video instance-level tasks like Multiple Object Tracking (MOT) and Video Instance Segmentation (VIS). In this paper, we propose a novel framework, Amodal-Aware Video Instance Segmentation (A2VIS), which incorporates amodal representations to achieve a reliable and comprehensive understanding of both visible and occluded parts of objects in a v… ▽ More

    Submitted 9 April, 2025; v1 submitted 2 December, 2024; originally announced December 2024.

    Comments: Accepted to IMAVIS. Project page: https://uark-aicv.github.io/A2VIS

  24. arXiv:2411.17132  [pdf, other

    cs.LG

    Improving Resistance to Noisy Label Fitting by Reweighting Gradient in SAM

    Authors: Hoang-Chau Luong, Thuc Nguyen-Quang, Minh-Triet Tran

    Abstract: Noisy labels pose a substantial challenge in machine learning, often resulting in overfitting and poor generalization. Sharpness-Aware Minimization (SAM), as demonstrated in Foret et al. (2021), improves generalization over traditional Stochastic Gradient Descent (SGD) in classification tasks with noisy labels by implicitly slowing noisy learning. While SAM's ability to generalize in noisy environ… ▽ More

    Submitted 26 November, 2024; originally announced November 2024.

  25. arXiv:2411.17046  [pdf, other

    cs.CV

    Large-Scale Data-Free Knowledge Distillation for ImageNet via Multi-Resolution Data Generation

    Authors: Minh-Tuan Tran, Trung Le, Xuan-May Le, Jianfei Cai, Mehrtash Harandi, Dinh Phung

    Abstract: Data-Free Knowledge Distillation (DFKD) is an advanced technique that enables knowledge transfer from a teacher model to a student model without relying on original training data. While DFKD methods have achieved success on smaller datasets like CIFAR10 and CIFAR100, they encounter challenges on larger, high-resolution datasets such as ImageNet. A primary issue with previous approaches is their ge… ▽ More

    Submitted 25 November, 2024; originally announced November 2024.

  26. arXiv:2410.14983  [pdf, other

    cs.CV

    D-SarcNet: A Dual-stream Deep Learning Framework for Automatic Analysis of Sarcomere Structures in Fluorescently Labeled hiPSC-CMs

    Authors: Huyen Le, Khiet Dang, Nhung Nguyen, Mai Tran, Hieu Pham

    Abstract: Human-induced pluripotent stem cell-derived cardiomyocytes (hiPSC-CMs) are a powerful tool in advancing cardiovascular research and clinical applications. The maturation of sarcomere organization in hiPSC-CMs is crucial, as it supports the contractile function and structural integrity of these cells. Traditional methods for assessing this maturation like manual annotation and feature extraction ar… ▽ More

    Submitted 19 October, 2024; originally announced October 2024.

    Comments: Accepted for oral presentation at IEEE International Conference on Bioinformatics and Biomedicine 2024 (IEEE BIBM 2024)

  27. Advancing VR Simulators for Autonomous Vehicle-Pedestrian Interactions: A Focus on Multi-Entity Scenarios

    Authors: Tram Thi Minh Tran, Callum Parker

    Abstract: Recent research has increasingly focused on how autonomous vehicles (AVs) communicate with pedestrians in complex traffic situations involving multiple vehicles and pedestrians. VR is emerging as an effective tool to simulate these multi-entity scenarios, offering a safe and controlled study environment. Despite its growing use, there is a lack of thorough investigation into the effectiveness of t… ▽ More

    Submitted 8 October, 2024; originally announced October 2024.

    Comments: Accepted to Transportation Research Part F: Traffic Psychology and Behaviour

  28. arXiv:2409.18476  [pdf

    cs.CV

    Underwater Image Enhancement with Physical-based Denoising Diffusion Implicit Models

    Authors: Nguyen Gia Bach, Chanh Minh Tran, Eiji Kamioka, Phan Xuan Tan

    Abstract: Underwater vision is crucial for autonomous underwater vehicles (AUVs), and enhancing degraded underwater images in real-time on a resource-constrained AUV is a key challenge due to factors like light absorption and scattering, or the sufficient model computational complexity to resolve such factors. Traditional image enhancement techniques lack adaptability to varying underwater conditions, while… ▽ More

    Submitted 27 September, 2024; originally announced September 2024.

  29. arXiv:2409.18256  [pdf, other

    cs.CV

    Amodal Instance Segmentation with Diffusion Shape Prior Estimation

    Authors: Minh Tran, Khoa Vo, Tri Nguyen, Ngan Le

    Abstract: Amodal Instance Segmentation (AIS) presents an intriguing challenge, including the segmentation prediction of both visible and occluded parts of objects within images. Previous methods have often relied on shape prior information gleaned from training data to enhance amodal segmentation. However, these approaches are susceptible to overfitting and disregard object category details. Recent advancem… ▽ More

    Submitted 4 October, 2024; v1 submitted 26 September, 2024; originally announced September 2024.

    Comments: ACCV2024; Project page: https://uark-aicv.github.io/AISDiff

  30. arXiv:2409.13563  [pdf, other

    cs.CR cs.ET cs.SE

    Proxion: Uncovering Hidden Proxy Smart Contracts for Finding Collision Vulnerabilities in Ethereum

    Authors: Cheng-Kang Chen, Wen-Yi Chu, Muoi Tran, Laurent Vanbever, Hsu-Chun Hsiao

    Abstract: The proxy design pattern allows Ethereum smart contracts to be simultaneously immutable and upgradeable, in which an original contract is split into a proxy contract containing the data storage and a logic contract containing the implementation logic. This architecture is known to have security issues, namely function collisions and storage collisions between the proxy and logic contracts, and has… ▽ More

    Submitted 20 September, 2024; originally announced September 2024.

  31. arXiv:2409.06481  [pdf, other

    cs.CV

    NeIn: Telling What You Don't Want

    Authors: Nhat-Tan Bui, Dinh-Hieu Hoang, Quoc-Huy Trinh, Minh-Triet Tran, Truong Nguyen, Susan Gauch

    Abstract: Negation is a fundamental linguistic concept used by humans to convey information that they do not desire. Despite this, minimal research has focused on negation within text-guided image editing. This lack of research means that vision-language models (VLMs) for image editing may struggle to understand negation, implying that they struggle to provide accurate results. One barrier to achieving huma… ▽ More

    Submitted 5 April, 2025; v1 submitted 9 September, 2024; originally announced September 2024.

    Comments: Accepted to CVPR 2025 Workshop SyntaGen. Project page: https://tanbuinhat.github.io/NeIn/

  32. arXiv:2409.05014  [pdf, other

    cs.CE cs.SE

    Analyzing Challenges in Deployment of the SLSA Framework for Software Supply Chain Security

    Authors: Mahzabin Tamanna, Sivana Hamer, Mindy Tran, Sascha Fahl, Yasemin Acar, Laurie Williams

    Abstract: In 2023, Sonatype reported a 200\% increase in software supply chain attacks, including major build infrastructure attacks. To secure the software supply chain, practitioners can follow security framework guidance like the Supply-chain Levels for Software Artifacts (SLSA). However, recent surveys and industry summits have shown that despite growing interest, the adoption of SLSA is not widespread.… ▽ More

    Submitted 4 December, 2024; v1 submitted 8 September, 2024; originally announced September 2024.

  33. arXiv:2407.13159  [pdf, other

    cs.CV

    Attenuation-Aware Weighted Optical Flow with Medium Transmission Map for Learning-based Visual Odometry in Underwater terrain

    Authors: Bach Nguyen Gia, Chanh Minh Tran, Kamioka Eiji, Tan Phan Xuan

    Abstract: This paper addresses the challenge of improving learning-based monocular visual odometry (VO) in underwater environments by integrating principles of underwater optical imaging to manipulate optical flow estimation. Leveraging the inherent properties of underwater imaging, the novel wflow-TartanVO is introduced, enhancing the accuracy of VO systems for autonomous underwater vehicles (AUVs). The pr… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  34. arXiv:2407.04327  [pdf, other

    cs.CV

    TF-SASM: Training-free Spatial-aware Sparse Memory for Multi-object Tracking

    Authors: Thuc Nguyen-Quang, Minh-Triet Tran

    Abstract: Multi-object tracking (MOT) in computer vision remains a significant challenge, requiring precise localization and continuous tracking of multiple objects in video sequences. The emergence of data sets that emphasize robust reidentification, such as DanceTrack, has highlighted the need for effective solutions. While memory-based approaches have shown promise, they often suffer from high computatio… ▽ More

    Submitted 15 July, 2024; v1 submitted 5 July, 2024; originally announced July 2024.

  35. arXiv:2406.19871  [pdf, other

    cs.LG cs.NI eess.SY

    Koopman based trajectory model and computation offloading for high mobility paradigm in ISAC enabled IoT system

    Authors: Minh-Tuan Tran

    Abstract: User experience on mobile devices is constrained by limited battery capacity and processing power, but 6G technology advancements are diving rapidly into mobile technical evolution. Mobile edge computing (MEC) offers a solution, offloading computationally intensive tasks to edge cloud servers, reducing battery drain compared to local processing. The upcoming integrated sensing and communication in… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

    MSC Class: 52-08 ACM Class: C.2

  36. arXiv:2406.14819  [pdf, other

    cs.CV

    SAM-EG: Segment Anything Model with Egde Guidance framework for efficient Polyp Segmentation

    Authors: Quoc-Huy Trinh, Hai-Dang Nguyen, Bao-Tram Nguyen Ngoc, Debesh Jha, Ulas Bagci, Minh-Triet Tran

    Abstract: Polyp segmentation, a critical concern in medical imaging, has prompted numerous proposed methods aimed at enhancing the quality of segmented masks. While current state-of-the-art techniques produce impressive results, the size and computational cost of these models pose challenges for practical industry applications. Recently, the Segment Anything Model (SAM) has been proposed as a robust foundat… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  37. What is in the Chrome Web Store? Investigating Security-Noteworthy Browser Extensions

    Authors: Sheryl Hsu, Manda Tran, Aurore Fass

    Abstract: This paper is the first attempt at providing a holistic view of the Chrome Web Store (CWS). We leverage historical data provided by ChromeStats to study global trends in the CWS and security implications. We first highlight the extremely short life cycles of extensions: roughly 60% of extensions stay in the CWS for one year. Second, we define and show that Security-Noteworthy Extensions (SNE) are… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: Published in ACM AsiaCCS 2024

    Journal ref: ACM AsiaCCS 2024

  38. arXiv:2406.11146  [pdf, other

    cs.HC

    Designing Interactions with Autonomous Physical Systems

    Authors: Marius Hoggenmueller, Tram Thi Minh Tran, Luke Hespanhol, Martin Tomitsch

    Abstract: In this position paper, we present a collection of four different prototyping approaches which we have developed and applied to prototype and evaluate interfaces for and interactions around autonomous physical systems. Further, we provide a classification of our approaches aiming to support other researchers and designers in choosing appropriate prototyping platforms and representations.

    Submitted 16 June, 2024; originally announced June 2024.

  39. arXiv:2406.09837  [pdf, other

    cs.LG

    TabularFM: An Open Framework For Tabular Foundational Models

    Authors: Quan M. Tran, Suong N. Hoang, Lam M. Nguyen, Dzung Phan, Hoang Thanh Lam

    Abstract: Foundational models (FMs), pretrained on extensive datasets using self-supervised techniques, are capable of learning generalized patterns from large amounts of data. This reduces the need for extensive labeled datasets for each new task, saving both time and resources by leveraging the broad knowledge base established during pretraining. Most research on FMs has primarily focused on unstructured… ▽ More

    Submitted 17 June, 2024; v1 submitted 14 June, 2024; originally announced June 2024.

  40. Context-Based Interface Prototyping: Understanding the Effect of Prototype Representation on User Feedback

    Authors: Marius Hoggenmueller, Martin Tomitsch, Luke Hespanhol, Tram Thi Minh Tran, Stewart Worrall, Eduardo Nebot

    Abstract: The rise of autonomous systems in cities, such as automated vehicles (AVs), requires new approaches for prototyping and evaluating how people interact with those systems through context-based user interfaces, such as external human-machine interfaces (eHMIs). In this paper, we present a comparative study of three prototype representations (real-world VR, computer-generated VR, real-world video) of… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  41. arXiv:2406.00307  [pdf, other

    cs.CV

    HENASY: Learning to Assemble Scene-Entities for Egocentric Video-Language Model

    Authors: Khoa Vo, Thinh Phan, Kashu Yamazaki, Minh Tran, Ngan Le

    Abstract: Current video-language models (VLMs) rely extensively on instance-level alignment between video and language modalities, which presents two major limitations: (1) visual reasoning disobeys the natural perception that humans do in first-person perspective, leading to a lack of reasoning interpretation; and (2) learning is limited in capturing inherent fine-grained relationships between two modaliti… ▽ More

    Submitted 1 November, 2024; v1 submitted 1 June, 2024; originally announced June 2024.

    Comments: Accepted to NeurIPS 2024

  42. arXiv:2405.17926  [pdf, other

    cs.CV

    SarcNet: A Novel AI-based Framework to Automatically Analyze and Score Sarcomere Organizations in Fluorescently Tagged hiPSC-CMs

    Authors: Huyen Le, Khiet Dang, Tien Lai, Nhung Nguyen, Mai Tran, Hieu Pham

    Abstract: Quantifying sarcomere structure organization in human-induced pluripotent stem cell-derived cardiomyocytes (hiPSC-CMs) is crucial for understanding cardiac disease pathology, improving drug screening, and advancing regenerative medicine. Traditional methods, such as manual annotation and Fourier transform analysis, are labor-intensive, error-prone, and lack high-throughput capabilities. In this st… ▽ More

    Submitted 28 October, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

    Comments: This work has been submitted to the IEEE for possible publication

  43. arXiv:2405.14608  [pdf, other

    cs.LG cs.AI

    ShapeFormer: Shapelet Transformer for Multivariate Time Series Classification

    Authors: Xuan-May Le, Ling Luo, Uwe Aickelin, Minh-Tuan Tran

    Abstract: Multivariate time series classification (MTSC) has attracted significant research attention due to its diverse real-world applications. Recently, exploiting transformers for MTSC has achieved state-of-the-art performance. However, existing methods focus on generic features, providing a comprehensive understanding of data, but they ignore class-specific features crucial for learning the representat… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Comments: Accepted at KDD 2024

  44. arXiv:2405.04489  [pdf, other

    cs.CV

    S3Former: Self-supervised High-resolution Transformer for Solar PV Profiling

    Authors: Minh Tran, Adrian De Luis, Haitao Liao, Ying Huang, Roy McCann, Alan Mantooth, Jack Cothren, Ngan Le

    Abstract: As the impact of climate change escalates, the global necessity to transition to sustainable energy sources becomes increasingly evident. Renewable energies have emerged as a viable solution for users, with Photovoltaic energy being a favored choice for small installations due to its reliability and efficiency. Accurate mapping of PV installations is crucial for understanding the extension of its… ▽ More

    Submitted 30 April, 2025; v1 submitted 7 May, 2024; originally announced May 2024.

    Comments: IEEE Transactions on Smart Grid

  45. arXiv:2404.18705  [pdf, other

    cs.IT eess.SP

    Wireless Information and Energy Transfer in the Era of 6G Communications

    Authors: Constantinos Psomas, Konstantinos Ntougias, Nikita Shanin, Dongfang Xu, Kenneth MacSporran Mayer, Nguyen Minh Tran, Laura Cottatellucci, Kae Won Choi, Dong In Kim, Robert Schober, Ioannis Krikidis

    Abstract: Wireless information and energy transfer (WIET) represents an emerging paradigm which employs controllable transmission of radio-frequency signals for the dual purpose of data communication and wireless charging. As such, WIET is widely regarded as an enabler of envisioned 6G use cases that rely on energy-sustainable Internet-of-Things (IoT) networks, such as smart cities and smart grids. Meeting… ▽ More

    Submitted 16 May, 2024; v1 submitted 29 April, 2024; originally announced April 2024.

    Comments: Proceedings of the IEEE, 36 pages, 33 figures

  46. arXiv:2404.11429  [pdf, other

    cs.CV

    CarcassFormer: An End-to-end Transformer-based Framework for Simultaneous Localization, Segmentation and Classification of Poultry Carcass Defect

    Authors: Minh Tran, Sang Truong, Arthur F. A. Fernandes, Michael T. Kidd, Ngan Le

    Abstract: In the food industry, assessing the quality of poultry carcasses during processing is a crucial step. This study proposes an effective approach for automating the assessment of carcass quality without requiring skilled labor or inspector involvement. The proposed system is based on machine learning (ML) and computer vision (CV) techniques, enabling automated defect detection and carcass quality as… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

    Comments: Accepted to Poultry Science Journal

  47. arXiv:2404.08590  [pdf, other

    cs.CV cs.AI

    Vision-Aware Text Features in Referring Image Segmentation: From Object Understanding to Context Understanding

    Authors: Hai Nguyen-Truong, E-Ro Nguyen, Tuan-Anh Vu, Minh-Triet Tran, Binh-Son Hua, Sai-Kit Yeung

    Abstract: Referring image segmentation is a challenging task that involves generating pixel-wise segmentation masks based on natural language descriptions. The complexity of this task increases with the intricacy of the sentences provided. Existing methods have relied mostly on visual features to generate the segmentation masks while treating text features as supporting components. However, this under-utili… ▽ More

    Submitted 4 November, 2024; v1 submitted 12 April, 2024; originally announced April 2024.

    Comments: This paper is accepted in WACV 2025

  48. arXiv:2404.04564  [pdf, other

    cs.CV cs.AI

    Enhancing Video Summarization with Context Awareness

    Authors: Hai-Dang Huynh-Lam, Ngoc-Phuong Ho-Thi, Minh-Triet Tran, Trung-Nghia Le

    Abstract: Video summarization is a crucial research area that aims to efficiently browse and retrieve relevant information from the vast amount of video content available today. With the exponential growth of multimedia data, the ability to extract meaningful representations from videos has become essential. Video summarization techniques automatically generate concise summaries by selecting keyframes, shot… ▽ More

    Submitted 6 April, 2024; originally announced April 2024.

    Comments: 115 pages, 1 supplementary paper, undergraduate thesis report at US-VNUHCM

  49. Cluster-based Video Summarization with Temporal Context Awareness

    Authors: Hai-Dang Huynh-Lam, Ngoc-Phuong Ho-Thi, Minh-Triet Tran, Trung-Nghia Le

    Abstract: In this paper, we present TAC-SUM, a novel and efficient training-free approach for video summarization that addresses the limitations of existing cluster-based models by incorporating temporal context. Our method partitions the input video into temporally consecutive segments with clustering information, enabling the injection of temporal awareness into the clustering process, setting it apart fr… ▽ More

    Submitted 6 April, 2024; originally announced April 2024.

    Comments: 14 pages, 6 figures, accepted in PSIVT 2023

  50. Ensemble Learning for Vietnamese Scene Text Spotting in Urban Environments

    Authors: Hieu Nguyen, Cong-Hoang Ta, Phuong-Thuy Le-Nguyen, Minh-Triet Tran, Trung-Nghia Le

    Abstract: This paper presents a simple yet efficient ensemble learning framework for Vietnamese scene text spotting. Leveraging the power of ensemble learning, which combines multiple models to yield more accurate predictions, our approach aims to significantly enhance the performance of scene text spotting in challenging urban settings. Through experimental evaluations on the VinText dataset, our proposed… ▽ More

    Submitted 31 March, 2024; originally announced April 2024.

    Comments: RIVF 2023

    Journal ref: In 2023 RIVF International Conference on Computing and Communication Technologies (RIVF) (pp. 177-182). IEEE