-
Towards Better Disentanglement in Non-Autoregressive Zero-Shot Expressive Voice Conversion
Authors:
Seymanur Akti,
Tuan Nam Nguyen,
Alexander Waibel
Abstract:
Expressive voice conversion aims to transfer both speaker identity and expressive attributes from a target speech to a given source speech. In this work, we improve over a self-supervised, non-autoregressive framework with a conditional variational autoencoder, focusing on reducing source timbre leakage and improving linguistic-acoustic disentanglement for better style transfer. To minimize style…
▽ More
Expressive voice conversion aims to transfer both speaker identity and expressive attributes from a target speech to a given source speech. In this work, we improve over a self-supervised, non-autoregressive framework with a conditional variational autoencoder, focusing on reducing source timbre leakage and improving linguistic-acoustic disentanglement for better style transfer. To minimize style leakage, we use multilingual discrete speech units for content representation and reinforce embeddings with augmentation-based similarity loss and mix-style layer normalization. To enhance expressivity transfer, we incorporate local F0 information via cross-attention and extract style embeddings enriched with global pitch and energy features. Experiments show our model outperforms baselines in emotion and speaker similarity, demonstrating superior style adaptation and reduced source style leakage.
△ Less
Submitted 4 June, 2025;
originally announced June 2025.
-
KIT's Low-resource Speech Translation Systems for IWSLT2025: System Enhancement with Synthetic Data and Model Regularization
Authors:
Zhaolin Li,
Yining Liu,
Danni Liu,
Tuan Nam Nguyen,
Enes Yavuz Ugan,
Tu Anh Dinh,
Carlos Mullov,
Alexander Waibel,
Jan Niehues
Abstract:
This paper presents KIT's submissions to the IWSLT 2025 low-resource track. We develop both cascaded systems, consisting of Automatic Speech Recognition (ASR) and Machine Translation (MT) models, and end-to-end (E2E) Speech Translation (ST) systems for three language pairs: Bemba, North Levantine Arabic, and Tunisian Arabic into English. Building upon pre-trained models, we fine-tune our systems w…
▽ More
This paper presents KIT's submissions to the IWSLT 2025 low-resource track. We develop both cascaded systems, consisting of Automatic Speech Recognition (ASR) and Machine Translation (MT) models, and end-to-end (E2E) Speech Translation (ST) systems for three language pairs: Bemba, North Levantine Arabic, and Tunisian Arabic into English. Building upon pre-trained models, we fine-tune our systems with different strategies to utilize resources efficiently. This study further explores system enhancement with synthetic data and model regularization. Specifically, we investigate MT-augmented ST by generating translations from ASR data using MT models. For North Levantine, which lacks parallel ST training data, a system trained solely on synthetic data slightly surpasses the cascaded system trained on real data. We also explore augmentation using text-to-speech models by generating synthetic speech from MT data, demonstrating the benefits of synthetic data in improving both ASR and ST performance for Bemba. Additionally, we apply intra-distillation to enhance model performance. Our experiments show that this approach consistently improves results across ASR, MT, and ST tasks, as well as across different pre-trained models. Finally, we apply Minimum Bayes Risk decoding to combine the cascaded and end-to-end systems, achieving an improvement of approximately 1.5 BLEU points.
△ Less
Submitted 26 May, 2025;
originally announced May 2025.
-
LEANCODE: Understanding Models Better for Code Simplification of Pre-trained Large Language Models
Authors:
Yan Wang,
Ling Ding,
Tien N Nguyen,
Shaohua Wang,
Yanan Zheng
Abstract:
Large Language Models for code often entail significant computational complexity, which grows significantly with the length of the input code sequence. We propose LeanCode for code simplification to reduce training and prediction time, leveraging code contexts in utilizing attention scores to represent the tokens' importance. We advocate for the selective removal of tokens based on the average con…
▽ More
Large Language Models for code often entail significant computational complexity, which grows significantly with the length of the input code sequence. We propose LeanCode for code simplification to reduce training and prediction time, leveraging code contexts in utilizing attention scores to represent the tokens' importance. We advocate for the selective removal of tokens based on the average context-aware attention scores rather than average scores across all inputs. LeanCode uses the attention scores of `CLS' tokens within the encoder for classification tasks, such as code search. It also employs the encoder-decoder attention scores to determine token significance for sequence-to-sequence tasks like code summarization. Our evaluation shows LeanCode's superiority over the SOTAs DietCode and Slimcode, with improvements of 60% and 16% for code search, and 29% and 27% for code summarization, respectively.
△ Less
Submitted 8 June, 2025; v1 submitted 20 May, 2025;
originally announced May 2025.
-
SmallPlan: Leverage Small Language Models for Sequential Path Planning with Simulation-Powered, LLM-Guided Distillation
Authors:
Quang P. M. Pham,
Khoi T. N. Nguyen,
Nhi H. Doan,
Cuong A. Pham,
Kentaro Inui,
Dezhen Song
Abstract:
Efficient path planning in robotics, particularly within large-scale, dynamic environments, remains a significant hurdle. While Large Language Models (LLMs) offer strong reasoning capabilities, their high computational cost and limited adaptability in dynamic scenarios hinder real-time deployment on edge devices. We present SmallPlan -- a novel framework leveraging LLMs as teacher models to train…
▽ More
Efficient path planning in robotics, particularly within large-scale, dynamic environments, remains a significant hurdle. While Large Language Models (LLMs) offer strong reasoning capabilities, their high computational cost and limited adaptability in dynamic scenarios hinder real-time deployment on edge devices. We present SmallPlan -- a novel framework leveraging LLMs as teacher models to train lightweight Small Language Models (SLMs) for high-level path planning tasks. In SmallPlan, the SLMs provide optimal action sequences to navigate across scene graphs that compactly represent full-scaled 3D scenes. The SLMs are trained in a simulation-powered, interleaved manner with LLM-guided supervised fine-tuning (SFT) and reinforcement learning (RL). This strategy not only enables SLMs to successfully complete navigation tasks but also makes them aware of important factors like travel distance and number of trials. Through experiments, we demonstrate that the fine-tuned SLMs perform competitively with larger models like GPT-4o on sequential path planning, without suffering from hallucination and overfitting. SmallPlan is resource-efficient, making it well-suited for edge-device deployment and advancing practical autonomous robotics. Our source code is available here: https://github.com/quangpham2006/SmallPlan
△ Less
Submitted 11 May, 2025; v1 submitted 1 May, 2025;
originally announced May 2025.
-
A tissue-informed deep learning-based method for positron range correction in preclinical 68Ga PET imaging
Authors:
Nerea Encina-Baranda,
Robert J. Paneque-Yunta,
Javier Lopez-Rodriguez,
Edwin C. Pratt,
Trong Nghia Nguyen,
Jan Grimm,
Alejandro Lopez-Montes,
Joaquin L. Herraiz
Abstract:
Positron range (PR) limits spatial resolution and quantitative accuracy in PET imaging, particularly for high-energy positron-emitting radionuclides like 68Ga. We propose a deep learning method using 3D residual encoder-decoder convolutional neural networks (3D RED-CNNs), incorporating tissue-dependent anatomical information through a u-map-dependent loss function. Models were trained with realist…
▽ More
Positron range (PR) limits spatial resolution and quantitative accuracy in PET imaging, particularly for high-energy positron-emitting radionuclides like 68Ga. We propose a deep learning method using 3D residual encoder-decoder convolutional neural networks (3D RED-CNNs), incorporating tissue-dependent anatomical information through a u-map-dependent loss function. Models were trained with realistic simulations and, using initial PET and CT data, generated positron range corrected images. We validated the models in simulations and real acquisitions. Three 3D RED-CNN architectures, Single-channel, Two-channel, and DualEncoder, were trained on simulated PET datasets and evaluated on synthetic and real PET acquisitions from 68Ga-FH and 68Ga-PSMA-617 mouse studies. Performance was compared to a standard Richardson-Lucy-based positron range correction (RL-PRC) method using metrics such as mean absolute error (MAE), structural similarity index (SSIM), contrast recovery (CR), and contrast-to-noise ratio (CNR). CNN-based methods achieved up to 19 percent SSIM improvement and 13 percent MAE reduction compared to RL-PRC. The Two-Channel model achieved the highest CR and CNR, recovering lung activity with 97 percent agreement to ground truth versus 77 percent for RL-PRC. Noise levels remained stable for CNN models (approximately 5.9 percent), while RL-PRC increased noise by 5.8 percent. In preclinical acquisitions, the Two-Channel model achieved the highest CNR across tissues while maintaining the lowest noise level (9.6 percent). Although no ground truth was available for real data, tumor delineation and spillover artifacts improved with the Two-Channel model. These findings highlight the potential of CNN-based PRC to enhance quantitative PET imaging, particularly for 68Ga. Future work will improve model generalization through domain adaptation and hybrid training strategies.
△ Less
Submitted 27 April, 2025;
originally announced April 2025.
-
Combining Static and Dynamic Approaches for Mining and Testing Constraints for RESTful API Testing
Authors:
Hieu Huynh,
Tri Le,
Vu Nguyen,
Tien N. Nguyen
Abstract:
In API testing, deriving logical constraints on API response bodies is crucial in generating the test cases to cover various aspects of RESTful APIs. However, existing approaches are limited to dynamic analysis in which constraints are extracted from the execution of APIs as part of the system under test. The key limitation of such a dynamic approach is its under-estimation in which inputs in API…
▽ More
In API testing, deriving logical constraints on API response bodies is crucial in generating the test cases to cover various aspects of RESTful APIs. However, existing approaches are limited to dynamic analysis in which constraints are extracted from the execution of APIs as part of the system under test. The key limitation of such a dynamic approach is its under-estimation in which inputs in API executions are not sufficiently diverse to uncover actual constraints on API response bodies. In this paper, we propose to combine a novel static analysis approach (in which the constraints for API response bodies are mined from API specifications), with the dynamic approach (which relies on API execution data). We leverage large language models (LLMs) to comprehend the API specifications, mine constraints for response bodies, and generate test cases. To reduce LLMs' hallucination, we apply an Observation-Confirmation (OC) scheme which uses initial prompts to contextualize constraints. %, allowing subsequent prompts to more accurately confirm their presence. Our empirical results show that~LLMs with OC prompting achieve high precision in constraint mining with the average of 91.2%. When combining static and dynamic analysis, our tool, RBCTest , achieves a precision of 78.5%. RBCTest detects 107 constraints that the dynamic approach misses and 46 more precise constraints. We also use its generated test cases to detect 21 mismatches between the API specification and actual response data for 8 real-world APIs. Four of the mismatches were, in fact, reported in developers' forums.
△ Less
Submitted 24 April, 2025;
originally announced April 2025.
-
The extended adjoint state and nonlinearity in correlation-based passive imaging
Authors:
Tram Thi Ngoc Nguyen
Abstract:
This articles investigates physics-based passive imaging problem, wherein one infers an unknown medium using ambient noise and correlation of the noise signal. We develop a general backpropagation framework via the so-called extended adjoint state, suitable for any linear PDE; crucially, this approach reduces by half the number of required PDE solves. Applications to several different PDE models d…
▽ More
This articles investigates physics-based passive imaging problem, wherein one infers an unknown medium using ambient noise and correlation of the noise signal. We develop a general backpropagation framework via the so-called extended adjoint state, suitable for any linear PDE; crucially, this approach reduces by half the number of required PDE solves. Applications to several different PDE models demonstrate the universality of our method. In addition, we analyze the nonlinearity of the correlated model, revealing a surprising tangential cone condition-like structure, thereby advancing the state of the art towards a convergence guarantee for regularized reconstruction in passive imaging.
△ Less
Submitted 23 April, 2025;
originally announced April 2025.
-
Towards Test Generation from Task Description for Mobile Testing with Multi-modal Reasoning
Authors:
Hieu Huynh,
Hai Phung,
Hao Pham,
Tien N. Nguyen,
Vu Nguyen
Abstract:
In Android GUI testing, generating an action sequence for a task that can be replayed as a test script is common. Generating sequences of actions and respective test scripts from task goals described in natural language can eliminate the need for manually writing test scripts. However, existing approaches based on large language models (LLM) often struggle with identifying the final action, and ei…
▽ More
In Android GUI testing, generating an action sequence for a task that can be replayed as a test script is common. Generating sequences of actions and respective test scripts from task goals described in natural language can eliminate the need for manually writing test scripts. However, existing approaches based on large language models (LLM) often struggle with identifying the final action, and either end prematurely or continue past the final screen. In this paper, we introduce VisiDroid, a multi-modal, LLM-based, multi-agent framework that iteratively determines the next action and leverages visual images of screens to detect the task's completeness. The multi-modal approach enhances our model in two significant ways. First, this approach enables it to avoid prematurely terminating a task when textual content alone provides misleading indications of task completion. Additionally, visual input helps the tool avoid errors when changes in the GUI do not directly affect functionality toward task completion, such as adjustments to font sizes or colors. Second, the multi-modal approach also ensures the tool not progress beyond the final screen, which might lack explicit textual indicators of task completion but could display a visual element indicating task completion, which is common in GUI apps. Our evaluation shows that VisiDroid achieves an accuracy of 87.3%, outperforming the best baseline relatively by 23.5%. We also demonstrate that our multi-modal framework with images and texts enables the LLM to better determine when a task is completed.
△ Less
Submitted 22 April, 2025;
originally announced April 2025.
-
SWE-Synth: Synthesizing Verifiable Bug-Fix Data to Enable Large Language Models in Resolving Real-World Bugs
Authors:
Minh V. T. Pham,
Huy N. Phan,
Hoang N. Phan,
Cuong Le Chi,
Tien N. Nguyen,
Nghi D. Q. Bui
Abstract:
Large language models (LLMs) are transforming automated program repair (APR) through agent-based approaches that localize bugs, generate patches, and verify fixes. However, the lack of high-quality, scalable training datasets, especially those with verifiable outputs and intermediate reasoning traces-limits progress, particularly for open-source models. In this work, we present SWE-Synth, a framew…
▽ More
Large language models (LLMs) are transforming automated program repair (APR) through agent-based approaches that localize bugs, generate patches, and verify fixes. However, the lack of high-quality, scalable training datasets, especially those with verifiable outputs and intermediate reasoning traces-limits progress, particularly for open-source models. In this work, we present SWE-Synth, a framework for synthesizing realistic, verifiable, and process-aware bug-fix datasets at the repository level. SWE-Synth leverages LLM agents to simulate debugging workflows, producing not only bug-fix pairs but also test cases and structured repair trajectories. Compared to manually curated datasets, our method scales with minimal human effort while preserving contextual richness and correctness. Experiments show that models trained on SWE-Synth outperform those trained on real-world datasets by 2.3% on SWE-Bench Lite. Our results highlight the potential of synthetic, agent-generated data to advance the state of the art in APR and software engineering automation.
△ Less
Submitted 20 April, 2025;
originally announced April 2025.
-
Toward Generation of Test Cases from Task Descriptions via History-aware Planning
Authors:
Duy Cao,
Phu Nguyen,
Vy Le,
Tien N. Nguyen,
Vu Nguyen
Abstract:
In automated web testing, generating test scripts from natural language task descriptions is crucial for enhancing the test generation process. This activity involves creating the correct sequences of actions to form test scripts for future testing activities. Current state-of-the-art approaches are limited in generating these action sequences, as they either demand substantial manual effort for h…
▽ More
In automated web testing, generating test scripts from natural language task descriptions is crucial for enhancing the test generation process. This activity involves creating the correct sequences of actions to form test scripts for future testing activities. Current state-of-the-art approaches are limited in generating these action sequences, as they either demand substantial manual effort for human demonstrations or fail to consider the history of previous web content and actions to decide the next action. In this paper, we introduce HxAgent, an iterative large language model agent planning approach that determines the next action based on: 1) observations of the current contents and feasible actions, 2) short-term memory of previous web states and actions, and 3) long-term experience with (in)correct action sequences. The agent generates a sequence of actions to perform a given task, which is effectively an automated test case to verify the task. We conducted an extensive empirical evaluation of HxAgent using two datasets. On the MiniWoB++ dataset, our approach achieves 97% exact-match accuracy that is comparable to the best baselines while eliminating the need for human demonstrations required by those methods. For complex tasks requiring navigation through multiple actions and screens, HxAgent achieves an average 82% exact-match. On the second dataset, comprising 350 task instances across seven popular websites, including YouTube, LinkedIn, Facebook, and Google, HxAgent achieves high performance, with 87% of the action sequences exactly matching the ground truth and a prefix-match of 93%, outperforming the baseline by 59%.
△ Less
Submitted 19 April, 2025;
originally announced April 2025.
-
Demo: ViolentUTF as An Accessible Platform for Generative AI Red Teaming
Authors:
Tam n. Nguyen
Abstract:
The rapid integration of Generative AI (GenAI) into various applications necessitates robust risk management strategies which includes Red Teaming (RT) - an evaluation method for simulating adversarial attacks. Unfortunately, RT for GenAI is often hindered by technical complexity, lack of user-friendly interfaces, and inadequate reporting features. This paper introduces Violent UTF - an accessible…
▽ More
The rapid integration of Generative AI (GenAI) into various applications necessitates robust risk management strategies which includes Red Teaming (RT) - an evaluation method for simulating adversarial attacks. Unfortunately, RT for GenAI is often hindered by technical complexity, lack of user-friendly interfaces, and inadequate reporting features. This paper introduces Violent UTF - an accessible, modular, and scalable platform for GenAI red teaming. Through intuitive interfaces (Web GUI, CLI, API, MCP) powered by LLMs and for LLMs, Violent UTF aims to empower non-technical domain experts and students alongside technical experts, facilitate comprehensive security evaluation by unifying capabilities from RT frameworks like Microsoft PyRIT, Nvidia Garak and its own specialized evaluators. ViolentUTF is being used for evaluating the robustness of a flagship LLM-based product in a large US Government department. It also demonstrates effectiveness in evaluating LLMs' cross-domain reasoning capability between cybersecurity and behavioral psychology.
△ Less
Submitted 29 April, 2025; v1 submitted 14 April, 2025;
originally announced April 2025.
-
When Less Is More: A Sparse Facial Motion Structure For Listening Motion Learning
Authors:
Tri Tung Nguyen Nguyen,
Quang Tien Dam,
Dinh Tuan Tran,
Joo-Ho Lee
Abstract:
Effective human behavior modeling is critical for successful human-robot interaction. Current state-of-the-art approaches for predicting listening head behavior during dyadic conversations employ continuous-to-discrete representations, where continuous facial motion sequence is converted into discrete latent tokens. However, non-verbal facial motion presents unique challenges owing to its temporal…
▽ More
Effective human behavior modeling is critical for successful human-robot interaction. Current state-of-the-art approaches for predicting listening head behavior during dyadic conversations employ continuous-to-discrete representations, where continuous facial motion sequence is converted into discrete latent tokens. However, non-verbal facial motion presents unique challenges owing to its temporal variance and multi-modal nature. State-of-the-art discrete motion token representation struggles to capture underlying non-verbal facial patterns making training the listening head inefficient with low-fidelity generated motion. This study proposes a novel method for representing and predicting non-verbal facial motion by encoding long sequences into a sparse sequence of keyframes and transition frames. By identifying crucial motion steps and interpolating intermediate frames, our method preserves the temporal structure of motion while enhancing instance-wise diversity during the learning process. Additionally, we apply this novel sparse representation to the task of listening head prediction, demonstrating its contribution to improving the explanation of facial motion patterns.
△ Less
Submitted 8 April, 2025;
originally announced April 2025.
-
SEA-LION: Southeast Asian Languages in One Network
Authors:
Raymond Ng,
Thanh Ngan Nguyen,
Yuli Huang,
Ngee Chia Tai,
Wai Yi Leong,
Wei Qi Leong,
Xianbin Yong,
Jian Gang Ngui,
Yosephine Susanto,
Nicholas Cheng,
Hamsawardhini Rengarajan,
Peerat Limkonchotiwat,
Adithya Venkatadri Hulagadri,
Kok Wai Teng,
Yeo Yeow Tong,
Bryan Siow,
Wei Yi Teo,
Wayne Lau,
Choon Meng Tan,
Brandon Ong,
Zhi Hao Ong,
Jann Railey Montalan,
Adwin Chan,
Sajeban Antonyrex,
Ren Lee
, et al. (6 additional authors not shown)
Abstract:
Recently, Large Language Models (LLMs) have dominated much of the artificial intelligence scene with their ability to process and generate natural languages. However, the majority of LLM research and development remains English-centric, leaving low-resource languages such as those in the Southeast Asian (SEA) region under-represented. To address this representation gap, we introduce Llama-SEA-LION…
▽ More
Recently, Large Language Models (LLMs) have dominated much of the artificial intelligence scene with their ability to process and generate natural languages. However, the majority of LLM research and development remains English-centric, leaving low-resource languages such as those in the Southeast Asian (SEA) region under-represented. To address this representation gap, we introduce Llama-SEA-LION-v3-8B-IT and Gemma-SEA-LION-v3-9B-IT, two cutting-edge multilingual LLMs designed for SEA languages. The SEA-LION family of LLMs supports 11 SEA languages, namely English, Chinese, Indonesian, Vietnamese, Malay, Thai, Burmese, Lao, Filipino, Tamil, and Khmer. Our work leverages large-scale multilingual continued pre-training with a comprehensive post-training regime involving multiple stages of instruction fine-tuning, alignment, and model merging. Evaluation results on multilingual benchmarks indicate that our models achieve state-of-the-art performance across LLMs supporting SEA languages. We open-source the models to benefit the wider SEA community.
△ Less
Submitted 15 April, 2025; v1 submitted 8 April, 2025;
originally announced April 2025.
-
Soft X-ray high-harmonic generation in an anti-resonant hollow core fiber driven by a 3 $μ$m ultrafast laser
Authors:
Drew Morrill,
Will Hettel,
Daniel Carlson,
Benjamin Shearer,
Clay Klein,
Jeremy Thurston,
Grzegorz Golba,
Rae Larsen,
Gabriella Seifert,
James Uhrich,
Daniel Lesko,
Tin Nghia Nguyen,
Gunnar Arisholm,
Jonathan Knight,
Scott Diddams,
Margaret Murnane,
Henry Kapteyn,
Michaƫl Hemmer
Abstract:
High-harmonic upconversion driven by a mid-infrared femtosecond laser can generate coherent soft X-ray beams in a tabletop-scale setup. Here, we report on a compact ytterbium-pumped optical parametric chirped pulse amplifier (OPCPA) laser system seeded by an all-fiber front-end and employing periodically-poled lithium niobate (PPLN) nonlinear media operated near the pulse fluence limits of current…
▽ More
High-harmonic upconversion driven by a mid-infrared femtosecond laser can generate coherent soft X-ray beams in a tabletop-scale setup. Here, we report on a compact ytterbium-pumped optical parametric chirped pulse amplifier (OPCPA) laser system seeded by an all-fiber front-end and employing periodically-poled lithium niobate (PPLN) nonlinear media operated near the pulse fluence limits of current commercially available PPLN crystals. The OPCPA delivers 3 $μ$m wavelength pulses with 775 $μ$J energy at 1 kHz repetition rate, with transform-limited 120 fs pulse duration, diffraction-limited beam quality, and ultrahigh 0.33% rms energy stability over >18 hours. Using this laser, we generate soft X-ray high harmonics (HHG) in argon gas by focusing into a low-loss, high-pressure gas-filled anti-resonant hollow core fiber (ARHCF), generating coherent light at photon energies up to the argon L-edge (250 eV) and carbon K-edge (284 eV), with high beam quality and ~1% rms energy stability. This work demonstrates soft X-ray HHG in a high-efficiency guided-wave phase matched geometry, overcoming the high losses inherent to mid-IR propagation in unstructured waveguides, or the short interaction lengths of gas cells or jets. The ARHCF can operate long term without damage, and with the repetition rate, stability and robustness required for demanding applications in spectro-microscopy and imaging. Finally, we discuss routes for maximizing the soft X-ray HHG flux by driving He at higher laser intensities using either 1.5 $μ$m or 3 $μ$m - the signal and idler wavelengths of the laser.
△ Less
Submitted 1 April, 2025;
originally announced April 2025.
-
Leveraging LLMs, IDEs, and Semantic Embeddings for Automated Move Method Refactoring
Authors:
Fraol Batole,
Abhiram Bellur,
Malinda Dilhara,
Mohammed Raihan Ullah,
Yaroslav Zharov,
Timofey Bryksin,
Kai Ishikawa,
Haifeng Chen,
Masaharu Morimoto,
Shota Motoura,
Takeo Hosomi,
Tien N. Nguyen,
Hridesh Rajan,
Nikolaos Tsantalis,
Danny Dig
Abstract:
MOVEMETHOD is a hallmark refactoring. Despite a plethora of research tools that recommend which methods to move and where, these recommendations do not align with how expert developers perform MOVEMETHOD. Given the extensive training of Large Language Models and their reliance upon naturalness of code, they should expertly recommend which methods are misplaced in a given class and which classes ar…
▽ More
MOVEMETHOD is a hallmark refactoring. Despite a plethora of research tools that recommend which methods to move and where, these recommendations do not align with how expert developers perform MOVEMETHOD. Given the extensive training of Large Language Models and their reliance upon naturalness of code, they should expertly recommend which methods are misplaced in a given class and which classes are better hosts. Our formative study of 2016 LLM recommendations revealed that LLMs give expert suggestions, yet they are unreliable: up to 80% of the suggestions are hallucinations. We introduce the first LLM fully powered assistant for MOVEMETHOD refactoring that automates its whole end-to-end lifecycle, from recommendation to execution. We designed novel solutions that automatically filter LLM hallucinations using static analysis from IDEs and a novel workflow that requires LLMs to be self-consistent, critique, and rank refactoring suggestions. As MOVEMETHOD refactoring requires global, projectlevel reasoning, we solved the limited context size of LLMs by employing refactoring-aware retrieval augment generation (RAG). Our approach, MM-assist, synergistically combines the strengths of the LLM, IDE, static analysis, and semantic relevance. In our thorough, multi-methodology empirical evaluation, we compare MM-assist with the previous state-of-the-art approaches. MM-assist significantly outperforms them: (i) on a benchmark widely used by other researchers, our Recall@1 and Recall@3 show a 1.7x improvement; (ii) on a corpus of 210 recent refactorings from Open-source software, our Recall rates improve by at least 2.4x. Lastly, we conducted a user study with 30 experienced participants who used MM-assist to refactor their own code for one week. They rated 82.8% of MM-assist recommendations positively. This shows that MM-assist is both effective and useful.
△ Less
Submitted 26 March, 2025;
originally announced March 2025.
-
Generalization performance of neural mapping schemes for the space-time interpolation of satellite-derived ocean colour datasets
Authors:
Thi Thuy Nga Nguyen,
ClƩment Dorffer,
FrƩdƩric Jourdin,
Ronan Fablet
Abstract:
Neural mapping schemes have become appealing approaches to deliver gap-free satellite-derived products for sea surface tracers. The generalization performance of these learning-based approaches naturally arises as a key challenge. This is particularly true for satellite-derived ocean colour products given the variety of bio-optical variables of interest, as well as the diversity of processes and s…
▽ More
Neural mapping schemes have become appealing approaches to deliver gap-free satellite-derived products for sea surface tracers. The generalization performance of these learning-based approaches naturally arises as a key challenge. This is particularly true for satellite-derived ocean colour products given the variety of bio-optical variables of interest, as well as the diversity of processes and scales involved. Considering region-specific and parameter-specific neural mapping schemes will result in substantial training costs. This study addresses generalization performance of neural mapping schemes to deliver gap-free satellite-derived ocean colour products. We develop a comprehensive experimental framework using real multi-sensor ocean colour datasets for two regions (the Mediterranean Sea and the North Sea) and a representative set of bio-optical parameters (Chlorophyll-a concentration, suspended particulate matter concentration, particulate backscattering coefficient). We consider several neural mapping schemes, and we report excellent generalization performance across regions and bio-optical parameters without any fine-tuning using appropriate dataset-specific normalization procedures. We discuss further how these results provide new insights towards the large-scale deployment of neural schemes for the processing of satellite-derived ocean colour datasets beyond case-study-specific demonstrations.
△ Less
Submitted 14 March, 2025;
originally announced March 2025.
-
Observation-only learning of neural mapping schemes for gappy satellite-derived ocean colour parameters
Authors:
ClƩment Dorffer,
FrƩdƩric Jourdin,
Thi Thuy Nga Nguyen,
Rodolphe Devillers,
David Mouillot,
Ronan Fablet
Abstract:
Monitoring optical properties of coastal and open ocean waters is crucial to assessing the health of marine ecosystems. Deep learning offers a promising approach to address these ecosystem dynamics, especially in scenarios where gap-free ground-truth data is lacking, which poses a challenge for designing effective training frameworks. Using an advanced neural variational data assimilation scheme (…
▽ More
Monitoring optical properties of coastal and open ocean waters is crucial to assessing the health of marine ecosystems. Deep learning offers a promising approach to address these ecosystem dynamics, especially in scenarios where gap-free ground-truth data is lacking, which poses a challenge for designing effective training frameworks. Using an advanced neural variational data assimilation scheme (called 4DVarNet), we introduce a comprehensive training framework designed to effectively train directly on gappy data sets. Using the Mediterranean Sea as a case study, our experiments not only highlight the high performance of the chosen neural network in reconstructing gap-free images from gappy datasets but also demonstrate its superior performance over state-of-the-art algorithms such as DInEOF and Direct Inversion, whether using CNN or UNet architectures.
△ Less
Submitted 14 March, 2025;
originally announced March 2025.
-
Interactive Holographic Visualization for 3D Facial Avatar
Authors:
Tri Tung Nguyen Nguyen,
Fujii Yasuyuki,
Dinh Tuan Tran,
Joo-Ho Lee
Abstract:
Traditional methods for visualizing dynamic human expressions, particularly in medical training, often rely on flat-screen displays or static mannequins, which have proven inefficient for realistic simulation. In response, we propose a platform that leverages a 3D interactive facial avatar capable of displaying non-verbal feedback, including pain signals. This avatar is projected onto a stereoscop…
▽ More
Traditional methods for visualizing dynamic human expressions, particularly in medical training, often rely on flat-screen displays or static mannequins, which have proven inefficient for realistic simulation. In response, we propose a platform that leverages a 3D interactive facial avatar capable of displaying non-verbal feedback, including pain signals. This avatar is projected onto a stereoscopic, view-dependent 3D display, offering a more immersive and realistic simulated patient experience for pain assessment practice. However, there is no existing solution that dynamically predicts and projects interactive 3D facial avatars in real-time. To overcome this, we emphasize the need for a 3D display projection system that can project the facial avatar holographically, allowing users to interact with the avatar from any viewpoint. By incorporating 3D Gaussian Splatting (3DGS) and real-time view-dependent calibration, we significantly improve the training environment for accurate pain recognition and assessment.
△ Less
Submitted 11 February, 2025;
originally announced February 2025.
-
Fast Beam Placement for Ultra-Dense LEO Networks
Authors:
Trinh Van Chien,
Nguyen Minh Quan,
Tri Nhu Do,
Cuong Le,
Tan N. Nguyen,
Symeon Chatzinotas
Abstract:
Low Earth orbit (LEO) satellites has brought about significant improvements in wireless communications, characterized by low latency and reduced transmission loss compared to geostationary orbit (GSO) satellites. Ultra-dense LEO satellites can serve many users by generating active beams effective to their locations. The beam placement problem is challenging but important for efficiently allocating…
▽ More
Low Earth orbit (LEO) satellites has brought about significant improvements in wireless communications, characterized by low latency and reduced transmission loss compared to geostationary orbit (GSO) satellites. Ultra-dense LEO satellites can serve many users by generating active beams effective to their locations. The beam placement problem is challenging but important for efficiently allocating resources with a large number of users. This paper formulates and solves a fast beam placement optimization problem for ultra-dense satellite systems to enhance the link budget with a minimum number of active beams (NABs). To achieve this goal and balance load among beams within polynomial time, we propose two algorithms for large user groups exploiting the modified K-means clustering and the graph theory. Numerical results illustrate the effectiveness of the proposals in terms of the statistical channel gain-to-noise ratio and computation time over state-of-the-art benchmarks.
△ Less
Submitted 11 December, 2024;
originally announced December 2024.
-
Traction force microscopy for linear and nonlinear elastic materials as a parameter identification inverse problem
Authors:
Gesa Sarnighausen,
Tram Thi Ngoc Nguyen,
Thorsten Hohage,
Mangalika Sinha,
Sarah Koester,
Timo Betz,
Ulrich Sebastian Schwarz,
Anne Wald
Abstract:
Traction force microscopy is a method widely used in biophysics and cell biology to determine forces that biological cells apply to their environment. In the experiment, the cells adhere to a soft elastic substrate, which is then deformed in response to cellular traction forces. The inverse problem consists in computing the traction stress applied by the cell from microscopy measurements of the su…
▽ More
Traction force microscopy is a method widely used in biophysics and cell biology to determine forces that biological cells apply to their environment. In the experiment, the cells adhere to a soft elastic substrate, which is then deformed in response to cellular traction forces. The inverse problem consists in computing the traction stress applied by the cell from microscopy measurements of the substrate deformations. In this work, we consider a linear model, in which 3D forces are applied at a 2D interface, called 2.5D traction force microscopy, and a nonlinear pure 2D model, from which we directly obtain a linear pure 2D model. All models lead to a linear resp. nonlinear parameter identification problem for a boundary value problem of elasticity. We analyze the respective forward operators and conclude with some numerical experiments for simulated and experimental data.
△ Less
Submitted 29 November, 2024;
originally announced November 2024.
-
TESGNN: Temporal Equivariant Scene Graph Neural Networks for Efficient and Robust Multi-View 3D Scene Understanding
Authors:
Quang P. M. Pham,
Khoi T. N. Nguyen,
Lan C. Ngo,
Truong Do,
Dezhen Song,
Truong-Son Hy
Abstract:
Scene graphs have proven to be highly effective for various scene understanding tasks due to their compact and explicit representation of relational information. However, current methods often overlook the critical importance of preserving symmetry when generating scene graphs from 3D point clouds, which can lead to reduced accuracy and robustness, particularly when dealing with noisy, multi-view…
▽ More
Scene graphs have proven to be highly effective for various scene understanding tasks due to their compact and explicit representation of relational information. However, current methods often overlook the critical importance of preserving symmetry when generating scene graphs from 3D point clouds, which can lead to reduced accuracy and robustness, particularly when dealing with noisy, multi-view data. Furthermore, a major limitation of prior approaches is the lack of temporal modeling to capture time-dependent relationships among dynamically evolving entities in a scene. To address these challenges, we propose Temporal Equivariant Scene Graph Neural Network (TESGNN), consisting of two key components: (1) an Equivariant Scene Graph Neural Network (ESGNN), which extracts information from 3D point clouds to generate scene graph while preserving crucial symmetry properties, and (2) a Temporal Graph Matching Network, which fuses scene graphs generated by ESGNN across multiple time sequences into a unified global representation using an approximate graph-matching algorithm. Our combined architecture TESGNN outperforms current state-of-the-art methods in scene graph generation, achieving higher accuracy and faster training convergence. Moreover, we show that leveraging the symmetry-preserving property produces a more stable and accurate global scene representation compared to existing approaches. Last but not least, it is computationally efficient and easily implementable using existing frameworks, making it well-suited for real-time applications in robotics and computer vision. This approach paves the way for more robust and scalable solutions to complex multi-view scene understanding challenges. Our source code is publicly available at: https://github.com/HySonLab/TESGraph
△ Less
Submitted 2 March, 2025; v1 submitted 15 November, 2024;
originally announced November 2024.
-
Fixing the Loose Brake: Exponential-Tailed Stopping Time in Best Arm Identification
Authors:
Kapilan Balagopalan,
Tuan Ngo Nguyen,
Yao Zhao,
Kwang-Sung Jun
Abstract:
The best arm identification problem requires identifying the best alternative (i.e., arm) in active experimentation using the smallest number of experiments (i.e., arm pulls), which is crucial for cost-efficient and timely decision-making processes. In the fixed confidence setting, an algorithm must stop data-dependently and return the estimated best arm with a correctness guarantee. Since this st…
▽ More
The best arm identification problem requires identifying the best alternative (i.e., arm) in active experimentation using the smallest number of experiments (i.e., arm pulls), which is crucial for cost-efficient and timely decision-making processes. In the fixed confidence setting, an algorithm must stop data-dependently and return the estimated best arm with a correctness guarantee. Since this stopping time is random, we desire its distribution to have light tails. Unfortunately, many existing studies focus on high probability or in expectation bounds on the stopping time, which allow heavy tails and, for high probability bounds, even not stopping at all. We first prove that this never-stopping event can indeed happen for some popular algorithms. Motivated by this, we propose algorithms that provably enjoy an exponential-tailed stopping time, which improves upon the polynomial tail bound reported by Kalyanakrishnan et al. (2012). The first algorithm is based on a fixed budget algorithm called Sequential Halving along with a doubling trick. The second algorithm is a meta algorithm that takes in any fixed confidence algorithm with a high probability stopping guarantee and turns it into one that enjoys an exponential-tailed stopping time. Our results imply that there is much more to be desired for contemporary fixed confidence algorithms.
△ Less
Submitted 4 November, 2024;
originally announced November 2024.
-
HAVER: Instance-Dependent Error Bounds for Maximum Mean Estimation and Applications to Q-Learning and Monte Carlo Tree Search
Authors:
Tuan Ngo Nguyen,
Jay Barrett,
Kwang-Sung Jun
Abstract:
We study the problem of estimating the \emph{value} of the largest mean among K distributions via samples from them (rather than estimating \emph{which} distribution has the largest mean), which arises from various machine learning tasks including Q-learning and Monte Carlo Tree Search (MCTS). While there have been a few proposed algorithms, their performance analyses have been limited to their bi…
▽ More
We study the problem of estimating the \emph{value} of the largest mean among K distributions via samples from them (rather than estimating \emph{which} distribution has the largest mean), which arises from various machine learning tasks including Q-learning and Monte Carlo Tree Search (MCTS). While there have been a few proposed algorithms, their performance analyses have been limited to their biases rather than a precise error metric. In this paper, we propose a novel algorithm called HAVER (Head AVERaging) and analyze its mean squared error. Our analysis reveals that HAVER has a compelling performance in two respects. First, HAVER estimates the maximum mean as well as the oracle who knows the identity of the best distribution and reports its sample mean. Second, perhaps surprisingly, HAVER exhibits even better rates than this oracle when there are many distributions near the best one. Both of these improvements are the first of their kind in the literature, and we also prove that the naive algorithm that reports the largest empirical mean does not achieve these bounds. Finally, we confirm our theoretical findings via numerical experiments where we implement HAVER in bandit, Q-learning, and MCTS algorithms. In these experiments, HAVER consistently outperforms the baseline methods, demonstrating its effectiveness across different applications.
△ Less
Submitted 28 April, 2025; v1 submitted 1 November, 2024;
originally announced November 2024.
-
VisualCoder: Guiding Large Language Models in Code Execution with Fine-grained Multimodal Chain-of-Thought Reasoning
Authors:
Cuong Chi Le,
Hoang-Chau Truong-Vinh,
Huy Nhat Phan,
Dung Duy Le,
Tien N. Nguyen,
Nghi D. Q. Bui
Abstract:
Predicting program behavior and reasoning about code execution remain significant challenges in software engineering, particularly for large language models (LLMs) designed for code analysis. While these models excel at understanding static syntax, they often struggle with dynamic reasoning tasks. We introduce VisualCoder, a simple yet effective approach that enhances code reasoning by integrating…
▽ More
Predicting program behavior and reasoning about code execution remain significant challenges in software engineering, particularly for large language models (LLMs) designed for code analysis. While these models excel at understanding static syntax, they often struggle with dynamic reasoning tasks. We introduce VisualCoder, a simple yet effective approach that enhances code reasoning by integrating multimodal Chain-of-Thought (CoT) reasoning with a visual Control Flow Graph (CFG). By aligning code snippets with their corresponding CFGs, VisualCoder provides deeper insights into execution flows. We address challenges in multimodal CoT integration through a reference mechanism, ensuring consistency between code and its execution path, thereby improving performance in program behavior prediction, error detection, and output generation.
△ Less
Submitted 9 February, 2025; v1 submitted 30 October, 2024;
originally announced October 2024.
-
Improving Pronunciation and Accent Conversion through Knowledge Distillation And Synthetic Ground-Truth from Native TTS
Authors:
Tuan Nam Nguyen,
Seymanur Akti,
Ngoc Quan Pham,
Alexander Waibel
Abstract:
Previous approaches on accent conversion (AC) mainly aimed at making non-native speech sound more native while maintaining the original content and speaker identity. However, non-native speakers sometimes have pronunciation issues, which can make it difficult for listeners to understand them. Hence, we developed a new AC approach that not only focuses on accent conversion but also improves pronunc…
▽ More
Previous approaches on accent conversion (AC) mainly aimed at making non-native speech sound more native while maintaining the original content and speaker identity. However, non-native speakers sometimes have pronunciation issues, which can make it difficult for listeners to understand them. Hence, we developed a new AC approach that not only focuses on accent conversion but also improves pronunciation of non-native accented speaker. By providing the non-native audio and the corresponding transcript, we generate the ideal ground-truth audio with native-like pronunciation with original duration and prosody. This ground-truth data aids the model in learning a direct mapping between accented and native speech. We utilize the end-to-end VITS framework to achieve high-quality waveform reconstruction for the AC task. As a result, our system not only produces audio that closely resembles native accents and while retaining the original speaker's identity but also improve pronunciation, as demonstrated by evaluation results.
△ Less
Submitted 4 March, 2025; v1 submitted 19 October, 2024;
originally announced October 2024.
-
Accent conversion using discrete units with parallel data synthesized from controllable accented TTS
Authors:
Tuan Nam Nguyen,
Ngoc Quan Pham,
Alexander Waibel
Abstract:
The goal of accent conversion (AC) is to convert speech accents while preserving content and speaker identity. Previous methods either required reference utterances during inference, did not preserve speaker identity well, or used one-to-one systems that could only be trained for each non-native accent. This paper presents a promising AC model that can convert many accents into native to overcome…
▽ More
The goal of accent conversion (AC) is to convert speech accents while preserving content and speaker identity. Previous methods either required reference utterances during inference, did not preserve speaker identity well, or used one-to-one systems that could only be trained for each non-native accent. This paper presents a promising AC model that can convert many accents into native to overcome these issues. Our approach utilizes discrete units, derived from clustering self-supervised representations of native speech, as an intermediary target for accent conversion. Leveraging multi-speaker text-to-speech synthesis, it transforms these discrete representations back into native speech while retaining the speaker identity. Additionally, we develop an efficient data augmentation method to train the system without demanding a lot of non-native resources. Our system is proved to improve non-native speaker fluency, sound like a native accent, and preserve original speaker identity well.
△ Less
Submitted 30 September, 2024;
originally announced October 2024.
-
Fuel tax loss in a world of electric mobility: A window of opportunity for congestion pricing
Authors:
Thi Ngoc Nguyen,
Felix Muesgens
Abstract:
The continued transition towards electric mobility will decrease energy tax revenues worldwide, which has substantial implications for government funds. At the same time, demand for transportation is ever increasing, which in turn increases congestion problems. Combining both challenges, this paper assesses the effectiveness of congestion pricing as a sustainable revenue stream to offset fuel tax…
▽ More
The continued transition towards electric mobility will decrease energy tax revenues worldwide, which has substantial implications for government funds. At the same time, demand for transportation is ever increasing, which in turn increases congestion problems. Combining both challenges, this paper assesses the effectiveness of congestion pricing as a sustainable revenue stream to offset fuel tax loss in 2030 while simultaneously enhancing efficiency in the transport sector. A congestion-based toll that is road-and-time-variant is simulated for the greater Berlin area in Germany using the multi-agent transport simulation (MATSim) software. Through the simulation results, this paper quantifies the impacts of the toll on the governmental revenue, traffic management, environment, social welfare, and the distribution effects. We find that the revenue from congestion tolls in a metropolitan area can compensate the reduction in passenger car fuel tax. Furthermore, a remarkable welfare surplus is observed. The toll also successfully incentivises transport users to adjust their travel behaviour, which reduces traffic delay time by 28%. CO2 emissions as a key metric for decarbonisation of the transport sector decrease by more than 5%. The analysis of the distribution effects suggests that a redistribution plan with a focus on the middle-low-income residents and the outer boroughs could help the policy gain more public acceptance.
△ Less
Submitted 30 September, 2024;
originally announced September 2024.
-
HyperAgent: Generalist Software Engineering Agents to Solve Coding Tasks at Scale
Authors:
Huy Nhat Phan,
Tien N. Nguyen,
Phong X. Nguyen,
Nghi D. Q. Bui
Abstract:
Large Language Models (LLMs) have revolutionized software engineering (SE), showcasing remarkable proficiency in various coding tasks. Despite recent advancements that have enabled the creation of autonomous software agents utilizing LLMs for end-to-end development tasks, these systems are typically designed for specific SE functions. We introduce HyperAgent, an innovative generalist multi-agent s…
▽ More
Large Language Models (LLMs) have revolutionized software engineering (SE), showcasing remarkable proficiency in various coding tasks. Despite recent advancements that have enabled the creation of autonomous software agents utilizing LLMs for end-to-end development tasks, these systems are typically designed for specific SE functions. We introduce HyperAgent, an innovative generalist multi-agent system designed to tackle a wide range of SE tasks across different programming languages by mimicking the workflows of human developers. HyperAgent features four specialized agents-Planner, Navigator, Code Editor, and Executor-capable of handling the entire lifecycle of SE tasks, from initial planning to final verification. HyperAgent sets new benchmarks in diverse SE tasks, including GitHub issue resolution on the renowned SWE-Bench benchmark, outperforming robust baselines. Furthermore, HyperAgent demonstrates exceptional performance in repository-level code generation (RepoExec) and fault localization and program repair (Defects4J), often surpassing state-of-the-art baselines.
△ Less
Submitted 5 November, 2024; v1 submitted 9 September, 2024;
originally announced September 2024.
-
PainDiffusion: Learning to Express Pain
Authors:
Quang Tien Dam,
Tri Tung Nguyen Nguyen,
Yuki Endo,
Dinh Tuan Tran,
Joo-Ho Lee
Abstract:
Accurate pain expression synthesis is essential for improving clinical training and human-robot interaction. Current Robotic Patient Simulators (RPSs) lack realistic pain facial expressions, limiting their effectiveness in medical training. In this work, we introduce PainDiffusion, a generative model that synthesizes naturalistic facial pain expressions. Unlike traditional heuristic or autoregress…
▽ More
Accurate pain expression synthesis is essential for improving clinical training and human-robot interaction. Current Robotic Patient Simulators (RPSs) lack realistic pain facial expressions, limiting their effectiveness in medical training. In this work, we introduce PainDiffusion, a generative model that synthesizes naturalistic facial pain expressions. Unlike traditional heuristic or autoregressive methods, PainDiffusion operates in a continuous latent space, ensuring smoother and more natural facial motion while supporting indefinite-length generation via diffusion forcing. Our approach incorporates intrinsic characteristics such as pain expressiveness and emotion, allowing for personalized and controllable pain expression synthesis. We train and evaluate our model using the BioVid HeatPain Database. Additionally, we integrate PainDiffusion into a robotic system to assess its applicability in real-time rehabilitation exercises. Qualitative studies with clinicians reveal that PainDiffusion produces realistic pain expressions, with a 31.2% (std 4.8%) preference rate against ground-truth recordings. Our results suggest that PainDiffusion can serve as a viable alternative to real patients in clinical training and simulation, bridging the gap between synthetic and naturalistic pain expression. Code and videos are available at: https://damtien444.github.io/paindf/
△ Less
Submitted 4 March, 2025; v1 submitted 17 September, 2024;
originally announced September 2024.
-
Bi-level regularization via iterative mesh refinement for aeroacoustics
Authors:
Christian Aarset,
Tram Thi Ngoc Nguyen
Abstract:
In this work, we illustrate the connection between adaptive mesh refinement for finite element discretized PDEs and the recently developed \emph{bi-level regularization algorithm}. By adaptive mesh refinement according to data noise, regularization effect and convergence are immediate consequences. We moreover demonstrate its numerical advantages to the classical Landweber algorithm in term of tim…
▽ More
In this work, we illustrate the connection between adaptive mesh refinement for finite element discretized PDEs and the recently developed \emph{bi-level regularization algorithm}. By adaptive mesh refinement according to data noise, regularization effect and convergence are immediate consequences. We moreover demonstrate its numerical advantages to the classical Landweber algorithm in term of time and reconstruction quality for the example of the Helmholtz equation in an aeroacoustic setting.
△ Less
Submitted 31 October, 2024; v1 submitted 10 September, 2024;
originally announced September 2024.
-
Sequential bi-level regularized inversion with application to hidden reaction law discovery
Authors:
Tram Thi Ngoc Nguyen
Abstract:
In this article, we develop and present a novel regularization scheme for ill-posed inverse problems governed by nonlinear \blue{time-dependent} partial differential equations (PDEs). In our recent work, we introduced a bi-level regularization framework. This study significantly improves upon the bi-level algorithm by sequentially initializing the lower-level problem, yielding accelerated converge…
▽ More
In this article, we develop and present a novel regularization scheme for ill-posed inverse problems governed by nonlinear \blue{time-dependent} partial differential equations (PDEs). In our recent work, we introduced a bi-level regularization framework. This study significantly improves upon the bi-level algorithm by sequentially initializing the lower-level problem, yielding accelerated convergence and demonstrable multi-scale effect, while retaining regularizing effect and allows for the usage of inexact PDE solvers. Moreover, by collecting the lower-level trajectory, we uncover an interesting connection to the incremental load method. The sequential bi-level approach illustrates its universality through several reaction-diffusion applications, in which the nonlinear reaction law needs to be determined. We moreover prove that the proposed tangential cone condition is satisfied.
△ Less
Submitted 29 May, 2025; v1 submitted 5 September, 2024;
originally announced September 2024.
-
CodeFlow: Program Behavior Prediction with Dynamic Dependencies Learning
Authors:
Cuong Chi Le,
Hoang Nhat Phan,
Huy Nhat Phan,
Tien N. Nguyen,
Nghi D. Q. Bui
Abstract:
Predicting program behavior without execution is a critical task in software engineering. Existing models often fall short in capturing the dynamic dependencies among program elements. To address this, we present CodeFlow, a novel machine learning-based approach that predicts code coverage and detects runtime errors by learning both static and dynamic dependencies within the code. By using control…
▽ More
Predicting program behavior without execution is a critical task in software engineering. Existing models often fall short in capturing the dynamic dependencies among program elements. To address this, we present CodeFlow, a novel machine learning-based approach that predicts code coverage and detects runtime errors by learning both static and dynamic dependencies within the code. By using control flow graphs (CFGs), CodeFlow effectively represents all possible execution paths and the statistic relations between different statements, providing a more comprehensive understanding of program behaviors. CodeFlow constructs CFGs to represent possible execution paths and learns vector representations (embeddings) for CFG nodes, capturing static control-flow dependencies. Additionally, it learns dynamic dependencies by leveraging execution traces, which reflect the impacts among statements during execution. This combination enables CodeFlow to accurately predict code coverage and identify runtime errors. Our empirical evaluation demonstrates that CodeFlow significantly improves code coverage prediction accuracy and effectively localizes runtime errors, outperforming state-of-the-art models.
△ Less
Submitted 9 February, 2025; v1 submitted 5 August, 2024;
originally announced August 2024.
-
Segment-Based Test Case Prioritization: A Multi-objective Approach
Authors:
Hieu Huynh,
Nhu Pham,
Tien N. Nguyen,
Vu Nguyen
Abstract:
Regression testing of software is a crucial but time-consuming task, especially in the context of user interface (UI) testing where multiple microservices must be validated simultaneously. Test case prioritization (TCP) is a cost-efficient solution to address this by scheduling test cases in an execution order that maximizes an objective function, generally aimed at increasing the fault detection…
▽ More
Regression testing of software is a crucial but time-consuming task, especially in the context of user interface (UI) testing where multiple microservices must be validated simultaneously. Test case prioritization (TCP) is a cost-efficient solution to address this by scheduling test cases in an execution order that maximizes an objective function, generally aimed at increasing the fault detection rate. While several techniques have been proposed for TCP, most rely on source code information which is usually not available for UI testing. In this paper, we introduce a multi-objective optimization approach to prioritize UI test cases, using evolutionary search algorithms and four coverage criteria focusing on web page elements as objectives for the optimization problem. Our method, which does not require source code information, is evaluated using two evolutionary algorithms (AGE-MOEA and NSGA-II) and compared with other TCP methods on a self-collected dataset of 11 test suites. The results show that our approach significantly outperforms other methods in terms of Average Percentage of Faults Detected (APFD) and APFD with Cost (APFDc), achieving the highest scores of 87.8\% and 79.2\%, respectively. We also introduce a new dataset and demonstrate the significant improvement of our approach over existing ones via empirical experiments. The paper's contributions include the application of web page segmentation in TCP, the construction of a new dataset for UI TCP, and empirical comparisons that demonstrate the improvement of our approach.
△ Less
Submitted 1 August, 2024;
originally announced August 2024.
-
Predicting and Understanding Human Action Decisions: Insights from Large Language Models and Cognitive Instance-Based Learning
Authors:
Thuy Ngoc Nguyen,
Kasturi Jamale,
Cleotilde Gonzalez
Abstract:
Large Language Models (LLMs) have demonstrated their capabilities across various tasks, from language translation to complex reasoning. Understanding and predicting human behavior and biases are crucial for artificial intelligence (AI) assisted systems to provide useful assistance, yet it remains an open question whether these models can achieve this. This paper addresses this gap by leveraging th…
▽ More
Large Language Models (LLMs) have demonstrated their capabilities across various tasks, from language translation to complex reasoning. Understanding and predicting human behavior and biases are crucial for artificial intelligence (AI) assisted systems to provide useful assistance, yet it remains an open question whether these models can achieve this. This paper addresses this gap by leveraging the reasoning and generative capabilities of the LLMs to predict human behavior in two sequential decision-making tasks. These tasks involve balancing between exploitative and exploratory actions and handling delayed feedback, both essential for simulating real-life decision processes. We compare the performance of LLMs with a cognitive instance-based learning (IBL) model, which imitates human experiential decision-making. Our findings indicate that LLMs excel at rapidly incorporating feedback to enhance prediction accuracy. In contrast, the cognitive IBL model better accounts for human exploratory behaviors and effectively captures loss aversion bias, i.e., the tendency to choose a sub-optimal goal with fewer step-cost penalties rather than exploring to find the optimal choice, even with limited experience. The results highlight the benefits of integrating LLMs with cognitive architectures, suggesting that this synergy could enhance the modeling and understanding of complex human decision-making patterns.
△ Less
Submitted 5 August, 2024; v1 submitted 12 July, 2024;
originally announced July 2024.
-
Rectifier: Code Translation with Corrector via LLMs
Authors:
Xin Yin,
Chao Ni,
Tien N. Nguyen,
Shaohua Wang,
Xiaohu Yang
Abstract:
Software migration is garnering increasing attention with the evolution of software and society. Early studies mainly relied on handcrafted translation rules to translate between two languages, the translation process is error-prone and time-consuming. In recent years, researchers have begun to explore the use of pre-trained large language models (LLMs) in code translation. However, code translati…
▽ More
Software migration is garnering increasing attention with the evolution of software and society. Early studies mainly relied on handcrafted translation rules to translate between two languages, the translation process is error-prone and time-consuming. In recent years, researchers have begun to explore the use of pre-trained large language models (LLMs) in code translation. However, code translation is a complex task that LLMs would generate mistakes during code translation, they all produce certain types of errors when performing code translation tasks, which include (1) compilation error, (2) runtime error, (3) functional error, and (4) non-terminating execution. We found that the root causes of these errors are very similar (e.g. failure to import packages, errors in loop boundaries, operator errors, and more). In this paper, we propose a general corrector, namely Rectifier, which is a micro and universal model for repairing translation errors. It learns from errors generated by existing LLMs and can be widely applied to correct errors generated by any LLM. The experimental results on translation tasks between C++, Java, and Python show that our model has effective repair ability, and cross experiments also demonstrate the robustness of our method.
△ Less
Submitted 10 July, 2024;
originally announced July 2024.
-
ESGNN: Towards Equivariant Scene Graph Neural Network for 3D Scene Understanding
Authors:
Quang P. M. Pham,
Khoi T. N. Nguyen,
Lan C. Ngo,
Truong Do,
Truong Son Hy
Abstract:
Scene graphs have been proven to be useful for various scene understanding tasks due to their compact and explicit nature. However, existing approaches often neglect the importance of maintaining the symmetry-preserving property when generating scene graphs from 3D point clouds. This oversight can diminish the accuracy and robustness of the resulting scene graphs, especially when handling noisy, m…
▽ More
Scene graphs have been proven to be useful for various scene understanding tasks due to their compact and explicit nature. However, existing approaches often neglect the importance of maintaining the symmetry-preserving property when generating scene graphs from 3D point clouds. This oversight can diminish the accuracy and robustness of the resulting scene graphs, especially when handling noisy, multi-view 3D data. This work, to the best of our knowledge, is the first to implement an Equivariant Graph Neural Network in semantic scene graph generation from 3D point clouds for scene understanding. Our proposed method, ESGNN, outperforms existing state-of-the-art approaches, demonstrating a significant improvement in scene estimation with faster convergence. ESGNN demands low computational resources and is easy to implement from available frameworks, paving the way for real-time applications such as robotics and computer vision.
△ Less
Submitted 30 June, 2024;
originally announced July 2024.
-
Ollabench: Evaluating LLMs' Reasoning for Human-centric Interdependent Cybersecurity
Authors:
Tam n. Nguyen
Abstract:
Large Language Models (LLMs) have the potential to enhance Agent-Based Modeling by better representing complex interdependent cybersecurity systems, improving cybersecurity threat modeling and risk management. However, evaluating LLMs in this context is crucial for legal compliance and effective application development. Existing LLM evaluation frameworks often overlook the human factor and cogniti…
▽ More
Large Language Models (LLMs) have the potential to enhance Agent-Based Modeling by better representing complex interdependent cybersecurity systems, improving cybersecurity threat modeling and risk management. However, evaluating LLMs in this context is crucial for legal compliance and effective application development. Existing LLM evaluation frameworks often overlook the human factor and cognitive computing capabilities essential for interdependent cybersecurity. To address this gap, I propose OllaBench, a novel evaluation framework that assesses LLMs' accuracy, wastefulness, and consistency in answering scenario-based information security compliance and non-compliance questions. OllaBench is built on a foundation of 24 cognitive behavioral theories and empirical evidence from 38 peer-reviewed papers. OllaBench was used to evaluate 21 LLMs, including both open-weight and commercial models from OpenAI, Anthropic, Google, Microsoft, Meta and so on. The results reveal that while commercial LLMs have the highest overall accuracy scores, there is significant room for improvement. Smaller low-resolution open-weight LLMs are not far behind in performance, and there are significant differences in token efficiency and consistency among the evaluated models. OllaBench provides a user-friendly interface and supports a wide range of LLM platforms, making it a valuable tool for researchers and solution developers in the field of human-centric interdependent cybersecurity and beyond.
△ Less
Submitted 10 June, 2024;
originally announced June 2024.
-
Geometry-aware framework for deep energy method: an application to structural mechanics with hyperelastic materials
Authors:
Thi Nguyen Khoa Nguyen,
Thibault Dairay,
Raphaƫl Meunier,
Christophe Millet,
Mathilde Mougeot
Abstract:
Physics-Informed Neural Networks (PINNs) have gained considerable interest in diverse engineering domains thanks to their capacity to integrate physical laws into deep learning models. Recently, geometry-aware PINN-based approaches that employ the strong form of underlying physical system equations have been developed with the aim of integrating geometric information into PINNs. Despite ongoing re…
▽ More
Physics-Informed Neural Networks (PINNs) have gained considerable interest in diverse engineering domains thanks to their capacity to integrate physical laws into deep learning models. Recently, geometry-aware PINN-based approaches that employ the strong form of underlying physical system equations have been developed with the aim of integrating geometric information into PINNs. Despite ongoing research, the assessment of PINNs in problems with various geometries remains an active area of investigation. In this work, we introduce a novel physics-informed framework named the Geometry-Aware Deep Energy Method (GADEM) for solving structural mechanics problems on different geometries. As the weak form of the physical system equation (or the energy-based approach) has demonstrated clear advantages compared to the strong form for solving solid mechanics problems, GADEM employs the weak form and aims to infer the solution on multiple shapes of geometries. Integrating a geometry-aware framework into an energy-based method results in an effective physics-informed deep learning model in terms of accuracy and computational cost. Different ways to represent the geometric information and to encode the geometric latent vectors are investigated in this work. We introduce a loss function of GADEM which is minimized based on the potential energy of all considered geometries. An adaptive learning method is also employed for the sampling of collocation points to enhance the performance of GADEM. We present some applications of GADEM to solve solid mechanics problems, including a loading simulation of a toy tire involving contact mechanics and large deformation hyperelasticity. The numerical results of this work demonstrate the remarkable capability of GADEM to infer the solution on various and new shapes of geometries using only one trained model.
△ Less
Submitted 6 May, 2024;
originally announced May 2024.
-
Emerging Technologies for 6G Non-Terrestrial-Networks: From Academia to Industrial Applications
Authors:
Cong T. Nguyen,
Yuris Mulya Saputra,
Nguyen Van Huynh,
Tan N. Nguyen,
Dinh Thai Hoang,
Diep N Nguyen,
Van-Quan Pham,
Miroslav Voznak,
Symeon Chatzinotas,
Dinh-Hieu Tran
Abstract:
Terrestrial networks form the fundamental infrastructure of modern communication systems, serving more than 4 billion users globally. However, terrestrial networks are facing a wide range of challenges, from coverage and reliability to interference and congestion. As the demands of the 6G era are expected to be much higher, it is crucial to address these challenges to ensure a robust and efficient…
▽ More
Terrestrial networks form the fundamental infrastructure of modern communication systems, serving more than 4 billion users globally. However, terrestrial networks are facing a wide range of challenges, from coverage and reliability to interference and congestion. As the demands of the 6G era are expected to be much higher, it is crucial to address these challenges to ensure a robust and efficient communication infrastructure for the future. To address these problems, Non-terrestrial Network (NTN) has emerged to be a promising solution. NTNs are communication networks that leverage airborne (e.g., unmanned aerial vehicles) and spaceborne vehicles (e.g., satellites) to facilitate ultra-reliable communications and connectivity with high data rates and low latency over expansive regions. This article aims to provide a comprehensive survey on the utilization of network slicing, Artificial Intelligence/Machine Learning (AI/ML), and Open Radio Access Network (ORAN) to address diverse challenges of NTNs from the perspectives of both academia and industry. Particularly, we first provide an in-depth tutorial on NTN and the key enabling technologies including network slicing, AI/ML, and ORAN. Then, we provide a comprehensive survey on how network slicing and AI/ML have been leveraged to overcome the challenges that NTNs are facing. Moreover, we present how ORAN can be utilized for NTNs. Finally, we highlight important challenges, open issues, and future research directions of NTN in the 6G era.
△ Less
Submitted 3 July, 2024; v1 submitted 12 March, 2024;
originally announced March 2024.
-
RepoHyper: Search-Expand-Refine on Semantic Graphs for Repository-Level Code Completion
Authors:
Huy N. Phan,
Hoang N. Phan,
Tien N. Nguyen,
Nghi D. Q. Bui
Abstract:
Code Large Language Models (CodeLLMs) have demonstrated impressive proficiency in code completion tasks. However, they often fall short of fully understanding the extensive context of a project repository, such as the intricacies of relevant files and class hierarchies, which can result in less precise completions. To overcome these limitations, we present \tool, a multifaceted framework designed…
▽ More
Code Large Language Models (CodeLLMs) have demonstrated impressive proficiency in code completion tasks. However, they often fall short of fully understanding the extensive context of a project repository, such as the intricacies of relevant files and class hierarchies, which can result in less precise completions. To overcome these limitations, we present \tool, a multifaceted framework designed to address the complex challenges associated with repository-level code completion. Central to RepoHYPER is the {\em Repo-level Semantic Graph} (RSG), a novel semantic graph structure that encapsulates the vast context of code repositories. Furthermore, RepoHyper leverages Expand and Refine retrieval method, including a graph expansion and a link prediction algorithm applied to the RSG, enabling the effective retrieval and prioritization of relevant code snippets. Our evaluations show that \tool markedly outperforms existing techniques in repository-level code completion, showcasing enhanced accuracy across various datasets when compared to several strong baselines. Our implementation of RepoHYPER can be found at https://github.com/FSoft-AI4Code/RepoHyper.
△ Less
Submitted 14 August, 2024; v1 submitted 10 March, 2024;
originally announced March 2024.
-
Inferring solar differential rotation and viscosity via passive imaging with inertial waves
Authors:
Tram Thi Ngoc Nguyen,
Thorsten Hohage,
Damien Fournier,
Laurent Gizon
Abstract:
The recent discovery of inertial waves on the surface of the Sun offers new possibilities to learn about the solar interior. These waves are long-lived with a period on the order of the Sun rotation period ($\sim$27 days) and are sensitive to parameters deep inside the Sun. They are excited by turbulent convection, leading to a passive imaging problem. In this work, we present the forward and inve…
▽ More
The recent discovery of inertial waves on the surface of the Sun offers new possibilities to learn about the solar interior. These waves are long-lived with a period on the order of the Sun rotation period ($\sim$27 days) and are sensitive to parameters deep inside the Sun. They are excited by turbulent convection, leading to a passive imaging problem. In this work, we present the forward and inverse problem of reconstructing viscosity and differential rotation on the Sun from cross-covariance observations of these inertial waves.
△ Less
Submitted 22 March, 2024; v1 submitted 1 March, 2024;
originally announced March 2024.
-
Integrating LLMs for Explainable Fault Diagnosis in Complex Systems
Authors:
Akshay J. Dave,
Tat Nghia Nguyen,
Richard B. Vilim
Abstract:
This paper introduces an integrated system designed to enhance the explainability of fault diagnostics in complex systems, such as nuclear power plants, where operator understanding is critical for informed decision-making. By combining a physics-based diagnostic tool with a Large Language Model, we offer a novel solution that not only identifies faults but also provides clear, understandable expl…
▽ More
This paper introduces an integrated system designed to enhance the explainability of fault diagnostics in complex systems, such as nuclear power plants, where operator understanding is critical for informed decision-making. By combining a physics-based diagnostic tool with a Large Language Model, we offer a novel solution that not only identifies faults but also provides clear, understandable explanations of their causes and implications. The system's efficacy is demonstrated through application to a molten salt facility, showcasing its ability to elucidate the connections between diagnosed faults and sensor data, answer operator queries, and evaluate historical sensor anomalies. Our approach underscores the importance of merging model-based diagnostics with advanced AI to improve the reliability and transparency of autonomous systems.
△ Less
Submitted 8 February, 2024;
originally announced February 2024.
-
Data-Driven Evidence-Based Syntactic Sugar Design
Authors:
David OBrien,
Robert Dyer,
Tien N. Nguyen,
Hridesh Rajan
Abstract:
Programming languages are essential tools for developers, and their evolution plays a crucial role in supporting the activities of developers. One instance of programming language evolution is the introduction of syntactic sugars, which are additional syntax elements that provide alternative, more readable code constructs. However, the process of designing and evolving a programming language has t…
▽ More
Programming languages are essential tools for developers, and their evolution plays a crucial role in supporting the activities of developers. One instance of programming language evolution is the introduction of syntactic sugars, which are additional syntax elements that provide alternative, more readable code constructs. However, the process of designing and evolving a programming language has traditionally been guided by anecdotal experiences and intuition. Recent advances in tools and methodologies for mining open-source repositories have enabled developers to make data-driven software engineering decisions. In light of this, this paper proposes an approach for motivating data-driven programming evolution by applying frequent subgraph mining techniques to a large dataset of 166,827,154 open-source Java methods. The dataset is mined by generalizing Java control-flow graphs to capture broad programming language usages and instances of duplication. Frequent subgraphs are then extracted to identify potentially impactful opportunities for new syntactic sugars. Our diverse results demonstrate the benefits of the proposed technique by identifying new syntactic sugars involving a variety of programming constructs that could be implemented in Java, thus simplifying frequent code idioms. This approach can potentially provide valuable insights for Java language designers, and serve as a proof-of-concept for data-driven programming language design and evolution.
△ Less
Submitted 1 February, 2024;
originally announced February 2024.
-
Representation formulas for maximal monotone operators of type (D) in Banach spaces whose dual spaces are strictly convex
Authors:
Nguyen B. Tran,
Tran N. Nguyen,
Huynh M. Hien
Abstract:
This work deals with a maximal monotone operator $A$ of type (D) in a Banach space whose dual space is strictly convex. We establish some representations for the value $Ax$ at a given point $x$ via its values at nearby points of $x$. We show that the faces of $Ax$ are contained in the set of all weak$^*$ convergent limits of bounded nets of the operator at nearby points of $x$, then we obtain a re…
▽ More
This work deals with a maximal monotone operator $A$ of type (D) in a Banach space whose dual space is strictly convex. We establish some representations for the value $Ax$ at a given point $x$ via its values at nearby points of $x$. We show that the faces of $Ax$ are contained in the set of all weak$^*$ convergent limits of bounded nets of the operator at nearby points of $x$, then we obtain a representation for $Ax$ by use of this set. In addition, representations for the support function of $Ax$ based on the minimal-norm selection of the operator in certain Banach spaces are given.
△ Less
Submitted 30 December, 2023;
originally announced January 2024.
-
FPGA-based residual amplitude modulation suppression and control for compact atomic clocks
Authors:
Tin Nghia Nguyen,
Thomas R. Schibli
Abstract:
We designed an FPGA fabric to provide phase modulation techniques to lock lasers to optical frequency references. The method incorporates an active residual-amplitude-modulation (RAM) suppression scheme that relies on complex modulation. All the required servos to construct an optical atomic clock are incorporated onto the same low-cost, commercial FPGA chip. We demonstrate a reliable, long-term R…
▽ More
We designed an FPGA fabric to provide phase modulation techniques to lock lasers to optical frequency references. The method incorporates an active residual-amplitude-modulation (RAM) suppression scheme that relies on complex modulation. All the required servos to construct an optical atomic clock are incorporated onto the same low-cost, commercial FPGA chip. We demonstrate a reliable, long-term RAM suppression 60 dB with the remaining RAM level at -100 dBc and an improved stability of three decades when applied on a two-photon rubidium clock.
△ Less
Submitted 31 October, 2023;
originally announced November 2023.
-
Large Language Models for Scientific Synthesis, Inference and Explanation
Authors:
Yizhen Zheng,
Huan Yee Koh,
Jiaxin Ju,
Anh T. N. Nguyen,
Lauren T. May,
Geoffrey I. Webb,
Shirui Pan
Abstract:
Large language models are a form of artificial intelligence systems whose primary knowledge consists of the statistical patterns, semantic relationships, and syntactical structures of language1. Despite their limited forms of "knowledge", these systems are adept at numerous complex tasks including creative writing, storytelling, translation, question-answering, summarization, and computer code gen…
▽ More
Large language models are a form of artificial intelligence systems whose primary knowledge consists of the statistical patterns, semantic relationships, and syntactical structures of language1. Despite their limited forms of "knowledge", these systems are adept at numerous complex tasks including creative writing, storytelling, translation, question-answering, summarization, and computer code generation. However, they have yet to demonstrate advanced applications in natural science. Here we show how large language models can perform scientific synthesis, inference, and explanation. We present a method for using general-purpose large language models to make inferences from scientific datasets of the form usually associated with special-purpose machine learning algorithms. We show that the large language model can augment this "knowledge" by synthesizing from the scientific literature. When a conventional machine learning system is augmented with this synthesized and inferred knowledge it can outperform the current state of the art across a range of benchmark tasks for predicting molecular properties. This approach has the further advantage that the large language model can explain the machine learning system's predictions. We anticipate that our framework will open new avenues for AI to accelerate the pace of scientific discovery.
△ Less
Submitted 11 October, 2023;
originally announced October 2023.
-
Bi-level iterative regularization for inverse problems in nonlinear PDEs
Authors:
Tram Thi Ngoc Nguyen
Abstract:
We investigate the ill-posed inverse problem of recovering unknown spatially dependent parameters in nonlinear evolution PDEs. We propose a bi-level Landweber scheme, where the upper-level parameter reconstruction embeds a lower-level state approximation. This can be seen as combining the classical reduced setting and the newer all-at-once setting, allowing us to, respectively, utilize well-posedn…
▽ More
We investigate the ill-posed inverse problem of recovering unknown spatially dependent parameters in nonlinear evolution PDEs. We propose a bi-level Landweber scheme, where the upper-level parameter reconstruction embeds a lower-level state approximation. This can be seen as combining the classical reduced setting and the newer all-at-once setting, allowing us to, respectively, utilize well-posedness of the parameter-to-state map, and to bypass having to solve nonlinear PDEs exactly. Using this, we derive stopping rules for lower- and upper-level iterations and convergence of the bi-level method. We discuss application to parameter identification for the Landau-Lifshitz-Gilbert equation in magnetic particle imaging.
△ Less
Submitted 5 February, 2024; v1 submitted 31 August, 2023;
originally announced August 2023.
-
Learning in Cooperative Multiagent Systems Using Cognitive and Machine Models
Authors:
Thuy Ngoc Nguyen,
Duy Nhat Phan,
Cleotilde Gonzalez
Abstract:
Developing effective Multi-Agent Systems (MAS) is critical for many applications requiring collaboration and coordination with humans. Despite the rapid advance of Multi-Agent Deep Reinforcement Learning (MADRL) in cooperative MAS, one major challenge is the simultaneous learning and interaction of independent agents in dynamic environments in the presence of stochastic rewards. State-of-the-art M…
▽ More
Developing effective Multi-Agent Systems (MAS) is critical for many applications requiring collaboration and coordination with humans. Despite the rapid advance of Multi-Agent Deep Reinforcement Learning (MADRL) in cooperative MAS, one major challenge is the simultaneous learning and interaction of independent agents in dynamic environments in the presence of stochastic rewards. State-of-the-art MADRL models struggle to perform well in Coordinated Multi-agent Object Transportation Problems (CMOTPs), wherein agents must coordinate with each other and learn from stochastic rewards. In contrast, humans often learn rapidly to adapt to nonstationary environments that require coordination among people. In this paper, motivated by the demonstrated ability of cognitive models based on Instance-Based Learning Theory (IBLT) to capture human decisions in many dynamic decision making tasks, we propose three variants of Multi-Agent IBL models (MAIBL). The idea of these MAIBL algorithms is to combine the cognitive mechanisms of IBLT and the techniques of MADRL models to deal with coordination MAS in stochastic environments from the perspective of independent learners. We demonstrate that the MAIBL models exhibit faster learning and achieve better coordination in a dynamic CMOTP task with various settings of stochastic rewards compared to current MADRL models. We discuss the benefits of integrating cognitive insights into MADRL models.
△ Less
Submitted 17 August, 2023;
originally announced August 2023.
-
Batch Clipping and Adaptive Layerwise Clipping for Differential Private Stochastic Gradient Descent
Authors:
Toan N. Nguyen,
Phuong Ha Nguyen,
Lam M. Nguyen,
Marten Van Dijk
Abstract:
Each round in Differential Private Stochastic Gradient Descent (DPSGD) transmits a sum of clipped gradients obfuscated with Gaussian noise to a central server which uses this to update a global model which often represents a deep neural network. Since the clipped gradients are computed separately, which we call Individual Clipping (IC), deep neural networks like resnet-18 cannot use Batch Normaliz…
▽ More
Each round in Differential Private Stochastic Gradient Descent (DPSGD) transmits a sum of clipped gradients obfuscated with Gaussian noise to a central server which uses this to update a global model which often represents a deep neural network. Since the clipped gradients are computed separately, which we call Individual Clipping (IC), deep neural networks like resnet-18 cannot use Batch Normalization Layers (BNL) which is a crucial component in deep neural networks for achieving a high accuracy. To utilize BNL, we introduce Batch Clipping (BC) where, instead of clipping single gradients as in the orginal DPSGD, we average and clip batches of gradients. Moreover, the model entries of different layers have different sensitivities to the added Gaussian noise. Therefore, Adaptive Layerwise Clipping methods (ALC), where each layer has its own adaptively finetuned clipping constant, have been introduced and studied, but so far without rigorous DP proofs. In this paper, we propose {\em a new ALC and provide rigorous DP proofs for both BC and ALC}. Experiments show that our modified DPSGD with BC and ALC for CIFAR-$10$ with resnet-$18$ converges while DPSGD with IC and ALC does not.
△ Less
Submitted 21 July, 2023;
originally announced July 2023.
-
Credit Assignment: Challenges and Opportunities in Developing Human-like AI Agents
Authors:
Thuy Ngoc Nguyen,
Chase McDonald,
Cleotilde Gonzalez
Abstract:
Temporal credit assignment is crucial for learning and skill development in natural and artificial intelligence. While computational methods like the TD approach in reinforcement learning have been proposed, it's unclear if they accurately represent how humans handle feedback delays. Cognitive models intend to represent the mental steps by which humans solve problems and perform a number of tasks,…
▽ More
Temporal credit assignment is crucial for learning and skill development in natural and artificial intelligence. While computational methods like the TD approach in reinforcement learning have been proposed, it's unclear if they accurately represent how humans handle feedback delays. Cognitive models intend to represent the mental steps by which humans solve problems and perform a number of tasks, but limited research in cognitive science has addressed the credit assignment problem in humans and cognitive models. Our research uses a cognitive model based on a theory of decisions from experience, Instance-Based Learning Theory (IBLT), to test different credit assignment mechanisms in a goal-seeking navigation task with varying levels of decision complexity. Instance-Based Learning (IBL) models simulate the process of making sequential choices with different credit assignment mechanisms, including a new IBL-TD model that combines the IBL decision mechanism with the TD approach. We found that (1) An IBL model that gives equal credit assignment to all decisions is able to match human performance better than other models, including IBL-TD and Q-learning; (2) IBL-TD and Q-learning models underperform compared to humans initially, but eventually, they outperform humans; (3) humans are influenced by decision complexity, while models are not. Our study provides insights into the challenges of capturing human behavior and the potential opportunities to use these models in future AI systems to support human activities.
△ Less
Submitted 16 July, 2023;
originally announced July 2023.