-
Hierarchical Neural Collapse Detection Transformer for Class Incremental Object Detection
Authors:
Duc Thanh Pham,
Hong Dang Nguyen,
Nhat Minh Nguyen Quoc,
Linh Ngo Van,
Sang Dinh Viet,
Duc Anh Nguyen
Abstract:
Recently, object detection models have witnessed notable performance improvements, particularly with transformer-based models. However, new objects frequently appear in the real world, requiring detection models to continually learn without suffering from catastrophic forgetting. Although Incremental Object Detection (IOD) has emerged to address this challenge, these existing models are still not…
▽ More
Recently, object detection models have witnessed notable performance improvements, particularly with transformer-based models. However, new objects frequently appear in the real world, requiring detection models to continually learn without suffering from catastrophic forgetting. Although Incremental Object Detection (IOD) has emerged to address this challenge, these existing models are still not practical due to their limited performance and prolonged inference time. In this paper, we introduce a novel framework for IOD, called Hier-DETR: Hierarchical Neural Collapse Detection Transformer, ensuring both efficiency and competitive performance by leveraging Neural Collapse for imbalance dataset and Hierarchical relation of classes' labels.
△ Less
Submitted 10 June, 2025;
originally announced June 2025.
-
ChemGraph: An Agentic Framework for Computational Chemistry Workflows
Authors:
Thang D. Pham,
Aditya Tanikanti,
Murat Keçeli
Abstract:
Atomistic simulations are essential tools in chemistry and materials science, accelerating the discovery of novel catalysts, energy storage materials, and pharmaceuticals. However, running these simulations remains challenging due to the wide range of computational methods, diverse software ecosystems, and the need for expert knowledge and manual effort for the setup, execution, and validation sta…
▽ More
Atomistic simulations are essential tools in chemistry and materials science, accelerating the discovery of novel catalysts, energy storage materials, and pharmaceuticals. However, running these simulations remains challenging due to the wide range of computational methods, diverse software ecosystems, and the need for expert knowledge and manual effort for the setup, execution, and validation stages. In this work, we present ChemGraph, an agentic framework powered by artificial intelligence and state-of-the-art simulation tools to streamline and automate computational chemistry and materials science workflows. ChemGraph leverages graph neural network-based foundation models for accurate yet computationally efficient calculations and large language models (LLMs) for natural language understanding, task planning, and scientific reasoning to provide an intuitive and interactive interface. Users can perform tasks such as molecular structure generation, single-point energy, geometry optimization, vibrational analysis, and thermochemistry calculations with methods ranging from tight-binding and machine learning interatomic potentials to density functional theory or wave function theory-based methods. We evaluate ChemGraph across 13 benchmark tasks and demonstrate that smaller LLMs (GPT-4o-mini, Claude-3.5-haiku, Qwen2.5-14B) perform well on simple workflows, while more complex tasks benefit from using larger models like GPT-4o. Importantly, we show that decomposing complex tasks into smaller subtasks through a multi-agent framework enables smaller LLM models to match or exceed GPT-4o's performance in specific scenarios.
△ Less
Submitted 3 June, 2025;
originally announced June 2025.
-
Leveraging Novel Ensemble Learning Techniques and Landsat Multispectral Data for Estimating Olive Yields in Tunisia
Authors:
Mohamed Kefi,
Tien Dat Pham,
Thin Nguyen,
Mark G. Tjoelker,
Viola Devasirvatham,
Kenichi Kashiwagi
Abstract:
Olive production is an important tree crop in Mediterranean climates. However, olive yield varies significantly due to climate change. Accurately estimating yield using remote sensing and machine learning remains a complex challenge. In this study, we developed a streamlined pipeline for olive yield estimation in the Kairouan and Sousse governorates of Tunisia. We extracted features from multispec…
▽ More
Olive production is an important tree crop in Mediterranean climates. However, olive yield varies significantly due to climate change. Accurately estimating yield using remote sensing and machine learning remains a complex challenge. In this study, we developed a streamlined pipeline for olive yield estimation in the Kairouan and Sousse governorates of Tunisia. We extracted features from multispectral reflectance bands, vegetation indices derived from Landsat-8 OLI and Landsat-9 OLI-2 satellite imagery, along with digital elevation model data. These spatial features were combined with ground-based field survey data to form a structured tabular dataset. We then developed an automated ensemble learning framework, implemented using AutoGluon to train and evaluate multiple machine learning models, select optimal combinations through stacking, and generate robust yield predictions using five-fold cross-validation. The results demonstrate strong predictive performance from both sensors, with Landsat-8 OLI achieving R2 = 0.8635 and RMSE = 1.17 tons per ha, and Landsat-9 OLI-2 achieving R2 = 0.8378 and RMSE = 1.32 tons per ha. This study highlights a scalable, cost-effective, and accurate method for olive yield estimation, with potential applicability across diverse agricultural regions globally.
△ Less
Submitted 25 May, 2025;
originally announced June 2025.
-
Brightness-Invariant Tracking Estimation in Tagged MRI
Authors:
Zhangxing Bian,
Shuwen Wei,
Xiao Liang,
Yuan-Chiao Lu,
Samuel W. Remedios,
Fangxu Xing,
Jonghye Woo,
Dzung L. Pham,
Aaron Carass,
Philip V. Bayly,
Jiachen Zhuo,
Ahmed Alshareef,
Jerry L. Prince
Abstract:
Magnetic resonance (MR) tagging is an imaging technique for noninvasively tracking tissue motion in vivo by creating a visible pattern of magnetization saturation (tags) that deforms with the tissue. Due to longitudinal relaxation and progression to steady-state, the tags and tissue brightnesses change over time, which makes tracking with optical flow methods error-prone. Although Fourier methods…
▽ More
Magnetic resonance (MR) tagging is an imaging technique for noninvasively tracking tissue motion in vivo by creating a visible pattern of magnetization saturation (tags) that deforms with the tissue. Due to longitudinal relaxation and progression to steady-state, the tags and tissue brightnesses change over time, which makes tracking with optical flow methods error-prone. Although Fourier methods can alleviate these problems, they are also sensitive to brightness changes as well as spectral spreading due to motion. To address these problems, we introduce the brightness-invariant tracking estimation (BRITE) technique for tagged MRI. BRITE disentangles the anatomy from the tag pattern in the observed tagged image sequence and simultaneously estimates the Lagrangian motion. The inherent ill-posedness of this problem is addressed by leveraging the expressive power of denoising diffusion probabilistic models to represent the probabilistic distribution of the underlying anatomy and the flexibility of physics-informed neural networks to estimate biologically-plausible motion. A set of tagged MR images of a gel phantom was acquired with various tag periods and imaging flip angles to demonstrate the impact of brightness variations and to validate our method. The results show that BRITE achieves more accurate motion and strain estimates as compared to other state of the art methods, while also being resistant to tag fading.
△ Less
Submitted 23 May, 2025;
originally announced May 2025.
-
Frankentext: Stitching random text fragments into long-form narratives
Authors:
Chau Minh Pham,
Jenna Russell,
Dzung Pham,
Mohit Iyyer
Abstract:
We introduce Frankentexts, a new type of long-form narratives produced by LLMs under the extreme constraint that most tokens (e.g., 90%) must be copied verbatim from human writings. This task presents a challenging test of controllable generation, requiring models to satisfy a writing prompt, integrate disparate text fragments, and still produce a coherent narrative. To generate Frankentexts, we i…
▽ More
We introduce Frankentexts, a new type of long-form narratives produced by LLMs under the extreme constraint that most tokens (e.g., 90%) must be copied verbatim from human writings. This task presents a challenging test of controllable generation, requiring models to satisfy a writing prompt, integrate disparate text fragments, and still produce a coherent narrative. To generate Frankentexts, we instruct the model to produce a draft by selecting and combining human-written passages, then iteratively revise the draft while maintaining a user-specified copy ratio. We evaluate the resulting Frankentexts along three axes: writing quality, instruction adherence, and detectability. Gemini-2.5-Pro performs surprisingly well on this task: 81% of its Frankentexts are coherent and 100% relevant to the prompt. Notably, up to 59% of these outputs are misclassified as human-written by detectors like Pangram, revealing limitations in AI text detectors. Human annotators can sometimes identify Frankentexts through their abrupt tone shifts and inconsistent grammar between segments, especially in longer generations. Beyond presenting a challenging generation task, Frankentexts invite discussion on building effective detectors for this new grey zone of authorship, provide training data for mixed authorship detection, and serve as a sandbox for studying human-AI co-writing processes.
△ Less
Submitted 28 May, 2025; v1 submitted 23 May, 2025;
originally announced May 2025.
-
Can Large Language Models Really Recognize Your Name?
Authors:
Dzung Pham,
Peter Kairouz,
Niloofar Mireshghallah,
Eugene Bagdasarian,
Chau Minh Pham,
Amir Houmansadr
Abstract:
Large language models (LLMs) are increasingly being used to protect sensitive user data. However, current LLM-based privacy solutions assume that these models can reliably detect personally identifiable information (PII), particularly named entities. In this paper, we challenge that assumption by revealing systematic failures in LLM-based privacy tasks. Specifically, we show that modern LLMs regul…
▽ More
Large language models (LLMs) are increasingly being used to protect sensitive user data. However, current LLM-based privacy solutions assume that these models can reliably detect personally identifiable information (PII), particularly named entities. In this paper, we challenge that assumption by revealing systematic failures in LLM-based privacy tasks. Specifically, we show that modern LLMs regularly overlook human names even in short text snippets due to ambiguous contexts, which cause the names to be misinterpreted or mishandled. We propose AMBENCH, a benchmark dataset of seemingly ambiguous human names, leveraging the name regularity bias phenomenon, embedded within concise text snippets along with benign prompt injections. Our experiments on modern LLMs tasked to detect PII as well as specialized tools show that recall of ambiguous names drops by 20--40% compared to more recognizable names. Furthermore, ambiguous human names are four times more likely to be ignored in supposedly privacy-preserving summaries generated by LLMs when benign prompt injections are present. These findings highlight the underexplored risks of relying solely on LLMs to safeguard user privacy and underscore the need for a more systematic investigation into their privacy failure modes.
△ Less
Submitted 20 May, 2025;
originally announced May 2025.
-
Transfer Learning from Visual Speech Recognition to Mouthing Recognition in German Sign Language
Authors:
Dinh Nam Pham,
Eleftherios Avramidis
Abstract:
Sign Language Recognition (SLR) systems primarily focus on manual gestures, but non-manual features such as mouth movements, specifically mouthing, provide valuable linguistic information. This work directly classifies mouthing instances to their corresponding words in the spoken language while exploring the potential of transfer learning from Visual Speech Recognition (VSR) to mouthing recognitio…
▽ More
Sign Language Recognition (SLR) systems primarily focus on manual gestures, but non-manual features such as mouth movements, specifically mouthing, provide valuable linguistic information. This work directly classifies mouthing instances to their corresponding words in the spoken language while exploring the potential of transfer learning from Visual Speech Recognition (VSR) to mouthing recognition in German Sign Language. We leverage three VSR datasets: one in English, one in German with unrelated words and one in German containing the same target words as the mouthing dataset, to investigate the impact of task similarity in this setting. Our results demonstrate that multi-task learning improves both mouthing recognition and VSR accuracy as well as model robustness, suggesting that mouthing recognition should be treated as a distinct but related task to VSR. This research contributes to the field of SLR by proposing knowledge transfer from VSR to SLR datasets with limited mouthing annotations.
△ Less
Submitted 19 May, 2025;
originally announced May 2025.
-
Development and evaluation of a deep learning algorithm for German word recognition from lip movements
Authors:
Dinh Nam Pham,
Torsten Rahne
Abstract:
When reading lips, many people benefit from additional visual information from the lip movements of the speaker, which is, however, very error prone. Algorithms for lip reading with artificial intelligence based on artificial neural networks significantly improve word recognition but are not available for the German language. A total of 1806 video clips with only one German-speaking person each we…
▽ More
When reading lips, many people benefit from additional visual information from the lip movements of the speaker, which is, however, very error prone. Algorithms for lip reading with artificial intelligence based on artificial neural networks significantly improve word recognition but are not available for the German language. A total of 1806 video clips with only one German-speaking person each were selected, split into word segments, and assigned to word classes using speech-recognition software. In 38,391 video segments with 32 speakers, 18 polysyllabic, visually distinguishable words were used to train and validate a neural network. The 3D Convolutional Neural Network and Gated Recurrent Units models and a combination of both models (GRUConv) were compared, as were different image sections and color spaces of the videos. The accuracy was determined in 5000 training epochs. Comparison of the color spaces did not reveal any relevant different correct classification rates in the range from 69% to 72%. With a cut to the lips, a significantly higher accuracy of 70% was achieved than when cut to the entire speaker's face (34%). With the GRUConv model, the maximum accuracies were 87% with known speakers and 63% in the validation with unknown speakers. The neural network for lip reading, which was first developed for the German language, shows a very high level of accuracy, comparable to English-language algorithms. It works with unknown speakers as well and can be generalized with more word classes.
△ Less
Submitted 22 April, 2025;
originally announced April 2025.
-
Persistent Homology-induced Graph Ensembles for Time Series Regressions
Authors:
Viet The Nguyen,
Duy Anh Pham,
An Thai Le,
Jans Peter,
Gunther Gust
Abstract:
The effectiveness of Spatio-temporal Graph Neural Networks (STGNNs) in time-series applications is often limited by their dependence on fixed, hand-crafted input graph structures. Motivated by insights from the Topological Data Analysis (TDA) paradigm, of which real-world data exhibits multi-scale patterns, we construct several graphs using Persistent Homology Filtration -- a mathematical framewor…
▽ More
The effectiveness of Spatio-temporal Graph Neural Networks (STGNNs) in time-series applications is often limited by their dependence on fixed, hand-crafted input graph structures. Motivated by insights from the Topological Data Analysis (TDA) paradigm, of which real-world data exhibits multi-scale patterns, we construct several graphs using Persistent Homology Filtration -- a mathematical framework describing the multiscale structural properties of data points. Then, we use the constructed graphs as an input to create an ensemble of Graph Neural Networks. The ensemble aggregates the signals from the individual learners via an attention-based routing mechanism, thus systematically encoding the inherent multiscale structures of data. Four different real-world experiments on seismic activity prediction and traffic forecasting (PEMS-BAY, METR-LA) demonstrate that our approach consistently outperforms single-graph baselines while providing interpretable insights.
△ Less
Submitted 19 March, 2025; v1 submitted 18 March, 2025;
originally announced March 2025.
-
ECLARE: Efficient cross-planar learning for anisotropic resolution enhancement
Authors:
Samuel W. Remedios,
Shuwen Wei,
Shuo Han,
Jinwei Zhang,
Aaron Carass,
Kurt G. Schilling,
Dzung L. Pham,
Jerry L. Prince,
Blake E. Dewey
Abstract:
In clinical imaging, magnetic resonance (MR) image volumes are often acquired as stacks of 2D slices with decreased scan times, improved signal-to-noise ratio, and image contrasts unique to 2D MR pulse sequences. While this is sufficient for clinical evaluation, automated algorithms designed for 3D analysis perform poorly on multi-slice 2D MR volumes, especially those with thick slices and gaps be…
▽ More
In clinical imaging, magnetic resonance (MR) image volumes are often acquired as stacks of 2D slices with decreased scan times, improved signal-to-noise ratio, and image contrasts unique to 2D MR pulse sequences. While this is sufficient for clinical evaluation, automated algorithms designed for 3D analysis perform poorly on multi-slice 2D MR volumes, especially those with thick slices and gaps between slices. Super-resolution (SR) methods aim to address this problem, but previous methods do not address all of the following: slice profile shape estimation, slice gap, domain shift, and non-integer or arbitrary upsampling factors. In this paper, we propose ECLARE (Efficient Cross-planar Learning for Anisotropic Resolution Enhancement), a self-SR method that addresses each of these factors. ECLARE uses a slice profile estimated from the multi-slice 2D MR volume, trains a network to learn the mapping from low-resolution to high-resolution in-plane patches from the same volume, and performs SR with anti-aliasing. We compared ECLARE to cubic B-spline interpolation, SMORE, and other contemporary SR methods. We used realistic and representative simulations so that quantitative performance against ground truth can be computed, and ECLARE outperformed all other methods in both signal recovery and downstream tasks. Importantly, as ECLARE does not use external training data it cannot suffer from domain shift between training and testing. Our code is open-source and available at https://www.github.com/sremedios/eclare.
△ Less
Submitted 21 May, 2025; v1 submitted 14 March, 2025;
originally announced March 2025.
-
A new framework for prognostics in decentralized industries: Enhancing fairness, security, and transparency through Blockchain and Federated Learning
Authors:
T. Q. D. Pham,
K. D. Tran,
Khanh T. P. Nguyen,
X. V. Tran,
L. Köehl,
K. P. Tran
Abstract:
As global industries transition towards Industry 5.0 predictive maintenance PM remains crucial for cost effective operations resilience and minimizing downtime in increasingly smart manufacturing environments In this chapter we explore how the integration of Federated Learning FL and blockchain BC technologies enhances the prediction of machinerys Remaining Useful Life RUL within decentralized and…
▽ More
As global industries transition towards Industry 5.0 predictive maintenance PM remains crucial for cost effective operations resilience and minimizing downtime in increasingly smart manufacturing environments In this chapter we explore how the integration of Federated Learning FL and blockchain BC technologies enhances the prediction of machinerys Remaining Useful Life RUL within decentralized and human centric industrial ecosystems Traditional centralized data approaches raise concerns over privacy security and scalability especially as Artificial intelligence AI driven smart manufacturing becomes more prevalent This chapter leverages FL to enable localized model training across multiple sites while utilizing BC to ensure trust transparency and data integrity across the network This BC integrated FL framework optimizes RUL predictions enhances data privacy and security establishes transparency and promotes collaboration in decentralized manufacturing It addresses key challenges such as maintaining privacy and security ensuring transparency and fairness and incentivizing participation in decentralized networks Experimental validation using the NASA CMAPSS dataset demonstrates the model effectiveness in real world scenarios and we extend our findings to the broader research community through open source code on GitHub inviting collaborative development to drive innovation in Industry 5.0
△ Less
Submitted 8 April, 2025; v1 submitted 17 February, 2025;
originally announced March 2025.
-
SHACL-SKOS Based Knowledge Representation of Material Safety Data Sheet (SDS) for the Pharmaceutical Industry
Authors:
Brian Lu,
Dennis Pham,
Ti-Chiun Chang,
Michael Lovette,
Terri Bui,
Stephen Ma
Abstract:
We report the development of a knowledge representation and reasoning (KRR) system built on hybrid SHACL-SKOS ontologies for globally harmonized system (GHS) material Safety Data Sheets (SDS) to enhance chemical safety communication and regulatory compliance. SDS are comprehensive documents containing safety and handling information for chemical substances. Thus, they are an essential part of work…
▽ More
We report the development of a knowledge representation and reasoning (KRR) system built on hybrid SHACL-SKOS ontologies for globally harmonized system (GHS) material Safety Data Sheets (SDS) to enhance chemical safety communication and regulatory compliance. SDS are comprehensive documents containing safety and handling information for chemical substances. Thus, they are an essential part of workplace safety and risk management. However, the vast number of Safety Data Sheets from multiple organizations, manufacturers, and suppliers that produce and distribute chemicals makes it challenging to centralize and access SDS documents through a single repository. To accomplish the underlying issues of data exchange related to chemical shipping and handling, we construct SDS related controlled vocabulary and conditions validated by SHACL, and knowledge systems of similar domains linked via SKOS. The resulting hybrid ontologies aim to provide standardized yet adaptable representations of SDS information, facilitating better data sharing, retrieval, and integration across various platforms. This paper outlines our SHACL-SKOS system architectural design and showcases our implementation for an industrial application streamlining the generation of a composite shipping cover sheet.
△ Less
Submitted 11 February, 2025;
originally announced February 2025.
-
On the Emergence of Thinking in LLMs I: Searching for the Right Intuition
Authors:
Guanghao Ye,
Khiem Duc Pham,
Xinzhi Zhang,
Sivakanth Gopi,
Baolin Peng,
Beibin Li,
Janardhan Kulkarni,
Huseyin A. Inan
Abstract:
Recent AI advancements, such as OpenAI's new models, are transforming LLMs into LRMs (Large Reasoning Models) that perform reasoning during inference, taking extra time and compute for higher-quality outputs. We aim to uncover the algorithmic framework for training LRMs. Methods like self-consistency, PRM, and AlphaZero suggest reasoning as guided search. We ask: what is the simplest, most scalabl…
▽ More
Recent AI advancements, such as OpenAI's new models, are transforming LLMs into LRMs (Large Reasoning Models) that perform reasoning during inference, taking extra time and compute for higher-quality outputs. We aim to uncover the algorithmic framework for training LRMs. Methods like self-consistency, PRM, and AlphaZero suggest reasoning as guided search. We ask: what is the simplest, most scalable way to enable search in LLMs?
We propose a post-training framework called Reinforcement Learning via Self-Play (RLSP). RLSP involves three steps: (1) supervised fine-tuning with human or synthetic demonstrations of the reasoning process, (2) using an exploration reward signal to encourage diverse and efficient reasoning behaviors, and (3) RL training with an outcome verifier to ensure correctness while preventing reward hacking. Our key innovation is to decouple exploration and correctness signals during PPO training, carefully balancing them to improve performance and efficiency.
Empirical studies in the math domain show that RLSP improves reasoning. On the Llama-3.1-8B-Instruct model, RLSP can boost performance by 23% in MATH-500 test set; On AIME 2024 math problems, Qwen2.5-32B-Instruct improved by 10% due to RLSP. However, a more important finding of this work is that the models trained using RLSP, even with the simplest exploration reward that encourages the model to take more intermediate steps, showed several emergent behaviors such as backtracking, exploration of ideas, and verification. These findings demonstrate that RLSP framework might be enough to enable emergence of complex reasoning abilities in LLMs when scaled. Lastly, we propose a theory as to why RLSP search strategy is more suitable for LLMs inspired by a remarkable result that says CoT provably increases computational power of LLMs, which grows as the number of steps in CoT \cite{li2024chain,merrill2023expresssive}.
△ Less
Submitted 10 February, 2025;
originally announced February 2025.
-
How Do Developers Use Code Suggestions in Pull Request Reviews?
Authors:
Abir Bouraffa,
Yen Dieu Pham,
Walid Maalej
Abstract:
GitHub introduced the suggestion feature to enable reviewers to explicitly suggest code modifications in pull requests. These suggestions make the reviewers' feedback more actionable for the submitters and represent a valuable knowledge for newcomers. Still, little is known about how code review suggestions are used by developers, what impact they have on pull requests, and how they are influenced…
▽ More
GitHub introduced the suggestion feature to enable reviewers to explicitly suggest code modifications in pull requests. These suggestions make the reviewers' feedback more actionable for the submitters and represent a valuable knowledge for newcomers. Still, little is known about how code review suggestions are used by developers, what impact they have on pull requests, and how they are influenced by social coding dynamics. To bridge this knowledge gap, we conducted an empirical study on pull requests from 46 engineered GitHub projects, in which developers used code review suggestions. We applied an open coding approach to uncover the types of suggestions and their usage frequency. We also mined pull request characteristics and assessed the impact of using suggestions on merge rate, resolution time, and code complexity. Furthermore, we conducted a survey with contributors of the studied projects to gain insights about the influence of social factors on the usage and acceptance of code review suggestions. We were able to uncover four suggestion types: code style suggestions, improvements, fixes, and documentation with improvements being the most frequent. We found that the use of suggestions positively affects the merge rate of pull requests but significantly increases resolution time without leading to a decrease in code complexity. Our survey results show that suggestions are more likely to be used by reviewers when the submitter is a newcomer. The results also show that developers mostly search suggestions when tracking rationale or looking for code examples. Our work provides insights on the usage of code suggestions and their potential as a knowledge sharing tool.
△ Less
Submitted 7 February, 2025;
originally announced February 2025.
-
Humanity's Last Exam
Authors:
Long Phan,
Alice Gatti,
Ziwen Han,
Nathaniel Li,
Josephina Hu,
Hugh Zhang,
Chen Bo Calvin Zhang,
Mohamed Shaaban,
John Ling,
Sean Shi,
Michael Choi,
Anish Agrawal,
Arnav Chopra,
Adam Khoja,
Ryan Kim,
Richard Ren,
Jason Hausenloy,
Oliver Zhang,
Mantas Mazeika,
Dmitry Dodonov,
Tung Nguyen,
Jaeho Lee,
Daron Anderson,
Mikhail Doroshenko,
Alun Cennyth Stokes
, et al. (1084 additional authors not shown)
Abstract:
Benchmarks are important tools for tracking the rapid advancements in large language model (LLM) capabilities. However, benchmarks are not keeping pace in difficulty: LLMs now achieve over 90\% accuracy on popular benchmarks like MMLU, limiting informed measurement of state-of-the-art LLM capabilities. In response, we introduce Humanity's Last Exam (HLE), a multi-modal benchmark at the frontier of…
▽ More
Benchmarks are important tools for tracking the rapid advancements in large language model (LLM) capabilities. However, benchmarks are not keeping pace in difficulty: LLMs now achieve over 90\% accuracy on popular benchmarks like MMLU, limiting informed measurement of state-of-the-art LLM capabilities. In response, we introduce Humanity's Last Exam (HLE), a multi-modal benchmark at the frontier of human knowledge, designed to be the final closed-ended academic benchmark of its kind with broad subject coverage. HLE consists of 2,500 questions across dozens of subjects, including mathematics, humanities, and the natural sciences. HLE is developed globally by subject-matter experts and consists of multiple-choice and short-answer questions suitable for automated grading. Each question has a known solution that is unambiguous and easily verifiable, but cannot be quickly answered via internet retrieval. State-of-the-art LLMs demonstrate low accuracy and calibration on HLE, highlighting a significant gap between current LLM capabilities and the expert human frontier on closed-ended academic questions. To inform research and policymaking upon a clear understanding of model capabilities, we publicly release HLE at https://lastexam.ai.
△ Less
Submitted 19 April, 2025; v1 submitted 24 January, 2025;
originally announced January 2025.
-
Local Control Networks (LCNs): Optimizing Flexibility in Neural Network Data Pattern Capture
Authors:
Hy Nguyen,
Duy Khoa Pham,
Srikanth Thudumu,
Hung Du,
Rajesh Vasa,
Kon Mouzakis
Abstract:
The widespread use of Multi-layer perceptrons (MLPs) often relies on a fixed activation function (e.g., ReLU, Sigmoid, Tanh) for all nodes within the hidden layers. While effective in many scenarios, this uniformity may limit the networks ability to capture complex data patterns. We argue that employing the same activation function at every node is suboptimal and propose leveraging different activ…
▽ More
The widespread use of Multi-layer perceptrons (MLPs) often relies on a fixed activation function (e.g., ReLU, Sigmoid, Tanh) for all nodes within the hidden layers. While effective in many scenarios, this uniformity may limit the networks ability to capture complex data patterns. We argue that employing the same activation function at every node is suboptimal and propose leveraging different activation functions at each node to increase flexibility and adaptability. To achieve this, we introduce Local Control Networks (LCNs), which leverage B-spline functions to enable distinct activation curves at each node. Our mathematical analysis demonstrates the properties and benefits of LCNs over conventional MLPs. In addition, we demonstrate that more complex architectures, such as Kolmogorov-Arnold Networks (KANs), are unnecessary in certain scenarios, and LCNs can be a more efficient alternative. Empirical experiments on various benchmarks and datasets validate our theoretical findings. In computer vision tasks, LCNs achieve marginal improvements over MLPs and outperform KANs by approximately 5\%, while also being more computationally efficient than KANs. In basic machine learning tasks, LCNs show a 1\% improvement over MLPs and a 0.6\% improvement over KANs. For symbolic formula representation tasks, LCNs perform on par with KANs, with both architectures outperforming MLPs. Our findings suggest that diverse activations at the node level can lead to improved performance and efficiency.
△ Less
Submitted 25 April, 2025; v1 submitted 23 January, 2025;
originally announced January 2025.
-
MERaLiON-TextLLM: Cross-Lingual Understanding of Large Language Models in Chinese, Indonesian, Malay, and Singlish
Authors:
Xin Huang,
Tarun Kumar Vangani,
Minh Duc Pham,
Xunlong Zou,
Bin Wang,
Zhengyuan Liu,
Ai Ti Aw
Abstract:
Multilingual large language models (MLLMs) have shown impressive capabilities across a variety of languages. However, efficacy can differ greatly between different language families, especially for those with limited linguistic resources. This report presents MERaLiON-TextLLM, a series of open-source language models specifically tailored to improve understanding and generation in Chinese, Indonesi…
▽ More
Multilingual large language models (MLLMs) have shown impressive capabilities across a variety of languages. However, efficacy can differ greatly between different language families, especially for those with limited linguistic resources. This report presents MERaLiON-TextLLM, a series of open-source language models specifically tailored to improve understanding and generation in Chinese, Indonesian, Malay, and Singlish. The initial released model is built on Llama-3-8B-Base and refined through a meticulously crafted process of continued pre-training and weight merging. Our approach achieves performance improvements across benchmarks in these languages, exceeding the capabilities of the official Llama-3 models. We provide the model checkpoints as a resource to support further research and development in cross-lingual language understanding.
△ Less
Submitted 21 January, 2025; v1 submitted 21 December, 2024;
originally announced January 2025.
-
AdaCS: Adaptive Normalization for Enhanced Code-Switching ASR
Authors:
The Chuong Chu,
Vu Tuan Dat Pham,
Kien Dao,
Hoang Nguyen,
Quoc Hung Truong
Abstract:
Intra-sentential code-switching (CS) refers to the alternation between languages that happens within a single utterance and is a significant challenge for Automatic Speech Recognition (ASR) systems. For example, when a Vietnamese speaker uses foreign proper names or specialized terms within their speech. ASR systems often struggle to accurately transcribe intra-sentential CS due to their training…
▽ More
Intra-sentential code-switching (CS) refers to the alternation between languages that happens within a single utterance and is a significant challenge for Automatic Speech Recognition (ASR) systems. For example, when a Vietnamese speaker uses foreign proper names or specialized terms within their speech. ASR systems often struggle to accurately transcribe intra-sentential CS due to their training on monolingual data and the unpredictable nature of CS. This issue is even more pronounced for low-resource languages, where limited data availability hinders the development of robust models. In this study, we propose AdaCS, a normalization model integrates an adaptive bias attention module (BAM) into encoder-decoder network. This novel approach provides a robust solution to CS ASR in unseen domains, thereby significantly enhancing our contribution to the field. By utilizing BAM to both identify and normalize CS phrases, AdaCS enhances its adaptive capabilities with a biased list of words provided during inference. Our method demonstrates impressive performance and the ability to handle unseen CS phrases across various domains. Experiments show that AdaCS outperforms previous state-of-the-art method on Vietnamese CS ASR normalization by considerable WER reduction of 56.2% and 36.8% on the two proposed test sets.
△ Less
Submitted 13 January, 2025;
originally announced January 2025.
-
SharpDepth: Sharpening Metric Depth Predictions Using Diffusion Distillation
Authors:
Duc-Hai Pham,
Tung Do,
Phong Nguyen,
Binh-Son Hua,
Khoi Nguyen,
Rang Nguyen
Abstract:
We propose SharpDepth, a novel approach to monocular metric depth estimation that combines the metric accuracy of discriminative depth estimation methods (e.g., Metric3D, UniDepth) with the fine-grained boundary sharpness typically achieved by generative methods (e.g., Marigold, Lotus). Traditional discriminative models trained on real-world data with sparse ground-truth depth can accurately predi…
▽ More
We propose SharpDepth, a novel approach to monocular metric depth estimation that combines the metric accuracy of discriminative depth estimation methods (e.g., Metric3D, UniDepth) with the fine-grained boundary sharpness typically achieved by generative methods (e.g., Marigold, Lotus). Traditional discriminative models trained on real-world data with sparse ground-truth depth can accurately predict metric depth but often produce over-smoothed or low-detail depth maps. Generative models, in contrast, are trained on synthetic data with dense ground truth, generating depth maps with sharp boundaries yet only providing relative depth with low accuracy. Our approach bridges these limitations by integrating metric accuracy with detailed boundary preservation, resulting in depth predictions that are both metrically precise and visually sharp. Our extensive zero-shot evaluations on standard depth estimation benchmarks confirm SharpDepth effectiveness, showing its ability to achieve both high depth accuracy and detailed representation, making it well-suited for applications requiring high-quality depth perception across diverse, real-world environments.
△ Less
Submitted 27 November, 2024;
originally announced November 2024.
-
Equivariant Polynomial Functional Networks
Authors:
Thieu N. Vo,
Viet-Hoang Tran,
Tho Tran Huu,
An Nguyen The,
Thanh Tran,
Minh-Khoi Nguyen-Nhat,
Duy-Tung Pham,
Tan Minh Nguyen
Abstract:
Neural Functional Networks (NFNs) have gained increasing interest due to their wide range of applications, including extracting information from implicit representations of data, editing network weights, and evaluating policies. A key design principle of NFNs is their adherence to the permutation and scaling symmetries inherent in the connectionist structure of the input neural networks. Recent NF…
▽ More
Neural Functional Networks (NFNs) have gained increasing interest due to their wide range of applications, including extracting information from implicit representations of data, editing network weights, and evaluating policies. A key design principle of NFNs is their adherence to the permutation and scaling symmetries inherent in the connectionist structure of the input neural networks. Recent NFNs have been proposed with permutation and scaling equivariance based on either graph-based message-passing mechanisms or parameter-sharing mechanisms. However, graph-based equivariant NFNs suffer from high memory consumption and long running times. On the other hand, parameter-sharing-based NFNs built upon equivariant linear layers exhibit lower memory consumption and faster running time, yet their expressivity is limited due to the large size of the symmetric group of the input neural networks. The challenge of designing a permutation and scaling equivariant NFN that maintains low memory consumption and running time while preserving expressivity remains unresolved. In this paper, we propose a novel solution with the development of MAGEP-NFN (Monomial mAtrix Group Equivariant Polynomial NFN). Our approach follows the parameter-sharing mechanism but differs from previous works by constructing a nonlinear equivariant layer represented as a polynomial in the input weights. This polynomial formulation enables us to incorporate additional relationships between weights from different input hidden layers, enhancing the model's expressivity while keeping memory consumption and running time low, thereby addressing the aforementioned challenge. We provide empirical evidence demonstrating that MAGEP-NFN achieves competitive performance and efficiency compared to existing baselines.
△ Less
Submitted 5 October, 2024;
originally announced October 2024.
-
Equivariant Neural Functional Networks for Transformers
Authors:
Viet-Hoang Tran,
Thieu N. Vo,
An Nguyen The,
Tho Tran Huu,
Minh-Khoi Nguyen-Nhat,
Thanh Tran,
Duy-Tung Pham,
Tan Minh Nguyen
Abstract:
This paper systematically explores neural functional networks (NFN) for transformer architectures. NFN are specialized neural networks that treat the weights, gradients, or sparsity patterns of a deep neural network (DNN) as input data and have proven valuable for tasks such as learnable optimizers, implicit data representations, and weight editing. While NFN have been extensively developed for ML…
▽ More
This paper systematically explores neural functional networks (NFN) for transformer architectures. NFN are specialized neural networks that treat the weights, gradients, or sparsity patterns of a deep neural network (DNN) as input data and have proven valuable for tasks such as learnable optimizers, implicit data representations, and weight editing. While NFN have been extensively developed for MLP and CNN, no prior work has addressed their design for transformers, despite the importance of transformers in modern deep learning. This paper aims to address this gap by providing a systematic study of NFN for transformers. We first determine the maximal symmetric group of the weights in a multi-head attention module as well as a necessary and sufficient condition under which two sets of hyperparameters of the multi-head attention module define the same function. We then define the weight space of transformer architectures and its associated group action, which leads to the design principles for NFN in transformers. Based on these, we introduce Transformer-NFN, an NFN that is equivariant under this group action. Additionally, we release a dataset of more than 125,000 Transformers model checkpoints trained on two datasets with two different tasks, providing a benchmark for evaluating Transformer-NFN and encouraging further research on transformer training and performance.
△ Less
Submitted 7 March, 2025; v1 submitted 5 October, 2024;
originally announced October 2024.
-
Demystifying the Token Dynamics of Deep Selective State Space Models
Authors:
Thieu N Vo,
Tung D. Pham,
Xin T. Tong,
Tan Minh Nguyen
Abstract:
Selective state space models (SSM), such as Mamba, have gained prominence for their effectiveness in modeling sequential data. Despite their outstanding empirical performance, a comprehensive theoretical understanding of deep selective SSM remains elusive, hindering their further development and adoption for applications that need high fidelity. In this paper, we investigate the dynamical properti…
▽ More
Selective state space models (SSM), such as Mamba, have gained prominence for their effectiveness in modeling sequential data. Despite their outstanding empirical performance, a comprehensive theoretical understanding of deep selective SSM remains elusive, hindering their further development and adoption for applications that need high fidelity. In this paper, we investigate the dynamical properties of tokens in a pre-trained Mamba model. In particular, we derive the dynamical system governing the continuous-time limit of the Mamba model and characterize the asymptotic behavior of its solutions. In the one-dimensional case, we prove that only one of the following two scenarios happens: either all tokens converge to zero, or all tokens diverge to infinity. We provide criteria based on model parameters to determine when each scenario occurs. For the convergent scenario, we empirically verify that this scenario negatively impacts the model's performance. For the divergent scenario, we prove that different tokens will diverge to infinity at different rates, thereby contributing unequally to the updates during model training. Based on these investigations, we propose two refinements for the model: excluding the convergent scenario and reordering tokens based on their importance scores, both aimed at improving practical performance. Our experimental results validate these refinements, offering insights into enhancing Mamba's effectiveness in real-world applications.
△ Less
Submitted 7 March, 2025; v1 submitted 4 October, 2024;
originally announced October 2024.
-
NeuroMax: Enhancing Neural Topic Modeling via Maximizing Mutual Information and Group Topic Regularization
Authors:
Duy-Tung Pham,
Thien Trang Nguyen Vu,
Tung Nguyen,
Linh Ngo Van,
Duc Anh Nguyen,
Thien Huu Nguyen
Abstract:
Recent advances in neural topic models have concentrated on two primary directions: the integration of the inference network (encoder) with a pre-trained language model (PLM) and the modeling of the relationship between words and topics in the generative model (decoder). However, the use of large PLMs significantly increases inference costs, making them less practical for situations requiring low…
▽ More
Recent advances in neural topic models have concentrated on two primary directions: the integration of the inference network (encoder) with a pre-trained language model (PLM) and the modeling of the relationship between words and topics in the generative model (decoder). However, the use of large PLMs significantly increases inference costs, making them less practical for situations requiring low inference times. Furthermore, it is crucial to simultaneously model the relationships between topics and words as well as the interrelationships among topics themselves. In this work, we propose a novel framework called NeuroMax (Neural Topic Model with Maximizing Mutual Information with Pretrained Language Model and Group Topic Regularization) to address these challenges. NeuroMax maximizes the mutual information between the topic representation obtained from the encoder in neural topic models and the representation derived from the PLM. Additionally, NeuroMax employs optimal transport to learn the relationships between topics by analyzing how information is transported among them. Experimental results indicate that NeuroMax reduces inference time, generates more coherent topics and topic groups, and produces more representative document embeddings, thereby enhancing performance on downstream tasks.
△ Less
Submitted 29 September, 2024;
originally announced September 2024.
-
Towards Reliable Medical Question Answering: Techniques and Challenges in Mitigating Hallucinations in Language Models
Authors:
Duy Khoa Pham,
Bao Quoc Vo
Abstract:
The rapid advancement of large language models (LLMs) has significantly impacted various domains, including healthcare and biomedicine. However, the phenomenon of hallucination, where LLMs generate outputs that deviate from factual accuracy or context, poses a critical challenge, especially in high-stakes domains. This paper conducts a scoping study of existing techniques for mitigating hallucinat…
▽ More
The rapid advancement of large language models (LLMs) has significantly impacted various domains, including healthcare and biomedicine. However, the phenomenon of hallucination, where LLMs generate outputs that deviate from factual accuracy or context, poses a critical challenge, especially in high-stakes domains. This paper conducts a scoping study of existing techniques for mitigating hallucinations in knowledge-based task in general and especially for medical domains. Key methods covered in the paper include Retrieval-Augmented Generation (RAG)-based techniques, iterative feedback loops, supervised fine-tuning, and prompt engineering. These techniques, while promising in general contexts, require further adaptation and optimization for the medical domain due to its unique demands for up-to-date, specialized knowledge and strict adherence to medical guidelines. Addressing these challenges is crucial for developing trustworthy AI systems that enhance clinical decision-making and patient safety as well as accuracy of biomedical scientific research.
△ Less
Submitted 25 August, 2024;
originally announced August 2024.
-
Vintern-1B: An Efficient Multimodal Large Language Model for Vietnamese
Authors:
Khang T. Doan,
Bao G. Huynh,
Dung T. Hoang,
Thuc D. Pham,
Nhat H. Pham,
Quan T. M. Nguyen,
Bang Q. Vo,
Suong N. Hoang
Abstract:
In this report, we introduce Vintern-1B, a reliable 1-billion-parameters multimodal large language model (MLLM) for Vietnamese language tasks. By integrating the Qwen2-0.5B-Instruct language model with the InternViT-300M-448px visual model, Vintern-1B is optimized for a range of applications, including optical character recognition (OCR), document extraction, and general question-answering in Viet…
▽ More
In this report, we introduce Vintern-1B, a reliable 1-billion-parameters multimodal large language model (MLLM) for Vietnamese language tasks. By integrating the Qwen2-0.5B-Instruct language model with the InternViT-300M-448px visual model, Vintern-1B is optimized for a range of applications, including optical character recognition (OCR), document extraction, and general question-answering in Vietnamese context. The model is fine-tuned on an extensive dataset of over 3 million image-question-answer pairs, achieving robust performance and reliable results across multiple Vietnamese language benchmarks like OpenViVQA and ViTextVQA. Vintern-1B is small enough to fit into various on-device applications easily. Additionally, we have open-sourced several Vietnamese vision question answering (VQA) datasets for text and diagrams, created with Gemini 1.5 Flash. Our models are available at: https://huggingface.co/5CD-AI/Vintern-1B-v2.
△ Less
Submitted 23 August, 2024; v1 submitted 22 August, 2024;
originally announced August 2024.
-
Semi-supervised 3D Semantic Scene Completion with 2D Vision Foundation Model Guidance
Authors:
Duc-Hai Pham,
Duc-Dung Nguyen,
Anh Pham,
Tuan Ho,
Phong Nguyen,
Khoi Nguyen,
Rang Nguyen
Abstract:
Accurate prediction of 3D semantic occupancy from 2D visual images is vital in enabling autonomous agents to comprehend their surroundings for planning and navigation. State-of-the-art methods typically employ fully supervised approaches, necessitating a huge labeled dataset acquired through expensive LiDAR sensors and meticulous voxel-wise labeling by human annotators. The resource-intensive natu…
▽ More
Accurate prediction of 3D semantic occupancy from 2D visual images is vital in enabling autonomous agents to comprehend their surroundings for planning and navigation. State-of-the-art methods typically employ fully supervised approaches, necessitating a huge labeled dataset acquired through expensive LiDAR sensors and meticulous voxel-wise labeling by human annotators. The resource-intensive nature of this annotating process significantly hampers the application and scalability of these methods. We introduce a novel semi-supervised framework to alleviate the dependency on densely annotated data. Our approach leverages 2D foundation models to generate essential 3D scene geometric and semantic cues, facilitating a more efficient training process. Our framework exhibits notable properties: (1) Generalizability, applicable to various 3D semantic scene completion approaches, including 2D-3D lifting and 3D-2D transformer methods. (2) Effectiveness, as demonstrated through experiments on SemanticKITTI and NYUv2, wherein our method achieves up to 85% of the fully-supervised performance using only 10% labeled data. This approach not only reduces the cost and labor associated with data annotation but also demonstrates the potential for broader adoption in camera-based systems for 3D semantic occupancy prediction.
△ Less
Submitted 9 January, 2025; v1 submitted 21 August, 2024;
originally announced August 2024.
-
ProxyGPT: Enabling User Anonymity in LLM Chatbots via (Un)Trustworthy Volunteer Proxies
Authors:
Dzung Pham,
Jade Sheffey,
Chau Minh Pham,
Amir Houmansadr
Abstract:
Popular large language model (LLM) chatbots such as ChatGPT and Claude require users to create an account with an email or a phone number before allowing full access to their services. This practice ties users' personally identifiable information (PII) to their sensitive conversational data, thus posing significant privacy risks. Unfortunately, existing private LLM solutions based on cryptography…
▽ More
Popular large language model (LLM) chatbots such as ChatGPT and Claude require users to create an account with an email or a phone number before allowing full access to their services. This practice ties users' personally identifiable information (PII) to their sensitive conversational data, thus posing significant privacy risks. Unfortunately, existing private LLM solutions based on cryptography or trusted execution environments (TEEs) remain unpopular due to their prohibitive computational expense and platform restrictions. To enable practical user anonymity in LLM chatbots, we propose ProxyGPT, a privacy-enhancing system that leverages browser interaction proxies to submit user queries on their behalf. Unlike traditional proxy systems, ProxyGPT operates at the "user" layer by proxying user interactions with the browser in identity-required environments, thus easily supporting a wide range of chatbot services. We prevent malicious proxies by performing regular integrity audits using modern web proof protocols for TLS data provenance. We further utilize state-of-the-art LLM prompt guards on the proxy's side to mitigate unwanted user requests. Additionally, we incorporate a give-and-take economy based on Chaum's blind-signature e-cash to incentivize ProxyGPT users to proxy for others. Our system evaluation and user study demonstrate the practicality of our approach, as each chat request only takes a few additional seconds on average to fully complete. To the best of our knowledge, ProxyGPT is the first comprehensive proxy-based solution for privacy-preserving AI chatbots.
△ Less
Submitted 11 June, 2025; v1 submitted 11 July, 2024;
originally announced July 2024.
-
Watching Swarm Dynamics from Above: A Framework for Advanced Object Tracking in Drone Videos
Authors:
Duc Pham,
Matthew Hansen,
Félicie Dhellemmes,
Jens Krause,
Pia Bideau
Abstract:
Easily accessible sensors, like drones with diverse onboard sensors, have greatly expanded studying animal behavior in natural environments. Yet, analyzing vast, unlabeled video data, often spanning hours, remains a challenge for machine learning, especially in computer vision. Existing approaches often analyze only a few frames. Our focus is on long-term animal behavior analysis. To address this…
▽ More
Easily accessible sensors, like drones with diverse onboard sensors, have greatly expanded studying animal behavior in natural environments. Yet, analyzing vast, unlabeled video data, often spanning hours, remains a challenge for machine learning, especially in computer vision. Existing approaches often analyze only a few frames. Our focus is on long-term animal behavior analysis. To address this challenge, we utilize classical probabilistic methods for state estimation, such as particle filtering. By incorporating recent advancements in semantic object segmentation, we enable continuous tracking of rapidly evolving object formations, even in scenarios with limited data availability. Particle filters offer a provably optimal algorithmic structure for recursively adding new incoming information. We propose a novel approach for tracking schools of fish in the open ocean from drone videos. Our framework not only performs classical object tracking in 2D, instead it tracks the position and spatial expansion of the fish school in world coordinates by fusing video data and the drone's on board sensor information (GPS and IMU). The presented framework for the first time allows researchers to study collective behavior of fish schools in its natural social and environmental context in a non-invasive and scalable way.
△ Less
Submitted 11 June, 2024;
originally announced June 2024.
-
Evolutionary Multi-Objective Optimisation for Fairness-Aware Self Adjusting Memory Classifiers in Data Streams
Authors:
Pivithuru Thejan Amarasinghe,
Diem Pham,
Binh Tran,
Su Nguyen,
Yuan Sun,
Damminda Alahakoon
Abstract:
This paper introduces a novel approach, evolutionary multi-objective optimisation for fairness-aware self-adjusting memory classifiers, designed to enhance fairness in machine learning algorithms applied to data stream classification. With the growing concern over discrimination in algorithmic decision-making, particularly in dynamic data stream environments, there is a need for methods that ensur…
▽ More
This paper introduces a novel approach, evolutionary multi-objective optimisation for fairness-aware self-adjusting memory classifiers, designed to enhance fairness in machine learning algorithms applied to data stream classification. With the growing concern over discrimination in algorithmic decision-making, particularly in dynamic data stream environments, there is a need for methods that ensure fair treatment of individuals across sensitive attributes like race or gender. The proposed approach addresses this challenge by integrating the strengths of the self-adjusting memory K-Nearest-Neighbour algorithm with evolutionary multi-objective optimisation. This combination allows the new approach to efficiently manage concept drift in streaming data and leverage the flexibility of evolutionary multi-objective optimisation to maximise accuracy and minimise discrimination simultaneously. We demonstrate the effectiveness of the proposed approach through extensive experiments on various datasets, comparing its performance against several baseline methods in terms of accuracy and fairness metrics. Our results show that the proposed approach maintains competitive accuracy and significantly reduces discrimination, highlighting its potential as a robust solution for fairness-aware data stream classification. Further analyses also confirm the effectiveness of the strategies to trigger evolutionary multi-objective optimisation and adapt classifiers in the proposed approach.
△ Less
Submitted 18 April, 2024;
originally announced April 2024.
-
Prioritized Multi-Tenant Traffic Engineering for Dynamic QoS Provisioning in Autonomous SDN-OpenFlow Edge Networks
Authors:
Mohammad Sajid Shahriar,
Faisal Ahmed,
Genshe Chen,
Khanh D. Pham,
Suresh Subramaniam,
Motoharu Matsuura,
Hiroshi Hasegawa,
Shih-Chun Lin
Abstract:
This letter indicates the critical need for prioritized multi-tenant quality-of-service (QoS) management by emerging mobile edge systems, particularly for high-throughput beyond fifth-generation networks. Existing traffic engineering tools utilize complex functions baked into closed, proprietary infrastructures, largely limiting design flexibility, scalability, and adaptiveness. Hence, this study…
▽ More
This letter indicates the critical need for prioritized multi-tenant quality-of-service (QoS) management by emerging mobile edge systems, particularly for high-throughput beyond fifth-generation networks. Existing traffic engineering tools utilize complex functions baked into closed, proprietary infrastructures, largely limiting design flexibility, scalability, and adaptiveness. Hence, this study introduces a software-defined networking (SDN)-based dynamic QoS provisioning scheme that prioritizes multi-tenant network traffic while focusing on the base station-edge cloud scenario. The designed scheme first separates control and data planes and enables traffic management automation using SDN programmability. It then implements dynamic QoS management via the SDN-OpenFlow protocol, which ensures ample bandwidth for multiple priority flows and efficiently manages the remaining bandwidth for non-priority traffic. Empirical experiments are conducted with a Mininet network emulator and an OpenDayLight controller. Performance evaluation validates the proposed scheme's effectiveness in meeting multi-tenant QoS criteria, offering a robust solution for traffic prioritization in SDN-based edge networks.
△ Less
Submitted 23 March, 2024;
originally announced March 2024.
-
Is Registering Raw Tagged-MR Enough for Strain Estimation in the Era of Deep Learning?
Authors:
Zhangxing Bian,
Ahmed Alshareef,
Shuwen Wei,
Junyu Chen,
Yuli Wang,
Jonghye Woo,
Dzung L. Pham,
Jiachen Zhuo,
Aaron Carass,
Jerry L. Prince
Abstract:
Magnetic Resonance Imaging with tagging (tMRI) has long been utilized for quantifying tissue motion and strain during deformation. However, a phenomenon known as tag fading, a gradual decrease in tag visibility over time, often complicates post-processing. The first contribution of this study is to model tag fading by considering the interplay between $T_1$ relaxation and the repeated application…
▽ More
Magnetic Resonance Imaging with tagging (tMRI) has long been utilized for quantifying tissue motion and strain during deformation. However, a phenomenon known as tag fading, a gradual decrease in tag visibility over time, often complicates post-processing. The first contribution of this study is to model tag fading by considering the interplay between $T_1$ relaxation and the repeated application of radio frequency (RF) pulses during serial imaging sequences. This is a factor that has been overlooked in prior research on tMRI post-processing. Further, we have observed an emerging trend of utilizing raw tagged MRI within a deep learning-based (DL) registration framework for motion estimation. In this work, we evaluate and analyze the impact of commonly used image similarity objectives in training DL registrations on raw tMRI. This is then compared with the Harmonic Phase-based approach, a traditional approach which is claimed to be robust to tag fading. Our findings, derived from both simulated images and an actual phantom scan, reveal the limitations of various similarity losses in raw tMRI and emphasize caution in registration tasks where image intensity changes over time.
△ Less
Submitted 30 January, 2024;
originally announced January 2024.
-
Improving Graph Convolutional Networks with Transformer Layer in social-based items recommendation
Authors:
Thi Linh Hoang,
Tuan Dung Pham,
Viet Cuong Ta
Abstract:
In this work, we have proposed an approach for improving the GCN for predicting ratings in social networks. Our model is expanded from the standard model with several layers of transformer architecture. The main focus of the paper is on the encoder architecture for node embedding in the network. Using the embedding layer from the graph-based convolution layer, the attention mechanism could rearran…
▽ More
In this work, we have proposed an approach for improving the GCN for predicting ratings in social networks. Our model is expanded from the standard model with several layers of transformer architecture. The main focus of the paper is on the encoder architecture for node embedding in the network. Using the embedding layer from the graph-based convolution layer, the attention mechanism could rearrange the feature space to get a more efficient embedding for the downstream task. The experiments showed that our proposed architecture achieves better performance than GCN on the traditional link prediction task.
△ Less
Submitted 12 January, 2024;
originally announced January 2024.
-
Solving Label Variation in Scientific Information Extraction via Multi-Task Learning
Authors:
Dong Pham,
Xanh Ho,
Quang-Thuy Ha,
Akiko Aizawa
Abstract:
Scientific Information Extraction (ScientificIE) is a critical task that involves the identification of scientific entities and their relationships. The complexity of this task is compounded by the necessity for domain-specific knowledge and the limited availability of annotated data. Two of the most popular datasets for ScientificIE are SemEval-2018 Task-7 and SciERC. They have overlapping sample…
▽ More
Scientific Information Extraction (ScientificIE) is a critical task that involves the identification of scientific entities and their relationships. The complexity of this task is compounded by the necessity for domain-specific knowledge and the limited availability of annotated data. Two of the most popular datasets for ScientificIE are SemEval-2018 Task-7 and SciERC. They have overlapping samples and differ in their annotation schemes, which leads to conflicts. In this study, we first introduced a novel approach based on multi-task learning to address label variations. We then proposed a soft labeling technique that converts inconsistent labels into probabilistic distributions. The experimental results demonstrated that the proposed method can enhance the model robustness to label noise and improve the end-to-end performance in both ScientificIE tasks. The analysis revealed that label variations can be particularly effective in handling ambiguous instances. Furthermore, the richness of the information captured by label variations can potentially reduce data size requirements. The findings highlight the importance of releasing variation labels and promote future research on other tasks in other domains. Overall, this study demonstrates the effectiveness of multi-task learning and the potential of label variations to enhance the performance of ScientificIE.
△ Less
Submitted 25 December, 2023;
originally announced December 2023.
-
Towards an accurate and generalizable multiple sclerosis lesion segmentation model using self-ensembled lesion fusion
Authors:
Jinwei Zhang,
Lianrui Zuo,
Blake E. Dewey,
Samuel W. Remedios,
Dzung L. Pham,
Aaron Carass,
Jerry L. Prince
Abstract:
Automatic multiple sclerosis (MS) lesion segmentation using multi-contrast magnetic resonance (MR) images provides improved efficiency and reproducibility compared to manual delineation. Current state-of-the-art automatic MS lesion segmentation methods utilize modified U-Net-like architectures. However, in the literature, dedicated architecture modifications were always required to maximize their…
▽ More
Automatic multiple sclerosis (MS) lesion segmentation using multi-contrast magnetic resonance (MR) images provides improved efficiency and reproducibility compared to manual delineation. Current state-of-the-art automatic MS lesion segmentation methods utilize modified U-Net-like architectures. However, in the literature, dedicated architecture modifications were always required to maximize their performance. In addition, the best-performing methods have not proven to be generalizable to diverse test datasets with contrast variations and image artifacts. In this work, we developed an accurate and generalizable MS lesion segmentation model using the well-known U-Net architecture without further modification. A novel test-time self-ensembled lesion fusion strategy is proposed that not only achieved the best performance using the ISBI 2015 MS segmentation challenge data but also demonstrated robustness across various self-ensemble parameter choices. Moreover, equipped with instance normalization rather than batch normalization widely used in literature, the model trained on the ISBI challenge data generalized well on clinical test datasets from different scanners.
△ Less
Submitted 3 December, 2023;
originally announced December 2023.
-
Gendec: A Machine Learning-based Framework for Gender Detection from Japanese Names
Authors:
Duong Tien Pham,
Luan Thanh Nguyen
Abstract:
Every human has their own name, a fundamental aspect of their identity and cultural heritage. The name often conveys a wealth of information, including details about an individual's background, ethnicity, and, especially, their gender. By detecting gender through the analysis of names, researchers can unlock valuable insights into linguistic patterns and cultural norms, which can be applied to pra…
▽ More
Every human has their own name, a fundamental aspect of their identity and cultural heritage. The name often conveys a wealth of information, including details about an individual's background, ethnicity, and, especially, their gender. By detecting gender through the analysis of names, researchers can unlock valuable insights into linguistic patterns and cultural norms, which can be applied to practical applications. Hence, this work presents a novel dataset for Japanese name gender detection comprising 64,139 full names in romaji, hiragana, and kanji forms, along with their biological genders. Moreover, we propose Gendec, a framework for gender detection from Japanese names that leverages diverse approaches, including traditional machine learning techniques or cutting-edge transfer learning models, to predict the gender associated with Japanese names accurately. Through a thorough investigation, the proposed framework is expected to be effective and serve potential applications in various domains.
△ Less
Submitted 18 November, 2023;
originally announced November 2023.
-
RAIFLE: Reconstruction Attacks on Interaction-based Federated Learning with Adversarial Data Manipulation
Authors:
Dzung Pham,
Shreyas Kulkarni,
Amir Houmansadr
Abstract:
Federated learning has emerged as a promising privacy-preserving solution for machine learning domains that rely on user interactions, particularly recommender systems and online learning to rank. While there has been substantial research on the privacy of traditional federated learning, little attention has been paid to the privacy properties of these interaction-based settings. In this work, we…
▽ More
Federated learning has emerged as a promising privacy-preserving solution for machine learning domains that rely on user interactions, particularly recommender systems and online learning to rank. While there has been substantial research on the privacy of traditional federated learning, little attention has been paid to the privacy properties of these interaction-based settings. In this work, we show that users face an elevated risk of having their private interactions reconstructed by the central server when the server can control the training features of the items that users interact with. We introduce RAIFLE, a novel optimization-based attack framework where the server actively manipulates the features of the items presented to users to increase the success rate of reconstruction. Our experiments with federated recommendation and online learning-to-rank scenarios demonstrate that RAIFLE is significantly more powerful than existing reconstruction attacks like gradient inversion, achieving high performance consistently in most settings. We discuss the pros and cons of several possible countermeasures to defend against RAIFLE in the context of interaction-based federated learning. Our code is open-sourced at https://github.com/dzungvpham/raifle.
△ Less
Submitted 1 March, 2025; v1 submitted 29 October, 2023;
originally announced October 2023.
-
Enhancing Accuracy-Privacy Trade-off in Differentially Private Split Learning
Authors:
Ngoc Duy Pham,
Khoa Tran Phan,
Naveen Chilamkurti
Abstract:
Split learning (SL) aims to protect user data privacy by distributing deep models between client-server and keeping private data locally. Only processed or `smashed' data can be transmitted from the clients to the server during the SL process. However, recently proposed model inversion attacks can recover the original data from the smashed data. In order to enhance privacy protection against such…
▽ More
Split learning (SL) aims to protect user data privacy by distributing deep models between client-server and keeping private data locally. Only processed or `smashed' data can be transmitted from the clients to the server during the SL process. However, recently proposed model inversion attacks can recover the original data from the smashed data. In order to enhance privacy protection against such attacks, a strategy is to adopt differential privacy (DP), which involves safeguarding the smashed data at the expense of some accuracy loss. This paper presents the first investigation into the impact on accuracy when training multiple clients in SL with various privacy requirements. Subsequently, we propose an approach that reviews the DP noise distributions of other clients during client training to address the identified accuracy degradation. We also examine the application of DP to the local model of SL to gain insights into the trade-off between accuracy and privacy. Specifically, findings reveal that introducing noise in the later local layers offers the most favorable balance between accuracy and privacy. Drawing from our insights in the shallower layers, we propose an approach to reduce the size of smashed data to minimize data leakage while maintaining higher accuracy, optimizing the accuracy-privacy trade-off. Additionally, a smaller size of smashed data reduces communication overhead on the client side, mitigating one of the notable drawbacks of SL. Experiments with popular datasets demonstrate that our proposed approaches provide an optimal trade-off for incorporating DP into SL, ultimately enhancing training accuracy for multi-client SL with varying privacy requirements.
△ Less
Submitted 15 October, 2024; v1 submitted 22 October, 2023;
originally announced October 2023.
-
Algorithmic Foundations of Inexact Computing
Authors:
John Augustine,
Dror Fried,
Krishna V. Palem,
Duc-Hung Pham,
Anshumali Shrivastava
Abstract:
Inexact computing also referred to as approximate computing is a style of designing algorithms and computing systems wherein the accuracy of correctness of algorithms executing on them is deliberately traded for significant resource savings. Significant progress has been reported in this regard both in terms of hardware as well as software or custom algorithms that exploited this approach resultin…
▽ More
Inexact computing also referred to as approximate computing is a style of designing algorithms and computing systems wherein the accuracy of correctness of algorithms executing on them is deliberately traded for significant resource savings. Significant progress has been reported in this regard both in terms of hardware as well as software or custom algorithms that exploited this approach resulting in some loss in solution quality (accuracy) while garnering disproportionately high savings. However, these approaches tended to be ad-hoc and were tied to specific algorithms and technologies. Consequently, a principled approach to designing and analyzing algorithms was lacking.
In this paper, we provide a novel model which allows us to characterize the behavior of algorithms designed to be inexact, as well as characterize opportunities and benefits that this approach offers. Our methods therefore are amenable to standard asymptotic analysis and provides a clean unified abstraction through which an algorithm's design and analysis can be conducted. With this as a backdrop, we show that inexactness can be significantly beneficial for some fundamental problems in that the quality of a solution can be exponentially better if one exploits inexactness when compared to approaches that are agnostic and are unable to exploit this approach. We show that such gains are possible in the context of evaluating Boolean functions rooted in the theory of Boolean functions and their spectra, PAC learning, and sorting. Formally, this is accomplished by introducing the twin concepts of inexactness aware and inexactness oblivious approaches to designing algorithms and the exponential gains are shown in the context of taking the ratio of the quality of the solution using the "aware" approach to the "oblivious" approach.
△ Less
Submitted 29 May, 2023;
originally announced May 2023.
-
Look how they have grown: Non-destructive Leaf Detection and Size Estimation of Tomato Plants for 3D Growth Monitoring
Authors:
Yuning Xing,
Dexter Pham,
Henry Williams,
David Smith,
Ho Seok Ahn,
JongYoon Lim,
Bruce A. MacDonald,
Mahla Nejati
Abstract:
Smart farming is a growing field as technology advances. Plant characteristics are crucial indicators for monitoring plant growth. Research has been done to estimate characteristics like leaf area index, leaf disease, and plant height. However, few methods have been applied to non-destructive measurements of leaf size. In this paper, an automated non-destructive imaged-based measuring system is pr…
▽ More
Smart farming is a growing field as technology advances. Plant characteristics are crucial indicators for monitoring plant growth. Research has been done to estimate characteristics like leaf area index, leaf disease, and plant height. However, few methods have been applied to non-destructive measurements of leaf size. In this paper, an automated non-destructive imaged-based measuring system is presented, which uses 2D and 3D data obtained using a Zivid 3D camera, creating 3D virtual representations (digital twins) of the tomato plants. Leaves are detected from corresponding 2D RGB images and mapped to their 3D point cloud using the detected leaf masks, which then pass the leaf point cloud to the plane fitting algorithm to extract the leaf size to provide data for growth monitoring. The performance of the measurement platform has been measured through a comprehensive trial on real-world tomato plants with quantified performance metrics compared to ground truth measurements. Three tomato leaf and height datasets (including 50+ 3D point cloud files of tomato plants) were collected and open-sourced in this project. The proposed leaf size estimation method demonstrates an RMSE value of 4.47mm and an R^2 value of 0.87. The overall measurement system (leaf detection and size estimation algorithms combine) delivers an RMSE value of 8.13mm and an R^2 value of 0.899.
△ Less
Submitted 7 April, 2023;
originally announced April 2023.
-
Tailoring Requirements Engineering for Responsible AI
Authors:
Walid Maalej,
Yen Dieu Pham,
Larissa Chazette
Abstract:
Requirements Engineering (RE) is the discipline for identifying, analyzing, as well as ensuring the implementation and delivery of user, technical, and societal requirements. Recently reported issues concerning the acceptance of Artificial Intelligence (AI) solutions after deployment, e.g. in the medical, automotive, or scientific domains, stress the importance of RE for designing and delivering R…
▽ More
Requirements Engineering (RE) is the discipline for identifying, analyzing, as well as ensuring the implementation and delivery of user, technical, and societal requirements. Recently reported issues concerning the acceptance of Artificial Intelligence (AI) solutions after deployment, e.g. in the medical, automotive, or scientific domains, stress the importance of RE for designing and delivering Responsible AI systems. In this paper, we argue that RE should not only be carefully conducted but also tailored for Responsible AI. We outline related challenges for research and practice.
△ Less
Submitted 21 February, 2023;
originally announced February 2023.
-
Rapid Design of Top-Performing Metal-Organic Frameworks with Qualitative Representations of Building Blocks
Authors:
Yigitcan Comlek,
Thang Duc Pham,
Randall Snurr,
Wei Chen
Abstract:
Data-driven materials design often encounters challenges where systems require or possess qualitative (categorical) information. Metal-organic frameworks (MOFs) are an example of such material systems. The representation of MOFs through different building blocks makes it a challenge for designers to incorporate qualitative information into design optimization. Furthermore, the large number of pote…
▽ More
Data-driven materials design often encounters challenges where systems require or possess qualitative (categorical) information. Metal-organic frameworks (MOFs) are an example of such material systems. The representation of MOFs through different building blocks makes it a challenge for designers to incorporate qualitative information into design optimization. Furthermore, the large number of potential building blocks leads to a combinatorial challenge, with millions of possible MOFs that could be explored through time consuming physics-based approaches. In this work, we integrated Latent Variable Gaussian Process (LVGP) and Multi-Objective Batch-Bayesian Optimization (MOBBO) to identify top-performing MOFs adaptively, autonomously, and efficiently without any human intervention. Our approach provides three main advantages: (i) no specific physical descriptors are required and only building blocks that construct the MOFs are used in global optimization through qualitative representations, (ii) the method is application and property independent, and (iii) the latent variable approach provides an interpretable model of qualitative building blocks with physical justification. To demonstrate the effectiveness of our method, we considered a design space with more than 47,000 MOF candidates. By searching only ~1% of the design space, LVGP-MOBBO was able to identify all MOFs on the Pareto front and more than 97% of the 50 top-performing designs for the CO$_2$ working capacity and CO$_2$/N$_2$ selectivity properties. Finally, we compared our approach with the Random Forest algorithm and demonstrated its efficiency, interpretability, and robustness.
△ Less
Submitted 17 February, 2023;
originally announced February 2023.
-
A PM2.5 concentration prediction framework with vehicle tracking system: From cause to effect
Authors:
Chuong D. Le,
Hoang V. Pham,
Duy A. Pham,
An D. Le,
Hien B. Vo
Abstract:
Air pollution is an emerging problem that needs to be solved especially in developed and developing countries. In Vietnam, air pollution is also a concerning issue in big cities such as Hanoi and Ho Chi Minh cities where air pollution comes mostly from vehicles such as cars and motorbikes. In order to tackle the problem, the paper focuses on developing a solution that can estimate the emitted PM2.…
▽ More
Air pollution is an emerging problem that needs to be solved especially in developed and developing countries. In Vietnam, air pollution is also a concerning issue in big cities such as Hanoi and Ho Chi Minh cities where air pollution comes mostly from vehicles such as cars and motorbikes. In order to tackle the problem, the paper focuses on developing a solution that can estimate the emitted PM2.5 pollutants by counting the number of vehicles in the traffic. We first investigated among the recent object detection models and developed our own traffic surveillance system. The observed traffic density showed a similar trend to the measured PM2.5 with a certain lagging in time, suggesting a relation between traffic density and PM2.5. We further express this relationship with a mathematical model which can estimate the PM2.5 value based on the observed traffic density. The estimated result showed a great correlation with the measured PM2.5 plots in the urban area context.
△ Less
Submitted 4 December, 2022;
originally announced December 2022.
-
Split Learning without Local Weight Sharing to Enhance Client-side Data Privacy
Authors:
Ngoc Duy Pham,
Tran Khoa Phan,
Alsharif Abuadbba,
Yansong Gao,
Doan Nguyen,
Naveen Chilamkurti
Abstract:
Split learning (SL) aims to protect user data privacy by distributing deep models between client-server and keeping private data locally. In SL training with multiple clients, the local model weights are shared among the clients for local model update. This paper first reveals data privacy leakage exacerbated from local weight sharing among the clients in SL through model inversion attacks. Then,…
▽ More
Split learning (SL) aims to protect user data privacy by distributing deep models between client-server and keeping private data locally. In SL training with multiple clients, the local model weights are shared among the clients for local model update. This paper first reveals data privacy leakage exacerbated from local weight sharing among the clients in SL through model inversion attacks. Then, to reduce the data privacy leakage issue, we propose and analyze privacy-enhanced SL (P-SL) (or SL without local weight sharing). We further propose parallelized P-SL to expedite the training process by duplicating multiple server-side model instances without compromising accuracy. Finally, we explore P-SL with late participating clients and devise a server-side cache-based training method to address the forgetting phenomenon in SL when late clients join. Experimental results demonstrate that P-SL helps reduce up to 50% of client-side data leakage, which essentially achieves a better privacy-accuracy trade-off than the current trend by using differential privacy mechanisms. Moreover, P-SL and its cache-based version achieve comparable accuracy to baseline SL under various data distributions, while cost less computation and communication. Additionally, caching-based training in P-SL mitigates the negative effect of forgetting, stabilizes the learning, and enables practical and low-complexity training in a dynamic environment with late-arriving clients.
△ Less
Submitted 21 July, 2024; v1 submitted 30 November, 2022;
originally announced December 2022.
-
Reusable Self-Attention-based Recommender System for Fashion
Authors:
Marjan Celikik,
Jacek Wasilewski,
Sahar Mbarek,
Pablo Celayes,
Pierre Gagliardi,
Duy Pham,
Nour Karessli,
Ana Peleteiro Ramallo
Abstract:
A large number of empirical studies on applying self-attention models in the domain of recommender systems are based on offline evaluation and metrics computed on standardized datasets, without insights on how these models perform in real life scenarios. Moreover, many of them do not consider information such as item and customer metadata, although deep-learning recommenders live up to their full…
▽ More
A large number of empirical studies on applying self-attention models in the domain of recommender systems are based on offline evaluation and metrics computed on standardized datasets, without insights on how these models perform in real life scenarios. Moreover, many of them do not consider information such as item and customer metadata, although deep-learning recommenders live up to their full potential only when numerous features of heterogeneous types are included. Also, typically recommendation models are designed to serve well only a single use case, which increases modeling complexity and maintenance costs, and may lead to inconsistent customer experience. In this work, we present a reusable Attention-based Fashion Recommendation Algorithm (AFRA), that utilizes various interaction types with different fashion entities such as items (e.g., shirt), outfits and influencers, and their heterogeneous features. Moreover, we leverage temporal and contextual information to address both short and long-term customer preferences. We show its effectiveness on outfit recommendation use cases, in particular: 1) personalized ranked feed; 2) outfit recommendations by style; 3) similar item recommendation and 4) in-session recommendations inspired by most recent customer actions. We present both offline and online experimental results demonstrating substantial improvements in customer retention and engagement.
△ Less
Submitted 29 November, 2022;
originally announced November 2022.
-
Outfit Generation and Recommendation -- An Experimental Study
Authors:
Marjan Celikik,
Matthias Kirmse,
Timo Denk,
Pierre Gagliardi,
Sahar Mbarek,
Duy Pham,
Ana Peleteiro Ramallo
Abstract:
Over the past years, fashion-related challenges have gained a lot of attention in the research community. Outfit generation and recommendation, i.e., the composition of a set of items of different types (e.g., tops, bottom, shoes, accessories) that go well together, are among the most challenging ones. That is because items have to be both compatible amongst each other and also personalized to mat…
▽ More
Over the past years, fashion-related challenges have gained a lot of attention in the research community. Outfit generation and recommendation, i.e., the composition of a set of items of different types (e.g., tops, bottom, shoes, accessories) that go well together, are among the most challenging ones. That is because items have to be both compatible amongst each other and also personalized to match the taste of the customer. Recently there has been a plethora of work targeted at tackling these problems by adopting various techniques and algorithms from the machine learning literature. However, to date, there is no extensive comparison of the performance of the different algorithms for outfit generation and recommendation. In this paper, we close this gap by providing a broad evaluation and comparison of various algorithms, including both personalized and non-personalized approaches, using online, real-world user data from one of Europe's largest fashion stores. We present the adaptations we made to some of those models to make them suitable for personalized outfit generation. Moreover, we provide insights for models that have not yet been evaluated on this task, specifically, GPT, BERT and Seq-to-Seq LSTM.
△ Less
Submitted 29 November, 2022;
originally announced November 2022.
-
Multi-fidelity Gaussian Process for Biomanufacturing Process Modeling with Small Data
Authors:
Yuan Sun,
Winton Nathan-Roberts,
Tien Dung Pham,
Ellen Otte,
Uwe Aickelin
Abstract:
In biomanufacturing, developing an accurate model to simulate the complex dynamics of bioprocesses is an important yet challenging task. This is partially due to the uncertainty associated with bioprocesses, high data acquisition cost, and lack of data availability to learn complex relations in bioprocesses. To deal with these challenges, we propose to use a statistical machine learning approach,…
▽ More
In biomanufacturing, developing an accurate model to simulate the complex dynamics of bioprocesses is an important yet challenging task. This is partially due to the uncertainty associated with bioprocesses, high data acquisition cost, and lack of data availability to learn complex relations in bioprocesses. To deal with these challenges, we propose to use a statistical machine learning approach, multi-fidelity Gaussian process, for process modelling in biomanufacturing. Gaussian process regression is a well-established technique based on probability theory which can naturally consider uncertainty in a dataset via Gaussian noise, and multi-fidelity techniques can make use of multiple sources of information with different levels of fidelity, thus suitable for bioprocess modeling with small data. We apply the multi-fidelity Gaussian process to solve two significant problems in biomanufacturing, bioreactor scale-up and knowledge transfer across cell lines, and demonstrate its efficacy on real-world datasets.
△ Less
Submitted 26 November, 2022;
originally announced November 2022.
-
Interpretable HER2 scoring by evaluating clinical Guidelines through a weakly supervised, constrained Deep Learning Approach
Authors:
Manh Dan Pham,
Cyprien Tilmant,
Stéphanie Petit,
Isabelle Salmon,
Saima Ben Hadj,
Rutger H. J. Fick
Abstract:
The evaluation of the Human Epidermal growth factor Receptor-2 (HER2) expression is an important prognostic biomarker for breast cancer treatment selection. However, HER2 scoring has notoriously high interobserver variability due to stain variations between centers and the need to estimate visually the staining intensity in specific percentages of tumor area. In this paper, focusing on the interpr…
▽ More
The evaluation of the Human Epidermal growth factor Receptor-2 (HER2) expression is an important prognostic biomarker for breast cancer treatment selection. However, HER2 scoring has notoriously high interobserver variability due to stain variations between centers and the need to estimate visually the staining intensity in specific percentages of tumor area. In this paper, focusing on the interpretability of HER2 scoring by a pathologist, we propose a semi-automatic, two-stage deep learning approach that directly evaluates the clinical HER2 guidelines defined by the American Society of Clinical Oncology/ College of American Pathologists (ASCO/CAP). In the first stage, we segment the invasive tumor over the user-indicated Region of Interest (ROI). Then, in the second stage, we classify the tumor tissue into four HER2 classes. For the classification stage, we use weakly supervised, constrained optimization to find a model that classifies cancerous patches such that the tumor surface percentage meets the guidelines specification of each HER2 class. We end the second stage by freezing the model and refining its output logits in a supervised way to all slide labels in the training set. To ensure the quality of our dataset's labels, we conducted a multi-pathologist HER2 scoring consensus. For the assessment of doubtful cases where no consensus was found, our model can help by interpreting its HER2 class percentages output. We achieve a performance of 0.78 in F1-score on the test set while keeping our model interpretable for the pathologist, hopefully contributing to interpretable AI models in digital pathology.
△ Less
Submitted 17 November, 2022;
originally announced November 2022.
-
Deep filter bank regression for super-resolution of anisotropic MR brain images
Authors:
Samuel W. Remedios,
Shuo Han,
Yuan Xue,
Aaron Carass,
Trac D. Tran,
Dzung L. Pham,
Jerry L. Prince
Abstract:
In 2D multi-slice magnetic resonance (MR) acquisition, the through-plane signals are typically of lower resolution than the in-plane signals. While contemporary super-resolution (SR) methods aim to recover the underlying high-resolution volume, the estimated high-frequency information is implicit via end-to-end data-driven training rather than being explicitly stated and sought. To address this, w…
▽ More
In 2D multi-slice magnetic resonance (MR) acquisition, the through-plane signals are typically of lower resolution than the in-plane signals. While contemporary super-resolution (SR) methods aim to recover the underlying high-resolution volume, the estimated high-frequency information is implicit via end-to-end data-driven training rather than being explicitly stated and sought. To address this, we reframe the SR problem statement in terms of perfect reconstruction filter banks, enabling us to identify and directly estimate the missing information. In this work, we propose a two-stage approach to approximate the completion of a perfect reconstruction filter bank corresponding to the anisotropic acquisition of a particular scan. In stage 1, we estimate the missing filters using gradient descent and in stage 2, we use deep networks to learn the mapping from coarse coefficients to detail coefficients. In addition, the proposed formulation does not rely on external training data, circumventing the need for domain shift correction. Under our approach, SR performance is improved particularly in "slice gap" scenarios, likely due to the constrained solution space imposed by the framework.
△ Less
Submitted 6 September, 2022;
originally announced September 2022.
-
Using Chatbots to Teach Languages
Authors:
Yu Li,
Chun-Yen Chen,
Dian Yu,
Sam Davidson,
Ryan Hou,
Xun Yuan,
Yinghua Tan,
Derek Pham,
Zhou Yu
Abstract:
This paper reports on progress towards building an online language learning tool to provide learners with conversational experience by using dialog systems as conversation practice partners. Our system can adapt to users' language proficiency on the fly. We also provide automatic grammar error feedback to help users learn from their mistakes. According to our first adopters, our system is entertai…
▽ More
This paper reports on progress towards building an online language learning tool to provide learners with conversational experience by using dialog systems as conversation practice partners. Our system can adapt to users' language proficiency on the fly. We also provide automatic grammar error feedback to help users learn from their mistakes. According to our first adopters, our system is entertaining and useful. Furthermore, we will provide the learning technology community a large-scale conversation dataset on language learning and grammar correction. Our next step is to make our system more adaptive to user profile information by using reinforcement learning algorithms.
△ Less
Submitted 31 July, 2022;
originally announced August 2022.
-
Maximizing Entanglement Routing Rate in Quantum Networks: Approximation Algorithms
Authors:
Tu N. Nguyen,
Dung H. P. Nguyen,
Dang H. Pham,
Bing-Hong Liu,
Hoa N. Nguyen
Abstract:
There will be a fast-paced shift from conventional network systems to novel quantum networks that are supported by the quantum entanglement and teleportation, key technologies of the quantum era, to enable secured data transmissions in the next-generation of the Internet. Despite this prospect, migration to quantum networks cannot be done at once, especially on the aspect of quantum routing. In th…
▽ More
There will be a fast-paced shift from conventional network systems to novel quantum networks that are supported by the quantum entanglement and teleportation, key technologies of the quantum era, to enable secured data transmissions in the next-generation of the Internet. Despite this prospect, migration to quantum networks cannot be done at once, especially on the aspect of quantum routing. In this paper, we study the maximizing entangled routing rate (MERR) problem. In particular, given a set of demands, we try to determine entangled routing paths for the maximum number of demands in the quantum network while meeting the network's fidelity. We first formulate the MERR problem using an integer linear programming (ILP) model to capture the traffic patent for all demands in the network. We then leverage the theory of relaxation of ILP to devise two efficient algorithms including HBRA and RRA with provable approximation ratios for the objective function. To deal with the challenge of the combinatorial optimization problem in big scale networks, we also propose the path-length-based approach (PLBA) to solve the MERR problem. Using both simulations and an open quantum network simulator platform to conduct experiments with real-world topologies and traffic matrices, we evaluate the performance of our algorithms and show up the success of maximizing entangled routing rate.
△ Less
Submitted 18 July, 2022;
originally announced July 2022.