-
Nonlinear Control of a Quadrotor UAV Using Backstepping-Based Sliding Mode Technique
Authors:
Thien Nhan Vo
Abstract:
This paper presents the development of a sliding mode controller using the backstepping approach. The controller is employed to synthesize tracking errors and Lyapunov functions. A novel state-space representation is formulated by incorporating the dynamics of the quadrotor and accounting for non-holonomic constraints. The proposed sliding mode controller effectively addresses system nonlinearitie…
▽ More
This paper presents the development of a sliding mode controller using the backstepping approach. The controller is employed to synthesize tracking errors and Lyapunov functions. A novel state-space representation is formulated by incorporating the dynamics of the quadrotor and accounting for non-holonomic constraints. The proposed sliding mode controller effectively addresses system nonlinearities and improves tracking of predefined trajectories. Simulation results are presented graphically to demonstrate the controller's performance.
△ Less
Submitted 17 June, 2025;
originally announced June 2025.
-
LLM-as-a-Judge for Reference-less Automatic Code Validation and Refinement for Natural Language to Bash in IT Automation
Authors:
Ngoc Phuoc An Vo,
Brent Paulovicks,
Vadim Sheinin
Abstract:
In an effort to automatically evaluate and select the best model and improve code quality for automatic incident remediation in IT Automation, it is crucial to verify if the generated code for remediation action is syntactically and semantically correct and whether it can be executed correctly as intended. There are three approaches: 1) conventional methods use surface form similarity metrics (tok…
▽ More
In an effort to automatically evaluate and select the best model and improve code quality for automatic incident remediation in IT Automation, it is crucial to verify if the generated code for remediation action is syntactically and semantically correct and whether it can be executed correctly as intended. There are three approaches: 1) conventional methods use surface form similarity metrics (token match, exact match, etc.) which have numerous limitations, 2) execution-based evaluation focuses more on code functionality based on pass/fail judgments for given test-cases, and 3) LLM-as-a-Judge employs LLMs for automated evaluation to judge if it is a correct answer for a given problem based on pre-defined metrics. In this work, we focused on enhancing LLM-as-a-Judge using bidirectional functionality matching and logic representation for reference-less automatic validation and refinement for Bash code generation to select the best model for automatic incident remediation in IT Automation. We used execution-based evaluation as ground-truth to evaluate our LLM-as-a-Judge metrics. Results show high accuracy and agreement with execution-based evaluation (and up to 8% over baseline). Finally, we built Reflection code agents to utilize judgments and feedback from our evaluation metrics which achieved significant improvement (up to 24% increase in accuracy) for automatic code refinement.
△ Less
Submitted 12 June, 2025;
originally announced June 2025.
-
Heart Rate Classification in ECG Signals Using Machine Learning and Deep Learning
Authors:
Thien Nhan Vo
Abstract:
This study addresses the classification of heartbeats from ECG signals through two distinct approaches: traditional machine learning utilizing hand-crafted features and deep learning via transformed images of ECG beats. The dataset underwent preprocessing steps, including downsampling, filtering, and normalization, to ensure consistency and relevance for subsequent analysis. In the first approach,…
▽ More
This study addresses the classification of heartbeats from ECG signals through two distinct approaches: traditional machine learning utilizing hand-crafted features and deep learning via transformed images of ECG beats. The dataset underwent preprocessing steps, including downsampling, filtering, and normalization, to ensure consistency and relevance for subsequent analysis. In the first approach, features such as heart rate variability (HRV), mean, variance, and RR intervals were extracted to train various classifiers, including SVM, Random Forest, AdaBoost, LSTM, Bi-directional LSTM, and LightGBM. The second approach involved transforming ECG signals into images using Gramian Angular Field (GAF), Markov Transition Field (MTF), and Recurrence Plots (RP), with these images subsequently classified using CNN architectures like VGG and Inception.
Experimental results demonstrate that the LightGBM model achieved the highest performance, with an accuracy of 99% and an F1 score of 0.94, outperforming the image-based CNN approach (F1 score of 0.85). Models such as SVM and AdaBoost yielded significantly lower scores, indicating limited suitability for this task. The findings underscore the superior ability of hand-crafted features to capture temporal and morphological variations in ECG signals compared to image-based representations of individual beats. Future investigations may benefit from incorporating multi-lead ECG signals and temporal dependencies across successive beats to enhance classification accuracy further.
△ Less
Submitted 16 June, 2025; v1 submitted 2 June, 2025;
originally announced June 2025.
-
Maximizing the Promptness of Metaverse Systems using Edge Computing by Deep Reinforcement Learning
Authors:
Tam Ninh Thi-Thanh,
Trinh Van Chien,
Hung Tran,
Nguyen Hoai Son,
Van Nhan Vo
Abstract:
Metaverse and Digital Twin (DT) have attracted much academic and industrial attraction to approach the future digital world. This paper introduces the advantages of deep reinforcement learning (DRL) in assisting Metaverse system-based Digital Twin. In this system, we assume that it includes several Metaverse User devices collecting data from the real world to transfer it into the virtual world, a…
▽ More
Metaverse and Digital Twin (DT) have attracted much academic and industrial attraction to approach the future digital world. This paper introduces the advantages of deep reinforcement learning (DRL) in assisting Metaverse system-based Digital Twin. In this system, we assume that it includes several Metaverse User devices collecting data from the real world to transfer it into the virtual world, a Metaverse Virtual Access Point (MVAP) undertaking the processing of data, and an edge computing server that receives the offloading data from the MVAP. The proposed model works under a dynamic environment with various parameters changing over time. The experiment results show that our proposed DRL algorithm is suitable for offloading tasks to ensure the promptness of DT in a dynamic environment.
△ Less
Submitted 3 June, 2025;
originally announced June 2025.
-
Network Digital Twin for 6G and Beyond: An End-to-End View Across Multi-Domain Network Ecosystems
Authors:
Dinh-Hieu Tran,
Nazar Waheed,
Yuris Mulya Saputra,
Xingqin Lin,
Cong T. Nguyen,
Tedros Salih Abdu,
Van Nhan Vo,
Van-Quan Pham,
Madyan Alsenwi,
Abuzar Babikir Mohammad Adam,
Symeon Chatzinotas,
Eva Lagaunas,
Hung Tran,
Tu Ho Dac,
Nguyen Van Huynh
Abstract:
With the rapid development of technology, the number of smart mobile users is increasing, accompanied by growing demands from applications such as virtual/augmented reality (VR/XR), remote surgery, autonomous vehicles, and real-time holographic communications, all of which require high transmission rates and ultra-low latency in 6G and beyond networks (6G+). This poses enormous challenges in effic…
▽ More
With the rapid development of technology, the number of smart mobile users is increasing, accompanied by growing demands from applications such as virtual/augmented reality (VR/XR), remote surgery, autonomous vehicles, and real-time holographic communications, all of which require high transmission rates and ultra-low latency in 6G and beyond networks (6G+). This poses enormous challenges in efficiently deploying large-scale networks, including network design, planning, troubleshooting, optimization, and maintenance, without affecting the user experience. Network Digital Twin (NDT) has emerged as a potential solution, enabling the creation of a virtual model that reflects the actual network, supporting the simulation of various network designs, applying diverse operating policies, and reproducing complex fault scenarios under real-world conditions. This motivate us for this study, where we provide a comprehensive survey of NDT in the context of 6G+, covering areas such as radio access networks (RAN), transport networks, 5G core networks and beyond (5GCORE+), cloud/edge computing, applications (blockchain, health system, manufacturing, security, and vehicular networks), non-terrestrial networks (NTNs), and quantum networks, from both academic and industrial perspectives. In particular, we are the first to provide an in-depth guide and usage of RAN and 5GCORE+ for NDT. Then, we provide an extensive review of foundation technologies such as transport networks, cloud/edge computing, applications, NTNs, and quantum networks in NDT. Finally, we discuss the key challenges, open issues, and future research directions for NDT in the context of 6G+.
△ Less
Submitted 2 June, 2025;
originally announced June 2025.
-
UniTalk: Towards Universal Active Speaker Detection in Real World Scenarios
Authors:
Le Thien Phuc Nguyen,
Zhuoran Yu,
Khoa Quang Nhat Cao,
Yuwei Guo,
Tu Ho Manh Pham,
Tuan Tai Nguyen,
Toan Ngo Duc Vo,
Lucas Poon,
Soochahn Lee,
Yong Jae Lee
Abstract:
We present UniTalk, a novel dataset specifically designed for the task of active speaker detection, emphasizing challenging scenarios to enhance model generalization. Unlike previously established benchmarks such as AVA, which predominantly features old movies and thus exhibits significant domain gaps, UniTalk focuses explicitly on diverse and difficult real-world conditions. These include underre…
▽ More
We present UniTalk, a novel dataset specifically designed for the task of active speaker detection, emphasizing challenging scenarios to enhance model generalization. Unlike previously established benchmarks such as AVA, which predominantly features old movies and thus exhibits significant domain gaps, UniTalk focuses explicitly on diverse and difficult real-world conditions. These include underrepresented languages, noisy backgrounds, and crowded scenes - such as multiple visible speakers speaking concurrently or in overlapping turns. It contains over 44.5 hours of video with frame-level active speaker annotations across 48,693 speaking identities, and spans a broad range of video types that reflect real-world conditions. Through rigorous evaluation, we show that state-of-the-art models, while achieving nearly perfect scores on AVA, fail to reach saturation on UniTalk, suggesting that the ASD task remains far from solved under realistic conditions. Nevertheless, models trained on UniTalk demonstrate stronger generalization to modern "in-the-wild" datasets like Talkies and ASW, as well as to AVA. UniTalk thus establishes a new benchmark for active speaker detection, providing researchers with a valuable resource for developing and evaluating versatile and resilient models.
Dataset: https://huggingface.co/datasets/plnguyen2908/UniTalk-ASD
Code: https://github.com/plnguyen2908/UniTalk-ASD-code
△ Less
Submitted 28 May, 2025;
originally announced May 2025.
-
Knowledge Distillation for Enhancing Walmart E-commerce Search Relevance Using Large Language Models
Authors:
Hongwei Shang,
Nguyen Vo,
Nitin Yadav,
Tian Zhang,
Ajit Puthenputhussery,
Xunfan Cai,
Shuyi Chen,
Prijith Chandran,
Changsung Kang
Abstract:
Ensuring the products displayed in e-commerce search results are relevant to users queries is crucial for improving the user experience. With their advanced semantic understanding, deep learning models have been widely used for relevance matching in search tasks. While large language models (LLMs) offer superior ranking capabilities, it is challenging to deploy LLMs in real-time systems due to the…
▽ More
Ensuring the products displayed in e-commerce search results are relevant to users queries is crucial for improving the user experience. With their advanced semantic understanding, deep learning models have been widely used for relevance matching in search tasks. While large language models (LLMs) offer superior ranking capabilities, it is challenging to deploy LLMs in real-time systems due to the high-latency requirements. To leverage the ranking power of LLMs while meeting the low-latency demands of production systems, we propose a novel framework that distills a high performing LLM into a more efficient, low-latency student model. To help the student model learn more effectively from the teacher model, we first train the teacher LLM as a classification model with soft targets. Then, we train the student model to capture the relevance margin between pairs of products for a given query using mean squared error loss. Instead of using the same training data as the teacher model, we significantly expand the student model dataset by generating unlabeled data and labeling it with the teacher model predictions. Experimental results show that the student model performance continues to improve as the size of the augmented training data increases. In fact, with enough augmented data, the student model can outperform the teacher model. The student model has been successfully deployed in production at Walmart.com with significantly positive metrics.
△ Less
Submitted 11 May, 2025;
originally announced May 2025.
-
Active Learning for Multi-class Image Classification
Authors:
Thien Nhan Vo
Abstract:
A principle bottleneck in image classification is the large number of training examples needed to train a classifier. Using active learning, we can reduce the number of training examples to teach a CNN classifier by strategically selecting examples. Assigning values to image examples using different uncertainty metrics allows the model to identify and select high-value examples in a smaller traini…
▽ More
A principle bottleneck in image classification is the large number of training examples needed to train a classifier. Using active learning, we can reduce the number of training examples to teach a CNN classifier by strategically selecting examples. Assigning values to image examples using different uncertainty metrics allows the model to identify and select high-value examples in a smaller training set size. We demonstrate results for digit recognition and fruit classification on the MNIST and Fruits360 data sets. We formally compare results for four different uncertainty metrics. Finally, we observe active learning is also effective on simpler (binary) classification tasks, but marked improvement from random sampling is more evident on more difficult tasks. We show active learning is a viable algorithm for image classification problems.
△ Less
Submitted 10 May, 2025;
originally announced May 2025.
-
Deep Learning for On-Street Parking Violation Prediction
Authors:
Thien Nhan Vo
Abstract:
Illegal parking along with the lack of available parking spaces are among the biggest issues faced in many large cities. These issues can have a significant impact on the quality of life of citizens. On-street parking systems have been designed to this end aiming at ensuring that parking spaces will be available for the local population, while also providing easy access to parking for people visit…
▽ More
Illegal parking along with the lack of available parking spaces are among the biggest issues faced in many large cities. These issues can have a significant impact on the quality of life of citizens. On-street parking systems have been designed to this end aiming at ensuring that parking spaces will be available for the local population, while also providing easy access to parking for people visiting the city center. However, these systems are often affected by illegal parking, providing incorrect information regarding the availability of parking spaces. Even though this can be mitigated using sensors for detecting the presence of cars in various parking sectors, the cost of these implementations is usually prohibiting large. In this paper, we investigate an indirect way of predicting parking violations at a fine-grained level, equipping such parking systems with a valuable tool for providing more accurate information to citizens. To this end, we employed a Deep Learning (DL)-based model to predict fine-grained parking violation rates for on-street parking systems. Moreover, we developed a data augmentation and smoothing technique for further improving the accuracy of DL models under the presence of missing and noisy data. We demonstrate, using experiments on real data collected in Thessaloniki, Greece, that the developed system can indeed provide accurate parking violation predictions.
△ Less
Submitted 10 May, 2025;
originally announced May 2025.
-
3D Brain MRI Classification for Alzheimer Diagnosis Using CNN with Data Augmentation
Authors:
Thien Nhan Vo,
Bac Nam Ho
Abstract:
A three-dimensional convolutional neural network was developed to classify T1-weighted brain MRI scans as healthy or Alzheimer. The network comprises 3D convolution, pooling, batch normalization, dense ReLU layers, and a sigmoid output. Using stochastic noise injection and five-fold cross-validation, the model achieved test set accuracy of 0.912 and area under the ROC curve of 0.961, an improvemen…
▽ More
A three-dimensional convolutional neural network was developed to classify T1-weighted brain MRI scans as healthy or Alzheimer. The network comprises 3D convolution, pooling, batch normalization, dense ReLU layers, and a sigmoid output. Using stochastic noise injection and five-fold cross-validation, the model achieved test set accuracy of 0.912 and area under the ROC curve of 0.961, an improvement of approximately 0.027 over resizing alone. Sensitivity and specificity both exceeded 0.90. These results align with prior work reporting up to 0.10 gain via synthetic augmentation. The findings demonstrate the effectiveness of simple augmentation for 3D MRI classification and motivate future exploration of advanced augmentation methods and architectures such as 3D U-Net and vision transformers.
△ Less
Submitted 17 June, 2025; v1 submitted 6 May, 2025;
originally announced May 2025.
-
Understand the Effect of Importance Weighting in Deep Learning on Dataset Shift
Authors:
Thien Nhan Vo
Abstract:
We evaluate the effectiveness of importance weighting in deep neural networks under label shift and covariate shift. On synthetic 2D data (linearly separable and moon-shaped) using logistic regression and MLPs, we observe that weighting strongly affects decision boundaries early in training but fades with prolonged optimization. On CIFAR-10 with various class imbalances, only L2 regularization (no…
▽ More
We evaluate the effectiveness of importance weighting in deep neural networks under label shift and covariate shift. On synthetic 2D data (linearly separable and moon-shaped) using logistic regression and MLPs, we observe that weighting strongly affects decision boundaries early in training but fades with prolonged optimization. On CIFAR-10 with various class imbalances, only L2 regularization (not dropout) helps preserve weighting effects. In a covariate-shift experiment, importance weighting yields no significant performance gain, highlighting challenges on complex data. Our results call into question the practical utility of importance weighting for real-world distribution shifts.
△ Less
Submitted 17 June, 2025; v1 submitted 6 May, 2025;
originally announced May 2025.
-
TI-JEPA: An Innovative Energy-based Joint Embedding Strategy for Text-Image Multimodal Systems
Authors:
Khang H. N. Vo,
Duc P. T. Nguyen,
Thong Nguyen,
Tho T. Quan
Abstract:
This paper focuses on multimodal alignment within the realm of Artificial Intelligence, particularly in text and image modalities. The semantic gap between the textual and visual modality poses a discrepancy problem towards the effectiveness of multi-modalities fusion. Therefore, we introduce Text-Image Joint Embedding Predictive Architecture (TI-JEPA), an innovative pre-training strategy that lev…
▽ More
This paper focuses on multimodal alignment within the realm of Artificial Intelligence, particularly in text and image modalities. The semantic gap between the textual and visual modality poses a discrepancy problem towards the effectiveness of multi-modalities fusion. Therefore, we introduce Text-Image Joint Embedding Predictive Architecture (TI-JEPA), an innovative pre-training strategy that leverages energy-based model (EBM) framework to capture complex cross-modal relationships. TI-JEPA combines the flexibility of EBM in self-supervised learning to facilitate the compatibility between textual and visual elements. Through extensive experiments across multiple benchmarks, we demonstrate that TI-JEPA achieves state-of-the-art performance on multimodal sentiment analysis task (and potentially on a wide range of multimodal-based tasks, such as Visual Question Answering), outperforming existing pre-training methodologies. Our findings highlight the potential of using energy-based framework in advancing multimodal fusion and suggest significant improvements for downstream applications.
△ Less
Submitted 8 March, 2025;
originally announced March 2025.
-
On the Second Hardy-Littlewood Conjecture
Authors:
Bittu Chahal,
Ertan Elma,
Nic Fellini,
Akshaa Vatwani,
Do Nhat Tan Vo
Abstract:
The second Hardy-Littlewood conjecture asserts that the prime counting function $π(x)$ satisfies the subadditive inequality
\begin{align*}
π(x+y)\leqslant π(x)+π(y)
\end{align*} for all integers $x,y\geqslant 2$. By linking the subadditivity of $π(x)$ to the error term in the Prime Number Theorem, we obtain unconditional improvements on the range of $y$ for which $π(x)$ is known to be subadd…
▽ More
The second Hardy-Littlewood conjecture asserts that the prime counting function $π(x)$ satisfies the subadditive inequality
\begin{align*}
π(x+y)\leqslant π(x)+π(y)
\end{align*} for all integers $x,y\geqslant 2$. By linking the subadditivity of $π(x)$ to the error term in the Prime Number Theorem, we obtain unconditional improvements on the range of $y$ for which $π(x)$ is known to be subadditive. Moreover, assuming the Riemann Hypothesis, we show that for all $ε>0$, there exists $x_ε \geqslant 2$ such that for all $x\geqslant x_ε$ and $y$ in the range \begin{align*}
\frac{(2+ε)\sqrt{x}\log^2x}{8π}\leqslant y\leqslant x, \end{align*} the inequality $π(x+y)\leqslant π(x) + π(y)$ holds.
△ Less
Submitted 4 March, 2025;
originally announced March 2025.
-
Stable Recovery of Regularized Linear Inverse Problems
Authors:
Tran T. A. Nghia,
Huy N. Pham,
Nghia V. Vo
Abstract:
Recovering a low-complexity signal from its noisy observations by regularization methods is a cornerstone of inverse problems and compressed sensing. Stable recovery ensures that the original signal can be approximated linearly by optimal solutions of the corresponding Morozov or Tikhonov regularized optimization problems. In this paper, we propose new characterizations for stable recovery in fini…
▽ More
Recovering a low-complexity signal from its noisy observations by regularization methods is a cornerstone of inverse problems and compressed sensing. Stable recovery ensures that the original signal can be approximated linearly by optimal solutions of the corresponding Morozov or Tikhonov regularized optimization problems. In this paper, we propose new characterizations for stable recovery in finite-dimensional spaces, uncovering the role of nonsmooth second-order information. These insights enable a deeper understanding of stable recovery and their practical implications. As a consequence, we apply our theory to derive new sufficient conditions for stable recovery of the analysis group sparsity problems, including the group sparsity and isotropic total variation problems. Numerical experiments on these two problems give favorable results about using our conditions to test stable recovery.
△ Less
Submitted 29 May, 2025; v1 submitted 15 December, 2024;
originally announced December 2024.
-
A Clifford Algebraic Approach to E(n)-Equivariant High-order Graph Neural Networks
Authors:
Viet-Hoang Tran,
Thieu N. Vo,
Tho Tran Huu,
Tan Minh Nguyen
Abstract:
Designing neural network architectures that can handle data symmetry is crucial. This is especially important for geometric graphs whose properties are equivariance under Euclidean transformations. Current equivariant graph neural networks (EGNNs), particularly those using message passing, have a limitation in expressive power. Recent high-order graph neural networks can overcome this limitation,…
▽ More
Designing neural network architectures that can handle data symmetry is crucial. This is especially important for geometric graphs whose properties are equivariance under Euclidean transformations. Current equivariant graph neural networks (EGNNs), particularly those using message passing, have a limitation in expressive power. Recent high-order graph neural networks can overcome this limitation, yet they lack equivariance properties, representing a notable drawback in certain applications in chemistry and physical sciences. In this paper, we introduce the Clifford Group Equivariant Graph Neural Networks (CG-EGNNs), a novel EGNN that enhances high-order message passing by integrating high-order local structures in the context of Clifford algebras. As a key benefit of using Clifford algebras, CG-EGNN can learn functions that capture equivariance from positional features. By adopting the high-order message passing mechanism, CG-EGNN gains richer information from neighbors, thus improving model performance. Furthermore, we establish the universality property of the $k$-hop message passing framework, showcasing greater expressive power of CG-EGNNs with additional $k$-hop message passing mechanism. We empirically validate that CG-EGNNs outperform previous methods on various benchmarks including n-body, CMU motion capture, and MD17, highlighting their effectiveness in geometric deep learning.
△ Less
Submitted 13 March, 2025; v1 submitted 6 October, 2024;
originally announced October 2024.
-
Equivariant Polynomial Functional Networks
Authors:
Thieu N. Vo,
Viet-Hoang Tran,
Tho Tran Huu,
An Nguyen The,
Thanh Tran,
Minh-Khoi Nguyen-Nhat,
Duy-Tung Pham,
Tan Minh Nguyen
Abstract:
Neural Functional Networks (NFNs) have gained increasing interest due to their wide range of applications, including extracting information from implicit representations of data, editing network weights, and evaluating policies. A key design principle of NFNs is their adherence to the permutation and scaling symmetries inherent in the connectionist structure of the input neural networks. Recent NF…
▽ More
Neural Functional Networks (NFNs) have gained increasing interest due to their wide range of applications, including extracting information from implicit representations of data, editing network weights, and evaluating policies. A key design principle of NFNs is their adherence to the permutation and scaling symmetries inherent in the connectionist structure of the input neural networks. Recent NFNs have been proposed with permutation and scaling equivariance based on either graph-based message-passing mechanisms or parameter-sharing mechanisms. However, graph-based equivariant NFNs suffer from high memory consumption and long running times. On the other hand, parameter-sharing-based NFNs built upon equivariant linear layers exhibit lower memory consumption and faster running time, yet their expressivity is limited due to the large size of the symmetric group of the input neural networks. The challenge of designing a permutation and scaling equivariant NFN that maintains low memory consumption and running time while preserving expressivity remains unresolved. In this paper, we propose a novel solution with the development of MAGEP-NFN (Monomial mAtrix Group Equivariant Polynomial NFN). Our approach follows the parameter-sharing mechanism but differs from previous works by constructing a nonlinear equivariant layer represented as a polynomial in the input weights. This polynomial formulation enables us to incorporate additional relationships between weights from different input hidden layers, enhancing the model's expressivity while keeping memory consumption and running time low, thereby addressing the aforementioned challenge. We provide empirical evidence demonstrating that MAGEP-NFN achieves competitive performance and efficiency compared to existing baselines.
△ Less
Submitted 5 October, 2024;
originally announced October 2024.
-
Equivariant Neural Functional Networks for Transformers
Authors:
Viet-Hoang Tran,
Thieu N. Vo,
An Nguyen The,
Tho Tran Huu,
Minh-Khoi Nguyen-Nhat,
Thanh Tran,
Duy-Tung Pham,
Tan Minh Nguyen
Abstract:
This paper systematically explores neural functional networks (NFN) for transformer architectures. NFN are specialized neural networks that treat the weights, gradients, or sparsity patterns of a deep neural network (DNN) as input data and have proven valuable for tasks such as learnable optimizers, implicit data representations, and weight editing. While NFN have been extensively developed for ML…
▽ More
This paper systematically explores neural functional networks (NFN) for transformer architectures. NFN are specialized neural networks that treat the weights, gradients, or sparsity patterns of a deep neural network (DNN) as input data and have proven valuable for tasks such as learnable optimizers, implicit data representations, and weight editing. While NFN have been extensively developed for MLP and CNN, no prior work has addressed their design for transformers, despite the importance of transformers in modern deep learning. This paper aims to address this gap by providing a systematic study of NFN for transformers. We first determine the maximal symmetric group of the weights in a multi-head attention module as well as a necessary and sufficient condition under which two sets of hyperparameters of the multi-head attention module define the same function. We then define the weight space of transformer architectures and its associated group action, which leads to the design principles for NFN in transformers. Based on these, we introduce Transformer-NFN, an NFN that is equivariant under this group action. Additionally, we release a dataset of more than 125,000 Transformers model checkpoints trained on two datasets with two different tasks, providing a benchmark for evaluating Transformer-NFN and encouraging further research on transformer training and performance.
△ Less
Submitted 7 March, 2025; v1 submitted 5 October, 2024;
originally announced October 2024.
-
Demystifying the Token Dynamics of Deep Selective State Space Models
Authors:
Thieu N Vo,
Tung D. Pham,
Xin T. Tong,
Tan Minh Nguyen
Abstract:
Selective state space models (SSM), such as Mamba, have gained prominence for their effectiveness in modeling sequential data. Despite their outstanding empirical performance, a comprehensive theoretical understanding of deep selective SSM remains elusive, hindering their further development and adoption for applications that need high fidelity. In this paper, we investigate the dynamical properti…
▽ More
Selective state space models (SSM), such as Mamba, have gained prominence for their effectiveness in modeling sequential data. Despite their outstanding empirical performance, a comprehensive theoretical understanding of deep selective SSM remains elusive, hindering their further development and adoption for applications that need high fidelity. In this paper, we investigate the dynamical properties of tokens in a pre-trained Mamba model. In particular, we derive the dynamical system governing the continuous-time limit of the Mamba model and characterize the asymptotic behavior of its solutions. In the one-dimensional case, we prove that only one of the following two scenarios happens: either all tokens converge to zero, or all tokens diverge to infinity. We provide criteria based on model parameters to determine when each scenario occurs. For the convergent scenario, we empirically verify that this scenario negatively impacts the model's performance. For the divergent scenario, we prove that different tokens will diverge to infinity at different rates, thereby contributing unequally to the updates during model training. Based on these investigations, we propose two refinements for the model: excluding the convergent scenario and reordering tokens based on their importance scores, both aimed at improving practical performance. Our experimental results validate these refinements, offering insights into enhancing Mamba's effectiveness in real-world applications.
△ Less
Submitted 7 March, 2025; v1 submitted 4 October, 2024;
originally announced October 2024.
-
SplitVAEs: Decentralized scenario generation from siloed data for stochastic optimization problems
Authors:
H M Mohaimanul Islam,
Huynh Q. N. Vo,
Paritosh Ramanan
Abstract:
Stochastic optimization problems in large-scale multi-stakeholder networked systems (e.g., power grids and supply chains) rely on data-driven scenarios to encapsulate complex spatiotemporal interdependencies. However, centralized aggregation of stakeholder data is challenging due to the existence of data silos resulting from computational and logistical bottlenecks. In this paper, we present Split…
▽ More
Stochastic optimization problems in large-scale multi-stakeholder networked systems (e.g., power grids and supply chains) rely on data-driven scenarios to encapsulate complex spatiotemporal interdependencies. However, centralized aggregation of stakeholder data is challenging due to the existence of data silos resulting from computational and logistical bottlenecks. In this paper, we present SplitVAEs, a decentralized scenario generation framework that leverages variational autoencoders to generate high-quality scenarios without moving stakeholder data. With the help of experiments on distributed memory systems, we demonstrate the broad applicability of SplitVAEs in a variety of domain areas that are dominated by a large number of stakeholders. Our experiments indicate that SplitVAEs can learn spatial and temporal interdependencies in large-scale networks to generate scenarios that match the joint historical distribution of stakeholder data in a decentralized manner. Our experiments show that SplitVAEs deliver robust performance compared to centralized, state-of-the-art benchmark methods while significantly reducing data transmission costs, leading to a scalable, privacy-enhancing alternative to scenario generation.
△ Less
Submitted 30 January, 2025; v1 submitted 18 September, 2024;
originally announced September 2024.
-
Monomial Matrix Group Equivariant Neural Functional Networks
Authors:
Viet-Hoang Tran,
Thieu N. Vo,
Tho H. Tran,
An T. Nguyen,
Tan M. Nguyen
Abstract:
Neural functional networks (NFNs) have recently gained significant attention due to their diverse applications, ranging from predicting network generalization and network editing to classifying implicit neural representation. Previous NFN designs often depend on permutation symmetries in neural networks' weights, which traditionally arise from the unordered arrangement of neurons in hidden layers.…
▽ More
Neural functional networks (NFNs) have recently gained significant attention due to their diverse applications, ranging from predicting network generalization and network editing to classifying implicit neural representation. Previous NFN designs often depend on permutation symmetries in neural networks' weights, which traditionally arise from the unordered arrangement of neurons in hidden layers. However, these designs do not take into account the weight scaling symmetries of $\ReLU$ networks, and the weight sign flipping symmetries of $\sin$ or $\Tanh$ networks. In this paper, we extend the study of the group action on the network weights from the group of permutation matrices to the group of monomial matrices by incorporating scaling/sign-flipping symmetries. Particularly, we encode these scaling/sign-flipping symmetries by designing our corresponding equivariant and invariant layers. We name our new family of NFNs the Monomial Matrix Group Equivariant Neural Functional Networks (Monomial-NFN). Because of the expansion of the symmetries, Monomial-NFN has much fewer independent trainable parameters compared to the baseline NFNs in the literature, thus enhancing the model's efficiency. Moreover, for fully connected and convolutional neural networks, we theoretically prove that all groups that leave these networks invariant while acting on their weight spaces are some subgroups of the monomial matrix group. We provide empirical evidence to demonstrate the advantages of our model over existing baselines, achieving competitive performance and efficiency.
△ Less
Submitted 13 March, 2025; v1 submitted 18 September, 2024;
originally announced September 2024.
-
The Lynchpin of In-Memory Computing: A Benchmarking Framework for Vector-Matrix Multiplication in RRAMs
Authors:
Md Tawsif Rahman Chowdhury,
Huynh Quang Nguyen Vo,
Paritosh Ramanan,
Murat Yildirim,
Gozde Tutuncuoglu
Abstract:
The Von Neumann bottleneck, a fundamental challenge in conventional computer architecture, arises from the inability to execute fetch and data operations simultaneously due to a shared bus linking processing and memory units. This bottleneck significantly limits system performance, increases energy consumption, and exacerbates computational complexity. Emerging technologies such as Resistive Rando…
▽ More
The Von Neumann bottleneck, a fundamental challenge in conventional computer architecture, arises from the inability to execute fetch and data operations simultaneously due to a shared bus linking processing and memory units. This bottleneck significantly limits system performance, increases energy consumption, and exacerbates computational complexity. Emerging technologies such as Resistive Random Access Memories (RRAMs), leveraging crossbar arrays, offer promising alternatives for addressing the demands of data-intensive computational tasks through in-memory computing of analog vector-matrix multiplication (VMM) operations. However, the propagation of errors due to device and circuit-level imperfections remains a significant challenge. In this study, we introduce MELISO (In-Memory Linear Solver), a comprehensive end-to-end VMM benchmarking framework tailored for RRAM-based systems. MELISO evaluates the error propagation in VMM operations, analyzing the impact of RRAM device metrics on error magnitude and distribution. This paper introduces the MELISO framework and demonstrates its utility in characterizing and mitigating VMM error propagation using state-of-the-art RRAM device metrics.
△ Less
Submitted 9 September, 2024;
originally announced September 2024.
-
Deep Reinforcement Learning for Network Energy Saving in 6G and Beyond Networks
Authors:
Dinh-Hieu Tran,
Nguyen Van Huynh,
Soumeya Kaada,
Van Nhan Vo,
Eva Lagunas,
Symeon Chatzinotas
Abstract:
Network energy saving has received great attention from operators and vendors to reduce energy consumption and CO2 emissions to the environment as well as significantly reduce costs for mobile network operators. However, the design of energy-saving networks also needs to ensure the mobile users' (MUs) QoS requirements such as throughput requirements (TR). This work considers a mobile cellular netw…
▽ More
Network energy saving has received great attention from operators and vendors to reduce energy consumption and CO2 emissions to the environment as well as significantly reduce costs for mobile network operators. However, the design of energy-saving networks also needs to ensure the mobile users' (MUs) QoS requirements such as throughput requirements (TR). This work considers a mobile cellular network including many ground base stations (GBSs), and some GBSs are intentionally turned off due to network energy saving (NES) or crash, so the MUs located in these outage GBSs are not served in time. Based on this observation, we propose the problem of maximizing the total achievable throughput in the network by optimizing the GBSs' antenna tilt and adaptive transmission power with a given number of served MUs satisfied. Notice that, the MU is considered successfully served if its Reference Signal Received Power (RSRP) and throughput requirement are satisfied. The formulated optimization problem becomes difficult to solve with multiple binary variables and non-convex constraints along with random throughput requirements and random placement of MUs. We propose a Deep Q-learning-based algorithm to help the network learn the uncertainty and dynamics of the transmission environment. Extensive simulation results show that our proposed algorithm achieves much better performance than the benchmark schemes.
△ Less
Submitted 20 August, 2024;
originally announced August 2024.
-
Enhancing Incremental Summarization with Structured Representations
Authors:
EunJeong Hwang,
Yichao Zhou,
James Bradley Wendt,
Beliz Gunel,
Nguyen Vo,
Jing Xie,
Sandeep Tata
Abstract:
Large language models (LLMs) often struggle with processing extensive input contexts, which can lead to redundant, inaccurate, or incoherent summaries. Recent methods have used unstructured memory to incrementally process these contexts, but they still suffer from information overload due to the volume of unstructured data handled. In our study, we introduce structured knowledge representations (…
▽ More
Large language models (LLMs) often struggle with processing extensive input contexts, which can lead to redundant, inaccurate, or incoherent summaries. Recent methods have used unstructured memory to incrementally process these contexts, but they still suffer from information overload due to the volume of unstructured data handled. In our study, we introduce structured knowledge representations ($GU_{json}$), which significantly improve summarization performance by 40% and 14% across two public datasets. Most notably, we propose the Chain-of-Key strategy ($CoK_{json}$) that dynamically updates or augments these representations with new information, rather than recreating the structured memory for each new source. This method further enhances performance by 7% and 4% on the datasets.
△ Less
Submitted 20 July, 2024;
originally announced July 2024.
-
Noether's normalization in skew polynomial rings
Authors:
Elad Paran,
Thieu N. Vo
Abstract:
We study Noether's normalization lemma for finitely generated algebras over a division algebra. In its classical form, the lemma states that if $I$ is a proper ideal of the ring $R=F[t_1,\ldots,t_n]$ of polynomials over a field $F$, then the quotient ring $R/I$ is a finite extension of a polynomial ring over $F$. We prove that the lemma holds when $R=D[t_1,\ldots,t_n]$ is the ring of polynomials i…
▽ More
We study Noether's normalization lemma for finitely generated algebras over a division algebra. In its classical form, the lemma states that if $I$ is a proper ideal of the ring $R=F[t_1,\ldots,t_n]$ of polynomials over a field $F$, then the quotient ring $R/I$ is a finite extension of a polynomial ring over $F$. We prove that the lemma holds when $R=D[t_1,\ldots,t_n]$ is the ring of polynomials in $n$ central variables over a division algebra $D$. We provide examples demonstrating that Noether's normalization may fail for the skew polynomial ring $D[t_1,\ldots,t_n;σ_1,\ldots,σ_n]$ with respect to commuting automorphisms $σ_1,\ldots,σ_n$ of $D$. We give a sufficient condition for $σ_1,\ldots,σ_n$ under which the normalization lemma holds for such ring. In the case where $D=F$ is a field, this sufficient condition is proved to be necessary.
△ Less
Submitted 17 July, 2024;
originally announced July 2024.
-
Execution-Based Evaluation of Natural Language to Bash and PowerShell for Incident Remediation
Authors:
Ngoc Phuoc An Vo,
Brent Paulovicks,
Vadim Sheinin
Abstract:
Given recent advancements of Large Language Models (LLMs), code generation tasks attract immense attention for wide application in different domains. In an effort to evaluate and select a best model to automatically remediate system incidents discovered by Application Performance Monitoring (APM) platforms, it is crucial to verify if the generated code is syntactically and semantically correct, an…
▽ More
Given recent advancements of Large Language Models (LLMs), code generation tasks attract immense attention for wide application in different domains. In an effort to evaluate and select a best model to automatically remediate system incidents discovered by Application Performance Monitoring (APM) platforms, it is crucial to verify if the generated code is syntactically and semantically correct, and whether it can be executed correctly as intended. However, current methods for evaluating the quality of code generated by LLMs heavily rely on surface form similarity metrics (e.g. BLEU, ROUGE, and exact/partial match) which have numerous limitations. In contrast, execution based evaluation focuses more on code functionality and does not constrain the code generation to any fixed solution. Nevertheless, designing and implementing such execution-based evaluation platform is not a trivial task. There are several works creating execution-based evaluation platforms for popular programming languages such as SQL, Python, Java, but limited or no attempts for scripting languages such as Bash and PowerShell. In this paper, we present the first execution-based evaluation platform in which we created three test suites (total 125 handcrafted test cases) to evaluate Bash (both single-line commands and multiple-line scripts) and PowerShell codes generated by LLMs. We benchmark seven closed and open-source LLMs using our platform with different techniques (zero-shot vs. few-shot learning).
△ Less
Submitted 16 December, 2024; v1 submitted 10 May, 2024;
originally announced May 2024.
-
ViTHSD: Exploiting Hatred by Targets for Hate Speech Detection on Vietnamese Social Media Texts
Authors:
Cuong Nhat Vo,
Khanh Bao Huynh,
Son T. Luu,
Trong-Hop Do
Abstract:
The growth of social networks makes toxic content spread rapidly. Hate speech detection is a task to help decrease the number of harmful comments. With the diversity in the hate speech created by users, it is necessary to interpret the hate speech besides detecting it. Hence, we propose a methodology to construct a system for targeted hate speech detection from online streaming texts from social m…
▽ More
The growth of social networks makes toxic content spread rapidly. Hate speech detection is a task to help decrease the number of harmful comments. With the diversity in the hate speech created by users, it is necessary to interpret the hate speech besides detecting it. Hence, we propose a methodology to construct a system for targeted hate speech detection from online streaming texts from social media. We first introduce the ViTHSD - a targeted hate speech detection dataset for Vietnamese Social Media Texts. The dataset contains 10K comments, each comment is labeled to specific targets with three levels: clean, offensive, and hate. There are 5 targets in the dataset, and each target is labeled with the corresponding level manually by humans with strict annotation guidelines. The inter-annotator agreement obtained from the dataset is 0.45 by Cohen's Kappa index, which is indicated as a moderate level. Then, we construct a baseline for this task by combining the Bi-GRU-LSTM-CNN with the pre-trained language model to leverage the power of text representation of BERTology. Finally, we suggest a methodology to integrate the baseline model for targeted hate speech detection into the online streaming system for practical application in preventing hateful and offensive content on social media.
△ Less
Submitted 8 February, 2025; v1 submitted 30 April, 2024;
originally announced April 2024.
-
STRUM-LLM: Attributed and Structured Contrastive Summarization
Authors:
Beliz Gunel,
James B. Wendt,
Jing Xie,
Yichao Zhou,
Nguyen Vo,
Zachary Fisher,
Sandeep Tata
Abstract:
Users often struggle with decision-making between two options (A vs B), as it usually requires time-consuming research across multiple web pages. We propose STRUM-LLM that addresses this challenge by generating attributed, structured, and helpful contrastive summaries that highlight key differences between the two options. STRUM-LLM identifies helpful contrast: the specific attributes along which…
▽ More
Users often struggle with decision-making between two options (A vs B), as it usually requires time-consuming research across multiple web pages. We propose STRUM-LLM that addresses this challenge by generating attributed, structured, and helpful contrastive summaries that highlight key differences between the two options. STRUM-LLM identifies helpful contrast: the specific attributes along which the two options differ significantly and which are most likely to influence the user's decision. Our technique is domain-agnostic, and does not require any human-labeled data or fixed attribute list as supervision. STRUM-LLM attributes all extractions back to the input sources along with textual evidence, and it does not have a limit on the length of input sources that it can process. STRUM-LLM Distilled has 100x more throughput than the models with comparable performance while being 10x smaller. In this paper, we provide extensive evaluations for our method and lay out future directions for our currently deployed system.
△ Less
Submitted 25 March, 2024;
originally announced March 2024.
-
Improving Vietnamese-English Medical Machine Translation
Authors:
Nhu Vo,
Dat Quoc Nguyen,
Dung D. Le,
Massimo Piccardi,
Wray Buntine
Abstract:
Machine translation for Vietnamese-English in the medical domain is still an under-explored research area. In this paper, we introduce MedEV -- a high-quality Vietnamese-English parallel dataset constructed specifically for the medical domain, comprising approximately 360K sentence pairs. We conduct extensive experiments comparing Google Translate, ChatGPT (gpt-3.5-turbo), state-of-the-art Vietnam…
▽ More
Machine translation for Vietnamese-English in the medical domain is still an under-explored research area. In this paper, we introduce MedEV -- a high-quality Vietnamese-English parallel dataset constructed specifically for the medical domain, comprising approximately 360K sentence pairs. We conduct extensive experiments comparing Google Translate, ChatGPT (gpt-3.5-turbo), state-of-the-art Vietnamese-English neural machine translation models and pre-trained bilingual/multilingual sequence-to-sequence models on our new MedEV dataset. Experimental results show that the best performance is achieved by fine-tuning "vinai-translate" for each translation direction. We publicly release our dataset to promote further research.
△ Less
Submitted 28 March, 2024;
originally announced March 2024.
-
E(3)-Equivariant Mesh Neural Networks
Authors:
Thuan Trang,
Nhat Khang Ngo,
Daniel Levy,
Thieu N. Vo,
Siamak Ravanbakhsh,
Truong Son Hy
Abstract:
Triangular meshes are widely used to represent three-dimensional objects. As a result, many recent works have address the need for geometric deep learning on 3D mesh. However, we observe that the complexities in many of these architectures does not translate to practical performance, and simple deep models for geometric graphs are competitive in practice. Motivated by this observation, we minimall…
▽ More
Triangular meshes are widely used to represent three-dimensional objects. As a result, many recent works have address the need for geometric deep learning on 3D mesh. However, we observe that the complexities in many of these architectures does not translate to practical performance, and simple deep models for geometric graphs are competitive in practice. Motivated by this observation, we minimally extend the update equations of E(n)-Equivariant Graph Neural Networks (EGNNs) (Satorras et al., 2021) to incorporate mesh face information, and further improve it to account for long-range interactions through hierarchy. The resulting architecture, Equivariant Mesh Neural Network (EMNN), outperforms other, more complicated equivariant methods on mesh tasks, with a fast run-time and no expensive pre-processing. Our implementation is available at https://github.com/HySonLab/EquiMesh
△ Less
Submitted 18 February, 2024; v1 submitted 7 February, 2024;
originally announced February 2024.
-
Domain Adaptation of a State of the Art Text-to-SQL Model: Lessons Learned and Challenges Found
Authors:
Irene Manotas,
Octavian Popescu,
Ngoc Phuoc An Vo,
Vadim Sheinin
Abstract:
There are many recent advanced developments for the Text-to-SQL task, where the Picard model is one of the the top performing models as measured by the Spider dataset competition. However, bringing Text-to-SQL systems to realistic use-cases through domain adaptation remains a tough challenge. We analyze how well the base T5 Language Model and Picard perform on query structures different from the S…
▽ More
There are many recent advanced developments for the Text-to-SQL task, where the Picard model is one of the the top performing models as measured by the Spider dataset competition. However, bringing Text-to-SQL systems to realistic use-cases through domain adaptation remains a tough challenge. We analyze how well the base T5 Language Model and Picard perform on query structures different from the Spider dataset, we fine-tuned the base model on the Spider data and on independent databases (DB). To avoid accessing the DB content online during inference, we also present an alternative way to disambiguate the values in an input question using a rule-based approach that relies on an intermediate representation of the semantic concepts of an input question. In our results we show in what cases T5 and Picard can deliver good performance, we share the lessons learned, and discuss current domain adaptation challenges.
△ Less
Submitted 8 December, 2023;
originally announced December 2023.
-
A skew Newton-Puiseux Theorem
Authors:
Elad Paran,
Thieu N. Vo
Abstract:
We prove a skew generalization of the Newton-Puiseux theorem for the field $F = \bigcup_{n=1}^\infty \mathbb{C}((x^\frac{1}{n}))$ of Puiseux series: For any positive real number $α$, we consider the $\mathbb{C}$-automorphism $σ$ of $F$ given by $x \mapsto αx$, and prove that every non-constant polynomial in the skew polynomial ring $F[t,σ]$ factors into a product of linear terms. This generalizes…
▽ More
We prove a skew generalization of the Newton-Puiseux theorem for the field $F = \bigcup_{n=1}^\infty \mathbb{C}((x^\frac{1}{n}))$ of Puiseux series: For any positive real number $α$, we consider the $\mathbb{C}$-automorphism $σ$ of $F$ given by $x \mapsto αx$, and prove that every non-constant polynomial in the skew polynomial ring $F[t,σ]$ factors into a product of linear terms. This generalizes the classical theorem where $σ= {\rm id}$, and gives the first concrete example of a field of characteristic $0$ that is algebraically closed with respect to a non-trivial automorphism -- a notion studied in works of Aryapoor and of Smith. Our result also resolves an open question of Aryapoor concerning such fields. A key ingredient in the proof is a new variant of Hensel's lemma.
△ Less
Submitted 29 November, 2023;
originally announced November 2023.
-
Design equivariant neural networks for 3D point cloud
Authors:
Thuan N. A. Trang,
Thieu N. Vo,
Khuong D. Nguyen
Abstract:
This work seeks to improve the generalization and robustness of existing neural networks for 3D point clouds by inducing group equivariance under general group transformations. The main challenge when designing equivariant models for point clouds is how to trade-off the performance of the model and the complexity. Existing equivariant models are either too complicate to implement or very high comp…
▽ More
This work seeks to improve the generalization and robustness of existing neural networks for 3D point clouds by inducing group equivariance under general group transformations. The main challenge when designing equivariant models for point clouds is how to trade-off the performance of the model and the complexity. Existing equivariant models are either too complicate to implement or very high complexity. The main aim of this study is to build a general procedure to introduce group equivariant property to SOTA models for 3D point clouds. The group equivariant models built form our procedure are simple to implement, less complexity in comparison with the existing ones, and they preserve the strengths of the original SOTA backbone. From the results of the experiments on object classification, it is shown that our methods are superior to other group equivariant models in performance and complexity. Moreover, our method also helps to improve the mIoU of semantic segmentation models. Overall, by using a combination of only-finite-rotation equivariance and augmentation, our models can outperform existing full $SO(3)$-equivariance models with much cheaper complexity and GPU memory. The proposed procedure is general and forms a fundamental approach to group equivariant neural networks. We believe that it can be easily adapted to other SOTA models in the future.
△ Less
Submitted 1 May, 2022;
originally announced May 2022.
-
Periodontitis and preeclampsia in pregnancy: A systematic review and meta-analysis
Authors:
Quynh-Anh Le,
Rahena Akhter,
Kimberly M. Coulton,
Ngoc T. N Vo,
Le T. Y Duong,
Hoang V. Nong,
Albert Yaacoub,
George Condous,
Joerg Eberhard,
Ralph Nanan
Abstract:
Objectives: A conflicting body of evidence suggests localized periodontal inflammation to spread systemically during pregnancy inducing adverse pregnancy outcomes. This systematic review and meta-analysis aimed to specifically evaluate the relationship between periodontitis and preeclampsia. Methods: Electronic searches were carried out in Medline, Pubmed, Cochrane Controlled Clinical Trial Regist…
▽ More
Objectives: A conflicting body of evidence suggests localized periodontal inflammation to spread systemically during pregnancy inducing adverse pregnancy outcomes. This systematic review and meta-analysis aimed to specifically evaluate the relationship between periodontitis and preeclampsia. Methods: Electronic searches were carried out in Medline, Pubmed, Cochrane Controlled Clinical Trial Register to identify and select observational case-control and cohort studies that analyzed the association between periodontal disease and preeclampsia. Prisma guidelines and Moose checklist were followed. Results: Thirty studies including six cohorts and twenty-four case-control studies were selected. Periodontitis was significantly associated with increased risk for preeclampsia, especially in a subgroup analysis including cohort studies and subgroup analysis with lower-middle-income countries. Conclusion: Periodontitis appears as a significant risk factor for preeclampsia, which might be even more pronounced in lower-middle-income countries.
△ Less
Submitted 9 August, 2021;
originally announced August 2021.
-
VinaFood21: A Novel Dataset for Evaluating Vietnamese Food Recognition
Authors:
Thuan Trong Nguyen,
Thuan Q. Nguyen,
Dung Vo,
Vi Nguyen,
Ngoc Ho,
Nguyen D. Vo,
Kiet Van Nguyen,
Khang Nguyen
Abstract:
Vietnam is such an attractive tourist destination with its stunning and pristine landscapes and its top-rated unique food and drink. Among thousands of Vietnamese dishes, foreigners and native people are interested in easy-to-eat tastes and easy-to-do recipes, along with reasonable prices, mouthwatering flavors, and popularity. Due to the diversity and almost all the dishes have significant simila…
▽ More
Vietnam is such an attractive tourist destination with its stunning and pristine landscapes and its top-rated unique food and drink. Among thousands of Vietnamese dishes, foreigners and native people are interested in easy-to-eat tastes and easy-to-do recipes, along with reasonable prices, mouthwatering flavors, and popularity. Due to the diversity and almost all the dishes have significant similarities and the lack of quality Vietnamese food datasets, it is hard to implement an auto system to classify Vietnamese food, therefore, make people easier to discover Vietnamese food. This paper introduces a new Vietnamese food dataset named VinaFood21, which consists of 13,950 images corresponding to 21 dishes. We use 10,044 images for model training and 6,682 test images to classify each food in the VinaFood21 dataset and achieved an average accuracy of 74.81% when fine-tuning CNN EfficientNet-B0. (https://github.com/nguyenvd-uit/uit-together-dataset)
△ Less
Submitted 5 August, 2021;
originally announced August 2021.
-
Classification of 7-dimensional solvable Lie algebras having 5-dimensional nilradicals
Authors:
Vu A. Le,
Tuan A. Nguyen,
Tu T. C. Nguyen,
Tuyen T. M. Nguyen,
Thieu N. Vo
Abstract:
This paper presents a classification of 7-dimensional real and complex indecomposable solvable Lie algebras having some 5-dimensional nilradicals. Afterwards, we combine our results with those of Rubin and Winternitz (1993), Ndogmo and Winternitz (1994), Snobl and Winternitz (2005, 2009), Snobl and Karásek (2010) to obtain a complete classification of 7-dimensional real and complex indecomposable…
▽ More
This paper presents a classification of 7-dimensional real and complex indecomposable solvable Lie algebras having some 5-dimensional nilradicals. Afterwards, we combine our results with those of Rubin and Winternitz (1993), Ndogmo and Winternitz (1994), Snobl and Winternitz (2005, 2009), Snobl and Karásek (2010) to obtain a complete classification of 7-dimensional real and complex indecomposable solvable Lie algebras with 5-dimensional nilradicals. In association with Gong (1998), Parry (2007), Hindeleh and Thompson (2008), we achieve a classification of 7-dimensional real and complex indecomposable solvable Lie algebras.
△ Less
Submitted 8 July, 2021;
originally announced July 2021.
-
Does Your Dermatology Classifier Know What It Doesn't Know? Detecting the Long-Tail of Unseen Conditions
Authors:
Abhijit Guha Roy,
Jie Ren,
Shekoofeh Azizi,
Aaron Loh,
Vivek Natarajan,
Basil Mustafa,
Nick Pawlowski,
Jan Freyberg,
Yuan Liu,
Zach Beaver,
Nam Vo,
Peggy Bui,
Samantha Winter,
Patricia MacWilliams,
Greg S. Corrado,
Umesh Telang,
Yun Liu,
Taylan Cemgil,
Alan Karthikesalingam,
Balaji Lakshminarayanan,
Jim Winkens
Abstract:
We develop and rigorously evaluate a deep learning based system that can accurately classify skin conditions while detecting rare conditions for which there is not enough data available for training a confident classifier. We frame this task as an out-of-distribution (OOD) detection problem. Our novel approach, hierarchical outlier detection (HOD) assigns multiple abstention classes for each train…
▽ More
We develop and rigorously evaluate a deep learning based system that can accurately classify skin conditions while detecting rare conditions for which there is not enough data available for training a confident classifier. We frame this task as an out-of-distribution (OOD) detection problem. Our novel approach, hierarchical outlier detection (HOD) assigns multiple abstention classes for each training outlier class and jointly performs a coarse classification of inliers vs. outliers, along with fine-grained classification of the individual classes. We demonstrate the effectiveness of the HOD loss in conjunction with modern representation learning approaches (BiT, SimCLR, MICLe) and explore different ensembling strategies for further improving the results. We perform an extensive subgroup analysis over conditions of varying risk levels and different skin types to investigate how the OOD detection performance changes over each subgroup and demonstrate the gains of our framework in comparison to baselines. Finally, we introduce a cost metric to approximate downstream clinical impact. We use this cost metric to compare the proposed method against a baseline system, thereby making a stronger case for the overall system effectiveness in a real-world deployment scenario.
△ Less
Submitted 8 April, 2021;
originally announced April 2021.
-
Recognizing and Splitting Conditional Sentences for Automation of Business Processes Management
Authors:
Ngoc Phuoc An Vo,
Irene Manotas,
Octavian Popescu,
Algimantas Cerniauskas,
Vadim Sheinin
Abstract:
Business Process Management (BPM) is the discipline which is responsible for management of discovering, analyzing, redesigning, monitoring, and controlling business processes. One of the most crucial tasks of BPM is discovering and modelling business processes from text documents. In this paper, we present our system that resolves an end-to-end problem consisting of 1) recognizing conditional sent…
▽ More
Business Process Management (BPM) is the discipline which is responsible for management of discovering, analyzing, redesigning, monitoring, and controlling business processes. One of the most crucial tasks of BPM is discovering and modelling business processes from text documents. In this paper, we present our system that resolves an end-to-end problem consisting of 1) recognizing conditional sentences from technical documents, 2) finding boundaries to extract conditional and resultant clauses from each conditional sentence, and 3) categorizing resultant clause as Action or Consequence which later helps to generate new steps in our business process model automatically. We created a new dataset and three models solve this problem. Our best model achieved very promising results of 83.82, 87.84, and 85.75 for Precision, Recall, and F1, respectively, for extracting Condition, Action, and Consequence clauses using Exact Match metric.
△ Less
Submitted 1 April, 2021;
originally announced April 2021.
-
Secrecy Performance of Small-Cell Networks with Transmitter Selection and Unreliable Backhaul under Spectrum Sharing Environment
Authors:
Jinghua Zhang,
Chinmoy Kundu,
Octavia A. Dobre,
Emi Garcia-Palacios,
Nguyen-Son Vo
Abstract:
We investigate the secrecy performance of an underlay small-cell cognitive radio network under unreliable backhaul connections. The small-cell network shares the same spectrum with the primary network, ensuring that a desired outage probability constraint is always met in the primary network. {To improve the security of the small-cell cognitive network, we propose three sub-optimal small-cell tran…
▽ More
We investigate the secrecy performance of an underlay small-cell cognitive radio network under unreliable backhaul connections. The small-cell network shares the same spectrum with the primary network, ensuring that a desired outage probability constraint is always met in the primary network. {To improve the security of the small-cell cognitive network, we propose three sub-optimal small-cell transmitter selection schemes,} namely sub-optimal transmitter selection, minimal interference selection, and minimal eavesdropping selection. Closed-form expressions of the non-zero secrecy rate, secrecy outage probability, and ergodic secrecy capacity are provided for the schemes along with asymptotic expressions. {We also propose an optimal selection scheme and compare performances with the sub-optimal selection schemes.} {Computable expressions for the non-zero secrecy rate and secrecy outage probability are presented for the optimal selection scheme.} Our results show that by increasing the primary transmitter's power and the number of small-cell transmitters, the system performance improves. The selection scheme, the backhaul reliability, and the primary user quality-of-service constraint also have a significant impact on secrecy performance.
△ Less
Submitted 7 March, 2021;
originally announced March 2021.
-
Testing isomorphism of complex and real Lie algebras
Authors:
Tuan A. Nguyen,
Vu A. Le,
Thieu N. Vo
Abstract:
In this paper, we give algorithms for determining the existence of isomorphism between two finite-dimensional Lie algebras and compute such an isomorphism in the affirrmative case. We also provide algorithms for determining algebraic relations of parameters in order to decide whether two parameterized Lie algebras are isomorphic. All of the considered Lie algebras are considered over a field $\F$,…
▽ More
In this paper, we give algorithms for determining the existence of isomorphism between two finite-dimensional Lie algebras and compute such an isomorphism in the affirrmative case. We also provide algorithms for determining algebraic relations of parameters in order to decide whether two parameterized Lie algebras are isomorphic. All of the considered Lie algebras are considered over a field $\F$, where $\F=\C$ or $\F=\R$. Several illustrative examples are given to show the applicability and the effectiveness of the proposed algorithms.
△ Less
Submitted 21 February, 2021;
originally announced February 2021.
-
Hierarchical Multi-head Attentive Network for Evidence-aware Fake News Detection
Authors:
Nguyen Vo,
Kyumin Lee
Abstract:
The widespread of fake news and misinformation in various domains ranging from politics, economics to public health has posed an urgent need to automatically fact-check information. A recent trend in fake news detection is to utilize evidence from external sources. However, existing evidence-aware fake news detection methods focused on either only word-level attention or evidence-level attention,…
▽ More
The widespread of fake news and misinformation in various domains ranging from politics, economics to public health has posed an urgent need to automatically fact-check information. A recent trend in fake news detection is to utilize evidence from external sources. However, existing evidence-aware fake news detection methods focused on either only word-level attention or evidence-level attention, which may result in suboptimal performance. In this paper, we propose a Hierarchical Multi-head Attentive Network to fact-check textual claims. Our model jointly combines multi-head word-level attention and multi-head document-level attention, which aid explanation in both word-level and evidence-level. Experiments on two real-word datasets show that our model outperforms seven state-of-the-art baselines. Improvements over baselines are from 6\% to 18\%. Our source code and datasets are released at \texttt{\url{https://github.com/nguyenvo09/EACL2021}}.
△ Less
Submitted 4 February, 2021;
originally announced February 2021.
-
Simplified DOM Trees for Transferable Attribute Extraction from the Web
Authors:
Yichao Zhou,
Ying Sheng,
Nguyen Vo,
Nick Edmonds,
Sandeep Tata
Abstract:
There has been a steady need to precisely extract structured knowledge from the web (i.e. HTML documents). Given a web page, extracting a structured object along with various attributes of interest (e.g. price, publisher, author, and genre for a book) can facilitate a variety of downstream applications such as large-scale knowledge base construction, e-commerce product search, and personalized rec…
▽ More
There has been a steady need to precisely extract structured knowledge from the web (i.e. HTML documents). Given a web page, extracting a structured object along with various attributes of interest (e.g. price, publisher, author, and genre for a book) can facilitate a variety of downstream applications such as large-scale knowledge base construction, e-commerce product search, and personalized recommendation. Considering each web page is rendered from an HTML DOM tree, existing approaches formulate the problem as a DOM tree node tagging task. However, they either rely on computationally expensive visual feature engineering or are incapable of modeling the relationship among the tree nodes. In this paper, we propose a novel transferable method, Simplified DOM Trees for Attribute Extraction (SimpDOM), to tackle the problem by efficiently retrieving useful context for each node by leveraging the tree structure. We study two challenging experimental settings: (i) intra-vertical few-shot extraction, and (ii) cross-vertical fewshot extraction with out-of-domain knowledge, to evaluate our approach. Extensive experiments on the SWDE public dataset show that SimpDOM outperforms the state-of-the-art (SOTA) method by 1.44% on the F1 score. We also find that utilizing knowledge from a different vertical (cross-vertical extraction) is surprisingly useful and helps beat the SOTA by a further 1.37%.
△ Less
Submitted 7 January, 2021;
originally announced January 2021.
-
FreeDOM: A Transferable Neural Architecture for Structured Information Extraction on Web Documents
Authors:
Bill Yuchen Lin,
Ying Sheng,
Nguyen Vo,
Sandeep Tata
Abstract:
Extracting structured data from HTML documents is a long-studied problem with a broad range of applications like augmenting knowledge bases, supporting faceted search, and providing domain-specific experiences for key verticals like shopping and movies. Previous approaches have either required a small number of examples for each target site or relied on carefully handcrafted heuristics built over…
▽ More
Extracting structured data from HTML documents is a long-studied problem with a broad range of applications like augmenting knowledge bases, supporting faceted search, and providing domain-specific experiences for key verticals like shopping and movies. Previous approaches have either required a small number of examples for each target site or relied on carefully handcrafted heuristics built over visual renderings of websites. In this paper, we present a novel two-stage neural approach, named FreeDOM, which overcomes both these limitations. The first stage learns a representation for each DOM node in the page by combining both the text and markup information. The second stage captures longer range distance and semantic relatedness using a relational neural network. By combining these stages, FreeDOM is able to generalize to unseen sites after training on a small number of seed sites from that vertical without requiring expensive hand-crafted features over visual renderings of the page. Through experiments on a public dataset with 8 different verticals, we show that FreeDOM beats the previous state of the art by nearly 3.7 F1 points on average without requiring features over rendered pages or expensive hand-crafted features.
△ Less
Submitted 21 October, 2020;
originally announced October 2020.
-
Where Are the Facts? Searching for Fact-checked Information to Alleviate the Spread of Fake News
Authors:
Nguyen Vo,
Kyumin Lee
Abstract:
Although many fact-checking systems have been developed in academia and industry, fake news is still proliferating on social media. These systems mostly focus on fact-checking but usually neglect online users who are the main drivers of the spread of misinformation. How can we use fact-checked information to improve users' consciousness of fake news to which they are exposed? How can we stop users…
▽ More
Although many fact-checking systems have been developed in academia and industry, fake news is still proliferating on social media. These systems mostly focus on fact-checking but usually neglect online users who are the main drivers of the spread of misinformation. How can we use fact-checked information to improve users' consciousness of fake news to which they are exposed? How can we stop users from spreading fake news? To tackle these questions, we propose a novel framework to search for fact-checking articles, which address the content of an original tweet (that may contain misinformation) posted by online users. The search can directly warn fake news posters and online users (e.g. the posters' followers) about misinformation, discourage them from spreading fake news, and scale up verified content on social media. Our framework uses both text and images to search for fact-checking articles, and achieves promising results on real-world datasets. Our code and datasets are released at https://github.com/nguyenvo09/EMNLP2020.
△ Less
Submitted 7 October, 2020;
originally announced October 2020.
-
On the problem of classifying solvable Lie algebras having small codimensional derived algebras
Authors:
Hoa Q. Duong,
Vu A. Le,
Tuan A. Nguyen,
Hai T. T. Cao,
Thieu N. Vo
Abstract:
This paper concerns the problem of classifying finite-dimensional real solvable Lie algebras whose derived algebras are of codimension 1 or 2. On the one hand, we present an effective method to classify all $(n+1)$-dimensional real solvable Lie algebras having 1-codimensional derived algebras provided that a full classification of $n$-dimensional nilpotent Lie algebras is given. On the other hand,…
▽ More
This paper concerns the problem of classifying finite-dimensional real solvable Lie algebras whose derived algebras are of codimension 1 or 2. On the one hand, we present an effective method to classify all $(n+1)$-dimensional real solvable Lie algebras having 1-codimensional derived algebras provided that a full classification of $n$-dimensional nilpotent Lie algebras is given. On the other hand, the problem of classifying all $(n+2)$-dimensional real solvable Lie algebras having 2-codimensional derived algebras is proved to be wild. In this case, we provide a method to classify a subclass of the considered Lie algebras which are extended from their derived algebras by a pair of derivations containing at least one inner derivation.
△ Less
Submitted 10 March, 2020;
originally announced March 2020.
-
Towards Reading Beyond Faces for Sparsity-Aware 4D Affect Recognition
Authors:
Muzammil Behzad,
Nhat Vo,
Xiaobai Li,
Guoying Zhao
Abstract:
In this paper, we present a sparsity-aware deep network for automatic 4D facial expression recognition (FER). Given 4D data, we first propose a novel augmentation method to combat the data limitation problem for deep learning. This is achieved by projecting the input data into RGB and depth map images and then iteratively performing randomized channel concatenation. Encoded in the given 3D landmar…
▽ More
In this paper, we present a sparsity-aware deep network for automatic 4D facial expression recognition (FER). Given 4D data, we first propose a novel augmentation method to combat the data limitation problem for deep learning. This is achieved by projecting the input data into RGB and depth map images and then iteratively performing randomized channel concatenation. Encoded in the given 3D landmarks, we also introduce an effective way to capture the facial muscle movements from three orthogonal plans (TOP), the TOP-landmarks over multi-views. Importantly, we then present a sparsity-aware deep network to compute the sparse representations of convolutional features over multi-views. This is not only effective for a higher recognition accuracy but is also computationally convenient. For training, the TOP-landmarks and sparse representations are used to train a long short-term memory (LSTM) network. The refined predictions are achieved when the learned features collaborate over multi-views. Extensive experimental results achieved on the BU-4DFE dataset show the significance of our method over the state-of-the-art methods by reaching a promising accuracy of 99.69% for 4D FER.
△ Less
Submitted 19 August, 2020; v1 submitted 8 February, 2020;
originally announced February 2020.
-
A Bayesian Filter for Multi-view 3D Multi-object Tracking with Occlusion Handling
Authors:
Jonah Ong,
Ba Tuong Vo,
Ba Ngu Vo,
Du Yong Kim,
Sven Nordholm
Abstract:
This paper proposes an online multi-camera multi-object tracker that only requires monocular detector training, independent of the multi-camera configurations, allowing seamless extension/deletion of cameras without retraining effort. The proposed algorithm has a linear complexity in the total number of detections across the cameras, and hence scales gracefully with the number of cameras. It opera…
▽ More
This paper proposes an online multi-camera multi-object tracker that only requires monocular detector training, independent of the multi-camera configurations, allowing seamless extension/deletion of cameras without retraining effort. The proposed algorithm has a linear complexity in the total number of detections across the cameras, and hence scales gracefully with the number of cameras. It operates in the 3D world frame, and provides 3D trajectory estimates of the objects. The key innovation is a high fidelity yet tractable 3D occlusion model, amenable to optimal Bayesian multi-view multi-object filtering, which seamlessly integrates, into a single Bayesian recursion, the sub-tasks of track management, state estimation, clutter rejection, and occlusion/misdetection handling. The proposed algorithm is evaluated on the latest WILDTRACKS dataset, and demonstrated to work in very crowded scenes on a new dataset.
△ Less
Submitted 27 October, 2020; v1 submitted 13 January, 2020;
originally announced January 2020.
-
Attributed Multi-Relational Attention Network for Fact-checking URL Recommendation
Authors:
Di You,
Nguyen Vo,
Kyumin Lee,
Qiang Liu
Abstract:
To combat fake news, researchers mostly focused on detecting fake news and journalists built and maintained fact-checking sites (e.g., Snopes.com and Politifact.com). However, fake news dissemination has been greatly promoted via social media sites, and these fact-checking sites have not been fully utilized. To overcome these problems and complement existing methods against fake news, in this pape…
▽ More
To combat fake news, researchers mostly focused on detecting fake news and journalists built and maintained fact-checking sites (e.g., Snopes.com and Politifact.com). However, fake news dissemination has been greatly promoted via social media sites, and these fact-checking sites have not been fully utilized. To overcome these problems and complement existing methods against fake news, in this paper we propose a deep-learning based fact-checking URL recommender system to mitigate impact of fake news in social media sites such as Twitter and Facebook. In particular, our proposed framework consists of a multi-relational attentive module and a heterogeneous graph attention network to learn complex/semantic relationship between user-URL pairs, user-user pairs, and URL-URL pairs. Extensive experiments on a real-world dataset show that our proposed framework outperforms eight state-of-the-art recommendation models, achieving at least 3~5.3% improvement.
△ Less
Submitted 7 January, 2020;
originally announced January 2020.
-
Landmarks-assisted Collaborative Deep Framework for Automatic 4D Facial Expression Recognition
Authors:
Muzammil Behzad,
Nhat Vo,
Xiaobai Li,
Guoying Zhao
Abstract:
We propose a novel landmarks-assisted collaborative end-to-end deep framework for automatic 4D FER. Using 4D face scan data, we calculate its various geometrical images, and afterwards use rank pooling to generate their dynamic images encapsulating important facial muscle movements over time. As well, the given 3D landmarks are projected on a 2D plane as binary images and convolutional layers are…
▽ More
We propose a novel landmarks-assisted collaborative end-to-end deep framework for automatic 4D FER. Using 4D face scan data, we calculate its various geometrical images, and afterwards use rank pooling to generate their dynamic images encapsulating important facial muscle movements over time. As well, the given 3D landmarks are projected on a 2D plane as binary images and convolutional layers are used to extract sequences of feature vectors for every landmark video. During the training stage, the dynamic images are used to train an end-to-end deep network, while the feature vectors of landmark images are used train a long short-term memory (LSTM) network. The finally improved set of expression predictions are obtained when the dynamic and landmark images collaborate over multi-views using the proposed deep framework. Performance results obtained from extensive experimentation on the widely-adopted BU-4DFE database under globally used settings prove that our proposed collaborative framework outperforms the state-of-the-art 4D FER methods and reach a promising classification accuracy of 96.7% demonstrating its effectiveness.
△ Less
Submitted 7 February, 2020; v1 submitted 11 October, 2019;
originally announced October 2019.
-
Learning from Fact-checkers: Analysis and Generation of Fact-checking Language
Authors:
Nguyen Vo,
Kyumin Lee
Abstract:
In fighting against fake news, many fact-checking systems comprised of human-based fact-checking sites (e.g., snopes.com and politifact.com) and automatic detection systems have been developed in recent years. However, online users still keep sharing fake news even when it has been debunked. It means that early fake news detection may be insufficient and we need another complementary approach to m…
▽ More
In fighting against fake news, many fact-checking systems comprised of human-based fact-checking sites (e.g., snopes.com and politifact.com) and automatic detection systems have been developed in recent years. However, online users still keep sharing fake news even when it has been debunked. It means that early fake news detection may be insufficient and we need another complementary approach to mitigate the spread of misinformation. In this paper, we introduce a novel application of text generation for combating fake news. In particular, we (1) leverage online users named \emph{fact-checkers}, who cite fact-checking sites as credible evidences to fact-check information in public discourse; (2) analyze linguistic characteristics of fact-checking tweets; and (3) propose and build a deep learning framework to generate responses with fact-checking intention to increase the fact-checkers' engagement in fact-checking activities. Our analysis reveals that the fact-checkers tend to refute misinformation and use formal language (e.g. few swear words and Internet slangs). Our framework successfully generates relevant responses, and outperforms competing models by achieving up to 30\% improvements. Our qualitative study also confirms that the superiority of our generated responses compared with responses generated from the existing models.
△ Less
Submitted 4 October, 2019;
originally announced October 2019.
-
Automatic 4D Facial Expression Recognition via Collaborative Cross-domain Dynamic Image Network
Authors:
Muzammil Behzad,
Nhat Vo,
Xiaobai Li,
Guoying Zhao
Abstract:
This paper proposes a novel 4D Facial Expression Recognition (FER) method using Collaborative Cross-domain Dynamic Image Network (CCDN). Given a 4D data of face scans, we first compute its geometrical images, and then combine their correlated information in the proposed cross-domain image representations. The acquired set is then used to generate cross-domain dynamic images (CDI) via rank pooling…
▽ More
This paper proposes a novel 4D Facial Expression Recognition (FER) method using Collaborative Cross-domain Dynamic Image Network (CCDN). Given a 4D data of face scans, we first compute its geometrical images, and then combine their correlated information in the proposed cross-domain image representations. The acquired set is then used to generate cross-domain dynamic images (CDI) via rank pooling that encapsulates facial deformations over time in terms of a single image. For the training phase, these CDIs are fed into an end-to-end deep learning model, and the resultant predictions collaborate over multi-views for performance gain in expression classification. Furthermore, we propose a 4D augmentation scheme that not only expands the training data scale but also introduces significant facial muscle movement patterns to improve the FER performance. Results from extensive experiments on the commonly used BU-4DFE dataset under widely adopted settings show that our proposed method outperforms the state-of-the-art 4D FER methods by achieving an accuracy of 96.5% indicating its effectiveness.
△ Less
Submitted 7 February, 2020; v1 submitted 6 May, 2019;
originally announced May 2019.