-
Compensating Spatiotemporally Inconsistent Observations for Online Dynamic 3D Gaussian Splatting
Authors:
Youngsik Yun,
Jeongmin Bae,
Hyunseung Son,
Seoha Kim,
Hahyun Lee,
Gun Bang,
Youngjung Uh
Abstract:
Online reconstruction of dynamic scenes is significant as it enables learning scenes from live-streaming video inputs, while existing offline dynamic reconstruction methods rely on recorded video inputs. However, previous online reconstruction approaches have primarily focused on efficiency and rendering quality, overlooking the temporal consistency of their results, which often contain noticeable…
▽ More
Online reconstruction of dynamic scenes is significant as it enables learning scenes from live-streaming video inputs, while existing offline dynamic reconstruction methods rely on recorded video inputs. However, previous online reconstruction approaches have primarily focused on efficiency and rendering quality, overlooking the temporal consistency of their results, which often contain noticeable artifacts in static regions. This paper identifies that errors such as noise in real-world recordings affect temporal inconsistency in online reconstruction. We propose a method that enhances temporal consistency in online reconstruction from observations with temporal inconsistency which is inevitable in cameras. We show that our method restores the ideal observation by subtracting the learned error. We demonstrate that applying our method to various baselines significantly enhances both temporal consistency and rendering quality across datasets. Code, video results, and checkpoints are available at https://bbangsik13.github.io/OR2.
△ Less
Submitted 2 May, 2025;
originally announced May 2025.
-
BOP Challenge 2024 on Model-Based and Model-Free 6D Object Pose Estimation
Authors:
Van Nguyen Nguyen,
Stephen Tyree,
Andrew Guo,
Mederic Fourmy,
Anas Gouda,
Taeyeop Lee,
Sungphill Moon,
Hyeontae Son,
Lukas Ranftl,
Jonathan Tremblay,
Eric Brachmann,
Bertram Drost,
Vincent Lepetit,
Carsten Rother,
Stan Birchfield,
Jiri Matas,
Yann Labbe,
Martin Sundermeyer,
Tomas Hodan
Abstract:
We present the evaluation methodology, datasets and results of the BOP Challenge 2024, the 6th in a series of public competitions organized to capture the state of the art in 6D object pose estimation and related tasks. In 2024, our goal was to transition BOP from lab-like setups to real-world scenarios. First, we introduced new model-free tasks, where no 3D object models are available and methods…
▽ More
We present the evaluation methodology, datasets and results of the BOP Challenge 2024, the 6th in a series of public competitions organized to capture the state of the art in 6D object pose estimation and related tasks. In 2024, our goal was to transition BOP from lab-like setups to real-world scenarios. First, we introduced new model-free tasks, where no 3D object models are available and methods need to onboard objects just from provided reference videos. Second, we defined a new, more practical 6D object detection task where identities of objects visible in a test image are not provided as input. Third, we introduced new BOP-H3 datasets recorded with high-resolution sensors and AR/VR headsets, closely resembling real-world scenarios. BOP-H3 include 3D models and onboarding videos to support both model-based and model-free tasks. Participants competed on seven challenge tracks. Notably, the best 2024 method for model-based 6D localization of unseen objects (FreeZeV2.1) achieves 22% higher accuracy on BOP-Classic-Core than the best 2023 method (GenFlow), and is only 4% behind the best 2023 method for seen objects (GPose2023) although being significantly slower (24.9 vs 2.7s per image). A more practical 2024 method for this task is Co-op which takes only 0.8s per image and is 13% more accurate than GenFlow. Methods have similar rankings on 6D detection as on 6D localization but higher run time. On model-based 2D detection of unseen objects, the best 2024 method (MUSE) achieves 21--29% relative improvement compared to the best 2023 method (CNOS). However, the 2D detection accuracy for unseen objects is still -35% behind the accuracy for seen objects (GDet2023), and the 2D detection stage is consequently the main bottleneck of existing pipelines for 6D localization/detection of unseen objects. The online evaluation system stays open and is available at http://bop.felk.cvut.cz/
△ Less
Submitted 23 April, 2025; v1 submitted 3 April, 2025;
originally announced April 2025.
-
Co-op: Correspondence-based Novel Object Pose Estimation
Authors:
Sungphill Moon,
Hyeontae Son,
Dongcheol Hur,
Sangwook Kim
Abstract:
We propose Co-op, a novel method for accurately and robustly estimating the 6DoF pose of objects unseen during training from a single RGB image. Our method requires only the CAD model of the target object and can precisely estimate its pose without any additional fine-tuning. While existing model-based methods suffer from inefficiency due to using a large number of templates, our method enables fa…
▽ More
We propose Co-op, a novel method for accurately and robustly estimating the 6DoF pose of objects unseen during training from a single RGB image. Our method requires only the CAD model of the target object and can precisely estimate its pose without any additional fine-tuning. While existing model-based methods suffer from inefficiency due to using a large number of templates, our method enables fast and accurate estimation with a small number of templates. This improvement is achieved by finding semi-dense correspondences between the input image and the pre-rendered templates. Our method achieves strong generalization performance by leveraging a hybrid representation that combines patch-level classification and offset regression. Additionally, our pose refinement model estimates probabilistic flow between the input image and the rendered image, refining the initial estimate to an accurate pose using a differentiable PnP layer. We demonstrate that our method not only estimates object poses rapidly but also outperforms existing methods by a large margin on the seven core datasets of the BOP Challenge, achieving state-of-the-art accuracy.
△ Less
Submitted 22 March, 2025;
originally announced March 2025.
-
Introducing Verification Task of Set Consistency with Set-Consistency Energy Networks
Authors:
Mooho Song,
Hyeryung Son,
Jay-Yoon Lee
Abstract:
Examining logical inconsistencies among multiple statements (such as collections of sentences or question-answer pairs) is a crucial challenge in machine learning, particularly for ensuring the safety and reliability of models. Traditional methods that rely on pairwise comparisons often fail to capture inconsistencies that only emerge when more than two statements are evaluated collectively. To ad…
▽ More
Examining logical inconsistencies among multiple statements (such as collections of sentences or question-answer pairs) is a crucial challenge in machine learning, particularly for ensuring the safety and reliability of models. Traditional methods that rely on pairwise comparisons often fail to capture inconsistencies that only emerge when more than two statements are evaluated collectively. To address this gap, we introduce the task of set-consistency verification, an extension of natural language inference (NLI) that assesses the logical coherence of entire sets rather than isolated pairs. Building on this task, we present the Set-Consistency Energy Network (SC-Energy), a novel model that employs a contrastive loss framework to learn the compatibility among a collection of statements. Our approach not only efficiently verifies inconsistencies and pinpoints the specific statements responsible for logical contradictions, but also significantly outperforms existing methods including prompting-based LLM models. Furthermore, we release two new datasets: Set-LConVQA and Set-SNLI for set-consistency verification task.
△ Less
Submitted 19 March, 2025; v1 submitted 12 March, 2025;
originally announced March 2025.
-
SparseVoxFormer: Sparse Voxel-based Transformer for Multi-modal 3D Object Detection
Authors:
Hyeongseok Son,
Jia He,
Seung-In Park,
Ying Min,
Yunhao Zhang,
ByungIn Yoo
Abstract:
Most previous 3D object detection methods that leverage the multi-modality of LiDAR and cameras utilize the Bird's Eye View (BEV) space for intermediate feature representation. However, this space uses a low x, y-resolution and sacrifices z-axis information to reduce the overall feature resolution, which may result in declined accuracy. To tackle the problem of using low-resolution features, this…
▽ More
Most previous 3D object detection methods that leverage the multi-modality of LiDAR and cameras utilize the Bird's Eye View (BEV) space for intermediate feature representation. However, this space uses a low x, y-resolution and sacrifices z-axis information to reduce the overall feature resolution, which may result in declined accuracy. To tackle the problem of using low-resolution features, this paper focuses on the sparse nature of LiDAR point cloud data. From our observation, the number of occupied cells in the 3D voxels constructed from a LiDAR data can be even fewer than the number of total cells in the BEV map, despite the voxels' significantly higher resolution. Based on this, we introduce a novel sparse voxel-based transformer network for 3D object detection, dubbed as SparseVoxFormer. Instead of performing BEV feature extraction, we directly leverage sparse voxel features as the input for a transformer-based detector. Moreover, with regard to the camera modality, we introduce an explicit modality fusion approach that involves projecting 3D voxel coordinates onto 2D images and collecting the corresponding image features. Thanks to these components, our approach can leverage geometrically richer multi-modal features while even reducing the computational cost. Beyond the proof-of-concept level, we further focus on facilitating better multi-modal fusion and flexible control over the number of sparse features. Finally, thorough experimental results demonstrate that utilizing a significantly smaller number of sparse features drastically reduces computational costs in a 3D object detector while enhancing both overall and long-range performance.
△ Less
Submitted 11 March, 2025;
originally announced March 2025.
-
ELM-DeepONets: Backpropagation-Free Training of Deep Operator Networks via Extreme Learning Machines
Authors:
Hwijae Son
Abstract:
Deep Operator Networks (DeepONets) are among the most prominent frameworks for operator learning, grounded in the universal approximation theorem for operators. However, training DeepONets typically requires significant computational resources. To address this limitation, we propose ELM-DeepONets, an Extreme Learning Machine (ELM) framework for DeepONets that leverages the backpropagation-free nat…
▽ More
Deep Operator Networks (DeepONets) are among the most prominent frameworks for operator learning, grounded in the universal approximation theorem for operators. However, training DeepONets typically requires significant computational resources. To address this limitation, we propose ELM-DeepONets, an Extreme Learning Machine (ELM) framework for DeepONets that leverages the backpropagation-free nature of ELM. By reformulating DeepONet training as a least-squares problem for newly introduced parameters, the ELM-DeepONet approach significantly reduces training complexity. Validation on benchmark problems, including nonlinear ODEs and PDEs, demonstrates that the proposed method not only achieves superior accuracy but also drastically reduces computational costs. This work offers a scalable and efficient alternative for operator learning in scientific computing.
△ Less
Submitted 16 January, 2025;
originally announced January 2025.
-
Scalable Quantum-Inspired Optimization through Dynamic Qubit Compression
Authors:
Co Tran,
Quoc-Bao Tran,
Hy Truong Son,
Thang N Dinh
Abstract:
Hard combinatorial optimization problems, often mapped to Ising models, promise potential solutions with quantum advantage but are constrained by limited qubit counts in near-term devices. We present an innovative quantum-inspired framework that dynamically compresses large Ising models to fit available quantum hardware of different sizes. Thus, we aim to bridge the gap between large-scale optimiz…
▽ More
Hard combinatorial optimization problems, often mapped to Ising models, promise potential solutions with quantum advantage but are constrained by limited qubit counts in near-term devices. We present an innovative quantum-inspired framework that dynamically compresses large Ising models to fit available quantum hardware of different sizes. Thus, we aim to bridge the gap between large-scale optimization and current hardware capabilities. Our method leverages a physics-inspired GNN architecture to capture complex interactions in Ising models and accurately predict alignments among neighboring spins (aka qubits) at ground states. By progressively merging such aligned spins, we can reduce the model size while preserving the underlying optimization structure. It also provides a natural trade-off between the solution quality and size reduction, meeting different hardware constraints of quantum computing devices. Extensive numerical studies on Ising instances of diverse topologies show that our method can reduce instance size at multiple levels with virtually no losses in solution quality on the latest D-wave quantum annealers.
△ Less
Submitted 24 December, 2024;
originally announced December 2024.
-
Not All Adapters Matter: Selective Adapter Freezing for Memory-Efficient Fine-Tuning of Language Models
Authors:
Hyegang Son,
Yonglak Son,
Changhoon Kim,
Young Geun Kim
Abstract:
Transformer-based large-scale pre-trained models achieve great success. Fine-tuning is the standard practice for leveraging these models in downstream tasks. Among the fine-tuning methods, adapter-tuning provides a parameter-efficient fine-tuning by introducing lightweight trainable modules while keeping most pre-trained parameters frozen. However, existing adapter-tuning methods still impose subs…
▽ More
Transformer-based large-scale pre-trained models achieve great success. Fine-tuning is the standard practice for leveraging these models in downstream tasks. Among the fine-tuning methods, adapter-tuning provides a parameter-efficient fine-tuning by introducing lightweight trainable modules while keeping most pre-trained parameters frozen. However, existing adapter-tuning methods still impose substantial resource usage. Through our investigation, we show that each adapter unequally contributes to both task performance and resource usage. Motivated by this insight, we propose Selective Adapter FrEezing (SAFE), which gradually freezes less important adapters early to reduce unnecessary resource usage while maintaining performance. In our experiments, SAFE reduces memory usage, computation amount, and training time by 42.85\%, 34.59\%, and 11.82\%, respectively, while achieving comparable or better task performance compared to the baseline. We also demonstrate that SAFE induces regularization effect, thereby smoothing the loss landscape, which enables the model to generalize better by avoiding sharp minima.
△ Less
Submitted 15 May, 2025; v1 submitted 26 November, 2024;
originally announced December 2024.
-
Physics-Informed Deep Inverse Operator Networks for Solving PDE Inverse Problems
Authors:
Sung Woong Cho,
Hwijae Son
Abstract:
Inverse problems involving partial differential equations (PDEs) can be seen as discovering a mapping from measurement data to unknown quantities, often framed within an operator learning approach. However, existing methods typically rely on large amounts of labeled training data, which is impractical for most real-world applications. Moreover, these supervised models may fail to capture the under…
▽ More
Inverse problems involving partial differential equations (PDEs) can be seen as discovering a mapping from measurement data to unknown quantities, often framed within an operator learning approach. However, existing methods typically rely on large amounts of labeled training data, which is impractical for most real-world applications. Moreover, these supervised models may fail to capture the underlying physical principles accurately. To address these limitations, we propose a novel architecture called Physics-Informed Deep Inverse Operator Networks (PI-DIONs), which can learn the solution operator of PDE-based inverse problems without labeled training data. We extend the stability estimates established in the inverse problem literature to the operator learning framework, thereby providing a robust theoretical foundation for our method. These estimates guarantee that the proposed model, trained on a finite sample and grid, generalizes effectively across the entire domain and function space. Extensive experiments are conducted to demonstrate that PI-DIONs can effectively and accurately learn the solution operators of the inverse problems without the need for labeled data.
△ Less
Submitted 7 February, 2025; v1 submitted 4 December, 2024;
originally announced December 2024.
-
Rethinking Top Probability from Multi-view for Distracted Driver Behaviour Localization
Authors:
Quang Vinh Nguyen,
Vo Hoang Thanh Son,
Chau Truong Vinh Hoang,
Duc Duy Nguyen,
Nhat Huy Nguyen Minh,
Soo-Hyung Kim
Abstract:
Naturalistic driving action localization task aims to recognize and comprehend human behaviors and actions from video data captured during real-world driving scenarios. Previous studies have shown great action localization performance by applying a recognition model followed by probability-based post-processing. Nevertheless, the probabilities provided by the recognition model frequently contain c…
▽ More
Naturalistic driving action localization task aims to recognize and comprehend human behaviors and actions from video data captured during real-world driving scenarios. Previous studies have shown great action localization performance by applying a recognition model followed by probability-based post-processing. Nevertheless, the probabilities provided by the recognition model frequently contain confused information causing challenge for post-processing. In this work, we adopt an action recognition model based on self-supervise learning to detect distracted activities and give potential action probabilities. Subsequently, a constraint ensemble strategy takes advantages of multi-camera views to provide robust predictions. Finally, we introduce a conditional post-processing operation to locate distracted behaviours and action temporal boundaries precisely. Experimenting on test set A2, our method obtains the sixth position on the public leaderboard of track 3 of the 2024 AI City Challenge.
△ Less
Submitted 19 November, 2024;
originally announced November 2024.
-
ISDNN: A Deep Neural Network for Channel Estimation in Massive MIMO systems
Authors:
Do Hai Son,
Vu Tung Lam,
Tran Thi Thuy Quynh
Abstract:
Massive Multiple-Input Multiple-Output (massive MIMO) technology stands as a cornerstone in 5G and beyonds. Despite the remarkable advancements offered by massive MIMO technology, the extreme number of antennas introduces challenges during the channel estimation (CE) phase. In this paper, we propose a single-step Deep Neural Network (DNN) for CE, termed Iterative Sequential DNN (ISDNN), inspired b…
▽ More
Massive Multiple-Input Multiple-Output (massive MIMO) technology stands as a cornerstone in 5G and beyonds. Despite the remarkable advancements offered by massive MIMO technology, the extreme number of antennas introduces challenges during the channel estimation (CE) phase. In this paper, we propose a single-step Deep Neural Network (DNN) for CE, termed Iterative Sequential DNN (ISDNN), inspired by recent developments in data detection algorithms. ISDNN is a DNN based on the projected gradient descent algorithm for CE problems, with the iterative iterations transforming into a DNN using the deep unfolding method. Furthermore, we introduce the structured channel ISDNN (S-ISDNN), extending ISDNN to incorporate side information such as directions of signals and antenna array configurations for enhanced CE. Simulation results highlight that ISDNN significantly outperforms another DNN-based CE (DetNet), in terms of training time (13%), running time (4.6%), and accuracy (0.43 dB). Furthermore, the S-ISDNN demonstrates even faster than ISDNN in terms of training time, though its overall performance still requires further improvement.
△ Less
Submitted 26 October, 2024;
originally announced October 2024.
-
Quantifying Context Bias in Domain Adaptation for Object Detection
Authors:
Hojun Son,
Arpan Kusari
Abstract:
Domain adaptation for object detection (DAOD) aims to transfer a trained model from a source to a target domain. Various DAOD methods exist, some of which minimize context bias between foreground-background associations in various domains. However, no prior work has studied context bias in DAOD by analyzing changes in background features during adaptation and how context bias is represented in dif…
▽ More
Domain adaptation for object detection (DAOD) aims to transfer a trained model from a source to a target domain. Various DAOD methods exist, some of which minimize context bias between foreground-background associations in various domains. However, no prior work has studied context bias in DAOD by analyzing changes in background features during adaptation and how context bias is represented in different domains. Our research experiment highlights the potential usability of context bias in DAOD. We address the problem by varying activation values over different layers of trained models and by masking the background, both of which impact the number and quality of detections. We then use one synthetic dataset from CARLA and two different versions of real open-source data, Cityscapes and Cityscapes foggy, as separate domains to represent and quantify context bias. We utilize different metrics such as Maximum Mean Discrepancy (MMD) and Maximum Variance Discrepancy (MVD) to find the layer-specific conditional probability estimates of foreground given manipulated background regions for separate domains. We demonstrate through detailed analysis that understanding of the context bias can affect DAOD approach and foc
△ Less
Submitted 22 September, 2024;
originally announced September 2024.
-
FLoD: Integrating Flexible Level of Detail into 3D Gaussian Splatting for Customizable Rendering
Authors:
Yunji Seo,
Young Sun Choi,
Hyun Seung Son,
Youngjung Uh
Abstract:
3D Gaussian Splatting (3DGS) achieves fast and high-quality renderings by using numerous small Gaussians, which leads to significant memory consumption. This reliance on a large number of Gaussians restricts the application of 3DGS-based models on low-cost devices due to memory limitations. However, simply reducing the number of Gaussians to accommodate devices with less memory capacity leads to i…
▽ More
3D Gaussian Splatting (3DGS) achieves fast and high-quality renderings by using numerous small Gaussians, which leads to significant memory consumption. This reliance on a large number of Gaussians restricts the application of 3DGS-based models on low-cost devices due to memory limitations. However, simply reducing the number of Gaussians to accommodate devices with less memory capacity leads to inferior quality compared to the quality that can be achieved on high-end hardware. To address this lack of scalability, we propose integrating a Flexible Level of Detail (FLoD) to 3DGS, to allow a scene to be rendered at varying levels of detail according to hardware capabilities. While existing 3DGSs with LoD focus on detailed reconstruction, our method provides reconstructions using a small number of Gaussians for reduced memory requirements, and a larger number of Gaussians for greater detail. Experiments demonstrate our various rendering options with tradeoffs between rendering quality and memory usage, thereby allowing real-time rendering across different memory constraints. Furthermore, we show that our method generalizes to different 3DGS frameworks, indicating its potential for integration into future state-of-the-art developments. Project page: https://3dgs-flod.github.io/flod.github.io/
△ Less
Submitted 23 August, 2024;
originally announced August 2024.
-
Semi-Supervised Learning for Anomaly Detection in Blockchain-based Supply Chains
Authors:
Do Hai Son,
Bui Duc Manh,
Tran Viet Khoa,
Nguyen Linh Trung,
Dinh Thai Hoang,
Hoang Trong Minh,
Yibeltal Alem,
Le Quang Minh
Abstract:
Blockchain-based supply chain (BSC) systems have tremendously been developed recently and can play an important role in our society in the future. In this study, we develop an anomaly detection model for BSC systems. Our proposed model can detect cyber-attacks at various levels, including the network layer, consensus layer, and beyond, by analyzing only the traffic data at the network layer. To do…
▽ More
Blockchain-based supply chain (BSC) systems have tremendously been developed recently and can play an important role in our society in the future. In this study, we develop an anomaly detection model for BSC systems. Our proposed model can detect cyber-attacks at various levels, including the network layer, consensus layer, and beyond, by analyzing only the traffic data at the network layer. To do this, we first build a BSC system at our laboratory to perform experiments and collect datasets. We then propose a novel semi-supervised DAE-MLP (Deep AutoEncoder-Multilayer Perceptron) that combines the advantages of supervised and unsupervised learning to detect anomalies in BSC systems. The experimental results demonstrate the effectiveness of our model for anomaly detection within BSCs, achieving a detection accuracy of 96.5%. Moreover, DAE-MLP can effectively detect new attacks by improving the F1-score up to 33.1% after updating the MLP component.
△ Less
Submitted 22 July, 2024;
originally announced July 2024.
-
Real-time Cyberattack Detection with Collaborative Learning for Blockchain Networks
Authors:
Tran Viet Khoa,
Do Hai Son,
Dinh Thai Hoang,
Nguyen Linh Trung,
Tran Thi Thuy Quynh,
Diep N. Nguyen,
Nguyen Viet Ha,
Eryk Dutkiewicz
Abstract:
With the ever-increasing popularity of blockchain applications, securing blockchain networks plays a critical role in these cyber systems. In this paper, we first study cyberattacks (e.g., flooding of transactions, brute pass) in blockchain networks and then propose an efficient collaborative cyberattack detection model to protect blockchain networks. Specifically, we deploy a blockchain network i…
▽ More
With the ever-increasing popularity of blockchain applications, securing blockchain networks plays a critical role in these cyber systems. In this paper, we first study cyberattacks (e.g., flooding of transactions, brute pass) in blockchain networks and then propose an efficient collaborative cyberattack detection model to protect blockchain networks. Specifically, we deploy a blockchain network in our laboratory to build a new dataset including both normal and attack traffic data. The main aim of this dataset is to generate actual attack data from different nodes in the blockchain network that can be used to train and test blockchain attack detection models. We then propose a real-time collaborative learning model that enables nodes in the network to share learning knowledge without disclosing their private data, thereby significantly enhancing system performance for the whole network. The extensive simulation and real-time experimental results show that our proposed detection model can detect attacks in the blockchain network with an accuracy of up to 97%.
△ Less
Submitted 4 July, 2024;
originally announced July 2024.
-
Locate&Edit: Energy-based Text Editing for Efficient, Flexible, and Faithful Controlled Text Generation
Authors:
Hye Ryung Son,
Jay-Yoon Lee
Abstract:
Recent approaches to controlled text generation (CTG) often involve manipulating the weights or logits of base language models (LMs) at decoding time. However, these methods are inapplicable to latest black-box LMs and ineffective at preserving the core semantics of the base LM's original generations. In this work, we propose Locate&Edit(L&E), an efficient and flexible energy-based approach to CTG…
▽ More
Recent approaches to controlled text generation (CTG) often involve manipulating the weights or logits of base language models (LMs) at decoding time. However, these methods are inapplicable to latest black-box LMs and ineffective at preserving the core semantics of the base LM's original generations. In this work, we propose Locate&Edit(L&E), an efficient and flexible energy-based approach to CTG, which edits text outputs from a base LM using off-the-shelf energy models. Given text outputs from the base LM, L&E first locates spans that are most relevant to constraints (e.g., toxicity) utilizing energy models, and then edits these spans by replacing them with more suitable alternatives. Importantly, our method is compatible with black-box LMs, as it requires only the text outputs. Also, since L&E doesn't mandate specific architecture for its component models, it can work with a diverse combination of available off-the-shelf models. Moreover, L&E preserves the base LM's original generations, by selectively modifying constraint-related aspects of the texts and leaving others unchanged. These targeted edits also ensure that L&E operates efficiently. Our experiments confirm that L&E achieves superior semantic preservation of the base LM generations and speed, while simultaneously obtaining competitive or improved constraint satisfaction. Furthermore, we analyze how the granularity of energy distribution impacts CTG performance and find that fine-grained, regression-based energy models improve constraint satisfaction, compared to conventional binary classifier energy models.
△ Less
Submitted 30 June, 2024;
originally announced July 2024.
-
W2E (Workout to Earn): A Low Cost DApp based on ERC-20 and ERC-721 standards
Authors:
Do Hai Son,
Nguyen Danh Hao,
Tran Thi Thuy Quynh,
Le Quang Minh
Abstract:
Decentralized applications (DApps) have gained prominence with the advent of blockchain technology, particularly Ethereum, providing trust, transparency, and traceability. However, challenges such as rising transaction costs and block confirmation delays hinder their widespread adoption. In this paper, we present our DApp named W2E - Workout to Earn, a mobile DApp incentivizing exercise through to…
▽ More
Decentralized applications (DApps) have gained prominence with the advent of blockchain technology, particularly Ethereum, providing trust, transparency, and traceability. However, challenges such as rising transaction costs and block confirmation delays hinder their widespread adoption. In this paper, we present our DApp named W2E - Workout to Earn, a mobile DApp incentivizing exercise through tokens and NFT awards. This application leverages the well-known ERC-20 and ERC-721 token standards of Ethereum. Additionally, we deploy W2E into various Ethereum-based networks, including Ethereum testnets, Layer 2 networks, and private networks, to survey gas efficiency and execution time. Our findings highlight the importance of network selection for DApp deployment, offering insights for developers and businesses seeking efficient blockchain solutions. This is because our experimental results are not only specific for W2E but also for other ERC-20 and ERC-721-based DApps.
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
Graph Neural Network Training Systems: A Performance Comparison of Full-Graph and Mini-Batch
Authors:
Saurabh Bajaj,
Hojae Son,
Juelin Liu,
Hui Guan,
Marco Serafini
Abstract:
Graph Neural Networks (GNNs) have gained significant attention in recent years due to their ability to learn representations of graph-structured data. Two common methods for training GNNs are mini-batch training and full-graph training. Since these two methods require different training pipelines and systems optimizations, two separate classes of GNN training systems emerged, each tailored for one…
▽ More
Graph Neural Networks (GNNs) have gained significant attention in recent years due to their ability to learn representations of graph-structured data. Two common methods for training GNNs are mini-batch training and full-graph training. Since these two methods require different training pipelines and systems optimizations, two separate classes of GNN training systems emerged, each tailored for one method. Works that introduce systems belonging to a particular category predominantly compare them with other systems within the same category, offering limited or no comparison with systems from the other category. Some prior work also justifies its focus on one specific training method by arguing that it achieves higher accuracy than the alternative. The literature, however, has incomplete and contradictory evidence in this regard.
In this paper, we provide a comprehensive empirical comparison of representative full-graph and mini-batch GNN training systems. We find that the mini-batch training systems consistently converge faster than the full-graph training ones across multiple datasets, GNN models, and system configurations. We also find that mini-batch training techniques converge to similar to or often higher accuracy values than full-graph training ones, showing that mini-batch sampling is not necessarily detrimental to accuracy. Our work highlights the importance of comparing systems across different classes, using time-to-accuracy rather than epoch time for performance comparison, and selecting appropriate hyperparameters for each training method separately.
△ Less
Submitted 20 December, 2024; v1 submitted 1 June, 2024;
originally announced June 2024.
-
Enhancing Social Media Post Popularity Prediction with Visual Content
Authors:
Dahyun Jeong,
Hyelim Son,
Yunjin Choi,
Keunwoo Kim
Abstract:
Our study presents a framework for predicting image-based social media content popularity that focuses on addressing complex image information and a hierarchical data structure. We utilize the Google Cloud Vision API to effectively extract key image and color information from users' postings, achieving 6.8% higher accuracy compared to using non-image covariates alone. For prediction, we explore a…
▽ More
Our study presents a framework for predicting image-based social media content popularity that focuses on addressing complex image information and a hierarchical data structure. We utilize the Google Cloud Vision API to effectively extract key image and color information from users' postings, achieving 6.8% higher accuracy compared to using non-image covariates alone. For prediction, we explore a wide range of prediction models, including Linear Mixed Model, Support Vector Regression, Multi-layer Perceptron, Random Forest, and XGBoost, with linear regression as the benchmark. Our comparative study demonstrates that models that are capable of capturing the underlying nonlinear interactions between covariates outperform other methods.
△ Less
Submitted 8 May, 2024; v1 submitted 3 May, 2024;
originally announced May 2024.
-
mmWave Wearable Antenna for Interaction with VR Devices
Authors:
Haksun Son,
Song Min Kim
Abstract:
The VR industry is one of the most promising industries for the near future, as it can provide a more immersive connection between people and the virtual world. Currently, VR devices interact with people using inconvenient controllers or cameras that perform poorly in dark environments. Interaction through millimeter-wave wearable devices has the potential to conveniently track human behavior rega…
▽ More
The VR industry is one of the most promising industries for the near future, as it can provide a more immersive connection between people and the virtual world. Currently, VR devices interact with people using inconvenient controllers or cameras that perform poorly in dark environments. Interaction through millimeter-wave wearable devices has the potential to conveniently track human behavior regardless of the lighting conditions. In this study, a millimeter-wave wearable antenna was developed, opening up the possibility for more immersive interaction with VR devices. The antenna features a low loss tangent polyester fabric to minimize dielectric losses and a smooth coating to reduce losses due to rough surfaces. The antenna operates in the 24GHz ISM band, with an S11 value of -29dB at 24.15GHz.
△ Less
Submitted 19 April, 2024;
originally announced April 2024.
-
HyperCLOVA X Technical Report
Authors:
Kang Min Yoo,
Jaegeun Han,
Sookyo In,
Heewon Jeon,
Jisu Jeong,
Jaewook Kang,
Hyunwook Kim,
Kyung-Min Kim,
Munhyong Kim,
Sungju Kim,
Donghyun Kwak,
Hanock Kwak,
Se Jung Kwon,
Bado Lee,
Dongsoo Lee,
Gichang Lee,
Jooho Lee,
Baeseong Park,
Seongjin Shin,
Joonsang Yu,
Seolki Baek,
Sumin Byeon,
Eungsup Cho,
Dooseok Choe,
Jeesung Han
, et al. (371 additional authors not shown)
Abstract:
We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment t…
▽ More
We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment to responsible AI. The model is evaluated across various benchmarks, including comprehensive reasoning, knowledge, commonsense, factuality, coding, math, chatting, instruction-following, and harmlessness, in both Korean and English. HyperCLOVA X exhibits strong reasoning capabilities in Korean backed by a deep understanding of the language and cultural nuances. Further analysis of the inherent bilingual nature and its extension to multilingualism highlights the model's cross-lingual proficiency and strong generalization ability to untargeted languages, including machine translation between several language pairs and cross-lingual inference tasks. We believe that HyperCLOVA X can provide helpful guidance for regions or countries in developing their sovereign LLMs.
△ Less
Submitted 13 April, 2024; v1 submitted 2 April, 2024;
originally announced April 2024.
-
Uoc luong kenh truyen trong he thong da robot su dung SDR
Authors:
Do Hai Son,
Nguyen Huu Hung,
Pham Duy Hung,
Tran Thi Thuy Quynh
Abstract:
This study focuses on developing an experimental system for estimating communication channels in a multi-robot mobile system using software-defined radio (SDR) devices. The system consists of two mobile robots programmed for two scenarios: one where the robot remains stationary and another where it follows a predefined trajectory. Communication within the system is conducted through orthogonal fre…
▽ More
This study focuses on developing an experimental system for estimating communication channels in a multi-robot mobile system using software-defined radio (SDR) devices. The system consists of two mobile robots programmed for two scenarios: one where the robot remains stationary and another where it follows a predefined trajectory. Communication within the system is conducted through orthogonal frequency-division multiplexing (OFDM) to mitigate the effects of multipath propagation in indoor environments. The system's performance is evaluated using the bit error rate (BER). Connections related to robot motion and communication are implemented using Raspberry Pi 3 and BladeRF x115, respectively. The least squares (LS) technique is employed to estimate the channel with a bit error rate of approximately 10^(-2).
△ Less
Submitted 19 March, 2024;
originally announced March 2024.
-
GenFlow: Generalizable Recurrent Flow for 6D Pose Refinement of Novel Objects
Authors:
Sungphill Moon,
Hyeontae Son,
Dongcheol Hur,
Sangwook Kim
Abstract:
Despite the progress of learning-based methods for 6D object pose estimation, the trade-off between accuracy and scalability for novel objects still exists. Specifically, previous methods for novel objects do not make good use of the target object's 3D shape information since they focus on generalization by processing the shape indirectly, making them less effective. We present GenFlow, an approac…
▽ More
Despite the progress of learning-based methods for 6D object pose estimation, the trade-off between accuracy and scalability for novel objects still exists. Specifically, previous methods for novel objects do not make good use of the target object's 3D shape information since they focus on generalization by processing the shape indirectly, making them less effective. We present GenFlow, an approach that enables both accuracy and generalization to novel objects with the guidance of the target object's shape. Our method predicts optical flow between the rendered image and the observed image and refines the 6D pose iteratively. It boosts the performance by a constraint of the 3D shape and the generalizable geometric knowledge learned from an end-to-end differentiable system. We further improve our model by designing a cascade network architecture to exploit the multi-scale correlations and coarse-to-fine refinement. GenFlow ranked first on the unseen object pose estimation benchmarks in both the RGB and RGB-D cases. It also achieves performance competitive with existing state-of-the-art methods for the seen object pose estimation without any fine-tuning.
△ Less
Submitted 18 March, 2024;
originally announced March 2024.
-
RANDAO-based RNG: Last Revealer Attacks in Ethereum 2.0 Randomness and a Potential Solution
Authors:
Do Hai Son,
Tran Thi Thuy Quynh,
Le Quang Minh
Abstract:
Ethereum 2.0 is a major upgrade to improve its scalability, throughput, and security. In this version, RANDAO is the scheme to randomly select the users who propose, confirm blocks, and get rewards. However, a vulnerability, referred to as the `Last Revealer Attack' (LRA), compromises the randomness of this scheme by introducing bias to the Random Number Generator (RNG) process. This vulnerability…
▽ More
Ethereum 2.0 is a major upgrade to improve its scalability, throughput, and security. In this version, RANDAO is the scheme to randomly select the users who propose, confirm blocks, and get rewards. However, a vulnerability, referred to as the `Last Revealer Attack' (LRA), compromises the randomness of this scheme by introducing bias to the Random Number Generator (RNG) process. This vulnerability is first clarified again in this study. After that, we propose a Shamir's Secret Sharing (SSS)-based RANDAO scheme to mitigate the LRA. Through our analysis, the proposed method can prevent the LRA under favorable network conditions.
△ Less
Submitted 14 March, 2024;
originally announced March 2024.
-
Towards Comprehensive Vietnamese Retrieval-Augmented Generation and Large Language Models
Authors:
Nguyen Quang Duc,
Le Hai Son,
Nguyen Duc Nhan,
Nguyen Dich Nhat Minh,
Le Thanh Huong,
Dinh Viet Sang
Abstract:
This paper presents our contributions towards advancing the state of Vietnamese language understanding and generation through the development and dissemination of open datasets and pre-trained models for Vietnamese Retrieval-Augmented Generation (RAG) and Large Language Models (LLMs).
This paper presents our contributions towards advancing the state of Vietnamese language understanding and generation through the development and dissemination of open datasets and pre-trained models for Vietnamese Retrieval-Augmented Generation (RAG) and Large Language Models (LLMs).
△ Less
Submitted 5 March, 2024; v1 submitted 3 March, 2024;
originally announced March 2024.
-
FedUV: Uniformity and Variance for Heterogeneous Federated Learning
Authors:
Ha Min Son,
Moon-Hyun Kim,
Tai-Myoung Chung,
Chao Huang,
Xin Liu
Abstract:
Federated learning is a promising framework to train neural networks with widely distributed data. However, performance degrades heavily with heterogeneously distributed data. Recent work has shown this is due to the final layer of the network being most prone to local bias, some finding success freezing the final layer as an orthogonal classifier. We investigate the training dynamics of the class…
▽ More
Federated learning is a promising framework to train neural networks with widely distributed data. However, performance degrades heavily with heterogeneously distributed data. Recent work has shown this is due to the final layer of the network being most prone to local bias, some finding success freezing the final layer as an orthogonal classifier. We investigate the training dynamics of the classifier by applying SVD to the weights motivated by the observation that freezing weights results in constant singular values. We find that there are differences when training in IID and non-IID settings. Based on this finding, we introduce two regularization terms for local training to continuously emulate IID settings: (1) variance in the dimension-wise probability distribution of the classifier and (2) hyperspherical uniformity of representations of the encoder. These regularizations promote local models to act as if it were in an IID setting regardless of the local data distribution, thus offsetting proneness to bias while being flexible to the data. On extensive experiments in both label-shift and feature-shift settings, we verify that our method achieves highest performance by a large margin especially in highly non-IID cases in addition to being scalable to larger models and datasets.
△ Less
Submitted 1 March, 2024; v1 submitted 27 February, 2024;
originally announced February 2024.
-
A Causality-Aware Pattern Mining Scheme for Group Activity Recognition in a Pervasive Sensor Space
Authors:
Hyunju Kim,
Heesuk Son,
Dongman Lee
Abstract:
Human activity recognition (HAR) is a key challenge in pervasive computing and its solutions have been presented based on various disciplines. Specifically, for HAR in a smart space without privacy and accessibility issues, data streams generated by deployed pervasive sensors are leveraged. In this paper, we focus on a group activity by which a group of users perform a collaborative task without u…
▽ More
Human activity recognition (HAR) is a key challenge in pervasive computing and its solutions have been presented based on various disciplines. Specifically, for HAR in a smart space without privacy and accessibility issues, data streams generated by deployed pervasive sensors are leveraged. In this paper, we focus on a group activity by which a group of users perform a collaborative task without user identification and propose an efficient group activity recognition scheme which extracts causality patterns from pervasive sensor event sequences generated by a group of users to support as good recognition accuracy as the state-of-the-art graphical model. To filter out irrelevant noise events from a given data stream, a set of rules is leveraged to highlight causally related events. Then, a pattern-tree algorithm extracts frequent causal patterns by means of a growing tree structure. Based on the extracted patterns, a weighted sum-based pattern matching algorithm computes the likelihoods of stored group activities to the given test event sequence by means of matched event pattern counts for group activity recognition. We evaluate the proposed scheme using the data collected from our testbed and CASAS datasets where users perform their tasks on a daily basis and validate its effectiveness in a real environment. Experiment results show that the proposed scheme performs higher recognition accuracy and with a small amount of runtime overhead than the existing schemes.
△ Less
Submitted 1 December, 2023;
originally announced December 2023.
-
Improving severity preservation of healthy-to-pathological voice conversion with global style tokens
Authors:
Bence Mark Halpern,
Wen-Chin Huang,
Lester Phillip Violeta,
R. J. J. H. van Son,
Tomoki Toda
Abstract:
In healthy-to-pathological voice conversion (H2P-VC), healthy speech is converted into pathological while preserving the identity. The paper improves on previous two-stage approach to H2P-VC where (1) speech is created first with the appropriate severity, (2) then the speaker identity of the voice is converted while preserving the severity of the voice. Specifically, we propose improvements to (2)…
▽ More
In healthy-to-pathological voice conversion (H2P-VC), healthy speech is converted into pathological while preserving the identity. The paper improves on previous two-stage approach to H2P-VC where (1) speech is created first with the appropriate severity, (2) then the speaker identity of the voice is converted while preserving the severity of the voice. Specifically, we propose improvements to (2) by using phonetic posteriorgrams (PPG) and global style tokens (GST). Furthermore, we present a new dataset that contains parallel recordings of pathological and healthy speakers with the same identity which allows more precise evaluation. Listening tests by expert listeners show that the framework preserves severity of the source sample, while modelling target speaker's voice. We also show that (a) pathology impacts x-vectors but not all speaker information is lost, (b) choosing source speakers based on severity labels alone is insufficient.
△ Less
Submitted 4 October, 2023;
originally announced October 2023.
-
XFedHunter: An Explainable Federated Learning Framework for Advanced Persistent Threat Detection in SDN
Authors:
Huynh Thai Thi,
Ngo Duc Hoang Son,
Phan The Duy,
Nghi Hoang Khoa,
Khoa Ngo-Khanh,
Van-Hau Pham
Abstract:
Advanced Persistent Threat (APT) attacks are highly sophisticated and employ a multitude of advanced methods and techniques to target organizations and steal sensitive and confidential information. APT attacks consist of multiple stages and have a defined strategy, utilizing new and innovative techniques and technologies developed by hackers to evade security software monitoring. To effectively pr…
▽ More
Advanced Persistent Threat (APT) attacks are highly sophisticated and employ a multitude of advanced methods and techniques to target organizations and steal sensitive and confidential information. APT attacks consist of multiple stages and have a defined strategy, utilizing new and innovative techniques and technologies developed by hackers to evade security software monitoring. To effectively protect against APTs, detecting and predicting APT indicators with an explanation from Machine Learning (ML) prediction is crucial to reveal the characteristics of attackers lurking in the network system. Meanwhile, Federated Learning (FL) has emerged as a promising approach for building intelligent applications without compromising privacy. This is particularly important in cybersecurity, where sensitive data and high-quality labeling play a critical role in constructing effective machine learning models for detecting cyber threats. Therefore, this work proposes XFedHunter, an explainable federated learning framework for APT detection in Software-Defined Networking (SDN) leveraging local cyber threat knowledge from many training collaborators. In XFedHunter, Graph Neural Network (GNN) and Deep Learning model are utilized to reveal the malicious events effectively in the large number of normal ones in the network system. The experimental results on NF-ToN-IoT and DARPA TCE3 datasets indicate that our framework can enhance the trust and accountability of ML-based systems utilized for cybersecurity purposes without privacy leakage.
△ Less
Submitted 15 September, 2023;
originally announced September 2023.
-
Collaborative Learning Framework to Detect Attacks in Transactions and Smart Contracts
Authors:
Tran Viet Khoa,
Do Hai Son,
Chi-Hieu Nguyen,
Dinh Thai Hoang,
Diep N. Nguyen,
Tran Thi Thuy Quynh,
Trong-Minh Hoang,
Nguyen Viet Ha,
Eryk Dutkiewicz,
Abu Alsheikh,
Nguyen Linh Trung
Abstract:
With the escalating prevalence of malicious activities exploiting vulnerabilities in blockchain systems, there is an urgent requirement for robust attack detection mechanisms. To address this challenge, this paper presents a novel collaborative learning framework designed to detect attacks in blockchain transactions and smart contracts by analyzing transaction features. Our framework exhibits the…
▽ More
With the escalating prevalence of malicious activities exploiting vulnerabilities in blockchain systems, there is an urgent requirement for robust attack detection mechanisms. To address this challenge, this paper presents a novel collaborative learning framework designed to detect attacks in blockchain transactions and smart contracts by analyzing transaction features. Our framework exhibits the capability to classify various types of blockchain attacks, including intricate attacks at the machine code level (e.g., injecting malicious codes to withdraw coins from users unlawfully), which typically necessitate significant time and security expertise to detect. To achieve that, the proposed framework incorporates a unique tool that transforms transaction features into visual representations, facilitating efficient analysis and classification of low-level machine codes. Furthermore, we propose an advanced collaborative learning model to enable real-time detection of diverse attack types at distributed mining nodes. Our model can efficiently detect attacks in smart contracts and transactions for blockchain systems without the need to gather all data from mining nodes into a centralized server. In order to evaluate the performance of our proposed framework, we deploy a pilot system based on a private Ethereum network and conduct multiple attack scenarios to generate a novel dataset. To the best of our knowledge, our dataset is the most comprehensive and diverse collection of transactions and smart contracts synthesized in a laboratory for cyberattack detection in blockchain systems. Our framework achieves a detection accuracy of approximately 94% through extensive simulations and 91% in real-time experiments with a throughput of over 2,150 transactions per second.
△ Less
Submitted 10 August, 2024; v1 submitted 30 August, 2023;
originally announced August 2023.
-
Impact Analysis of Antenna Array Geometry on Performance of Semi-blind Structured Channel Estimation for massive MIMO-OFDM systems
Authors:
Do Hai Son,
Tran Thi Thuy Quynh
Abstract:
Channel estimation is always implemented in communication systems to overcome the effect of interference and noise. Especially, in wireless communications, this task is more challenging to improve system performance while saving resources. This paper focuses on investigating the impact of geometries of antenna arrays on the performance of structured channel estimation in massive MIMO-OFDM systems.…
▽ More
Channel estimation is always implemented in communication systems to overcome the effect of interference and noise. Especially, in wireless communications, this task is more challenging to improve system performance while saving resources. This paper focuses on investigating the impact of geometries of antenna arrays on the performance of structured channel estimation in massive MIMO-OFDM systems. We use Cram'er Rao Bound to analyze errors in two methods, i.e., training-based and semi-blind-based channel estimations. The simulation results show that the latter gets significantly better performance than the former. Besides, the system with Uniform Cylindrical Array outperforms the traditional Uniform Linear Array one in both estimation methods.
△ Less
Submitted 16 May, 2023;
originally announced May 2023.
-
Object-Centric Multi-Task Learning for Human Instances
Authors:
Hyeongseok Son,
Sangil Jung,
Solae Lee,
Seongeun Kim,
Seung-In Park,
ByungIn Yoo
Abstract:
Human is one of the most essential classes in visual recognition tasks such as detection, segmentation, and pose estimation. Although much effort has been put into individual tasks, multi-task learning for these three tasks has been rarely studied. In this paper, we explore a compact multi-task network architecture that maximally shares the parameters of the multiple tasks via object-centric learn…
▽ More
Human is one of the most essential classes in visual recognition tasks such as detection, segmentation, and pose estimation. Although much effort has been put into individual tasks, multi-task learning for these three tasks has been rarely studied. In this paper, we explore a compact multi-task network architecture that maximally shares the parameters of the multiple tasks via object-centric learning. To this end, we propose a novel query design to encode the human instance information effectively, called human-centric query (HCQ). HCQ enables for the query to learn explicit and structural information of human as well such as keypoints. Besides, we utilize HCQ in prediction heads of the target tasks directly and also interweave HCQ with the deformable attention in Transformer decoders to exploit a well-learned object-centric representation. Experimental results show that the proposed multi-task network achieves comparable accuracy to state-of-the-art task-specific models in human detection, segmentation, and pose estimation task, while it consumes less computational costs.
△ Less
Submitted 12 March, 2023;
originally announced March 2023.
-
InFusionSurf: Refining Neural RGB-D Surface Reconstruction Using Per-Frame Intrinsic Refinement and TSDF Fusion Prior Learning
Authors:
Seunghwan Lee,
Gwanmo Park,
Hyewon Son,
Jiwon Ryu,
Han Joo Chae
Abstract:
We introduce InFusionSurf, an innovative enhancement for neural radiance field (NeRF) frameworks in 3D surface reconstruction using RGB-D video frames. Building upon previous methods that have employed feature encoding to improve optimization speed, we further improve the reconstruction quality with minimal impact on optimization time by refining depth information. InFusionSurf addresses camera mo…
▽ More
We introduce InFusionSurf, an innovative enhancement for neural radiance field (NeRF) frameworks in 3D surface reconstruction using RGB-D video frames. Building upon previous methods that have employed feature encoding to improve optimization speed, we further improve the reconstruction quality with minimal impact on optimization time by refining depth information. InFusionSurf addresses camera motion-induced blurs in each depth frame through a per-frame intrinsic refinement scheme. It incorporates the truncated signed distance field (TSDF) Fusion, a classical real-time 3D surface reconstruction method, as a pretraining tool for the feature grid, enhancing reconstruction details and training speed. Comparative quantitative and qualitative analyses show that InFusionSurf reconstructs scenes with high accuracy while maintaining optimization efficiency. The effectiveness of our intrinsic refinement and TSDF Fusion-based pretraining is further validated through an ablation study.
△ Less
Submitted 6 October, 2024; v1 submitted 8 March, 2023;
originally announced March 2023.
-
FedCC: Robust Federated Learning against Model Poisoning Attacks
Authors:
Hyejun Jeong,
Hamin Son,
Seohu Lee,
Jayun Hyun,
Tai-Myoung Chung
Abstract:
Federated learning is a distributed framework designed to address privacy concerns. However, it introduces new attack surfaces, which are especially prone when data is non-Independently and Identically Distributed. Existing approaches fail to effectively mitigate the malicious influence in this setting; previous approaches often tackle non-IID data and poisoning attacks separately. To address both…
▽ More
Federated learning is a distributed framework designed to address privacy concerns. However, it introduces new attack surfaces, which are especially prone when data is non-Independently and Identically Distributed. Existing approaches fail to effectively mitigate the malicious influence in this setting; previous approaches often tackle non-IID data and poisoning attacks separately. To address both challenges simultaneously, we present FedCC, a simple yet effective novel defense algorithm against model poisoning attacks. It leverages the Centered Kernel Alignment similarity of Penultimate Layer Representations for clustering, allowing the identification and filtration of malicious clients, even in non-IID data settings. The penultimate layer representations are meaningful since the later layers are more sensitive to local data distributions, which allows better detection of malicious clients. The sophisticated utilization of layer-wise Centered Kernel Alignment similarity allows attack mitigation while leveraging useful knowledge obtained. Our extensive experiments demonstrate the effectiveness of FedCC in mitigating both untargeted model poisoning and targeted backdoor attacks. Compared to existing outlier detection-based and first-order statistics-based methods, FedCC consistently reduces attack confidence to zero. Specifically, it significantly minimizes the average degradation of global performance by 65.5\%. We believe that this new perspective on aggregation makes it a valuable contribution to the field of FL model security and privacy. The code will be made available upon acceptance.
△ Less
Submitted 19 February, 2025; v1 submitted 4 December, 2022;
originally announced December 2022.
-
Preregistered protocol for: Articulatory changes in speech following treatment for oral or oropharyngeal cancer: a systematic review
Authors:
Thomas B. Tienkamp,
Teja Rebernik,
Defne Abur,
Rob J. J. H. van Son,
Sebastiaan A. H. J. de Visscher,
Max J. H. Witjes,
Martijn Wieling
Abstract:
This document outlines a PROSPERO pre-registered protocol for a systematic review regarding articulatory changes in speech following oral or orophayrngeal cancer treatment. Treatment of tumours in the oral cavity may result in physiological changes that could lead to articulatory difficulties. The tongue becomes less mobile due to scar tissue and/or potential (postoperative) radiation therapy. Mor…
▽ More
This document outlines a PROSPERO pre-registered protocol for a systematic review regarding articulatory changes in speech following oral or orophayrngeal cancer treatment. Treatment of tumours in the oral cavity may result in physiological changes that could lead to articulatory difficulties. The tongue becomes less mobile due to scar tissue and/or potential (postoperative) radiation therapy. Moreover, tissue loss may create a bypass for airflow or limit constriction possibilities. In order to gain a better understanding of the nature of the speech problems, information regarding the movement of the articulators is needed since perceptual or acoustic information provide only indirect evidence of articulatory changes. Therefore, this systematic review will review studies that directly measured the articulatory movements of the tongue, jaw, and lips following treatment for oral or oropharyngeal cancer.
△ Less
Submitted 14 September, 2022;
originally announced September 2022.
-
Jointly Learning Span Extraction and Sequence Labeling for Information Extraction from Business Documents
Authors:
Nguyen Hong Son,
Hieu M. Vu,
Tuan-Anh D. Nguyen,
Minh-Tien Nguyen
Abstract:
This paper introduces a new information extraction model for business documents. Different from prior studies which only base on span extraction or sequence labeling, the model takes into account advantage of both span extraction and sequence labeling. The combination allows the model to deal with long documents with sparse information (the small amount of extracted information). The model is trai…
▽ More
This paper introduces a new information extraction model for business documents. Different from prior studies which only base on span extraction or sequence labeling, the model takes into account advantage of both span extraction and sequence labeling. The combination allows the model to deal with long documents with sparse information (the small amount of extracted information). The model is trained end-to-end to jointly optimize the two tasks in a unified manner. Experimental results on four business datasets in English and Japanese show that the model achieves promising results and is significantly faster than the normal span-based extraction method. The code is also available.
△ Less
Submitted 26 May, 2022;
originally announced May 2022.
-
Real-Time Video Deblurring via Lightweight Motion Compensation
Authors:
Hyeongseok Son,
Junyong Lee,
Sunghyun Cho,
Seungyong Lee
Abstract:
While motion compensation greatly improves video deblurring quality, separately performing motion compensation and video deblurring demands huge computational overhead. This paper proposes a real-time video deblurring framework consisting of a lightweight multi-task unit that supports both video deblurring and motion compensation in an efficient way. The multi-task unit is specifically designed to…
▽ More
While motion compensation greatly improves video deblurring quality, separately performing motion compensation and video deblurring demands huge computational overhead. This paper proposes a real-time video deblurring framework consisting of a lightweight multi-task unit that supports both video deblurring and motion compensation in an efficient way. The multi-task unit is specifically designed to handle large portions of the two tasks using a single shared network, and consists of a multi-task detail network and simple networks for deblurring and motion compensation. The multi-task unit minimizes the cost of incorporating motion compensation into video deblurring and enables real-time deblurring. Moreover, by stacking multiple multi-task units, our framework provides flexible control between the cost and deblurring quality. We experimentally validate the state-of-the-art deblurring quality of our approach, which runs at a much faster speed compared to previous methods, and show practical real-time performance (30.99dB@30fps measured in the DVD dataset).
△ Less
Submitted 13 September, 2022; v1 submitted 25 May, 2022;
originally announced May 2022.
-
Enhanced Physics-Informed Neural Networks with Augmented Lagrangian Relaxation Method (AL-PINNs)
Authors:
Hwijae Son,
Sung Woong Cho,
Hyung Ju Hwang
Abstract:
Physics-Informed Neural Networks (PINNs) have become a prominent application of deep learning in scientific computation, as they are powerful approximators of solutions to nonlinear partial differential equations (PDEs). There have been numerous attempts to facilitate the training process of PINNs by adjusting the weight of each component of the loss function, called adaptive loss-balancing algori…
▽ More
Physics-Informed Neural Networks (PINNs) have become a prominent application of deep learning in scientific computation, as they are powerful approximators of solutions to nonlinear partial differential equations (PDEs). There have been numerous attempts to facilitate the training process of PINNs by adjusting the weight of each component of the loss function, called adaptive loss-balancing algorithms. In this paper, we propose an Augmented Lagrangian relaxation method for PINNs (AL-PINNs). We treat the initial and boundary conditions as constraints for the optimization problem of the PDE residual. By employing Augmented Lagrangian relaxation, the constrained optimization problem becomes a sequential max-min problem so that the learnable parameters $λ$ adaptively balance each loss component. Our theoretical analysis reveals that the sequence of minimizers of the proposed loss functions converges to an actual solution for the Helmholtz, viscous Burgers, and Klein--Gordon equations. We demonstrate through various numerical experiments that AL-PINNs yield a much smaller relative error compared with that of state-of-the-art adaptive loss-balancing algorithms.
△ Less
Submitted 30 May, 2023; v1 submitted 29 April, 2022;
originally announced May 2022.
-
Enhance Incomplete Utterance Restoration by Joint Learning Token Extraction and Text Generation
Authors:
Shumpei Inoue,
Tsungwei Liu,
Nguyen Hong Son,
Minh-Tien Nguyen
Abstract:
This paper introduces a model for incomplete utterance restoration (IUR) called JET (\textbf{J}oint learning token \textbf{E}xtraction and \textbf{T}ext generation). Different from prior studies that only work on extraction or abstraction datasets, we design a simple but effective model, working for both scenarios of IUR. Our design simulates the nature of IUR, where omitted tokens from the contex…
▽ More
This paper introduces a model for incomplete utterance restoration (IUR) called JET (\textbf{J}oint learning token \textbf{E}xtraction and \textbf{T}ext generation). Different from prior studies that only work on extraction or abstraction datasets, we design a simple but effective model, working for both scenarios of IUR. Our design simulates the nature of IUR, where omitted tokens from the context contribute to restoration. From this, we construct a Picker that identifies the omitted tokens. To support the picker, we design two label creation methods (soft and hard labels), which can work in cases of no annotation data for the omitted tokens. The restoration is done by using a Generator with the help of the Picker on joint learning. Promising results on four benchmark datasets in extraction and abstraction scenarios show that our model is better than the pretrained T5 and non-generative language model methods in both rich and limited training data settings.\footnote{The code is available at \url{https://github.com/shumpei19/JET}}
△ Less
Submitted 28 July, 2022; v1 submitted 8 April, 2022;
originally announced April 2022.
-
Teaching for large-scale Reproducibility Verification
Authors:
Lars Vilhuber,
Hyuk Harry Son,
Meredith Welch,
David N. Wasser,
Michael Darisse
Abstract:
We describe a unique environment in which undergraduate students from various STEM and social science disciplines are trained in data provenance and reproducible methods, and then apply that knowledge to real, conditionally accepted manuscripts and associated replication packages. We describe in detail the recruitment, training, and regular activities. While the activity is not part of a regular c…
▽ More
We describe a unique environment in which undergraduate students from various STEM and social science disciplines are trained in data provenance and reproducible methods, and then apply that knowledge to real, conditionally accepted manuscripts and associated replication packages. We describe in detail the recruitment, training, and regular activities. While the activity is not part of a regular curriculum, the skills and knowledge taught through explicit training of reproducible methods and principles, and reinforced through repeated application in a real-life workflow, contribute to the education of these undergraduate students, and prepare them for post-graduation jobs and further studies.
△ Less
Submitted 31 March, 2022;
originally announced April 2022.
-
An Effective Framework of Private Ethereum Blockchain Networks for Smart Grid
Authors:
Do Hai Son,
Tran Thi Thuy Quynh,
Tran Viet Khoa,
Dinh Thai Hoang,
Nguyen Linh Trung,
Nguyen Viet Ha,
Dusit Niyato,
Nguyen N. Diep,
Eryk Dutkiewicz
Abstract:
A smart grid is an important application in Industry 4.0 with a lot of new technologies and equipment working together. Hence, sensitive data stored in the smart grid is vulnerable to malicious modification and theft. This paper proposes a framework to build a smart grid based on a highly effective private Ethereum network. Our framework provides a real smart grid that includes modern hardware and…
▽ More
A smart grid is an important application in Industry 4.0 with a lot of new technologies and equipment working together. Hence, sensitive data stored in the smart grid is vulnerable to malicious modification and theft. This paper proposes a framework to build a smart grid based on a highly effective private Ethereum network. Our framework provides a real smart grid that includes modern hardware and a smart contract to secure data in the blockchain network. To obtain high throughput but a low uncle rate, the difficulty calculation method used in the mining process of the Ethereum consensus mechanism is modified to adapt to the practical smart grid setup. The performance in terms of throughput and latency are evaluated by simulation and verified by the real smart grid setup. The enhanced private Ethereum-based smart grid has significantly better performance than the public one. Moreover, this framework can be applied to any system used to store data in the Ethereum network.
△ Less
Submitted 28 March, 2022;
originally announced March 2022.
-
Collaborative Learning for Cyberattack Detection in Blockchain Networks
Authors:
Tran Viet Khoa,
Do Hai Son,
Dinh Thai Hoang,
Nguyen Linh Trung,
Tran Thi Thuy Quynh,
Diep N. Nguyen,
Nguyen Viet Ha,
Eryk Dutkiewicz
Abstract:
This article aims to study intrusion attacks and then develop a novel cyberattack detection framework to detect cyberattacks at the network layer (e.g., Brute Password and Flooding of Transactions) of blockchain networks. Specifically, we first design and implement a blockchain network in our laboratory. This blockchain network will serve two purposes, i.e., to generate the real traffic data (incl…
▽ More
This article aims to study intrusion attacks and then develop a novel cyberattack detection framework to detect cyberattacks at the network layer (e.g., Brute Password and Flooding of Transactions) of blockchain networks. Specifically, we first design and implement a blockchain network in our laboratory. This blockchain network will serve two purposes, i.e., to generate the real traffic data (including both normal data and attack data) for our learning models and to implement real-time experiments to evaluate the performance of our proposed intrusion detection framework. To the best of our knowledge, this is the first dataset that is synthesized in a laboratory for cyberattacks in a blockchain network. We then propose a novel collaborative learning model that allows efficient deployment in the blockchain network to detect attacks. The main idea of the proposed learning model is to enable blockchain nodes to actively collect data, learn the knowledge from data using the Deep Belief Network, and then share the knowledge learned from its data with other blockchain nodes in the network. In this way, we can not only leverage the knowledge from all the nodes in the network but also do not need to gather all raw data for training at a centralized node like conventional centralized learning solutions. Such a framework can also avoid the risk of exposing local data's privacy as well as excessive network overhead/congestion. Both intensive simulations and real-time experiments clearly show that our proposed intrusion detection framework can achieve an accuracy of up to 98.6% in detecting attacks.
△ Less
Submitted 6 May, 2024; v1 submitted 21 March, 2022;
originally announced March 2022.
-
Generating 3D Bio-Printable Patches Using Wound Segmentation and Reconstruction to Treat Diabetic Foot Ulcers
Authors:
Han Joo Chae,
Seunghwan Lee,
Hyewon Son,
Seungyeob Han,
Taebin Lim
Abstract:
We introduce AiD Regen, a novel system that generates 3D wound models combining 2D semantic segmentation with 3D reconstruction so that they can be printed via 3D bio-printers during the surgery to treat diabetic foot ulcers (DFUs). AiD Regen seamlessly binds the full pipeline, which includes RGB-D image capturing, semantic segmentation, boundary-guided point-cloud processing, 3D model reconstruct…
▽ More
We introduce AiD Regen, a novel system that generates 3D wound models combining 2D semantic segmentation with 3D reconstruction so that they can be printed via 3D bio-printers during the surgery to treat diabetic foot ulcers (DFUs). AiD Regen seamlessly binds the full pipeline, which includes RGB-D image capturing, semantic segmentation, boundary-guided point-cloud processing, 3D model reconstruction, and 3D printable G-code generation, into a single system that can be used out of the box. We developed a multi-stage data preprocessing method to handle small and unbalanced DFU image datasets. AiD Regen's human-in-the-loop machine learning interface enables clinicians to not only create 3D regenerative patches with just a few touch interactions but also customize and confirm wound boundaries. As evidenced by our experiments, our model outperforms prior wound segmentation models and our reconstruction algorithm is capable of generating 3D wound models with compelling accuracy. We further conducted a case study on a real DFU patient and demonstrated the effectiveness of AiD Regen in treating DFU wounds.
△ Less
Submitted 7 March, 2022;
originally announced March 2022.
-
Exploring Discontinuity for Video Frame Interpolation
Authors:
Sangjin Lee,
Hyeongmin Lee,
Chajin Shin,
Hanbin Son,
Sangyoun Lee
Abstract:
Video frame interpolation (VFI) is the task that synthesizes the intermediate frame given two consecutive frames. Most of the previous studies have focused on appropriate frame warping operations and refinement modules for the warped frames. These studies have been conducted on natural videos containing only continuous motions. However, many practical videos contain various unnatural objects with…
▽ More
Video frame interpolation (VFI) is the task that synthesizes the intermediate frame given two consecutive frames. Most of the previous studies have focused on appropriate frame warping operations and refinement modules for the warped frames. These studies have been conducted on natural videos containing only continuous motions. However, many practical videos contain various unnatural objects with discontinuous motions such as logos, user interfaces and subtitles. We propose three techniques to make the existing deep learning-based VFI architectures robust to these elements. First is a novel data augmentation strategy called figure-text mixing (FTM) which can make the models learn discontinuous motions during training stage without any extra dataset. Second, we propose a simple but effective module that predicts a map called discontinuity map (D-map), which densely distinguishes between areas of continuous and discontinuous motions. Lastly, we propose loss functions to give supervisions of the discontinuous motion areas which can be applied along with FTM and D-map. We additionally collect a special test benchmark called Graphical Discontinuous Motion (GDM) dataset consisting of some mobile games and chatting videos. Applied to the various state-of-the-art VFI networks, our method significantly improves the interpolation qualities on the videos from not only GDM dataset, but also the existing benchmarks containing only continuous motions such as Vimeo90K, UCF101, and DAVIS.
△ Less
Submitted 23 March, 2023; v1 submitted 15 February, 2022;
originally announced February 2022.
-
Compare Where It Matters: Using Layer-Wise Regularization To Improve Federated Learning on Heterogeneous Data
Authors:
Ha Min Son,
Moon Hyun Kim,
Tai-Myoung Chung
Abstract:
Federated Learning is a widely adopted method to train neural networks over distributed data. One main limitation is the performance degradation that occurs when data is heterogeneously distributed. While many works have attempted to address this problem, these methods under-perform because they are founded on a limited understanding of neural networks. In this work, we verify that only certain im…
▽ More
Federated Learning is a widely adopted method to train neural networks over distributed data. One main limitation is the performance degradation that occurs when data is heterogeneously distributed. While many works have attempted to address this problem, these methods under-perform because they are founded on a limited understanding of neural networks. In this work, we verify that only certain important layers in a neural network require regularization for effective training. We additionally verify that Centered Kernel Alignment (CKA) most accurately calculates similarity between layers of neural networks trained on different data. By applying CKA-based regularization to important layers during training, we significantly improve performance in heterogeneous settings. We present FedCKA: a simple framework that out-performs previous state-of-the-art methods on various deep learning tasks while also improving efficiency and scalability.
△ Less
Submitted 1 December, 2021;
originally announced December 2021.
-
Iterative Filter Adaptive Network for Single Image Defocus Deblurring
Authors:
Junyong Lee,
Hyeongseok Son,
Jaesung Rim,
Sunghyun Cho,
Seungyong Lee
Abstract:
We propose a novel end-to-end learning-based approach for single image defocus deblurring. The proposed approach is equipped with a novel Iterative Filter Adaptive Network (IFAN) that is specifically designed to handle spatially-varying and large defocus blur. For adaptively handling spatially-varying blur, IFAN predicts pixel-wise deblurring filters, which are applied to defocused features of an…
▽ More
We propose a novel end-to-end learning-based approach for single image defocus deblurring. The proposed approach is equipped with a novel Iterative Filter Adaptive Network (IFAN) that is specifically designed to handle spatially-varying and large defocus blur. For adaptively handling spatially-varying blur, IFAN predicts pixel-wise deblurring filters, which are applied to defocused features of an input image to generate deblurred features. For effectively managing large blur, IFAN models deblurring filters as stacks of small-sized separable filters. Predicted separable deblurring filters are applied to defocused features using a novel Iterative Adaptive Convolution (IAC) layer. We also propose a training scheme based on defocus disparity estimation and reblurring, which significantly boosts the deblurring quality. We demonstrate that our method achieves state-of-the-art performance both quantitatively and qualitatively on real-world images.
△ Less
Submitted 28 March, 2022; v1 submitted 31 August, 2021;
originally announced August 2021.
-
Recurrent Video Deblurring with Blur-Invariant Motion Estimation and Pixel Volumes
Authors:
Hyeongseok Son,
Junyong Lee,
Jonghyeop Lee,
Sunghyun Cho,
Seungyong Lee
Abstract:
For the success of video deblurring, it is essential to utilize information from neighboring frames. Most state-of-the-art video deblurring methods adopt motion compensation between video frames to aggregate information from multiple frames that can help deblur a target frame. However, the motion compensation methods adopted by previous deblurring methods are not blur-invariant, and consequently,…
▽ More
For the success of video deblurring, it is essential to utilize information from neighboring frames. Most state-of-the-art video deblurring methods adopt motion compensation between video frames to aggregate information from multiple frames that can help deblur a target frame. However, the motion compensation methods adopted by previous deblurring methods are not blur-invariant, and consequently, their accuracy is limited for blurry frames with different blur amounts. To alleviate this problem, we propose two novel approaches to deblur videos by effectively aggregating information from multiple video frames. First, we present blur-invariant motion estimation learning to improve motion estimation accuracy between blurry frames. Second, for motion compensation, instead of aligning frames by warping with estimated motions, we use a pixel volume that contains candidate sharp pixels to resolve motion estimation errors. We combine these two processes to propose an effective recurrent video deblurring network that fully exploits deblurred previous frames. Experiments show that our method achieves the state-of-the-art performance both quantitatively and qualitatively compared to recent methods that use deep learning.
△ Less
Submitted 23 August, 2021;
originally announced August 2021.
-
Single Image Defocus Deblurring Using Kernel-Sharing Parallel Atrous Convolutions
Authors:
Hyeongseok Son,
Junyong Lee,
Sunghyun Cho,
Seungyong Lee
Abstract:
This paper proposes a novel deep learning approach for single image defocus deblurring based on inverse kernels. In a defocused image, the blur shapes are similar among pixels although the blur sizes can spatially vary. To utilize the property with inverse kernels, we exploit the observation that when only the size of a defocus blur changes while keeping the shape, the shape of the corresponding i…
▽ More
This paper proposes a novel deep learning approach for single image defocus deblurring based on inverse kernels. In a defocused image, the blur shapes are similar among pixels although the blur sizes can spatially vary. To utilize the property with inverse kernels, we exploit the observation that when only the size of a defocus blur changes while keeping the shape, the shape of the corresponding inverse kernel remains the same and only the scale changes. Based on the observation, we propose a kernel-sharing parallel atrous convolutional (KPAC) block specifically designed by incorporating the property of inverse kernels for single image defocus deblurring. To effectively simulate the invariant shapes of inverse kernels with different scales, KPAC shares the same convolutional weights among multiple atrous convolution layers. To efficiently simulate the varying scales of inverse kernels, KPAC consists of only a few atrous convolution layers with different dilations and learns per-pixel scale attentions to aggregate the outputs of the layers. KPAC also utilizes the shape attention to combine the outputs of multiple convolution filters in each atrous convolution layer, to deal with defocus blur with a slightly varying shape. We demonstrate that our approach achieves state-of-the-art performance with a much smaller number of parameters than previous methods.
△ Less
Submitted 20 August, 2021;
originally announced August 2021.
-
Personalized Federated Learning with Clustering: Non-IID Heart Rate Variability Data Application
Authors:
Joo Hun Yoo,
Ha Min Son,
Hyejun Jeong,
Eun-Hye Jang,
Ah Young Kim,
Han Young Yu,
Hong Jin Jeon,
Tai-Myoung Chung
Abstract:
While machine learning techniques are being applied to various fields for their exceptional ability to find complex relations in large datasets, the strengthening of regulations on data ownership and privacy is causing increasing difficulty in its application to medical data. In light of this, Federated Learning has recently been proposed as a solution to train on private data without breach of co…
▽ More
While machine learning techniques are being applied to various fields for their exceptional ability to find complex relations in large datasets, the strengthening of regulations on data ownership and privacy is causing increasing difficulty in its application to medical data. In light of this, Federated Learning has recently been proposed as a solution to train on private data without breach of confidentiality. This conservation of privacy is particularly appealing in the field of healthcare, where patient data is highly confidential. However, many studies have shown that its assumption of Independent and Identically Distributed data is unrealistic for medical data. In this paper, we propose Personalized Federated Cluster Models, a hierarchical clustering-based FL process, to predict Major Depressive Disorder severity from Heart Rate Variability. By allowing clients to receive more personalized model, we address problems caused by non-IID data, showing an accuracy increase in severity prediction. This increase in performance may be sufficient to use Personalized Federated Cluster Models in many existing Federated Learning scenarios.
△ Less
Submitted 10 August, 2021; v1 submitted 4 August, 2021;
originally announced August 2021.
-
Lagrangian dual framework for conservative neural network solutions of kinetic equations
Authors:
Hyung Ju Hwang,
Hwijae Son
Abstract:
In this paper, we propose a novel conservative formulation for solving kinetic equations via neural networks. More precisely, we formulate the learning problem as a constrained optimization problem with constraints that represent the physical conservation laws. The constraints are relaxed toward the residual loss function by the Lagrangian duality. By imposing physical conservation properties of t…
▽ More
In this paper, we propose a novel conservative formulation for solving kinetic equations via neural networks. More precisely, we formulate the learning problem as a constrained optimization problem with constraints that represent the physical conservation laws. The constraints are relaxed toward the residual loss function by the Lagrangian duality. By imposing physical conservation properties of the solution as constraints of the learning problem, we demonstrate far more accurate approximations of the solutions in terms of errors and the conservation laws, for the kinetic Fokker-Planck equation and the homogeneous Boltzmann equation.
△ Less
Submitted 23 June, 2021;
originally announced June 2021.