Search | arXiv e-print repository

Leveraging Depth Maps and Attention Mechanisms for Enhanced Image Inpainting

Authors: Jin Hyun Park, Harine Choi, Praewa Pitiphat

Abstract: Existing deep learning-based image inpainting methods typically rely on convolutional networks with RGB images to reconstruct images. However, relying exclusively on RGB images may neglect important depth information, which plays a critical role in understanding the spatial and structural context of a scene. Just as human vision leverages stereo cues to perceive depth, incorporating depth maps int… ▽ More Existing deep learning-based image inpainting methods typically rely on convolutional networks with RGB images to reconstruct images. However, relying exclusively on RGB images may neglect important depth information, which plays a critical role in understanding the spatial and structural context of a scene. Just as human vision leverages stereo cues to perceive depth, incorporating depth maps into the inpainting process can enhance the model's ability to reconstruct images with greater accuracy and contextual awareness. In this paper, we propose a novel approach that incorporates both RGB and depth images for enhanced image inpainting. Our models employ a dual encoder architecture, where one encoder processes the RGB image and the other handles the depth image. The encoded features from both encoders are then fused in the decoder using an attention mechanism, effectively integrating the RGB and depth representations. We use two different masking strategies, line and square, to test the robustness of the model under different types of occlusions. To further analyze the effectiveness of our approach, we use Gradient-weighted Class Activation Mapping (Grad-CAM) visualizations to examine the regions of interest the model focuses on during inpainting. We show that incorporating depth information alongside the RGB image significantly improves the reconstruction quality. Through both qualitative and quantitative comparisons, we demonstrate that the depth-integrated model outperforms the baseline, with attention mechanisms further enhancing inpainting performance, as evidenced by multiple evaluation metrics and visualization. △ Less

Submitted 8 May, 2025; v1 submitted 29 April, 2025; originally announced May 2025.

arXiv:2504.11622 [pdf, other]

Making Acoustic Side-Channel Attacks on Noisy Keyboards Viable with LLM-Assisted Spectrograms' "Typo" Correction

Authors: Seyyed Ali Ayati, Jin Hyun Park, Yichen Cai, Marcus Botacin

Abstract: The large integration of microphones into devices increases the opportunities for Acoustic Side-Channel Attacks (ASCAs), as these can be used to capture keystrokes' audio signals that might reveal sensitive information. However, the current State-Of-The-Art (SOTA) models for ASCAs, including Convolutional Neural Networks (CNNs) and hybrid models, such as CoAtNet, still exhibit limited robustness u… ▽ More The large integration of microphones into devices increases the opportunities for Acoustic Side-Channel Attacks (ASCAs), as these can be used to capture keystrokes' audio signals that might reveal sensitive information. However, the current State-Of-The-Art (SOTA) models for ASCAs, including Convolutional Neural Networks (CNNs) and hybrid models, such as CoAtNet, still exhibit limited robustness under realistic noisy conditions. Solving this problem requires either: (i) an increased model's capacity to infer contextual information from longer sequences, allowing the model to learn that an initially noisily typed word is the same as a futurely collected non-noisy word, or (ii) an approach to fix misidentified information from the contexts, as one does not type random words, but the ones that best fit the conversation context. In this paper, we demonstrate that both strategies are viable and complementary solutions for making ASCAs practical. We observed that no existing solution leverages advanced transformer architectures' power for these tasks and propose that: (i) Visual Transformers (VTs) are the candidate solutions for capturing long-term contextual information and (ii) transformer-powered Large Language Models (LLMs) are the candidate solutions to fix the ``typos'' (mispredictions) the model might make. Thus, we here present the first-of-its-kind approach that integrates VTs and LLMs for ASCAs. We first show that VTs achieve SOTA performance in classifying keystrokes when compared to the previous CNN benchmark. Second, we demonstrate that LLMs can mitigate the impact of real-world noise. Evaluations on the natural sentences revealed that: (i) incorporating LLMs (e.g., GPT-4o) in our ASCA pipeline boosts the performance of error-correction tasks; and (ii) the comparable performance can be attained by a lightweight, fine-tuned smaller LLM (67 times smaller than GPT-4o), using... △ Less

Submitted 15 April, 2025; originally announced April 2025.

Comments: Length: 13 pages Figures: 5 figures Tables: 7 tables Keywords: Acoustic side-channel attacks, machine learning, Visual Transformers, Large Language Models (LLMs), security Conference: Accepted at the 19th USENIX WOOT Conference on Offensive Technologies (WOOT '25). Licensing: This paper is submitted under the CC BY Creative Commons Attribution license. arXiv admin note: text overlap with arXiv:2502.09782

arXiv:2502.09782

Improving Acoustic Side-Channel Attacks on Keyboards Using Transformers and Large Language Models

Authors: Jin Hyun Park, Seyyed Ali Ayati, Yichen Cai

Abstract: The increasing prevalence of microphones in everyday devices and the growing reliance on online services have amplified the risk of acoustic side-channel attacks (ASCAs) targeting keyboards. This study explores deep learning techniques, specifically vision transformers (VTs) and large language models (LLMs), to enhance the effectiveness and applicability of such attacks. We present substantial imp… ▽ More The increasing prevalence of microphones in everyday devices and the growing reliance on online services have amplified the risk of acoustic side-channel attacks (ASCAs) targeting keyboards. This study explores deep learning techniques, specifically vision transformers (VTs) and large language models (LLMs), to enhance the effectiveness and applicability of such attacks. We present substantial improvements over prior research, with the CoAtNet model achieving state-of-the-art performance. Our CoAtNet shows a 5.0% improvement for keystrokes recorded via smartphone (Phone) and 5.9% for those recorded via Zoom compared to previous benchmarks. We also evaluate transformer architectures and language models, with the best VT model matching CoAtNet's performance. A key advancement is the introduction of a noise mitigation method for real-world scenarios. By using LLMs for contextual understanding, we detect and correct erroneous keystrokes in noisy environments, enhancing ASCA performance. Additionally, fine-tuned lightweight language models with Low-Rank Adaptation (LoRA) deliver comparable performance to heavyweight models with 67X more parameters. This integration of VTs and LLMs improves the practical applicability of ASCA mitigation, marking the first use of these technologies to address ASCAs and error correction in real-world scenarios. △ Less

Submitted 18 February, 2025; v1 submitted 13 February, 2025; originally announced February 2025.

Comments: We would like to withdraw our paper due to a significant error in the experimental methodology, which impacts the validity of our results. The error specifically affects the analysis presented in Section 4, where an incorrect dataset preprocessing step led to misleading conclusions

arXiv:2411.15490 [pdf, other]

Improving Factuality of 3D Brain MRI Report Generation with Paired Image-domain Retrieval and Text-domain Augmentation

Authors: Junhyeok Lee, Yujin Oh, Dahyoun Lee, Hyon Keun Joh, Chul-Ho Sohn, Sung Hyun Baik, Cheol Kyu Jung, Jung Hyun Park, Kyu Sung Choi, Byung-Hoon Kim, Jong Chul Ye

Abstract: Acute ischemic stroke (AIS) requires time-critical management, with hours of delayed intervention leading to an irreversible disability of the patient. Since diffusion weighted imaging (DWI) using the magnetic resonance image (MRI) plays a crucial role in the detection of AIS, automated prediction of AIS from DWI has been a research topic of clinical importance. While text radiology reports contai… ▽ More Acute ischemic stroke (AIS) requires time-critical management, with hours of delayed intervention leading to an irreversible disability of the patient. Since diffusion weighted imaging (DWI) using the magnetic resonance image (MRI) plays a crucial role in the detection of AIS, automated prediction of AIS from DWI has been a research topic of clinical importance. While text radiology reports contain the most relevant clinical information from the image findings, the difficulty of mapping across different modalities has limited the factuality of conventional direct DWI-to-report generation methods. Here, we propose paired image-domain retrieval and text-domain augmentation (PIRTA), a cross-modal retrieval-augmented generation (RAG) framework for providing clinician-interpretative AIS radiology reports with improved factuality. PIRTA mitigates the need for learning cross-modal mapping, which poses difficulty in image-to-text generation, by casting the cross-modal mapping problem as an in-domain retrieval of similar DWI images that have paired ground-truth text radiology reports. By exploiting the retrieved radiology reports to augment the report generation process of the query image, we show by experiments with extensive in-house and public datasets that PIRTA can accurately retrieve relevant reports from 3D DWI images. This approach enables the generation of radiology reports with significantly higher accuracy compared to direct image-to-text generation using state-of-the-art multimodal language models. △ Less

Submitted 23 November, 2024; originally announced November 2024.

arXiv:2409.19536 [pdf, other]

Joint Trajectory Replanning for Mars Ascent Vehicle under Propulsion System Faults: A Suboptimal Learning-Based Warm-Start Approach

Authors: Kun Li, Guangtao Ran, Yanning Guo, Ju H. Park, Yao Zhang

Abstract: During the Mars ascent vehicle (MAV) launch missions, when encountering a thrust drop type of propulsion system fault problem, the general trajectory replanning methods relying on step-by-step judgments may fail to make timely decisions, potentially leading to mission failure. This paper proposes a suboptimal joint trajectory replanning (SJTR) method, which formulates the joint optimization proble… ▽ More During the Mars ascent vehicle (MAV) launch missions, when encountering a thrust drop type of propulsion system fault problem, the general trajectory replanning methods relying on step-by-step judgments may fail to make timely decisions, potentially leading to mission failure. This paper proposes a suboptimal joint trajectory replanning (SJTR) method, which formulates the joint optimization problem of target orbit and flight trajectory after a fault within a convex optimization framework. By incorporating penalty coefficients for terminal constraints, the optimization solution adheres to the orbit redecision principle, thereby avoiding complex decision-making processes and resulting in a concise and rapid solution to the replanning problem. A learning-based warm-start scheme is proposed in conjunction with the designed SJTR method. Offline, a deep neural network (DNN) is trained using a dataset generated by the SJTR method. Online, the DNN provides initial guesses for the time optimization variables based on the current fault situation, enhancing the solving efficiency and reliability of the algorithm. Numerical simulations of the MAV flight scenario under the thrust drop faults are performed, and Monte Carlo experiments and case studies across all orbit types demonstrate the effectiveness of the proposed method. △ Less

Submitted 28 September, 2024; originally announced September 2024.

arXiv:2311.13906 [pdf, other]

Threat-Based Resource Allocation Strategy for Target Tracking in a Cognitive Radar Network

Authors: JiYe Lee, J. H Park

Abstract: Cognitive radar is developed to utilize the feedback of its operating environment obtained from a beam to make resource allocation decisions by solving optimization problems. Previous works focused on target tracking accuracy by designing an evaluation metric for an optimization problem. However, in a real combat situation, not only the tracking performance of the target but also its operational p… ▽ More Cognitive radar is developed to utilize the feedback of its operating environment obtained from a beam to make resource allocation decisions by solving optimization problems. Previous works focused on target tracking accuracy by designing an evaluation metric for an optimization problem. However, in a real combat situation, not only the tracking performance of the target but also its operational perspective should be considered. In this study, the usage of threats in the allocation of radar resource is proposed for a cognitive radar framework. Resource allocation regarding radar dwell time is considered to reflect the operational importance of target effects. The dwell time allocation problem is solved using a Second-Order Cone Program (SOCP). Numerical simulations are performed to verify the effectiveness of the proposed framework. △ Less

Submitted 23 November, 2023; originally announced November 2023.

arXiv:2110.02509 [pdf, other]

Design and Implementation of 5.8GHz RF Wireless PowerTransfer System

Authors: Je Hyeon Park, Nguyen Minh Tran, Sa Il Hwang, Dong In Kim, Kae Won Choi

Abstract: In this paper, we present a 5.8 GHz radio-frequency (RF) wireless power transfer (WPT) system that consists of 64 transmit antennas and 16 receive antennas. Unlike the inductive or resonant coupling-based near-field WPT, RF WPT has a great advantage in powering low-power internet of things (IoT) devices with its capability of long-range wireless power transfer. We also propose a beam scanning algo… ▽ More In this paper, we present a 5.8 GHz radio-frequency (RF) wireless power transfer (WPT) system that consists of 64 transmit antennas and 16 receive antennas. Unlike the inductive or resonant coupling-based near-field WPT, RF WPT has a great advantage in powering low-power internet of things (IoT) devices with its capability of long-range wireless power transfer. We also propose a beam scanning algorithm that can effectively transfer the power no matter whether the receiver is located in the radiative near-field zone or far-field zone. The proposed beam scanning algorithm is verified with a real-life WPT testbed implemented by ourselves. By experiments, we confirm that the implemented 5.8 GHz RF WPT system is able to transfer 3.67 mW at a distance of 25 meters with the proposed beam scanning algorithm. Moreover, the results show that the proposed algorithm can effectively cover radiative near-field region differently from the conventional scanning schemes which are designed under the assumption of the far-field WPT. △ Less

Submitted 6 October, 2021; originally announced October 2021.

arXiv:2106.11805 [pdf, other]

doi 10.1109/JIOT.2022.3179691

Reconfigurable Intelligent Surface-Aided Wireless Power Transfer Systems: Analysis and Implementation

Authors: Nguyen Minh Tran, Muhammad Miftahul Amri, Je Hyeon Park, Dong In Kim, Kae Won Choi

Abstract: Reconfigurable intelligent surface (RIS) is a promising technology for RF wireless power transfer (WPT) as it is capable of beamforming and beam focusing without using active and power-hungry components. In this paper, we propose a multi-tile RIS beam scanning (MTBS) algorithm for powering up internet-of-things (IoT) devices. Considering the hardware limitations of the IoT devices, the proposed al… ▽ More Reconfigurable intelligent surface (RIS) is a promising technology for RF wireless power transfer (WPT) as it is capable of beamforming and beam focusing without using active and power-hungry components. In this paper, we propose a multi-tile RIS beam scanning (MTBS) algorithm for powering up internet-of-things (IoT) devices. Considering the hardware limitations of the IoT devices, the proposed algorithm requires only power information to enable the beam focusing capability of the RIS. Specifically, we first divide the RIS into smaller RIS tiles. Then, all RIS tiles and the phased array transmitter are iteratively scanned and optimized to maximize the receive power. We elaborately analyze the proposed algorithm and build a simulator to verify it. Furthermore, we have built a real-life testbed of RIS-aided WPT systems to validate the algorithm. The experimental results show that the proposed MTBS algorithm can properly control the transmission phase of the transmitter and the reflection phase of the RIS to focus the power at the receiver. Consequently, after executing the algorithm, about 20 dB improvement of the receive power is achieved compared to the case that all unit cells of the RIS are in OFF state. By experiments, we confirm that the RIS with the MTBS algorithm can greatly enhance the power transfer efficiency. △ Less

Submitted 13 March, 2022; v1 submitted 12 June, 2021; originally announced June 2021.

Comments: This work has been submitted to the IEEE for possible publication

arXiv:2006.13360 [pdf, other]

Evaluation of Sampling Methods for Robotic Sediment Sampling Systems

Authors: Jun Han Bae, Wonse Jo, Jee Hwan Park, Richard M. Voyles, Sara K. McMillan, Byung-Cheol Min

Abstract: Analysis of sediments from rivers, lakes, reservoirs, wetlands and other constructed surface water impoundments is an important tool to characterize the function and health of these systems, but is generally carried out manually. This is costly and can be hazardous and difficult for humans due to inaccessibility, contamination, or availability of required equipment. Robotic sampling systems can ea… ▽ More Analysis of sediments from rivers, lakes, reservoirs, wetlands and other constructed surface water impoundments is an important tool to characterize the function and health of these systems, but is generally carried out manually. This is costly and can be hazardous and difficult for humans due to inaccessibility, contamination, or availability of required equipment. Robotic sampling systems can ease these burdens, but little work has examined the efficiency of such sampling means and no prior work has investigated the quality of the resulting samples. This paper presents an experimental study that evaluates and optimizes sediment sampling patterns applied to a robot sediment sampling system that allows collection of minimally-disturbed sediment cores from natural and man-made water bodies for various sediment types. To meet this need, we developed and tested a robotic sampling platform in the laboratory to test functionality under a range of sediment types and operating conditions. Specifically, we focused on three patterns by which a cylindrical coring device was driven into the sediment (linear, helical, and zig-zag) for three sediment types (coarse sand, medium sand, and silt). The results show that the optimal sampling pattern varies depending on the type of sediment and can be optimized based on the sampling objective. We examined two sampling objectives: maximizing the mass of minimally disturbed sediment and minimizing the power per mass of sample. This study provides valuable data to aid in the selection of optimal sediment coring methods for various applications and builds a solid foundation for future field testing under a range of environmental conditions. △ Less

Submitted 23 June, 2020; originally announced June 2020.

Showing 1–9 of 9 results for author: Park, J H