Skip to main content

Showing 1–50 of 608 results for author: Kim, J

Searching in archive eess. Search in all archives.
.
  1. arXiv:2507.08128  [pdf, ps, other

    cs.SD cs.AI cs.CL eess.AS

    Audio Flamingo 3: Advancing Audio Intelligence with Fully Open Large Audio Language Models

    Authors: Arushi Goel, Sreyan Ghosh, Jaehyeon Kim, Sonal Kumar, Zhifeng Kong, Sang-gil Lee, Chao-Han Huck Yang, Ramani Duraiswami, Dinesh Manocha, Rafael Valle, Bryan Catanzaro

    Abstract: We present Audio Flamingo 3 (AF3), a fully open state-of-the-art (SOTA) large audio-language model that advances reasoning and understanding across speech, sound, and music. AF3 introduces: (i) AF-Whisper, a unified audio encoder trained using a novel strategy for joint representation learning across all 3 modalities of speech, sound, and music; (ii) flexible, on-demand thinking, allowing the mode… ▽ More

    Submitted 10 July, 2025; originally announced July 2025.

    Comments: Code, Datasets and Models: https://research.nvidia.com/labs/adlr/AF3/

  2. arXiv:2507.00832  [pdf

    eess.IV cs.AI cs.CV

    Automated anatomy-based post-processing reduces false positives and improved interpretability of deep learning intracranial aneurysm detection

    Authors: Jisoo Kim, Chu-Hsuan Lin, Alberto Ceballos-Arroyo, Ping Liu, Huaizu Jiang, Shrikanth Yadav, Qi Wan, Lei Qin, Geoffrey S Young

    Abstract: Introduction: Deep learning (DL) models can help detect intracranial aneurysms on CTA, but high false positive (FP) rates remain a barrier to clinical translation, despite improvement in model architectures and strategies like detection threshold tuning. We employed an automated, anatomy-based, heuristic-learning hybrid artery-vein segmentation post-processing method to further reduce FPs. Methods… ▽ More

    Submitted 1 July, 2025; originally announced July 2025.

  3. arXiv:2506.23102  [pdf, ps, other

    eess.IV cs.CV

    MedRegion-CT: Region-Focused Multimodal LLM for Comprehensive 3D CT Report Generation

    Authors: Sunggu Kyung, Jinyoung Seo, Hyunseok Lim, Dongyeong Kim, Hyungbin Park, Jimin Sung, Jihyun Kim, Wooyoung Jo, Yoojin Nam, Namkug Kim

    Abstract: The recent release of RadGenome-Chest CT has significantly advanced CT-based report generation. However, existing methods primarily focus on global features, making it challenging to capture region-specific details, which may cause certain abnormalities to go unnoticed. To address this, we propose MedRegion-CT, a region-focused Multi-Modal Large Language Model (MLLM) framework, featuring three key… ▽ More

    Submitted 29 June, 2025; originally announced June 2025.

    Comments: 14 pages, 5 figures, submitted to ICCV 2025

  4. arXiv:2506.21796  [pdf, ps, other

    eess.SP cs.AI

    Demonstrating Interoperable Channel State Feedback Compression with Machine Learning

    Authors: Dani Korpi, Rachel Wang, Jerry Wang, Abdelrahman Ibrahim, Carl Nuzman, Runxin Wang, Kursat Rasim Mestav, Dustin Zhang, Iraj Saniee, Shawn Winston, Gordana Pavlovic, Wei Ding, William J. Hillery, Chenxi Hao, Ram Thirunagari, Jung Chang, Jeehyun Kim, Bartek Kozicki, Dragan Samardzija, Taesang Yoo, Andreas Maeder, Tingfang Ji, Harish Viswanathan

    Abstract: Neural network-based compression and decompression of channel state feedback has been one of the most widely studied applications of machine learning (ML) in wireless networks. Various simulation-based studies have shown that ML-based feedback compression can result in reduced overhead and more accurate channel information. However, to the best of our knowledge, there are no real-life proofs of co… ▽ More

    Submitted 26 June, 2025; originally announced June 2025.

    Comments: This work has been submitted to the IEEE for possible publication

  5. arXiv:2506.16741  [pdf, ps, other

    eess.AS cs.AI

    RapFlow-TTS: Rapid and High-Fidelity Text-to-Speech with Improved Consistency Flow Matching

    Authors: Hyun Joon Park, Jeongmin Liu, Jin Sob Kim, Jeong Yeol Yang, Sung Won Han, Eunwoo Song

    Abstract: We introduce RapFlow-TTS, a rapid and high-fidelity TTS acoustic model that leverages velocity consistency constraints in flow matching (FM) training. Although ordinary differential equation (ODE)-based TTS generation achieves natural-quality speech, it typically requires a large number of generation steps, resulting in a trade-off between quality and inference speed. To address this challenge, Ra… ▽ More

    Submitted 20 June, 2025; originally announced June 2025.

    Comments: Accepted on Interspeech 2025

  6. arXiv:2506.15258  [pdf, ps, other

    eess.IV cs.CV

    Privacy-Preserving Chest X-ray Classification in Latent Space with Homomorphically Encrypted Neural Inference

    Authors: Jonghun Kim, Gyeongdeok Jo, Sinyoung Ra, Hyunjin Park

    Abstract: Medical imaging data contain sensitive patient information requiring strong privacy protection. Many analytical setups require data to be sent to a server for inference purposes. Homomorphic encryption (HE) provides a solution by allowing computations to be performed on encrypted data without revealing the original information. However, HE inference is computationally expensive, particularly for l… ▽ More

    Submitted 19 June, 2025; v1 submitted 18 June, 2025; originally announced June 2025.

    Comments: 11 pages, 5 figures

  7. arXiv:2506.12199  [pdf, ps, other

    cs.SD cs.AI eess.AS

    ViSAGe: Video-to-Spatial Audio Generation

    Authors: Jaeyeon Kim, Heeseung Yun, Gunhee Kim

    Abstract: Spatial audio is essential for enhancing the immersiveness of audio-visual experiences, yet its production typically demands complex recording systems and specialized expertise. In this work, we address a novel problem of generating first-order ambisonics, a widely used spatial audio format, directly from silent videos. To support this task, we introduce YT-Ambigen, a dataset comprising 102K 5-sec… ▽ More

    Submitted 13 June, 2025; originally announced June 2025.

    Comments: ICLR 2025. Project page: https://jaeyeonkim99.github.io/visage/

  8. arXiv:2506.10747  [pdf, ps, other

    eess.AS

    FairASR: Fair Audio Contrastive Learning for Automatic Speech Recognition

    Authors: Jongsuk Kim, Jaemyung Yu, Minchan Kwon, Junmo Kim

    Abstract: Large-scale ASR models have achieved remarkable gains in accuracy and robustness. However, fairness issues remain largely unaddressed despite their critical importance in real-world applications. In this work, we introduce FairASR, a system that mitigates demographic bias by learning representations that are uninformative about group membership, enabling fair generalization across demographic grou… ▽ More

    Submitted 12 June, 2025; originally announced June 2025.

    Comments: Accepted to Interspeech2025

  9. arXiv:2506.09487  [pdf, ps, other

    cs.SD cs.AI cs.LG cs.LO eess.AS

    BemaGANv2: A Tutorial and Comparative Survey of GAN-based Vocoders for Long-Term Audio Generation

    Authors: Taesoo Park, Mungwi Jeong, Mingyu Park, Narae Kim, Junyoung Kim, Mujung Kim, Jisang Yoo, Hoyun Lee, Sanghoon Kim, Soonchul Kwon

    Abstract: This paper presents a tutorial-style survey and implementation guide of BemaGANv2, an advanced GAN-based vocoder designed for high-fidelity and long-term audio generation. Built upon the original BemaGAN architecture, BemaGANv2 incorporates major architectural innovations by replacing traditional ResBlocks in the generator with the Anti-aliased Multi-Periodicity composition (AMP) module, which int… ▽ More

    Submitted 11 June, 2025; originally announced June 2025.

    Comments: 11 pages, 7 figures. Survey and tutorial paper. Currently under review at ICT Express as an extended version of our ICAIIC 2025 paper

    ACM Class: I.2.6; H.5.5; I.5.1

  10. arXiv:2506.03381  [pdf, ps, other

    eess.SY cs.AI cs.LG

    Automated Traffic Incident Response Plans using Generative Artificial Intelligence: Part 1 -- Building the Incident Response Benchmark

    Authors: Artur Grigorev, Khaled Saleh, Jiwon Kim, Adriana-Simona Mihaita

    Abstract: Traffic incidents remain a critical public safety concern worldwide, with Australia recording 1,300 road fatalities in 2024, which is the highest toll in 12 years. Similarly, the United States reports approximately 6 million crashes annually, raising significant challenges in terms of a fast reponse time and operational management. Traditional response protocols rely on human decision-making, whic… ▽ More

    Submitted 3 June, 2025; originally announced June 2025.

  11. arXiv:2506.03020  [pdf, ps, other

    eess.AS

    InfiniteAudio: Infinite-Length Audio Generation with Consistency

    Authors: Chaeyoung Jung, Hojoon Ki, Ji-Hoon Kim, Junmo Kim, Joon Son Chung

    Abstract: This paper presents InfiniteAudio, a simple yet effective strategy for generating infinite-length audio using diffusion-based text-to-audio methods. Current approaches face memory constraints because the output size increases with input length, making long duration generation challenging. A common workaround is to concatenate short audio segments, but this often leads to inconsistencies due to the… ▽ More

    Submitted 3 June, 2025; originally announced June 2025.

  12. arXiv:2506.02981  [pdf, ps, other

    cs.CV eess.IV

    Astrophotography turbulence mitigation via generative models

    Authors: Joonyeoup Kim, Yu Yuan, Xingguang Zhang, Xijun Wang, Stanley Chan

    Abstract: Photography is the cornerstone of modern astronomical and space research. However, most astronomical images captured by ground-based telescopes suffer from atmospheric turbulence, resulting in degraded imaging quality. While multi-frame strategies like lucky imaging can mitigate some effects, they involve intensive data acquisition and complex manual processing. In this paper, we propose AstroDiff… ▽ More

    Submitted 3 June, 2025; originally announced June 2025.

  13. arXiv:2505.24160  [pdf, ps, other

    eess.IV cs.CV

    Beyond the LUMIR challenge: The pathway to foundational registration models

    Authors: Junyu Chen, Shuwen Wei, Joel Honkamaa, Pekka Marttinen, Hang Zhang, Min Liu, Yichao Zhou, Zuopeng Tan, Zhuoyuan Wang, Yi Wang, Hongchao Zhou, Shunbo Hu, Yi Zhang, Qian Tao, Lukas Förner, Thomas Wendler, Bailiang Jian, Benedikt Wiestler, Tim Hable, Jin Kim, Dan Ruan, Frederic Madesta, Thilo Sentker, Wiebke Heyer, Lianrui Zuo , et al. (11 additional authors not shown)

    Abstract: Medical image challenges have played a transformative role in advancing the field, catalyzing algorithmic innovation and establishing new performance standards across diverse clinical applications. Image registration, a foundational task in neuroimaging pipelines, has similarly benefited from the Learn2Reg initiative. Building on this foundation, we introduce the Large-scale Unsupervised Brain MRI… ▽ More

    Submitted 29 May, 2025; originally announced May 2025.

  14. arXiv:2505.22027  [pdf, other

    cs.SD cs.AI eess.AS

    Improving Respiratory Sound Classification with Architecture-Agnostic Knowledge Distillation from Ensembles

    Authors: Miika Toikkanen, June-Woo Kim

    Abstract: Respiratory sound datasets are limited in size and quality, making high performance difficult to achieve. Ensemble models help but inevitably increase compute cost at inference time. Soft label training distills knowledge efficiently with extra cost only at training. In this study, we explore soft labels for respiratory sound classification as an architecture-agnostic approach to distill an ensemb… ▽ More

    Submitted 28 May, 2025; originally announced May 2025.

    Comments: Accepted to Interspeech 2025

  15. arXiv:2505.20899  [pdf, ps, other

    cs.CL cs.SD eess.AS

    Dub-S2ST: Textless Speech-to-Speech Translation for Seamless Dubbing

    Authors: Jeongsoo Choi, Jaehun Kim, Joon Son Chung

    Abstract: This paper introduces a cross-lingual dubbing system that translates speech from one language to another while preserving key characteristics such as duration, speaker identity, and speaking speed. Despite the strong translation quality of existing speech translation approaches, they often overlook the transfer of speech patterns, leading to mismatches with source speech and limiting their suitabi… ▽ More

    Submitted 27 May, 2025; originally announced May 2025.

  16. arXiv:2505.19595  [pdf, ps, other

    eess.AS cs.SD

    Accelerating Diffusion-based Text-to-Speech Model Training with Dual Modality Alignment

    Authors: Jeongsoo Choi, Zhikang Niu, Ji-Hoon Kim, Chunhui Wang, Joon Son Chung, Xie Chen

    Abstract: The goal of this paper is to optimize the training process of diffusion-based text-to-speech models. While recent studies have achieved remarkable advancements, their training demands substantial time and computational costs, largely due to the implicit guidance of diffusion models in learning complex intermediate representations. To address this, we propose A-DMA, an effective strategy for Accele… ▽ More

    Submitted 30 May, 2025; v1 submitted 26 May, 2025; originally announced May 2025.

    Comments: Interspeech 2025

  17. arXiv:2505.19401  [pdf, ps, other

    eess.AS

    Stack Less, Repeat More: A Block Reusing Approach for Progressive Speech Enhancement

    Authors: Jangyeon Kim, Ui-Hyeop Shin, Jaehyun Ko, Hyung-Min Park

    Abstract: This paper presents an efficient speech enhancement (SE) approach that reuses a processing block repeatedly instead of conventional stacking. Rather than increasing the number of blocks for learning deep latent representations, repeating a single block leads to progressive refinement while reducing parameter redundancy. We also minimize domain transformation by keeping an encoder and decoder shall… ▽ More

    Submitted 25 May, 2025; originally announced May 2025.

    Comments: Accepted to Interspeech 2025

  18. arXiv:2505.19384  [pdf, ps, other

    cs.CL cs.SD eess.AS

    GSA-TTS : Toward Zero-Shot Speech Synthesis based on Gradual Style Adaptor

    Authors: Seokgi Lee, Jungjun Kim

    Abstract: We present the gradual style adaptor TTS (GSA-TTS) with a novel style encoder that gradually encodes speaking styles from an acoustic reference for zero-shot speech synthesis. GSA first captures the local style of each semantic sound unit. Then the local styles are combined by self-attention to obtain a global style condition. This semantic and hierarchical encoding strategy provides a robust and… ▽ More

    Submitted 25 May, 2025; originally announced May 2025.

    Comments: 7 pages, 3 figures

  19. arXiv:2505.18162  [pdf

    eess.SP cs.LG

    Accelerating Battery Material Optimization through iterative Machine Learning

    Authors: Seon-Hwa Lee, Insoo Ye, Changhwan Lee, Jieun Kim, Geunho Choi, Sang-Cheol Nam, Inchul Park

    Abstract: The performance of battery materials is determined by their composition and the processing conditions employed during commercial-scale fabrication, where raw materials undergo complex processing steps with various additives to yield final products. As the complexity of these parameters expands with the development of industry, conventional one-factor-at-a-time (OFAT) experiment becomes old fashion… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

    Comments: 25 pages, 5 figures

  20. arXiv:2505.11788  [pdf, ps, other

    cs.DC cs.IT cs.LG cs.NI eess.SP

    Communication-Efficient Hybrid Language Model via Uncertainty-Aware Opportunistic and Compressed Transmission

    Authors: Seungeun Oh, Jinhyuk Kim, Jihong Park, Seung-Woo Ko, Jinho Choi, Tony Q. S. Quek, Seong-Lyun Kim

    Abstract: To support emerging language-based applications using dispersed and heterogeneous computing resources, the hybrid language model (HLM) offers a promising architecture, where an on-device small language model (SLM) generates draft tokens that are validated and corrected by a remote large language model (LLM). However, the original HLM suffers from substantial communication overhead, as the LLM requ… ▽ More

    Submitted 16 May, 2025; originally announced May 2025.

    Comments: 14 pages, 10 figures, 2 tables; This work has been submitted to the IEEE for possible publication

  21. arXiv:2505.11688  [pdf, ps, other

    math.OC eess.SY

    On the Sharp Input-Output Analysis of Nonlinear Systems under Adversarial Attacks

    Authors: Jihun Kim, Yuchen Fang, Javad Lavaei

    Abstract: This paper is concerned with learning the input-output mapping of general nonlinear dynamical systems. While the existing literature focuses on Gaussian inputs and benign disturbances, we significantly broaden the scope of admissible control inputs and allow correlated, nonzero-mean, adversarial disturbances. With our reformulation as a linear combination of basis functions, we prove that the… ▽ More

    Submitted 16 May, 2025; originally announced May 2025.

    Comments: 28 pages, 2 figures

    MSC Class: 93B15; 93B30; 93C10

  22. arXiv:2505.10988  [pdf

    cs.AI eess.SY

    DRL-Based Injection Molding Process Parameter Optimization for Adaptive and Profitable Production

    Authors: Joon-Young Kim, Jecheon Yu, Heekyu Kim, Seunghwa Ryu

    Abstract: Plastic injection molding remains essential to modern manufacturing. However, optimizing process parameters to balance product quality and profitability under dynamic environmental and economic conditions remains a persistent challenge. This study presents a novel deep reinforcement learning (DRL)-based framework for real-time process optimization in injection molding, integrating product quality… ▽ More

    Submitted 16 May, 2025; originally announced May 2025.

    Comments: 50 pages, 10 figures

  23. arXiv:2505.09508  [pdf, other

    eess.SP

    Wearable Tracking of Eye and Body Movements During Breaching Training: Towards Real-Time Blast Injury Monitoring

    Authors: Jeremy P. Kemmerer, James R. Williamson, Joseph Kim, Elizabeth Halford, Hrishikesh M. Rao, Christopher J. Smalt

    Abstract: Repeated exposure to blast overpressure in occupational settings has been associated with changes in cognitive and psychological health, as well as deficits in neurosensory subsystems. In this work, we describe a wearable system to simultaneously monitor physiology and blast exposure levels and demonstrate how this system can identify individualized exposure levels corresponding to acute physiolog… ▽ More

    Submitted 14 May, 2025; originally announced May 2025.

  24. arXiv:2505.07365  [pdf, ps, other

    cs.SD cs.AI cs.CL cs.MM eess.AS

    Multi-Domain Audio Question Answering Toward Acoustic Content Reasoning in The DCASE 2025 Challenge

    Authors: Chao-Han Huck Yang, Sreyan Ghosh, Qing Wang, Jaeyeon Kim, Hengyi Hong, Sonal Kumar, Guirui Zhong, Zhifeng Kong, S Sakshi, Vaibhavi Lokegaonkar, Oriol Nieto, Ramani Duraiswami, Dinesh Manocha, Gunhee Kim, Jun Du, Rafael Valle, Bryan Catanzaro

    Abstract: We present Task 5 of the DCASE 2025 Challenge: an Audio Question Answering (AQA) benchmark spanning multiple domains of sound understanding. This task defines three QA subsets (Bioacoustics, Temporal Soundscapes, and Complex QA) to test audio-language models on interactive question-answering over diverse acoustic scenes. We describe the dataset composition (from marine mammal calls to soundscapes… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

    Comments: Preprint. DCASE 2025 Audio QA Challenge: https://dcase.community/challenge2025/task-audio-question-answering

  25. arXiv:2505.04432  [pdf, other

    eess.SP

    SwinLSTM Autoencoder for Temporal-Spatial-Frequency Domain CSI Compression in Massive MIMO Systems

    Authors: Aakash Saini, Yunchou Xing, Jee Hyun Kim, Amir Ahmadian Tehrani, Wolfgang Gerstacker

    Abstract: This study presents a parameter-light, low-complexity artificial intelligence/machine learning (AI/ML) model that enhances channel state information (CSI) feedback in wireless systems by jointly exploiting temporal, spatial, and frequency (TSF) domain correlations. While traditional frameworks use autoencoders for CSI compression at the user equipment (UE) and reconstruction at the network (NW) si… ▽ More

    Submitted 7 May, 2025; originally announced May 2025.

    Comments: 7 pages, 5 figures

  26. arXiv:2505.04105  [pdf

    eess.IV cs.CV

    MAISY: Motion-Aware Image SYnthesis for Medical Image Motion Correction

    Authors: Andrew Zhang, Hao Wang, Shuchang Ye, Michael Fulham, Jinman Kim

    Abstract: Patient motion during medical image acquisition causes blurring, ghosting, and distorts organs, which makes image interpretation challenging. Current state-of-the-art algorithms using Generative Adversarial Network (GAN)-based methods with their ability to learn the mappings between corrupted images and their ground truth via Structural Similarity Index Measure (SSIM) loss effectively generate mot… ▽ More

    Submitted 8 May, 2025; v1 submitted 6 May, 2025; originally announced May 2025.

  27. arXiv:2505.02951  [pdf, other

    cs.IT eess.SP

    Multi-Antenna Users in Cell-Free Massive MIMO: Stream Allocation and Necessity of Downlink Pilots

    Authors: Eren Berk Kama, Junbeom Kim, Emil Björnson

    Abstract: We consider a cell-free massive multiple-input multiple-output (MIMO) system with multiple antennas on the users and access points (APs). In previous works, the downlink spectral efficiency (SE) has been evaluated using the hardening bound that requires no downlink pilots. This approach works well for single-antenna users. In this paper, we show that much higher SEs can be achieved if downlink pil… ▽ More

    Submitted 5 May, 2025; originally announced May 2025.

    Comments: 13 pages, 9 figures. arXiv admin note: text overlap with arXiv:2404.18516

  28. arXiv:2505.00481  [pdf, other

    eess.SY

    Stabilization by Controllers Having Integer Coefficients

    Authors: Joowon Lee, Donggil Lee, Junsoo Kim

    Abstract: The system property of ``having integer coefficients,'' that is, a transfer function has an integer monic polynomial as its denominator, is significant in the field of encrypted control as it is required for a dynamic controller to be realized over encrypted data. This paper shows that there always exists a controller with integer coefficients stabilizing a given discrete-time linear time-invarian… ▽ More

    Submitted 1 May, 2025; originally announced May 2025.

  29. arXiv:2504.19247  [pdf, other

    cs.RO eess.SY

    Efficient COLREGs-Compliant Collision Avoidance using Turning Circle-based Control Barrier Function

    Authors: Changyu Lee, Jinwook Park, Jinwhan Kim

    Abstract: This paper proposes a computationally efficient collision avoidance algorithm using turning circle-based control barrier functions (CBFs) that comply with international regulations for preventing collisions at sea (COLREGs). Conventional CBFs often lack explicit consideration of turning capabilities and avoidance direction, which are key elements in developing a COLREGs-compliant collision avoidan… ▽ More

    Submitted 27 April, 2025; originally announced April 2025.

    Comments: This work has been submitted to an IEEE journal for possible publication

  30. Documentation on Encrypted Dynamic Control Simulation Code using Ring-LWE based Cryptosystems

    Authors: Yeongjun Jang, Joowon Lee, Junsoo Kim

    Abstract: Encrypted controllers offer secure computation by employing modern cryptosystems to execute control operations directly over encrypted data without decryption. However, incorporating cryptosystems into dynamic controllers significantly increases the computational load. This paper aims to provide an accessible guideline for running encrypted controllers using an open-source library Lattigo, which s… ▽ More

    Submitted 17 April, 2025; originally announced April 2025.

    Comments: 6 pages

    Journal ref: Journal of The Society of Instrument and Control Engineers, vol. 64, no. 4, pp. 248-254, 2025

  31. arXiv:2504.09655  [pdf

    eess.IV cs.CV

    OmniMamba4D: Spatio-temporal Mamba for longitudinal CT lesion segmentation

    Authors: Justin Namuk Kim, Yiqiao Liu, Rajath Soans, Keith Persson, Sarah Halek, Michal Tomaszewski, Jianda Yuan, Gregory Goldmacher, Antong Chen

    Abstract: Accurate segmentation of longitudinal CT scans is important for monitoring tumor progression and evaluating treatment responses. However, existing 3D segmentation models solely focus on spatial information. To address this gap, we propose OmniMamba4D, a novel segmentation model designed for 4D medical images (3D images over time). OmniMamba4D utilizes a spatio-temporal tetra-orientated Mamba block… ▽ More

    Submitted 24 April, 2025; v1 submitted 13 April, 2025; originally announced April 2025.

    Comments: Accepted at IEEE International Symposium on Biomedical Imaging (ISBI) 2025

  32. arXiv:2504.09248  [pdf, ps, other

    eess.SY cs.CR

    Asymptotic stabilization under homomorphic encryption: A re-encryption free method

    Authors: Shuai Feng, Qian Ma, Junsoo Kim, Shengyuan Xu

    Abstract: In this paper, we propose methods to encrypted a pre-given dynamic controller with homomorphic encryption, without re-encrypting the control inputs. We first present a preliminary result showing that the coefficients in a pre-given dynamic controller can be scaled up into integers by the zooming-in factor in dynamic quantization, without utilizing re-encryption. However, a sufficiently small zoomi… ▽ More

    Submitted 12 April, 2025; originally announced April 2025.

  33. arXiv:2504.00244  [pdf, other

    math.OC eess.SY

    System Identification from Partial Observations under Adversarial Attacks

    Authors: Jihun Kim, Javad Lavaei

    Abstract: This paper is concerned with the partially observed linear system identification, where the goal is to obtain reasonably accurate estimation of the balanced truncation of the true system up to the order $k$ from output measurements. We consider the challenging case of system identification under adversarial attacks, where the probability of having an attack at each time is $Θ(1/k)$ while the value… ▽ More

    Submitted 31 March, 2025; originally announced April 2025.

    Comments: 9 pages, 2 figures

    MSC Class: 93B15; 93B30; 93C05

  34. arXiv:2503.22829  [pdf

    eess.IV cs.AI cs.CV cs.LG

    Nonhuman Primate Brain Tissue Segmentation Using a Transfer Learning Approach

    Authors: Zhen Lin, Hongyu Yuan, Richard Barcus, Qing Lyu, Sucheta Chakravarty, Megan E. Lipford, Carol A. Shively, Suzanne Craft, Mohammad Kawas, Jeongchul Kim, Christopher T. Whitlow

    Abstract: Non-human primates (NHPs) serve as critical models for understanding human brain function and neurological disorders due to their close evolutionary relationship with humans. Accurate brain tissue segmentation in NHPs is critical for understanding neurological disorders, but challenging due to the scarcity of annotated NHP brain MRI datasets, the small size of the NHP brain, the limited resolution… ▽ More

    Submitted 1 April, 2025; v1 submitted 28 March, 2025; originally announced March 2025.

  35. arXiv:2503.20280  [pdf, other

    cs.RO eess.SY

    Turning Circle-based Control Barrier Function for Efficient Collision Avoidance of Nonholonomic Vehicles

    Authors: Changyu Lee, Kiyong Park, Jinwhan Kim

    Abstract: This paper presents a new control barrier function (CBF) designed to improve the efficiency of collision avoidance for nonholonomic vehicles. Traditional CBFs typically rely on the shortest Euclidean distance to obstacles, overlooking the limited heading change ability of nonholonomic vehicles. This often leads to abrupt maneuvers and excessive speed reductions, which is not desirable and reduces… ▽ More

    Submitted 26 March, 2025; originally announced March 2025.

    Comments: This work has been submitted to an IEEE journal for possible publication

  36. arXiv:2503.18642  [pdf, other

    eess.IV cs.CV

    Rethinking Glaucoma Calibration: Voting-Based Binocular and Metadata Integration

    Authors: Taejin Jeong, Joohyeok Kim, Jaehoon Joo, Yeonwoo Jung, Hyeonmin Kim, Seong Jae Hwang

    Abstract: Glaucoma is an incurable ophthalmic disease that damages the optic nerve, leads to vision loss, and ranks among the leading causes of blindness worldwide. Diagnosing glaucoma typically involves fundus photography, optical coherence tomography (OCT), and visual field testing. However, the high cost of OCT often leads to reliance on fundus photography and visual field testing, both of which exhibit… ▽ More

    Submitted 24 March, 2025; originally announced March 2025.

  37. arXiv:2503.16956  [pdf, other

    eess.AS cs.AI cs.CV cs.SD

    From Faces to Voices: Learning Hierarchical Representations for High-quality Video-to-Speech

    Authors: Ji-Hoon Kim, Jeongsoo Choi, Jaehun Kim, Chaeyoung Jung, Joon Son Chung

    Abstract: The objective of this study is to generate high-quality speech from silent talking face videos, a task also known as video-to-speech synthesis. A significant challenge in video-to-speech synthesis lies in the substantial modality gap between silent video and multi-faceted speech. In this paper, we propose a novel video-to-speech system that effectively bridges this modality gap, significantly enha… ▽ More

    Submitted 21 March, 2025; originally announced March 2025.

    Comments: CVPR 2025, demo page: https://mm.kaist.ac.kr/projects/faces2voices/

  38. arXiv:2503.09385  [pdf, other

    cs.SE cs.RO eess.SY

    PCLA: A Framework for Testing Autonomous Agents in the CARLA Simulator

    Authors: Masoud Jamshidiyan Tehrani, Jinhan Kim, Paolo Tonella

    Abstract: Recent research on testing autonomous driving agents has grown significantly, especially in simulation environments. The CARLA simulator is often the preferred choice, and the autonomous agents from the CARLA Leaderboard challenge are regarded as the best-performing agents within this environment. However, researchers who test these agents, rather than training their own ones from scratch, often f… ▽ More

    Submitted 13 March, 2025; v1 submitted 12 March, 2025; originally announced March 2025.

    Comments: This work will be published at the FSE 2025 demonstration track

  39. arXiv:2503.05848  [pdf, other

    cs.RO eess.SY

    Merry-Go-Round: Safe Control of Decentralized Multi-Robot Systems with Deadlock Prevention

    Authors: Wonjong Lee, Joonyeol Sim, Joonkyung Kim, Siwon Jo, Wenhao Luo, Changjoo Nam

    Abstract: We propose a hybrid approach for decentralized multi-robot navigation that ensures both safety and deadlock prevention. Building on a standard control formulation, we add a lightweight deadlock prevention mechanism by forming temporary "roundabouts" (circular reference paths). Each robot relies only on local, peer-to-peer communication and a controller for base collision avoidance; a roundabout is… ▽ More

    Submitted 7 March, 2025; originally announced March 2025.

    Comments: 7 pages, 7 Figures

  40. arXiv:2503.05366  [pdf, other

    eess.SY

    A Risk-aware Bi-level Bidding Strategy for Virtual Power Plant with Power-to-Hydrogen System

    Authors: Jaehyun Yoo, Jip Kim

    Abstract: This paper presents a risk-aware bi-level bidding strategy for Virtual Power Plant (VPP) that integrates Power-to-Hydrogen (P2H) system, addressing the challenges posed by renewable energy variability and market volatility. By incorporating Conditional Value at Risk (CVaR) within the bi-level optimization framework, the proposed strategy enables VPPs to mitigate financial risks associated with unc… ▽ More

    Submitted 7 March, 2025; originally announced March 2025.

    Comments: 5 pages, 5 figures, 2025 PES General Meeting

  41. arXiv:2503.05361  [pdf, other

    eess.SY

    Community Energy Management System for Fast Frequency Response: A Hierarchical Control Approach

    Authors: Joonsung Jung, Hyunjoong Kim, Hyunghwan Shin, Jip Kim

    Abstract: The increase in renewable energy sources (RES) has reduced power system inertia, making frequency stabilization more challenging and highlighting the need for fast frequency response (FFR) resources. While building energy management systems (BEMS) equipped with distributed energy resources (DERs) can provide FFR, individual BEMS alone cannot fully meet demand. To address this, we propose a communi… ▽ More

    Submitted 7 March, 2025; originally announced March 2025.

    Comments: 5 pages, 7 figures, submitted to PES General Meeting 2025

    MSC Class: 90C05; 90C90 ACM Class: I.2.8; C.3; G.1.6

  42. arXiv:2503.03983  [pdf, other

    cs.SD cs.CL cs.LG eess.AS

    Audio Flamingo 2: An Audio-Language Model with Long-Audio Understanding and Expert Reasoning Abilities

    Authors: Sreyan Ghosh, Zhifeng Kong, Sonal Kumar, S Sakshi, Jaehyeon Kim, Wei Ping, Rafael Valle, Dinesh Manocha, Bryan Catanzaro

    Abstract: Understanding and reasoning over non-speech sounds and music are crucial for both humans and AI agents to interact effectively with their environments. In this paper, we introduce Audio Flamingo 2 (AF2), an Audio-Language Model (ALM) with advanced audio understanding and reasoning capabilities. AF2 leverages (i) a custom CLAP model, (ii) synthetic Audio QA data for fine-grained audio reasoning, an… ▽ More

    Submitted 5 March, 2025; originally announced March 2025.

  43. arXiv:2502.16459  [pdf

    eess.IV cs.AI cs.CV

    Deep learning approaches to surgical video segmentation and object detection: A Scoping Review

    Authors: Devanish N. Kamtam, Joseph B. Shrager, Satya Deepya Malla, Nicole Lin, Juan J. Cardona, Jake J. Kim, Clarence Hu

    Abstract: Introduction: Computer vision (CV) has had a transformative impact in biomedical fields such as radiology, dermatology, and pathology. Its real-world adoption in surgical applications, however, remains limited. We review the current state-of-the-art performance of deep learning (DL)-based CV models for segmentation and object detection of anatomical structures in videos obtained during surgical pr… ▽ More

    Submitted 23 February, 2025; originally announced February 2025.

    Comments: 38 pages, 2 figures

  44. arXiv:2502.13986  [pdf, other

    eess.IV

    Structure-from-Sherds++: Robust Incremental 3D Reassembly of Axially Symmetric Pots from Unordered and Mixed Fragment Collections

    Authors: Seong Jong Yoo, Sisung Liu, Muhammad Zeeshan Arshad, Jinhyeok Kim, Young Min Kim, Yiannis Aloimonos, Cornelia Fermuller, Kyungdon Joo, Jinwook Kim, Je Hyeong Hong

    Abstract: Reassembling multiple axially symmetric pots from fragmentary sherds is crucial for cultural heritage preservation, yet it poses significant challenges due to thin and sharp fracture surfaces that generate numerous false positive matches and hinder large-scale puzzle solving. Existing global approaches, which optimize all potential fragment pairs simultaneously or data-driven models, are prone to… ▽ More

    Submitted 18 February, 2025; originally announced February 2025.

    Comments: 24 pages

  45. arXiv:2502.10283  [pdf, other

    cs.CR eess.SY

    Anomaly Detection with LWE Encrypted Control

    Authors: Rijad Alisic, Junsoo Kim, Henrik Sandberg

    Abstract: Detecting attacks using encrypted signals is challenging since encryption hides its information content. We present a novel mechanism for anomaly detection over Learning with Errors (LWE) encrypted signals without using decryption, secure channels, nor complex communication schemes. Instead, the detector exploits the homomorphic property of LWE encryption to perform hypothesis tests on transformat… ▽ More

    Submitted 14 February, 2025; originally announced February 2025.

  46. arXiv:2502.09283  [pdf, other

    eess.SP

    Rate-Splitting Multiple Access for 6G: Prototypes, Experimental Results and Link/System level Simulations

    Authors: Sundar Aditya, Yong Jin Daniel Kim, David Vargas, David Redgate, Onur Dizdar, Neil Bhushan, Xinze Lyu, Sibo Zhang, Stephen Wang, Bruno Clerckx

    Abstract: Rate-Splitting Multiple Access (RSMA) is a powerful and versatile physical layer multiple access technique that generalizes and has better interference management capabilities than 5G-based Space Division Multiple Access (SDMA). It is also a rapidly maturing technology, all of which makes it a natural successor to SDMA in 6G. In this article, we describe RSMA's suitability for 6G by presenting: i)… ▽ More

    Submitted 17 February, 2025; v1 submitted 13 February, 2025; originally announced February 2025.

    Comments: Submitted to the IEEE Communications Standards Magazine December 2025 Special Issue on "Wireless Technologies for 6G and Beyond: Applications, Implementations, and Standardization"

  47. arXiv:2502.08675  [pdf, other

    eess.IV

    Improving Lesion Segmentation in Medical Images by Global and Regional Feature Compensation

    Authors: Chuhan Wang, Zhenghao Chen, Jean Y. H. Yang, Jinman Kim

    Abstract: Automated lesion segmentation of medical images has made tremendous improvements in recent years due to deep learning advancements. However, accurately capturing fine-grained global and regional feature representations remains a challenge. Many existing methods obtain suboptimal performance on complex lesion segmentation due to information loss during typical downsampling operations and the insuff… ▽ More

    Submitted 11 February, 2025; originally announced February 2025.

  48. arXiv:2502.05330  [pdf, other

    eess.IV cs.AI cs.CV cs.LG

    Multi-Class Segmentation of Aortic Branches and Zones in Computed Tomography Angiography: The AortaSeg24 Challenge

    Authors: Muhammad Imran, Jonathan R. Krebs, Vishal Balaji Sivaraman, Teng Zhang, Amarjeet Kumar, Walker R. Ueland, Michael J. Fassler, Jinlong Huang, Xiao Sun, Lisheng Wang, Pengcheng Shi, Maximilian Rokuss, Michael Baumgartner, Yannick Kirchhof, Klaus H. Maier-Hein, Fabian Isensee, Shuolin Liu, Bing Han, Bong Thanh Nguyen, Dong-jin Shin, Park Ji-Woo, Mathew Choi, Kwang-Hyun Uhm, Sung-Jea Ko, Chanwoong Lee , et al. (38 additional authors not shown)

    Abstract: Multi-class segmentation of the aorta in computed tomography angiography (CTA) scans is essential for diagnosing and planning complex endovascular treatments for patients with aortic dissections. However, existing methods reduce aortic segmentation to a binary problem, limiting their ability to measure diameters across different branches and zones. Furthermore, no open-source dataset is currently… ▽ More

    Submitted 7 February, 2025; originally announced February 2025.

  49. arXiv:2502.03505  [pdf, other

    eess.IV cs.AI cs.LG

    Enhancing Free-hand 3D Photoacoustic and Ultrasound Reconstruction using Deep Learning

    Authors: SiYeoul Lee, SeonHo Kim, Minkyung Seo, SeongKyu Park, Salehin Imrus, Kambaluru Ashok, DongEon Lee, Chunsu Park, SeonYeong Lee, Jiye Kim, Jae-Heung Yoo, MinWoo Kim

    Abstract: This study introduces a motion-based learning network with a global-local self-attention module (MoGLo-Net) to enhance 3D reconstruction in handheld photoacoustic and ultrasound (PAUS) imaging. Standard PAUS imaging is often limited by a narrow field of view and the inability to effectively visualize complex 3D structures. The 3D freehand technique, which aligns sequential 2D images for 3D reconst… ▽ More

    Submitted 5 February, 2025; originally announced February 2025.

  50. arXiv:2502.01092  [pdf, other

    cs.RO cs.CV eess.SY

    Enhancing Feature Tracking Reliability for Visual Navigation using Real-Time Safety Filter

    Authors: Dabin Kim, Inkyu Jang, Youngsoo Han, Sunwoo Hwang, H. Jin Kim

    Abstract: Vision sensors are extensively used for localizing a robot's pose, particularly in environments where global localization tools such as GPS or motion capture systems are unavailable. In many visual navigation systems, localization is achieved by detecting and tracking visual features or landmarks, which provide information about the sensor's relative pose. For reliable feature tracking and accurat… ▽ More

    Submitted 3 February, 2025; originally announced February 2025.

    Comments: 7 pages, 6 figures, Accepted to 2025 IEEE International Conference on Robotics & Automation (ICRA 2025)