Skip to main content

Showing 1–50 of 494 results for author: Kim, S

Searching in archive eess. Search in all archives.
.
  1. arXiv:2507.04140  [pdf, ps, other

    cs.RO eess.SY

    Learning Humanoid Arm Motion via Centroidal Momentum Regularized Multi-Agent Reinforcement Learning

    Authors: Ho Jae Lee, Se Hwan Jeon, Sangbae Kim

    Abstract: Humans naturally swing their arms during locomotion to regulate whole-body dynamics, reduce angular momentum, and help maintain balance. Inspired by this principle, we present a limb-level multi-agent reinforcement learning (RL) framework that enables coordinated whole-body control of humanoid robots through emergent arm motion. Our approach employs separate actor-critic structures for the arms an… ▽ More

    Submitted 5 July, 2025; originally announced July 2025.

    Comments: 8 pages, 10 figures

  2. arXiv:2507.02897  [pdf, ps, other

    cs.LG cs.CV eess.SY physics.plasm-ph

    Regulation Compliant AI for Fusion: Real-Time Image Analysis-Based Control of Divertor Detachment in Tokamaks

    Authors: Nathaniel Chen, Cheolsik Byun, Azarakash Jalalvand, Sangkyeun Kim, Andrew Rothstein, Filippo Scotti, Steve Allen, David Eldon, Keith Erickson, Egemen Kolemen

    Abstract: While artificial intelligence (AI) has been promising for fusion control, its inherent black-box nature will make compliant implementation in regulatory environments a challenge. This study implements and validates a real-time AI enabled linear and interpretable control system for successful divertor detachment control with the DIII-D lower divertor camera. Using D2 gas, we demonstrate feedback di… ▽ More

    Submitted 21 June, 2025; originally announced July 2025.

  3. arXiv:2507.01038  [pdf, ps, other

    cs.IT cs.LG eess.SP

    Cross-Attention Message-Passing Transformers for Code-Agnostic Decoding in 6G Networks

    Authors: Seong-Joon Park, Hee-Youl Kwak, Sang-Hyo Kim, Yongjune Kim, Jong-Seon No

    Abstract: Channel coding for 6G networks is expected to support a wide range of requirements arising from heterogeneous communication scenarios. These demands challenge traditional code-specific decoders, which lack the flexibility and scalability required for next-generation systems. To tackle this problem, we propose an AI-native foundation model for unified and code-agnostic decoding based on the transfo… ▽ More

    Submitted 22 June, 2025; originally announced July 2025.

  4. arXiv:2506.21765  [pdf, ps, other

    eess.IV cs.CV

    TUS-REC2024: A Challenge to Reconstruct 3D Freehand Ultrasound Without External Tracker

    Authors: Qi Li, Shaheer U. Saeed, Yuliang Huang, Mingyuan Luo, Zhongnuo Yan, Jiongquan Chen, Xin Yang, Dong Ni, Nektarios Winter, Phuc Nguyen, Lucas Steinberger, Caelan Haney, Yuan Zhao, Mingjie Jiang, Bowen Ren, SiYeoul Lee, Seonho Kim, MinKyung Seo, MinWoo Kim, Yimeng Dou, Zhiwei Zhang, Yin Li, Tomy Varghese, Dean C. Barratt, Matthew J. Clarkson , et al. (2 additional authors not shown)

    Abstract: Trackerless freehand ultrasound reconstruction aims to reconstruct 3D volumes from sequences of 2D ultrasound images without relying on external tracking systems, offering a low-cost, portable, and widely deployable alternative for volumetric imaging. However, it presents significant challenges, including accurate inter-frame motion estimation, minimisation of drift accumulation over long sequence… ▽ More

    Submitted 26 June, 2025; originally announced June 2025.

  5. arXiv:2506.20152  [pdf, ps, other

    cs.CV cs.AI eess.IV

    Loss-Aware Automatic Selection of Structured Pruning Criteria for Deep Neural Network Acceleration

    Authors: Deepak Ghimire, Kilho Lee, Seong-heum Kim

    Abstract: Structured pruning is a well-established technique for compressing neural networks, making it suitable for deployment in resource-limited edge devices. This paper presents an efficient Loss-Aware Automatic Selection of Structured Pruning Criteria (LAASP) for slimming and accelerating deep neural networks. The majority of pruning methodologies employ a sequential process consisting of three stages:… ▽ More

    Submitted 25 June, 2025; originally announced June 2025.

    Journal ref: Image Vision Comput. 136 (2023) 104745

  6. arXiv:2506.16741  [pdf, ps, other

    eess.AS cs.AI

    RapFlow-TTS: Rapid and High-Fidelity Text-to-Speech with Improved Consistency Flow Matching

    Authors: Hyun Joon Park, Jeongmin Liu, Jin Sob Kim, Jeong Yeol Yang, Sung Won Han, Eunwoo Song

    Abstract: We introduce RapFlow-TTS, a rapid and high-fidelity TTS acoustic model that leverages velocity consistency constraints in flow matching (FM) training. Although ordinary differential equation (ODE)-based TTS generation achieves natural-quality speech, it typically requires a large number of generation steps, resulting in a trade-off between quality and inference speed. To address this challenge, Ra… ▽ More

    Submitted 20 June, 2025; originally announced June 2025.

    Comments: Accepted on Interspeech 2025

  7. arXiv:2506.16738  [pdf, ps, other

    cs.CL cs.AI cs.SD eess.AS

    LM-SPT: LM-Aligned Semantic Distillation for Speech Tokenization

    Authors: Daejin Jo, Jeeyoung Yun, Byungseok Roh, Sungwoong Kim

    Abstract: With the rapid progress of speech language models (SLMs), discrete speech tokens have emerged as a core interface between speech and text, enabling unified modeling across modalities. Recent speech tokenization approaches aim to isolate semantic information from low-level acoustics to better align with language models. In particular, previous methods use SSL teachers such as HuBERT to extract sema… ▽ More

    Submitted 20 June, 2025; originally announced June 2025.

  8. arXiv:2506.16231  [pdf, ps, other

    eess.AS cs.SD

    EDNet: A Distortion-Agnostic Speech Enhancement Framework with Gating Mamba Mechanism and Phase Shift-Invariant Training

    Authors: Doyeop Kwak, Youngjoon Jang, Seongyu Kim, Joon Son Chung

    Abstract: Speech signals in real-world environments are frequently affected by various distortions such as additive noise, reverberation, and bandwidth limitation, which may appear individually or in combination. Traditional speech enhancement methods typically rely on either masking, which focuses on suppressing non-speech components while preserving observable structure, or mapping, which seeks to recover… ▽ More

    Submitted 19 June, 2025; originally announced June 2025.

  9. arXiv:2506.09487  [pdf, ps, other

    cs.SD cs.AI cs.LG cs.LO eess.AS

    BemaGANv2: A Tutorial and Comparative Survey of GAN-based Vocoders for Long-Term Audio Generation

    Authors: Taesoo Park, Mungwi Jeong, Mingyu Park, Narae Kim, Junyoung Kim, Mujung Kim, Jisang Yoo, Hoyun Lee, Sanghoon Kim, Soonchul Kwon

    Abstract: This paper presents a tutorial-style survey and implementation guide of BemaGANv2, an advanced GAN-based vocoder designed for high-fidelity and long-term audio generation. Built upon the original BemaGAN architecture, BemaGANv2 incorporates major architectural innovations by replacing traditional ResBlocks in the generator with the Anti-aliased Multi-Periodicity composition (AMP) module, which int… ▽ More

    Submitted 11 June, 2025; originally announced June 2025.

    Comments: 11 pages, 7 figures. Survey and tutorial paper. Currently under review at ICT Express as an extended version of our ICAIIC 2025 paper

    ACM Class: I.2.6; H.5.5; I.5.1

  10. arXiv:2506.01789  [pdf, ps, other

    cs.LG cs.AI cs.CL cs.CV eess.AS

    Datasheets Aren't Enough: DataRubrics for Automated Quality Metrics and Accountability

    Authors: Genta Indra Winata, David Anugraha, Emmy Liu, Alham Fikri Aji, Shou-Yi Hung, Aditya Parashar, Patrick Amadeus Irawan, Ruochen Zhang, Zheng-Xin Yong, Jan Christian Blaise Cruz, Niklas Muennighoff, Seungone Kim, Hanyang Zhao, Sudipta Kar, Kezia Erina Suryoraharjo, M. Farid Adilazuarda, En-Shiun Annie Lee, Ayu Purwarianti, Derry Tanti Wijaya, Monojit Choudhury

    Abstract: High-quality datasets are fundamental to training and evaluating machine learning models, yet their creation-especially with accurate human annotations-remains a significant challenge. Many dataset paper submissions lack originality, diversity, or rigorous quality control, and these shortcomings are often overlooked during peer review. Submissions also frequently omit essential details about datas… ▽ More

    Submitted 3 June, 2025; v1 submitted 2 June, 2025; originally announced June 2025.

    Comments: Preprint

  11. arXiv:2506.01382  [pdf, ps, other

    eess.SP

    Enabling Scalable Distributed Beamforming via Networked LEO Satellites Towards 6G

    Authors: Yuchen Zhang Seungnyun Kim, Tareq Y. Al-Naffouri

    Abstract: In this paper, we propose scalable distributed beamforming schemes over low Earth orbit (LEO) satellite networks that rely solely on statistical channel state information for downlink orthogonal frequency division multiplexing systems. We begin by introducing the system model and presenting a pragmatic yet effective analog beamformer and user-scheduling design. We then derive a closed-form lower b… ▽ More

    Submitted 2 June, 2025; originally announced June 2025.

    Comments: This paper has been submitted to IEEE journal

  12. arXiv:2505.23834  [pdf, ps, other

    cs.SD cs.LG eess.AS

    Patient-Aware Feature Alignment for Robust Lung Sound Classification:Cohesion-Separation and Global Alignment Losses

    Authors: Seung Gyu Jeong, Seong Eun Kim

    Abstract: Lung sound classification is vital for early diagnosis of respiratory diseases. However, biomedical signals often exhibit inter-patient variability even among patients with the same symptoms, requiring a learning approach that considers individual differences. We propose a Patient-Aware Feature Alignment (PAFA) framework with two novel losses, Patient Cohesion-Separation Loss (PCSL) and Global Pat… ▽ More

    Submitted 28 May, 2025; originally announced May 2025.

    Comments: Accepted INTERSPEECH 2025

  13. arXiv:2505.23132  [pdf, ps, other

    cs.SD cs.AI eess.AS

    Patient Domain Supervised Contrastive Learning for Lung Sound Classification Using Mobile Phone

    Authors: Seung Gyu Jeong, Seong Eun Kim

    Abstract: Auscultation is crucial for diagnosing lung diseases. The COVID-19 pandemic has revealed the limitations of traditional, in-person lung sound assessments. To overcome these issues, advancements in digital stethoscopes and artificial intelligence (AI) have led to the development of new diagnostic methods. In this context, our study aims to use smartphone microphones to record and analyze lung sound… ▽ More

    Submitted 29 May, 2025; originally announced May 2025.

    Comments: ITS-CSCC 2024

  14. arXiv:2505.22489  [pdf, other

    eess.IV cs.CV cs.GR

    Cascaded 3D Diffusion Models for Whole-body 3D 18-F FDG PET/CT synthesis from Demographics

    Authors: Siyeop Yoon, Sifan Song, Pengfei Jin, Matthew Tivnan, Yujin Oh, Sekeun Kim, Dufan Wu, Xiang Li, Quanzheng Li

    Abstract: We propose a cascaded 3D diffusion model framework to synthesize high-fidelity 3D PET/CT volumes directly from demographic variables, addressing the growing need for realistic digital twins in oncologic imaging, virtual trials, and AI-driven data augmentation. Unlike deterministic phantoms, which rely on predefined anatomical and metabolic templates, our method employs a two-stage generative proce… ▽ More

    Submitted 28 May, 2025; originally announced May 2025.

    Comments: MICCAI2025 Submitted version

  15. arXiv:2505.20868  [pdf, ps, other

    cs.SD cs.AI eess.AS

    Spotlight-TTS: Spotlighting the Style via Voiced-Aware Style Extraction and Style Direction Adjustment for Expressive Text-to-Speech

    Authors: Nam-Gyu Kim, Deok-Hyeon Cho, Seung-Bin Kim, Seong-Whan Lee

    Abstract: Recent advances in expressive text-to-speech (TTS) have introduced diverse methods based on style embedding extracted from reference speech. However, synthesizing high-quality expressive speech remains challenging. We propose Spotlight-TTS, which exclusively emphasizes style via voiced-aware style extraction and style direction adjustment. Voiced-aware style extraction focuses on voiced regions hi… ▽ More

    Submitted 29 June, 2025; v1 submitted 27 May, 2025; originally announced May 2025.

    Comments: Proceedings of Interspeech 2025

  16. arXiv:2505.19693  [pdf, ps, other

    cs.SD cs.AI eess.AS

    EmoSphere-SER: Enhancing Speech Emotion Recognition Through Spherical Representation with Auxiliary Classification

    Authors: Deok-Hyeon Cho, Hyung-Seok Oh, Seung-Bin Kim, Seong-Whan Lee

    Abstract: Speech emotion recognition predicts a speaker's emotional state from speech signals using discrete labels or continuous dimensions such as arousal, valence, and dominance (VAD). We propose EmoSphere-SER, a joint model that integrates spherical VAD region classification to guide VAD regression for improved emotion prediction. In our framework, VAD values are transformed into spherical coordinates t… ▽ More

    Submitted 26 May, 2025; originally announced May 2025.

    Comments: Proceedings of Interspeech 2025

  17. arXiv:2505.19687  [pdf, ps, other

    cs.SD cs.AI eess.AS

    DiEmo-TTS: Disentangled Emotion Representations via Self-Supervised Distillation for Cross-Speaker Emotion Transfer in Text-to-Speech

    Authors: Deok-Hyeon Cho, Hyung-Seok Oh, Seung-Bin Kim, Seong-Whan Lee

    Abstract: Cross-speaker emotion transfer in speech synthesis relies on extracting speaker-independent emotion embeddings for accurate emotion modeling without retaining speaker traits. However, existing timbre compression methods fail to fully separate speaker and emotion characteristics, causing speaker leakage and degraded synthesis quality. To address this, we propose DiEmo-TTS, a self-supervised distill… ▽ More

    Submitted 26 May, 2025; originally announced May 2025.

    Comments: Proceedings of Interspeech 2025

  18. arXiv:2505.16212  [pdf, ps, other

    cs.CL eess.AS

    Large Language Models based ASR Error Correction for Child Conversations

    Authors: Anfeng Xu, Tiantian Feng, So Hyun Kim, Somer Bishop, Catherine Lord, Shrikanth Narayanan

    Abstract: Automatic Speech Recognition (ASR) has recently shown remarkable progress, but accurately transcribing children's speech remains a significant challenge. Recent developments in Large Language Models (LLMs) have shown promise in improving ASR transcriptions. However, their applications in child speech including conversational scenarios are underexplored. In this study, we explore the use of LLMs in… ▽ More

    Submitted 24 May, 2025; v1 submitted 22 May, 2025; originally announced May 2025.

    Comments: Accepted to Interspeech 2025

  19. arXiv:2505.12686  [pdf, other

    cs.LG cs.SD eess.AS

    RoVo: Robust Voice Protection Against Unauthorized Speech Synthesis with Embedding-Level Perturbations

    Authors: Seungmin Kim, Sohee Park, Donghyun Kim, Jisu Lee, Daeseon Choi

    Abstract: With the advancement of AI-based speech synthesis technologies such as Deep Voice, there is an increasing risk of voice spoofing attacks, including voice phishing and fake news, through unauthorized use of others' voices. Existing defenses that inject adversarial perturbations directly into audio signals have limited effectiveness, as these perturbations can easily be neutralized by speech enhance… ▽ More

    Submitted 19 May, 2025; originally announced May 2025.

  20. arXiv:2505.11788  [pdf, ps, other

    cs.DC cs.IT cs.LG cs.NI eess.SP

    Communication-Efficient Hybrid Language Model via Uncertainty-Aware Opportunistic and Compressed Transmission

    Authors: Seungeun Oh, Jinhyuk Kim, Jihong Park, Seung-Woo Ko, Jinho Choi, Tony Q. S. Quek, Seong-Lyun Kim

    Abstract: To support emerging language-based applications using dispersed and heterogeneous computing resources, the hybrid language model (HLM) offers a promising architecture, where an on-device small language model (SLM) generates draft tokens that are validated and corrected by a remote large language model (LLM). However, the original HLM suffers from substantial communication overhead, as the LLM requ… ▽ More

    Submitted 16 May, 2025; originally announced May 2025.

    Comments: 14 pages, 10 figures, 2 tables; This work has been submitted to the IEEE for possible publication

  21. arXiv:2505.05710  [pdf, ps, other

    cs.CV cs.AI eess.IV

    HyperspectralMAE: The Hyperspectral Imagery Classification Model using Fourier-Encoded Dual-Branch Masked Autoencoder

    Authors: Wooyoung Jeong, Hyun Jae Park, Seonghun Jeong, Jong Wook Jang, Tae Hoon Lim, Dae Seoung Kim

    Abstract: Hyperspectral imagery provides rich spectral detail but poses unique challenges because of its high dimensionality in both spatial and spectral domains. We propose \textit{HyperspectralMAE}, a Transformer-based foundation model for hyperspectral data that employs a \textit{dual masking} strategy: during pre-training we randomly occlude 50\% of spatial patches and 50\% of spectral bands. This force… ▽ More

    Submitted 8 May, 2025; originally announced May 2025.

  22. arXiv:2505.01617  [pdf, other

    cs.RO eess.SY

    High Speed Robotic Table Tennis Swinging Using Lightweight Hardware with Model Predictive Control

    Authors: David Nguyen, Kendrick D. Cancio, Sangbae Kim

    Abstract: We present a robotic table tennis platform that achieves a variety of hit styles and ball-spins with high precision, power, and consistency. This is enabled by a custom lightweight, high-torque, low rotor inertia, five degree-of-freedom arm capable of high acceleration. To generate swing trajectories, we formulate an optimal control problem (OCP) that constrains the state of the paddle at the time… ▽ More

    Submitted 2 May, 2025; originally announced May 2025.

  23. arXiv:2504.18539  [pdf, other

    eess.AS cs.LG cs.MM cs.SD

    Multi-Task Corrupted Prediction for Learning Robust Audio-Visual Speech Representation

    Authors: Sungnyun Kim, Sungwoo Cho, Sangmin Bae, Kangwook Jang, Se-Young Yun

    Abstract: Audio-visual speech recognition (AVSR) incorporates auditory and visual modalities to improve recognition accuracy, particularly in noisy environments where audio-only speech systems are insufficient. While previous research has largely addressed audio disruptions, few studies have dealt with visual corruptions, e.g., lip occlusions or blurred videos, which are also detrimental. To address this re… ▽ More

    Submitted 30 April, 2025; v1 submitted 23 January, 2025; originally announced April 2025.

    Comments: ICLR 2025; 22 pages, 6 figures, 14 tables

  24. FADEL: Uncertainty-aware Fake Audio Detection with Evidential Deep Learning

    Authors: Ju Yeon Kang, Ji Won Yoon, Semin Kim, Min Hyun Han, Nam Soo Kim

    Abstract: Recently, fake audio detection has gained significant attention, as advancements in speech synthesis and voice conversion have increased the vulnerability of automatic speaker verification (ASV) systems to spoofing attacks. A key challenge in this task is generalizing models to detect unseen, out-of-distribution (OOD) attacks. Although existing approaches have shown promising results, they inheren… ▽ More

    Submitted 22 April, 2025; originally announced April 2025.

    Comments: Accepted at ICASSP 2025

  25. An Addendum to NeBula: Towards Extending TEAM CoSTAR's Solution to Larger Scale Environments

    Authors: Ali Agha, Kyohei Otsu, Benjamin Morrell, David D. Fan, Sung-Kyun Kim, Muhammad Fadhil Ginting, Xianmei Lei, Jeffrey Edlund, Seyed Fakoorian, Amanda Bouman, Fernando Chavez, Taeyeon Kim, Gustavo J. Correa, Maira Saboia, Angel Santamaria-Navarro, Brett Lopez, Boseong Kim, Chanyoung Jung, Mamoru Sobue, Oriana Claudia Peltzer, Joshua Ott, Robert Trybula, Thomas Touma, Marcel Kaufmann, Tiago Stegun Vaquero , et al. (64 additional authors not shown)

    Abstract: This paper presents an appendix to the original NeBula autonomy solution developed by the TEAM CoSTAR (Collaborative SubTerranean Autonomous Robots), participating in the DARPA Subterranean Challenge. Specifically, this paper presents extensions to NeBula's hardware, software, and algorithmic components that focus on increasing the range and scale of the exploration environment. From the algorithm… ▽ More

    Submitted 18 April, 2025; originally announced April 2025.

    Journal ref: IEEE Transactions on Field Robotics, vol. 1, pp. 476-526, 2024

  26. arXiv:2504.12512  [pdf, other

    cs.RO eess.SY

    Practical Insights on Grasp Strategies for Mobile Manipulation in the Wild

    Authors: Isabella Huang, Richard Cheng, Sangwoon Kim, Dan Kruse, Carolyn Matl, Lukas Kaul, JC Hancock, Shanmuga Harikumar, Mark Tjersland, James Borders, Dan Helmick

    Abstract: Mobile manipulation robots are continuously advancing, with their grasping capabilities rapidly progressing. However, there are still significant gaps preventing state-of-the-art mobile manipulators from widespread real-world deployments, including their ability to reliably grasp items in unstructured environments. To help bridge this gap, we developed SHOPPER, a mobile manipulation robot platform… ▽ More

    Submitted 16 April, 2025; originally announced April 2025.

    Comments: 8 pages, 8 figures, submitted to IROS 2025

  27. arXiv:2504.12354  [pdf, other

    eess.IV cs.AI

    WaterFlow: Learning Fast & Robust Watermarks using Stable Diffusion

    Authors: Vinay Shukla, Prachee Sharma, Ryan Rossi, Sungchul Kim, Tong Yu, Aditya Grover

    Abstract: The ability to embed watermarks in images is a fundamental problem of interest for computer vision, and is exacerbated by the rapid rise of generated imagery in recent times. Current state-of-the-art techniques suffer from computational and statistical challenges such as the slow execution speed for practical deployments. In addition, other works trade off fast watermarking speeds but suffer great… ▽ More

    Submitted 17 April, 2025; v1 submitted 15 April, 2025; originally announced April 2025.

  28. arXiv:2504.03600  [pdf, other

    eess.IV cs.AI cs.CV

    MedSAM2: Segment Anything in 3D Medical Images and Videos

    Authors: Jun Ma, Zongxin Yang, Sumin Kim, Bihui Chen, Mohammed Baharoon, Adibvafa Fallahpour, Reza Asakereh, Hongwei Lyu, Bo Wang

    Abstract: Medical image and video segmentation is a critical task for precision medicine, which has witnessed considerable progress in developing task or modality-specific and generalist models for 2D images. However, there have been limited studies on building general-purpose models for 3D images and videos with comprehensive user studies. Here, we present MedSAM2, a promptable segmentation foundation mode… ▽ More

    Submitted 4 April, 2025; originally announced April 2025.

    Comments: https://medsam2.github.io/

  29. arXiv:2503.19228  [pdf, ps, other

    eess.SY

    Bridging the Sim-to-real Gap: A Control Framework for Imitation Learning of Model Predictive Control

    Authors: Seungtaek Kim, Jonghyup Lee, Kyoungseok Han, Seibum B. Choi

    Abstract: To address the computational challenges of Model Predictive Control (MPC), recent research has studied using imitation learning to approximate the MPC to a computationally efficient Deep Neural Network (DNN). However, this introduces a common issue in learning-based control, the simulation-to-reality (sim-to-real) gap, and Domain Randomization (DR) has been widely used to mitigate this gap by intr… ▽ More

    Submitted 3 July, 2025; v1 submitted 24 March, 2025; originally announced March 2025.

  30. arXiv:2503.18880  [pdf, other

    cs.CV cs.SD eess.AS

    Seeing Speech and Sound: Distinguishing and Locating Audios in Visual Scenes

    Authors: Hyeonggon Ryu, Seongyu Kim, Joon Son Chung, Arda Senocak

    Abstract: We present a unified model capable of simultaneously grounding both spoken language and non-speech sounds within a visual scene, addressing key limitations in current audio-visual grounding models. Existing approaches are typically limited to handling either speech or non-speech sounds independently, or at best, together but sequentially without mixing. This limitation prevents them from capturing… ▽ More

    Submitted 24 March, 2025; originally announced March 2025.

    Comments: CVPR 2025

  31. arXiv:2503.18151  [pdf, ps, other

    eess.IV cs.AI cs.CV

    Efficient Deep Learning Approaches for Processing Ultra-Widefield Retinal Imaging

    Authors: Siwon Kim, Wooyung Yun, Jeongbin Oh, Soomok Lee

    Abstract: Deep learning has emerged as the predominant solution for classifying medical images. We intend to apply these developments to the ultra-widefield (UWF) retinal imaging dataset. Since UWF images can accurately diagnose various retina diseases, it is very important to clas sify them accurately and prevent them with early treatment. However, processing images manually is time-consuming and labor-int… ▽ More

    Submitted 23 March, 2025; originally announced March 2025.

  32. arXiv:2503.16366  [pdf, other

    physics.optics eess.SP

    Dynamic Metasurface-Backed Luneburg Lens for Multiplexed Backscatter Communication

    Authors: Samuel Kim, Tim Sleasman, Avrami Rakovsky, Ra'id Awadallah, David B. Shrekenhamer

    Abstract: Backscatter communications is attractive for its low power requirements due to the lack of actively radiating components; however, commonly used devices are typically limited in range and functionality. Here, we design and demonstrate a flattened Luneburg lens combined with a spatially-tunable dynamic metasurface to create a low-power backscatter communicator. The Luneburg lens is a spherically-sy… ▽ More

    Submitted 20 March, 2025; originally announced March 2025.

    Comments: 13 pages, 8 figures

  33. arXiv:2503.12806  [pdf, other

    cs.MM cs.CV cs.SD eess.AS

    AV-Surf: Surface-Enhanced Geometry-Aware Novel-View Acoustic Synthesis

    Authors: Hadam Baek, Hannie Shin, Jiyoung Seo, Chanwoo Kim, Saerom Kim, Hyeongbok Kim, Sangpil Kim

    Abstract: Accurately modeling sound propagation with complex real-world environments is essential for Novel View Acoustic Synthesis (NVAS). While previous studies have leveraged visual perception to estimate spatial acoustics, the combined use of surface normal and structural details from 3D representations in acoustic modeling has been underexplored. Given their direct impact on sound wave reflections and… ▽ More

    Submitted 17 March, 2025; originally announced March 2025.

  34. arXiv:2503.11026  [pdf, other

    eess.AS cs.CV cs.LG cs.MM

    MAVFlow: Preserving Paralinguistic Elements with Conditional Flow Matching for Zero-Shot AV2AV Multilingual Translation

    Authors: Sungwoo Cho, Jeongsoo Choi, Sungnyun Kim, Se-Young Yun

    Abstract: Despite recent advances in text-to-speech (TTS) models, audio-visual to audio-visual (AV2AV) translation still faces a critical challenge: maintaining speaker consistency between the original and translated vocal and facial features. To address this issue, we propose a conditional flow matching (CFM) zero-shot audio-visual renderer that utilizes strong dual guidance from both audio and visual moda… ▽ More

    Submitted 13 March, 2025; originally announced March 2025.

    Comments: Preliminary work

  35. arXiv:2503.10349  [pdf, ps, other

    cs.RO eess.SP

    Autonomous Robotic Radio Source Localization via a Novel Gaussian Mixture Filtering Approach

    Authors: Sukkeun Kim, Sangwoo Moon, Ivan Petrunin, Hyo-Sang Shin, Shehryar Khattak

    Abstract: This study proposes a new Gaussian Mixture Filter (GMF) to improve the estimation performance for the autonomous robotic radio signal source search and localization problem in unknown environments. The proposed filter is first tested with a benchmark numerical problem to validate the performance with other state-of-the-practice approaches such as Particle Filter (PF) and Particle Gaussian Mixture… ▽ More

    Submitted 13 June, 2025; v1 submitted 13 March, 2025; originally announced March 2025.

  36. arXiv:2503.04966  [pdf, other

    eess.IV cs.AI cs.CV

    Prediction of Frozen Region Growth in Kidney Cryoablation Intervention Using a 3D Flow-Matching Model

    Authors: Siyeop Yoon, Yujin Oh, Matthew Tivnan, Sifan Song, Pengfei Jin, Sekeun Kim, Hyun Jin Cho, Dufan Wu, Raul Uppot, Quanzheng Li

    Abstract: This study presents a 3D flow-matching model designed to predict the progression of the frozen region (iceball) during kidney cryoablation. Precise intraoperative guidance is critical in cryoablation to ensure complete tumor eradication while preserving adjacent healthy tissue. However, conventional methods, typically based on physics driven or diffusion based simulations, are computationally dema… ▽ More

    Submitted 11 March, 2025; v1 submitted 6 March, 2025; originally announced March 2025.

    Comments: MICCAI 2025 submitted version (author list included)

  37. arXiv:2503.01075  [pdf, other

    eess.IV cs.AI cs.CV

    Tackling Hallucination from Conditional Models for Medical Image Reconstruction with DynamicDPS

    Authors: Seunghoi Kim, Henry F. J. Tregidgo, Matteo Figini, Chen Jin, Sarang Joshi, Daniel C. Alexander

    Abstract: Hallucinations are spurious structures not present in the ground truth, posing a critical challenge in medical image reconstruction, especially for data-driven conditional models. We hypothesize that combining an unconditional diffusion model with data consistency, trained on a diverse dataset, can reduce these hallucinations. Based on this, we propose DynamicDPS, a diffusion-based framework that… ▽ More

    Submitted 2 March, 2025; originally announced March 2025.

  38. arXiv:2502.13983  [pdf, other

    eess.AS cs.AI

    Gesture-Aware Zero-Shot Speech Recognition for Patients with Language Disorders

    Authors: Seungbae Kim, Daeun Lee, Brielle Stark, Jinyoung Han

    Abstract: Individuals with language disorders often face significant communication challenges due to their limited language processing and comprehension abilities, which also affect their interactions with voice-assisted systems that mostly rely on Automatic Speech Recognition (ASR). Despite advancements in ASR that address disfluencies, there has been little attention on integrating non-verbal communicatio… ▽ More

    Submitted 18 February, 2025; originally announced February 2025.

  39. Set-Based Position Ambiguity Reduction Method for Zonotope Shadow Matching in Urban Areas Using Estimated Multipath Errors

    Authors: Sanghyun Kim, Jiwon Seo

    Abstract: In urban areas, the quality of global navigation satellite system (GNSS) signals deteriorates, leading to reduced positioning accuracy. To address this issue, 3D-mapping-aided (3DMA) techniques, such as shadow matching and zonotope shadow matching (ZSM), have been proposed. However, these methods can introduce a problem known as multi-modal position ambiguity, making it challenging to select the e… ▽ More

    Submitted 16 February, 2025; originally announced February 2025.

    Comments: Submitted to ION ITM 2025

  40. arXiv:2502.10447  [pdf, other

    eess.AS cs.CL cs.LG

    MoHAVE: Mixture of Hierarchical Audio-Visual Experts for Robust Speech Recognition

    Authors: Sungnyun Kim, Kangwook Jang, Sangmin Bae, Sungwoo Cho, Se-Young Yun

    Abstract: Audio-visual speech recognition (AVSR) has become critical for enhancing speech recognition in noisy environments by integrating both auditory and visual modalities. However, existing AVSR systems struggle to scale up without compromising computational efficiency. In this study, we introduce MoHAVE (Mixture of Hierarchical Audio-Visual Experts), a novel robust AVSR framework designed to address th… ▽ More

    Submitted 21 May, 2025; v1 submitted 11 February, 2025; originally announced February 2025.

    Comments: Accepted to ICML 2025

  41. arXiv:2502.09929  [pdf, other

    eess.SP

    Low-Complexity On-Grid Channel Estimation for Partially-Connected Hybrid XL-MIMO

    Authors: Sunho Kim, Wan Choi

    Abstract: This paper addresses the challenge of channel estimation in extremely large-scale multiple-input multiple-output (XL-MIMO) systems, pivotal for the advancement of 6G communications. XL-MIMO systems, characterized by their vast antenna arrays, necessitate accurate channel state information (CSI) to leverage high spatial multiplexing and beamforming gains. However, conventional channel estimation me… ▽ More

    Submitted 14 February, 2025; originally announced February 2025.

  42. arXiv:2502.07208  [pdf

    eess.AS cs.SD

    Towards Understanding of Frequency Dependence on Sound Event Detection

    Authors: Hyeonuk Nam, Seong-Hu Kim, Deokki Min, Byeong-Yun Ko, Yong-Hwa Park

    Abstract: In this work, various analysis methods are conducted on frequency-dependent methods on SED to further delve into their detailed characteristics and behaviors on SED. While SED has been rapidly advancing through the adoption of various deep learning techniques from other pattern recognition fields, these techniques are often not suitable for SED. To address this issue, two frequency-dependent SED m… ▽ More

    Submitted 10 February, 2025; originally announced February 2025.

  43. arXiv:2502.05330  [pdf, other

    eess.IV cs.AI cs.CV cs.LG

    Multi-Class Segmentation of Aortic Branches and Zones in Computed Tomography Angiography: The AortaSeg24 Challenge

    Authors: Muhammad Imran, Jonathan R. Krebs, Vishal Balaji Sivaraman, Teng Zhang, Amarjeet Kumar, Walker R. Ueland, Michael J. Fassler, Jinlong Huang, Xiao Sun, Lisheng Wang, Pengcheng Shi, Maximilian Rokuss, Michael Baumgartner, Yannick Kirchhof, Klaus H. Maier-Hein, Fabian Isensee, Shuolin Liu, Bing Han, Bong Thanh Nguyen, Dong-jin Shin, Park Ji-Woo, Mathew Choi, Kwang-Hyun Uhm, Sung-Jea Ko, Chanwoong Lee , et al. (38 additional authors not shown)

    Abstract: Multi-class segmentation of the aorta in computed tomography angiography (CTA) scans is essential for diagnosing and planning complex endovascular treatments for patients with aortic dissections. However, existing methods reduce aortic segmentation to a binary problem, limiting their ability to measure diameters across different branches and zones. Furthermore, no open-source dataset is currently… ▽ More

    Submitted 7 February, 2025; originally announced February 2025.

  44. arXiv:2502.03505  [pdf, other

    eess.IV cs.AI cs.LG

    Enhancing Free-hand 3D Photoacoustic and Ultrasound Reconstruction using Deep Learning

    Authors: SiYeoul Lee, SeonHo Kim, Minkyung Seo, SeongKyu Park, Salehin Imrus, Kambaluru Ashok, DongEon Lee, Chunsu Park, SeonYeong Lee, Jiye Kim, Jae-Heung Yoo, MinWoo Kim

    Abstract: This study introduces a motion-based learning network with a global-local self-attention module (MoGLo-Net) to enhance 3D reconstruction in handheld photoacoustic and ultrasound (PAUS) imaging. Standard PAUS imaging is often limited by a narrow field of view and the inability to effectively visualize complex 3D structures. The 3D freehand technique, which aligns sequential 2D images for 3D reconst… ▽ More

    Submitted 5 February, 2025; originally announced February 2025.

  45. arXiv:2502.00619  [pdf, other

    eess.IV cs.AI cs.CV

    Distribution-aware Fairness Learning in Medical Image Segmentation From A Control-Theoretic Perspective

    Authors: Yujin Oh, Pengfei Jin, Sangjoon Park, Sekeun Kim, Siyeop Yoon, Kyungsang Kim, Jin Sung Kim, Xiang Li, Quanzheng Li

    Abstract: Ensuring fairness in medical image segmentation is critical due to biases in imbalanced clinical data acquisition caused by demographic attributes (e.g., age, sex, race) and clinical factors (e.g., disease severity). To address these challenges, we introduce Distribution-aware Mixture of Experts (dMoE), inspired by optimal control theory. We provide a comprehensive analysis of its underlying mecha… ▽ More

    Submitted 27 May, 2025; v1 submitted 1 February, 2025; originally announced February 2025.

    Comments: ICML 2025 spotlight, see https://openreview.net/forum?id=BUONdewsBa

  46. arXiv:2501.18921  [pdf, other

    eess.IV cs.CV

    Full-scale Representation Guided Network for Retinal Vessel Segmentation

    Authors: Sunyong Seo, Huisu Yoon, Semin Kim, Jongha Lee

    Abstract: The U-Net architecture and its variants have remained state-of-the-art (SOTA) for retinal vessel segmentation over the past decade. In this study, we introduce a Full Scale Guided Network (FSG-Net), where the feature representation network with modernized convolution blocks extracts full-scale information and the guided convolution block refines that information. Attention-guided filter is introdu… ▽ More

    Submitted 31 January, 2025; originally announced January 2025.

    Comments: 10 pages, 7 figures

  47. arXiv:2501.14790  [pdf, other

    q-bio.NC cs.AI cs.SD eess.AS

    Towards Dynamic Neural Communication and Speech Neuroprosthesis Based on Viseme Decoding

    Authors: Ji-Ha Park, Seo-Hyun Lee, Soowon Kim, Seong-Whan Lee

    Abstract: Decoding text, speech, or images from human neural signals holds promising potential both as neuroprosthesis for patients and as innovative communication tools for general users. Although neural signals contain various information on speech intentions, movements, and phonetic details, generating informative outputs from them remains challenging, with mostly focusing on decoding short intentions or… ▽ More

    Submitted 8 January, 2025; originally announced January 2025.

    Comments: 5 pages, 5 figures, 1 table, Name of Conference: 2025 IEEE International Conference on Acoustics, Speech, and Signal Processing

  48. arXiv:2501.14171  [pdf, other

    eess.IV cs.CV

    Fully Guided Neural Schrödinger bridge for Brain MR image synthesis

    Authors: Hanyeol Yang, Sunggyu Kim, Yongseon Yoo, Jong-min Lee

    Abstract: Multi-modal brain MRI provides essential complementary information for clinical diagnosis. However, acquiring all modalities is often challenging due to time and cost constraints. To address this, various methods have been proposed to generate missing modalities from available ones. Traditional approaches can be broadly categorized into two main types: paired and unpaired methods. While paired met… ▽ More

    Submitted 23 January, 2025; originally announced January 2025.

    Comments: 9 pages,4 figures

  49. arXiv:2501.11225  [pdf, other

    cond-mat.mtrl-sci cs.CV eess.IV

    CNN-based TEM image denoising from first principles

    Authors: Jinwoong Chae, Sungwook Hong, Sungkyu Kim, Sungroh Yoon, Gunn Kim

    Abstract: Transmission electron microscope (TEM) images are often corrupted by noise, hindering their interpretation. To address this issue, we propose a deep learning-based approach using simulated images. Using density functional theory calculations with a set of pseudo-atomic orbital basis sets, we generate highly accurate ground truth images. We introduce four types of noise into these simulations to cr… ▽ More

    Submitted 19 January, 2025; originally announced January 2025.

    Comments: 10 pages and 4 figures

  50. arXiv:2501.04926  [pdf, other

    eess.AS cs.AI cs.CL cs.SD

    FLowHigh: Towards Efficient and High-Quality Audio Super-Resolution with Single-Step Flow Matching

    Authors: Jun-Hak Yun, Seung-Bin Kim, Seong-Whan Lee

    Abstract: Audio super-resolution is challenging owing to its ill-posed nature. Recently, the application of diffusion models in audio super-resolution has shown promising results in alleviating this challenge. However, diffusion-based models have limitations, primarily the necessity for numerous sampling steps, which causes significantly increased latency when synthesizing high-quality audio samples. In thi… ▽ More

    Submitted 8 January, 2025; originally announced January 2025.

    Comments: Accepted by ICASSP 2025