Skip to main content

Showing 1–50 of 292 results for author: Choi, J

Searching in archive eess. Search in all archives.
.
  1. arXiv:2507.03246  [pdf, ps, other

    eess.SP

    Enhancing Satellite Quantum Key Distribution with Dual Band Reconfigurable Intelligent Surfaces

    Authors: Muhammad Khalil, Ke Wang, Jinho Choi

    Abstract: This paper presents a novel system architecture for hybrid satellite communications, integrating quantum key distribution (QKD) and classical radio frequency (RF) data transmission using a dual-band reconfigurable intelligent surface (RIS). The motivation is to address the growing need for global, secure, and reliable communications by leveraging the security of quantum optical links and the robus… ▽ More

    Submitted 3 July, 2025; originally announced July 2025.

    Comments: 11

  2. arXiv:2506.19451  [pdf, ps, other

    eess.SP cs.LG

    Low-Complexity Semantic Packet Aggregation for Token Communication via Lookahead Search

    Authors: Seunghun Lee, Jihong Park, Jinho Choi, Hyuncheol Park

    Abstract: Tokens are fundamental processing units of generative AI (GenAI) and large language models (LLMs), and token communication (TC) is essential for enabling remote AI-generate content (AIGC) and wireless LLM applications. Unlike traditional bits, each of which is independently treated, the semantics of each token depends on its surrounding context tokens. This inter-token dependency makes TC vulnerab… ▽ More

    Submitted 24 June, 2025; originally announced June 2025.

  3. arXiv:2506.15745  [pdf, ps, other

    eess.IV cs.LG

    InfiniPot-V: Memory-Constrained KV Cache Compression for Streaming Video Understanding

    Authors: Minsoo Kim, Kyuhong Shim, Jungwook Choi, Simyung Chang

    Abstract: Modern multimodal large language models (MLLMs) can reason over hour-long video, yet their key-value (KV) cache grows linearly with time--quickly exceeding the fixed memory of phones, AR glasses, and edge robots. Prior compression schemes either assume the whole video and user query are available offline or must first build the full cache, so memory still scales with stream length. InfiniPot-V is… ▽ More

    Submitted 17 June, 2025; originally announced June 2025.

  4. arXiv:2506.14657  [pdf, ps, other

    eess.AS cs.AR

    ASAP-FE: Energy-Efficient Feature Extraction Enabling Multi-Channel Keyword Spotting on Edge Processors

    Authors: Jongin Choi, Jina Park, Woojoo Lee, Jae-Jin Lee, Massoud Pedram

    Abstract: Multi-channel keyword spotting (KWS) has become crucial for voice-based applications in edge environments. However, its substantial computational and energy requirements pose significant challenges. We introduce ASAP-FE (Agile Sparsity-Aware Parallelized-Feature Extractor), a hardware-oriented front-end designed to address these challenges. Our framework incorporates three key innovations: (1) Hal… ▽ More

    Submitted 17 June, 2025; originally announced June 2025.

    Comments: 7 pages, 11 figures, ISLPED 2025

  5. arXiv:2506.00832  [pdf, ps, other

    cs.SD cs.AI eess.AS

    Counterfactual Activation Editing for Post-hoc Prosody and Mispronunciation Correction in TTS Models

    Authors: Kyowoon Lee, Artyom Stitsyuk, Gunu Jho, Inchul Hwang, Jaesik Choi

    Abstract: Recent advances in Text-to-Speech (TTS) have significantly improved speech naturalness, increasing the demand for precise prosody control and mispronunciation correction. Existing approaches for prosody manipulation often depend on specialized modules or additional training, limiting their capacity for post-hoc adjustments. Similarly, traditional mispronunciation correction relies on grapheme-to-p… ▽ More

    Submitted 1 June, 2025; originally announced June 2025.

    Comments: Accepted at Interspeech 2025

  6. arXiv:2505.20899  [pdf, ps, other

    cs.CL cs.SD eess.AS

    Dub-S2ST: Textless Speech-to-Speech Translation for Seamless Dubbing

    Authors: Jeongsoo Choi, Jaehun Kim, Joon Son Chung

    Abstract: This paper introduces a cross-lingual dubbing system that translates speech from one language to another while preserving key characteristics such as duration, speaker identity, and speaking speed. Despite the strong translation quality of existing speech translation approaches, they often overlook the transfer of speech patterns, leading to mismatches with source speech and limiting their suitabi… ▽ More

    Submitted 27 May, 2025; originally announced May 2025.

  7. arXiv:2505.20794  [pdf, ps, other

    cs.SD cs.AI eess.AS

    VibE-SVC: Vibrato Extraction with High-frequency F0 Contour for Singing Voice Conversion

    Authors: Joon-Seung Choi, Dong-Min Byun, Hyung-Seok Oh, Seong-Whan Lee

    Abstract: Controlling singing style is crucial for achieving an expressive and natural singing voice. Among the various style factors, vibrato plays a key role in conveying emotions and enhancing musical depth. However, modeling vibrato remains challenging due to its dynamic nature, making it difficult to control in singing voice conversion. To address this, we propose VibESVC, a controllable singing voice… ▽ More

    Submitted 27 May, 2025; originally announced May 2025.

    Comments: Proceedings of Interspeech 2025

  8. arXiv:2505.19595  [pdf, ps, other

    eess.AS cs.SD

    Accelerating Diffusion-based Text-to-Speech Model Training with Dual Modality Alignment

    Authors: Jeongsoo Choi, Zhikang Niu, Ji-Hoon Kim, Chunhui Wang, Joon Son Chung, Xie Chen

    Abstract: The goal of this paper is to optimize the training process of diffusion-based text-to-speech models. While recent studies have achieved remarkable advancements, their training demands substantial time and computational costs, largely due to the implicit guidance of diffusion models in learning complex intermediate representations. To address this, we propose A-DMA, an effective strategy for Accele… ▽ More

    Submitted 30 May, 2025; v1 submitted 26 May, 2025; originally announced May 2025.

    Comments: Interspeech 2025

  9. arXiv:2505.11788  [pdf, ps, other

    cs.DC cs.IT cs.LG cs.NI eess.SP

    Communication-Efficient Hybrid Language Model via Uncertainty-Aware Opportunistic and Compressed Transmission

    Authors: Seungeun Oh, Jinhyuk Kim, Jihong Park, Seung-Woo Ko, Jinho Choi, Tony Q. S. Quek, Seong-Lyun Kim

    Abstract: To support emerging language-based applications using dispersed and heterogeneous computing resources, the hybrid language model (HLM) offers a promising architecture, where an on-device small language model (SLM) generates draft tokens that are validated and corrected by a remote large language model (LLM). However, the original HLM suffers from substantial communication overhead, as the LLM requ… ▽ More

    Submitted 16 May, 2025; originally announced May 2025.

    Comments: 14 pages, 10 figures, 2 tables; This work has been submitted to the IEEE for possible publication

  10. arXiv:2505.07547  [pdf, other

    eess.SP

    Space-Time Beamforming for LEO Satellite Communications

    Authors: Jungbin Yim, Jinseok Choi, Jeonghun Park, Ian P. Roberts, Namyoon Lee

    Abstract: Inter-beam interference poses a significant challenge in low Earth orbit (LEO) satellite communications due to dense satellite constellations. To address this issue, we introduce spacetime beamforming, a novel paradigm that leverages the spacetime channel vector, uniquely determined by the angle of arrival (AoA) and relative Doppler shift, to optimize beamforming between a moving satellite transmi… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

    Comments: 13 pages, 10 figures

  11. arXiv:2505.02293  [pdf, ps, other

    cs.RO cs.MA eess.SY

    Resolving Conflicting Constraints in Multi-Agent Reinforcement Learning with Layered Safety

    Authors: Jason J. Choi, Jasmine Jerry Aloor, Jingqi Li, Maria G. Mendoza, Hamsa Balakrishnan, Claire J. Tomlin

    Abstract: Preventing collisions in multi-robot navigation is crucial for deployment. This requirement hinders the use of learning-based approaches, such as multi-agent reinforcement learning (MARL), on their own due to their lack of safety guarantees. Traditional control methods, such as reachability and control barrier functions, can provide rigorous safety guarantees when interactions are limited only to… ▽ More

    Submitted 4 May, 2025; originally announced May 2025.

    Comments: Accepted for publication at the 2025 Robotics: Science and Systems Conference. 18 pages, 8 figures

  12. arXiv:2504.19591  [pdf, ps, other

    eess.SP

    Semantic Packet Aggregation for Token Communication via Genetic Beam Search

    Authors: Seunghun Lee, Jihong Park, Jinho Choi, Hyuncheol Park

    Abstract: Token communication (TC) is poised to play a pivotal role in emerging language-driven applications such as AI-generated content (AIGC) and wireless language models (LLMs). However, token loss caused by channel noise can severely degrade task performance. To address this, in this article, we focus on the problem of semantics-aware packetization and develop a novel algorithm, termed semantic packet… ▽ More

    Submitted 28 April, 2025; originally announced April 2025.

  13. arXiv:2504.17080  [pdf, other

    cs.RO eess.SY

    Geometric Formulation of Unified Force-Impedance Control on SE(3) for Robotic Manipulators

    Authors: Joohwan Seo, Nikhil Potu Surya Prakash, Soomi Lee, Arvind Kruthiventy, Megan Teng, Jongeun Choi, Roberto Horowitz

    Abstract: In this paper, we present an impedance control framework on the SE(3) manifold, which enables force tracking while guaranteeing passivity. Building upon the unified force-impedance control (UFIC) and our previous work on geometric impedance control (GIC), we develop the geometric unified force impedance control (GUFIC) to account for the SE(3) manifold structure in the controller formulation using… ▽ More

    Submitted 23 April, 2025; originally announced April 2025.

    Comments: Submitted to Control Decision Conference (CDC) 2025

  14. arXiv:2504.12870  [pdf, other

    eess.AS

    CST-former: Multidimensional Attention-based Transformer for Sound Event Localization and Detection in Real Scenes

    Authors: Yusun Shul, Dayun Choi, Jung-Woo Choi

    Abstract: Sound event localization and detection (SELD) is a task for the classification of sound events and the identification of direction of arrival (DoA) utilizing multichannel acoustic signals. For effective classification and localization, a channel-spectro-temporal transformer (CST-former) was suggested. CST-former employs multidimensional attention mechanisms across the spatial, spectral, and tempor… ▽ More

    Submitted 17 April, 2025; originally announced April 2025.

    Comments: 12 pages, 10 figures, Submitted to IEEE/ACM Transactions on Audio, Speech, and Language Processing

  15. Trainable Adaptive Score Normalization for Automatic Speaker Verification

    Authors: Jeong-Hwan Choi, Ju-Seok Seong, Ye-Rin Jeoung, Joon-Hyuk Chang

    Abstract: Adaptive S-norm (AS-norm) calibrates automatic speaker verification (ASV) scores by normalizing them utilize the scores of impostors which are similar to the input speaker. However, AS-norm does not involve any learning process, limiting its ability to provide appropriate regularization strength for various evaluation utterances. To address this limitation, we propose a trainable AS-norm (TAS-norm… ▽ More

    Submitted 6 April, 2025; originally announced April 2025.

    Comments: Accepted at ICASSP'25

  16. arXiv:2504.03233  [pdf, other

    eess.SY

    Data-Driven Hamiltonian for Direct Construction of Safe Set from Trajectory Data

    Authors: Jason J. Choi, Christopher A. Strong, Koushil Sreenath, Namhoon Cho, Claire J. Tomlin

    Abstract: In continuous-time optimal control, evaluating the Hamiltonian requires solving a constrained optimization problem using the system's dynamics model. Hamilton-Jacobi reachability analysis for safety verification has demonstrated practical utility only when efficient evaluation of the Hamiltonian over a large state-time grid is possible. In this study, we introduce the concept of a data-driven Hami… ▽ More

    Submitted 4 April, 2025; originally announced April 2025.

    Comments: This is the extended version of the article submitted to IEEE CDC 2025. This work has been submitted to the IEEE for possible publication

  17. arXiv:2504.02386  [pdf, other

    cs.CV eess.AS

    VoiceCraft-Dub: Automated Video Dubbing with Neural Codec Language Models

    Authors: Kim Sung-Bin, Jeongsoo Choi, Puyuan Peng, Joon Son Chung, Tae-Hyun Oh, David Harwath

    Abstract: We present VoiceCraft-Dub, a novel approach for automated video dubbing that synthesizes high-quality speech from text and facial cues. This task has broad applications in filmmaking, multimedia creation, and assisting voice-impaired individuals. Building on the success of Neural Codec Language Models (NCLMs) for speech synthesis, our method extends their capabilities by incorporating video featur… ▽ More

    Submitted 3 April, 2025; originally announced April 2025.

    Comments: https://voicecraft-dub.github.io/

  18. Robust Transmission Design for Active RIS-Aided Systems

    Authors: Jinho Yang, Hyeongtaek Lee, Junil Choi

    Abstract: Different from conventional passive reconfigurable intelligent surfaces (RISs), incident signals and thermal noise can be amplified at active RISs. By exploiting the amplifying capability of active RISs, noticeable performance improvement can be expected when precise channel state information (CSI) is available. Since obtaining perfect CSI related to an RIS is difficult in practice, a robust trans… ▽ More

    Submitted 31 March, 2025; originally announced April 2025.

    Comments: 6 pages, 4 figures, accepted to IEEE Transactions on Vehicular Technology

  19. arXiv:2503.23734  [pdf, ps, other

    eess.SP

    Semantic Packet Aggregation and Repeated Transmission for Text-to-Image Generation

    Authors: Seunghun Lee, Jihong Park, Jinho Choi, Hyuncheol Park

    Abstract: Text-based communication is expected to be prevalent in 6G applications such as wireless AI-generated content (AIGC). Motivated by this, this paper addresses the challenges of transmitting text prompts over erasure channels for a text-to-image AIGC task by developing the semantic segmentation and repeated transmission (SMART) algorithm. SMART groups words in text prompts into packets, prioritizing… ▽ More

    Submitted 31 March, 2025; originally announced March 2025.

  20. arXiv:2503.22143  [pdf

    eess.SP cs.AI cs.CV cs.LG

    A Self-Supervised Learning of a Foundation Model for Analog Layout Design Automation

    Authors: Sungyu Jeong, Won Joon Choi, Junung Choi, Anik Biswas, Byungsub Kim

    Abstract: We propose a UNet-based foundation model and its self-supervised learning method to address two key challenges: 1) lack of qualified annotated analog layout data, and 2) excessive variety in analog layout design tasks. For self-supervised learning, we propose random patch sampling and random masking techniques automatically to obtain enough training data from a small unannotated layout dataset. Th… ▽ More

    Submitted 28 March, 2025; originally announced March 2025.

    Comments: 8 pages, 11 figures

  21. arXiv:2503.16956  [pdf, other

    eess.AS cs.AI cs.CV cs.SD

    From Faces to Voices: Learning Hierarchical Representations for High-quality Video-to-Speech

    Authors: Ji-Hoon Kim, Jeongsoo Choi, Jaehun Kim, Chaeyoung Jung, Joon Son Chung

    Abstract: The objective of this study is to generate high-quality speech from silent talking face videos, a task also known as video-to-speech synthesis. A significant challenge in video-to-speech synthesis lies in the substantial modality gap between silent video and multi-faceted speech. In this paper, we propose a novel video-to-speech system that effectively bridges this modality gap, significantly enha… ▽ More

    Submitted 21 March, 2025; originally announced March 2025.

    Comments: CVPR 2025, demo page: https://mm.kaist.ac.kr/projects/faces2voices/

  22. arXiv:2503.12891  [pdf

    eess.SY

    PD-Skygroundhook Controller for Semi-Active Suspension System Using Magnetorheological Fluid Dampers

    Authors: Hansol Lim, Jee Won Lee, Seung-Bok Choi, Jongseong Brad Choi

    Abstract: This paper presents a Proportional-Derivative (PD) Skygroundhook controller for magnetorheological (MR) dampers in semi-active suspensions. Traditional skyhook, Groundhook, and hybrid Skygroundhook controllers are well-known for their ability to reduce body and wheel vibrations; however, each approach has limitations in handling a broad frequency spectrum and often relies on abrupt switching. By a… ▽ More

    Submitted 17 March, 2025; originally announced March 2025.

    Comments: This work has been submitted to the 2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) for possible publication

  23. arXiv:2503.11026  [pdf, other

    eess.AS cs.CV cs.LG cs.MM

    MAVFlow: Preserving Paralinguistic Elements with Conditional Flow Matching for Zero-Shot AV2AV Multilingual Translation

    Authors: Sungwoo Cho, Jeongsoo Choi, Sungnyun Kim, Se-Young Yun

    Abstract: Despite recent advances in text-to-speech (TTS) models, audio-visual to audio-visual (AV2AV) translation still faces a critical challenge: maintaining speaker consistency between the original and translated vocal and facial features. To address this issue, we propose a conditional flow matching (CFM) zero-shot audio-visual renderer that utilizes strong dual guidance from both audio and visual moda… ▽ More

    Submitted 13 March, 2025; originally announced March 2025.

    Comments: Preliminary work

  24. arXiv:2503.10022  [pdf, other

    eess.SP

    A New Interpretation of the Time-Interleaved ADC Mismatch Problem: A Tracking-Based Hybrid Calibration Approach

    Authors: Jiwon Sung, Jinseok Choi

    Abstract: Time-interleaved ADCs (TI-ADCs) achieve high sampling rates by interleaving multiple sub-ADCs in parallel. Mismatch errors between the sub-ADCs, however, can significantly degrade the signal quality, which is a main performance bottleneck. This paper presents a hybrid calibration approach by interpreting the mismatch problem as a tracking problem, and uses the extended Kalman filter for online est… ▽ More

    Submitted 13 March, 2025; originally announced March 2025.

    Comments: 5 pages

  25. arXiv:2503.09829  [pdf, other

    cs.RO cs.LG eess.SY

    SE(3)-Equivariant Robot Learning and Control: A Tutorial Survey

    Authors: Joohwan Seo, Soochul Yoo, Junwoo Chang, Hyunseok An, Hyunwoo Ryu, Soomi Lee, Arvind Kruthiventy, Jongeun Choi, Roberto Horowitz

    Abstract: Recent advances in deep learning and Transformers have driven major breakthroughs in robotics by employing techniques such as imitation learning, reinforcement learning, and LLM-based multimodal perception and decision-making. However, conventional deep learning and Transformer models often struggle to process data with inherent symmetries and invariances, typically relying on large datasets or ex… ▽ More

    Submitted 23 April, 2025; v1 submitted 12 March, 2025; originally announced March 2025.

    Comments: Accepted to International Journcal of Control, Automation and Systems (IJCAS)

  26. arXiv:2502.18196  [pdf, ps, other

    cs.IT eess.SP

    Machine Learning for Future Wireless Communications: Channel Prediction Perspectives

    Authors: Hwanjin Kim, Junil Choi, David J. Love

    Abstract: Precise channel state knowledge is crucial in future wireless communication systems, which drives the need for accurate channel prediction without additional pilot overhead. While machine-learning (ML) methods for channel prediction show potential, existing approaches have limitations in their capability to adapt to environmental changes due to their extensive training requirements. In this paper,… ▽ More

    Submitted 25 February, 2025; originally announced February 2025.

    Comments: 7 pages, 3 figures, 2 tables, submitted to IEEE Communications Magazine

  27. arXiv:2502.03300  [pdf, other

    eess.SP

    ScNeuGM: Scalable Neural Graph Modeling for Coloring-Based Contention and Interference Management in Wi-Fi 7

    Authors: Zhouyou Gu, Jihong Park, Jinho Choi

    Abstract: Carrier-sense multiple access with collision avoidance in Wi-Fi often leads to contention and interference, thereby increasing packet losses. These challenges have traditionally been modeled as a graph, with stations (STAs) represented as vertices and contention or interference as edges. Graph coloring assigns orthogonal transmission slots to STAs, managing contention and interference, e.g., using… ▽ More

    Submitted 5 February, 2025; originally announced February 2025.

    Comments: This work has been submitted to an IEEE journal for possible publication

  28. arXiv:2502.03117  [pdf, ps, other

    cs.IT eess.SP

    Meta-Learning-Based People Counting and Localization Models Employing CSI from Commodity WiFi NICs

    Authors: Jihoon Cha, Hwanjin Kim, Junil Choi

    Abstract: In this paper, we consider people counting and localization systems exploiting channel state information (CSI) measured from commodity WiFi network interface cards (NICs). While CSI has useful information of amplitude and phase to describe signal propagation in a measurement environment of interest, CSI measurement suffers from offsets due to various uncertainties. Moreover, an uncontrollable exte… ▽ More

    Submitted 5 February, 2025; originally announced February 2025.

    Comments: 13 pages, 15 figures, submitted to IEEE Internet of Things Journal (IoTJ)

  29. arXiv:2501.11926  [pdf, other

    cs.IT eess.SP

    Multi-Modal Variable-Rate CSI Reconstruction for FDD Massive MIMO Systems

    Authors: Yunseo Nam, Jiwook Choi

    Abstract: In frequency division duplex (FDD) systems, acquiring channel state information (CSI) at the base station (BS) traditionally relies on limited feedback from mobile terminals (MTs). However, the accuracy of channel reconstruction from feedback CSI is inherently constrained by the rate-distortion trade-off. To overcome this limitation, we propose a multi-modal channel reconstruction framework that l… ▽ More

    Submitted 7 March, 2025; v1 submitted 21 January, 2025; originally announced January 2025.

  30. arXiv:2501.11307  [pdf, other

    eess.SP

    SIG-SDP: Sparse Interference Graph-Aided Semidefinite Programming for Large-Scale Wireless Time-Sensitive Networking

    Authors: Zhouyou Gu, Jihong Park, Branka Vucetic, Jinho Choi

    Abstract: Wireless time-sensitive networking (WTSN) is essential for Industrial Internet of Things. We address the problem of minimizing time slots needed for WTSN transmissions while ensuring reliability subject to interference constraints -- an NP-hard task. Existing semidefinite programming (SDP) methods can relax and solve the problem but suffer from high polynomial complexity. We propose a sparse inter… ▽ More

    Submitted 20 January, 2025; originally announced January 2025.

    Comments: This work has been submitted to an IEEE journal for possible publication

  31. arXiv:2501.02273  [pdf, other

    eess.SP cs.IT

    Blind Training for Channel-Adaptive Digital Semantic Communications

    Authors: Yongjeong Oh, Joohyuk Park, Jinho Choi, Jihong Park, Yo-Seb Jeon

    Abstract: Semantic encoders and decoders for digital semantic communication (SC) often struggle to adapt to variations in unpredictable channel environments and diverse system designs. To address these challenges, this paper proposes a novel framework for training semantic encoders and decoders to enable channel-adaptive digital SC. The core idea is to use binary symmetric channel (BSC) as a universal repre… ▽ More

    Submitted 19 March, 2025; v1 submitted 4 January, 2025; originally announced January 2025.

  32. arXiv:2501.01001  [pdf, other

    eess.SP

    Scalable Beamforming Design for Multi-RIS-Aided MU-MIMO Systems with Imperfect CSIT

    Authors: Mintaek Oh, Jinseok Choi

    Abstract: A reconfigurable intelligent surface (RIS) has emerged as a promising solution for enhancing the capabilities of wireless communications. This paper presents a scalable beamforming design for maximizing the spectral efficiency (SE) of multi-RIS-aided communications through joint optimization of the precoder and RIS phase shifts in multi-user multiple-input multiple-output (MU-MIMO) systems under i… ▽ More

    Submitted 1 January, 2025; originally announced January 2025.

    Comments: 13 pages

  33. arXiv:2412.19110  [pdf, other

    cs.IT eess.SP

    A Selective Secure Precoding Framework for MU-MIMO Rate-Splitting Multiple Access Networks Under Limited CSIT

    Authors: Sangmin Lee, Seokjun Park, Jeonghun Park, Jinseok Choi

    Abstract: In this paper, we propose a robust and adaptable secure precoding framework designed to encapsulate a intricate scenario where legitimate users have different information security: secure private or normal public information. Leveraging rate-splitting multiple access (RSMA), we formulate the sum secrecy spectral efficiency (SE) maximization problem in downlink multi-user multiple-input multiple-ou… ▽ More

    Submitted 26 December, 2024; originally announced December 2024.

    Comments: 13 pages, 10 figures

  34. arXiv:2412.12590  [pdf, ps, other

    cs.IT eess.SP

    Integrated Sensing and Communications in Downlink FDD MIMO without CSI Feedback

    Authors: Namhyun Kim, Juntaek Han, Jinseok Choi, Ahmed Alkhateeb, Chan-Byoung Chae, Jeonghun Park

    Abstract: In this paper, we propose a precoding framework for frequency division duplex (FDD) integrated sensing and communication (ISAC) systems with multiple-input multiple-output (MIMO). Specifically, we aim to maximize ergodic sum spectral efficiency (SE) while satisfying a sensing beam pattern constraint defined by the mean squared error (MSE). Our method reconstructs downlink (DL) channel state inform… ▽ More

    Submitted 10 June, 2025; v1 submitted 17 December, 2024; originally announced December 2024.

    Comments: submitted to possible IEEE publication

  35. arXiv:2411.19486  [pdf, ps, other

    cs.CV cs.SD eess.AS

    V2SFlow: Video-to-Speech Generation with Speech Decomposition and Rectified Flow

    Authors: Jeongsoo Choi, Ji-Hoon Kim, Jinyu Li, Joon Son Chung, Shujie Liu

    Abstract: In this paper, we introduce V2SFlow, a novel Video-to-Speech (V2S) framework designed to generate natural and intelligible speech directly from silent talking face videos. While recent V2S systems have shown promising results on constrained datasets with limited speakers and vocabularies, their performance often degrades on real-world, unconstrained datasets due to the inherent variability and com… ▽ More

    Submitted 30 May, 2025; v1 submitted 29 November, 2024; originally announced November 2024.

    Comments: ICASSP 2025

  36. arXiv:2411.12888  [pdf, other

    cs.IT eess.SP

    An Experimental Multi-Band Channel Characterization in the Upper Mid-Band

    Authors: Roberto Bomfin, Ahmad Bazzi, Hao Guo, Hyeongtaek Lee, Marco Mezzavilla, Sundeep Rangan, Junil Choi, Marwa Chafii

    Abstract: The following paper provides a multi-band channel measurement analysis on the frequency range (FR)3. This study focuses on the FR3 low frequencies 6.5 GHz and 8.75 GHz with a setup tailored to the context of integrated sensing and communication (ISAC), where the data are collected with and without the presence of a target. A method based on multiple signal classification (MUSIC) is used to refine… ▽ More

    Submitted 19 November, 2024; originally announced November 2024.

  37. arXiv:2411.00830  [pdf, other

    eess.IV cs.AI cs.CV

    Unsupervised Training of a Dynamic Context-Aware Deep Denoising Framework for Low-Dose Fluoroscopic Imaging

    Authors: Sun-Young Jeon, Sen Wang, Adam S. Wang, Garry E. Gold, Jang-Hwan Choi

    Abstract: Fluoroscopy is critical for real-time X-ray visualization in medical imaging. However, low-dose images are compromised by noise, potentially affecting diagnostic accuracy. Noise reduction is crucial for maintaining image quality, especially given such challenges as motion artifacts and the limited availability of clean data in medical imaging. To address these issues, we propose an unsupervised tr… ▽ More

    Submitted 29 October, 2024; originally announced November 2024.

    Comments: 15 pages, 10 figures

  38. arXiv:2410.22640  [pdf, other

    eess.SP

    Channel-Coded Precoding for Multi-User MISO Systems

    Authors: Ly V. Nguyen, Junil Choi, Bjorn Ottersten, A. Lee Swindlehurst

    Abstract: Precoding is a critical and long-standing technique in multi-user communication systems. However, the majority of existing precoding methods do not consider channel coding in their designs. In this paper, we consider the precoding problem in multi-user multiple-input single-output (MISO) systems, incorporating channel coding into the design. By leveraging the error-correcting capability of channel… ▽ More

    Submitted 29 October, 2024; originally announced October 2024.

    Comments: 13 pages, 11 figures

  39. arXiv:2410.13839  [pdf, other

    cs.SD cs.AI eess.AS

    Accelerating Codec-based Speech Synthesis with Multi-Token Prediction and Speculative Decoding

    Authors: Tan Dat Nguyen, Ji-Hoon Kim, Jeongsoo Choi, Shukjae Choi, Jinseok Park, Younglo Lee, Joon Son Chung

    Abstract: The goal of this paper is to accelerate codec-based speech synthesis systems with minimum sacrifice to speech quality. We propose an enhanced inference method that allows for flexible trade-offs between speed and quality during inference without requiring additional training. Our core idea is to predict multiple tokens per inference step of the AR module using multiple prediction heads, resulting… ▽ More

    Submitted 17 October, 2024; originally announced October 2024.

    Comments: Submitted to IEEE ICASSP 2025

  40. arXiv:2410.11254  [pdf, other

    eess.SP

    Adaptive Power Allocation in Spaceborne Assisted NOMA Systems for Integrated Terrestrial Communications

    Authors: M Khalil, Ke Wang, Jinho Choi

    Abstract: This study introduces an innovative approach for adaptive power allocation in Non-Orthogonal Multiple Access (NOMA) systems, enhanced by the integration of spaceborne and terrestrial signals through a Reconfigurable Intelligent Surface (RIS). We develop an adaptive mechanism to adjust the power distribution between spaceborne and terrestrial signals according to variations in environmental conditi… ▽ More

    Submitted 15 October, 2024; originally announced October 2024.

  41. arXiv:2410.05883  [pdf, other

    eess.SP math.OC

    Improved PCRLB for radar tracking in clutter with geometry-dependent target measurement uncertainty and application to radar trajectory control

    Authors: Yifang Shi, Yu Zhang, Linjiao Fu, Dongliang Peng, Qiang Lu, Jee Woong Choi, Alfonso Farina

    Abstract: In realistic radar tracking, target measurement uncertainty (TMU) in terms of both detection probability and measurement error covariance is significantly affected by the target-to-radar (T2R) geometry. However, existing posterior Cramer-Rao Lower Bounds (PCRLBs) rarely investigate the fundamental impact of T2R geometry on target measurement uncertainty and eventually on mean square error (MSE) of… ▽ More

    Submitted 8 October, 2024; originally announced October 2024.

    Comments: 15 pages,12 figures

    ACM Class: F.2.1

  42. arXiv:2409.18465  [pdf, ps, other

    cs.IT eess.SP

    RIS-Enabled Cellular Systems Operated by Different Service Providers

    Authors: Hyeongtaek Lee, Junil Choi

    Abstract: In realistic cellular communication systems, multiple service providers will operate within different frequency ranges. Each serving cell, which is managed by a distinct service provider, is designed individually due to the orthogonal frequencies. However, when a reconfigurable intelligent surface (RIS) is deployed for a certain cell, the RIS still incurs reflective channels for the overall system… ▽ More

    Submitted 27 September, 2024; originally announced September 2024.

    Comments: 6 pages, 4 figures, accepted to IEEE Transactions on Vehicular Technology

  43. arXiv:2409.17048  [pdf, other

    cs.LG cs.NI eess.SP

    Predictive Covert Communication Against Multi-UAV Surveillance Using Graph Koopman Autoencoder

    Authors: Sivaram Krishnan, Jihong Park, Gregory Sherman, Benjamin Campbell, Jinho Choi

    Abstract: Low Probability of Detection (LPD) communication aims to obscure the presence of radio frequency (RF) signals to evade surveillance. In the context of mobile surveillance utilizing unmanned aerial vehicles (UAVs), achieving LPD communication presents significant challenges due to the UAVs' rapid and continuous movements, which are characterized by unknown nonlinear dynamics. Therefore, accurately… ▽ More

    Submitted 25 September, 2024; originally announced September 2024.

  44. arXiv:2409.16301  [pdf, other

    cs.RO cs.LG eess.SY

    Gait Switching and Enhanced Stabilization of Walking Robots with Deep Learning-based Reachability: A Case Study on Two-link Walker

    Authors: Xingpeng Xia, Jason J. Choi, Ayush Agrawal, Koushil Sreenath, Claire J. Tomlin, Somil Bansal

    Abstract: Learning-based approaches have recently shown notable success in legged locomotion. However, these approaches often lack accountability, necessitating empirical tests to determine their effectiveness. In this work, we are interested in designing a learning-based locomotion controller whose stability can be examined and guaranteed. This can be achieved by verifying regions of attraction (RoAs) of l… ▽ More

    Submitted 10 September, 2024; originally announced September 2024.

    Comments: The first two authors contributed equally. This work is supported in part by the NSF Grant CMMI-1944722, the NSF CAREER Program under award 2240163, the NASA ULI on Safe Aviation Autonomy, and the DARPA Assured Autonomy and Assured Neuro Symbolic Learning and Reasoning (ANSR) programs. The work of Jason J. Choi received the support of a fellowship from Kwanjeong Educational Foundation, Korea

  45. arXiv:2409.16296  [pdf

    cs.CV cs.GR eess.IV

    LiDAR-3DGS: LiDAR Reinforced 3D Gaussian Splatting for Multimodal Radiance Field Rendering

    Authors: Hansol Lim, Hanbeom Chang, Jongseong Brad Choi, Chul Min Yeum

    Abstract: In this paper, we explore the capabilities of multimodal inputs to 3D Gaussian Splatting (3DGS) based Radiance Field Rendering. We present LiDAR-3DGS, a novel method of reinforcing 3DGS inputs with LiDAR generated point clouds to significantly improve the accuracy and detail of 3D models. We demonstrate a systematic approach of LiDAR reinforcement to 3DGS to enable capturing of important features… ▽ More

    Submitted 9 September, 2024; originally announced September 2024.

  46. arXiv:2409.15760  [pdf, other

    cs.SD eess.AS

    NanoVoice: Efficient Speaker-Adaptive Text-to-Speech for Multiple Speakers

    Authors: Nohil Park, Heeseung Kim, Che Hyun Lee, Jooyoung Choi, Jiheum Yeom, Sungroh Yoon

    Abstract: We present NanoVoice, a personalized text-to-speech model that efficiently constructs voice adapters for multiple speakers simultaneously. NanoVoice introduces a batch-wise speaker adaptation technique capable of fine-tuning multiple references in parallel, significantly reducing training time. Beyond building separate adapters for each speaker, we also propose a parameter sharing technique that r… ▽ More

    Submitted 20 December, 2024; v1 submitted 24 September, 2024; originally announced September 2024.

    Comments: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025, Demo Page: https://nanovoice.github.io/

  47. arXiv:2409.15759  [pdf, other

    cs.SD eess.AS

    VoiceGuider: Enhancing Out-of-Domain Performance in Parameter-Efficient Speaker-Adaptive Text-to-Speech via Autoguidance

    Authors: Jiheum Yeom, Heeseung Kim, Jooyoung Choi, Che Hyun Lee, Nohil Park, Sungroh Yoon

    Abstract: When applying parameter-efficient finetuning via LoRA onto speaker adaptive text-to-speech models, adaptation performance may decline compared to full-finetuned counterparts, especially for out-of-domain speakers. Here, we propose VoiceGuider, a parameter-efficient speaker adaptive text-to-speech system reinforced with autoguidance to enhance the speaker adaptation performance, reducing the gap ag… ▽ More

    Submitted 20 December, 2024; v1 submitted 24 September, 2024; originally announced September 2024.

    Comments: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025, Demo Page: https://voiceguider.github.io/

  48. arXiv:2409.12416  [pdf, other

    eess.AS cs.SD eess.SP

    Speech-Declipping Transformer with Complex Spectrogram and Learnerble Temporal Features

    Authors: Younghoo Kwon, Jung-Woo Choi

    Abstract: We present a transformer-based speech-declipping model that effectively recovers clipped signals across a wide range of input signal-to-distortion ratios (SDRs). While recent time-domain deep neural network (DNN)-based declippers have outperformed traditional handcrafted and spectrogram-based DNN approaches, they still struggle with low-SDR inputs. To address this, we incorporate a transformer-bas… ▽ More

    Submitted 18 September, 2024; originally announced September 2024.

    Comments: 5 pages, 2 figures, submitted to ICASSP 2024

  49. arXiv:2409.12415  [pdf, other

    eess.AS cs.AI cs.SD

    Multichannel-to-Multichannel Target Sound Extraction Using Direction and Timestamp Clues

    Authors: Dayun Choi, Jung-Woo Choi

    Abstract: We propose a multichannel-to-multichannel target sound extraction (M2M-TSE) framework for separating multichannel target signals from a multichannel mixture of sound sources. Target sound extraction (TSE) isolates a specific target signal using user-provided clues, typically focusing on single-channel extraction with class labels or temporal activation maps. However, to preserve and utilize spatia… ▽ More

    Submitted 18 September, 2024; originally announced September 2024.

    Comments: 5 pages, 4 figures

  50. arXiv:2409.12413  [pdf, other

    eess.AS cs.SD

    DeFT-Mamba: Universal Multichannel Sound Separation and Polyphonic Audio Classification

    Authors: Dongheon Lee, Jung-Woo Choi

    Abstract: This paper presents a framework for universal sound separation and polyphonic audio classification, addressing the challenges of separating and classifying individual sound sources in a multichannel mixture. The proposed framework, DeFT-Mamba, utilizes the dense frequency-time attentive network (DeFTAN) combined with Mamba to extract sound objects, capturing the local time-frequency relations thro… ▽ More

    Submitted 18 September, 2024; originally announced September 2024.

    Comments: 5 pages, 2 figures