Skip to main content

Showing 1–50 of 549 results for author: Lee, J

Searching in archive eess. Search in all archives.
.
  1. arXiv:2507.06481  [pdf, ps, other

    cs.SD eess.AS

    IMPACT: Industrial Machine Perception via Acoustic Cognitive Transformer

    Authors: Changheon Han, Yuseop Sim, Hoin Jung, Jiho Lee, Hojun Lee, Yun Seok Kang, Sucheol Woo, Garam Kim, Hyung Wook Park, Martin Byung-Guk Jun

    Abstract: Acoustic signals from industrial machines offer valuable insights for anomaly detection, predictive maintenance, and operational efficiency enhancement. However, existing task-specific, supervised learning methods often scale poorly and fail to generalize across diverse industrial scenarios, whose acoustic characteristics are distinct from general audio. Furthermore, the scarcity of accessible, la… ▽ More

    Submitted 8 July, 2025; originally announced July 2025.

  2. arXiv:2507.04667  [pdf, ps, other

    cs.CV cs.AI cs.MM cs.SD eess.AS

    What's Making That Sound Right Now? Video-centric Audio-Visual Localization

    Authors: Hahyeon Choi, Junhoo Lee, Nojun Kwak

    Abstract: Audio-Visual Localization (AVL) aims to identify sound-emitting sources within a visual scene. However, existing studies focus on image-level audio-visual associations, failing to capture temporal dynamics. Moreover, they assume simplified scenarios where sound sources are always visible and involve only a single object. To address these limitations, we propose AVATAR, a video-centric AVL benchmar… ▽ More

    Submitted 8 July, 2025; v1 submitted 7 July, 2025; originally announced July 2025.

    Comments: Published at ICCV 2025. Project page: https://hahyeon610.github.io/Video-centric_Audio_Visual_Localization/

  3. arXiv:2507.04140  [pdf, ps, other

    cs.RO eess.SY

    Learning Humanoid Arm Motion via Centroidal Momentum Regularized Multi-Agent Reinforcement Learning

    Authors: Ho Jae Lee, Se Hwan Jeon, Sangbae Kim

    Abstract: Humans naturally swing their arms during locomotion to regulate whole-body dynamics, reduce angular momentum, and help maintain balance. Inspired by this principle, we present a limb-level multi-agent reinforcement learning (RL) framework that enables coordinated whole-body control of humanoid robots through emergent arm motion. Our approach employs separate actor-critic structures for the arms an… ▽ More

    Submitted 5 July, 2025; originally announced July 2025.

    Comments: 8 pages, 10 figures

  4. arXiv:2507.03937  [pdf

    eess.IV cs.AI cs.CV

    EdgeSRIE: A hybrid deep learning framework for real-time speckle reduction and image enhancement on portable ultrasound systems

    Authors: Hyunwoo Cho, Jongsoo Lee, Jinbum Kang, Yangmo Yoo

    Abstract: Speckle patterns in ultrasound images often obscure anatomical details, leading to diagnostic uncertainty. Recently, various deep learning (DL)-based techniques have been introduced to effectively suppress speckle; however, their high computational costs pose challenges for low-resource devices, such as portable ultrasound systems. To address this issue, EdgeSRIE, which is a lightweight hybrid DL… ▽ More

    Submitted 5 July, 2025; originally announced July 2025.

  5. arXiv:2507.03149  [pdf, ps, other

    eess.AS cs.AI cs.CL cs.SD

    On the Relationship between Accent Strength and Articulatory Features

    Authors: Kevin Huang, Sean Foley, Jihwan Lee, Yoonjeong Lee, Dani Byrd, Shrikanth Narayanan

    Abstract: This paper explores the relationship between accent strength and articulatory features inferred from acoustic speech. To quantify accent strength, we compare phonetic transcriptions with transcriptions based on dictionary-based references, computing phoneme-level difference as a measure of accent strength. The proposed framework leverages recent self-supervised learning articulatory inversion tech… ▽ More

    Submitted 3 July, 2025; originally announced July 2025.

    Comments: Accepted for Interspeech2025

  6. arXiv:2507.01173  [pdf

    eess.SY

    An Adaptive Estimation Approach based on Fisher Information to Overcome the Challenges of LFP Battery SOC Estimation

    Authors: Junzhe Shi, Shida Jiang, Shengyu Tao, Jaewong Lee, Manashita Borah, Scott Moura

    Abstract: Robust and Real-time State of Charge (SOC) estimation is essential for Lithium Iron Phosphate (LFP) batteries, which are widely used in electric vehicles (EVs) and energy storage systems due to safety and longevity. However, the flat Open Circuit Voltage (OCV)-SOC curve makes this task particularly challenging. This challenge is complicated by hysteresis effects, and real-world conditions such as… ▽ More

    Submitted 1 July, 2025; originally announced July 2025.

  7. arXiv:2506.22467  [pdf

    eess.SP cs.CV

    SegmentAnyMuscle: A universal muscle segmentation model across different locations in MRI

    Authors: Roy Colglazier, Jisoo Lee, Haoyu Dong, Hanxue Gu, Yaqian Chen, Joseph Cao, Zafer Yildiz, Zhonghao Liu, Nicholas Konz, Jichen Yang, Jikai Zhang, Yuwen Chen, Lin Li, Adrian Camarena, Maciej A. Mazurowski

    Abstract: The quantity and quality of muscles are increasingly recognized as important predictors of health outcomes. While MRI offers a valuable modality for such assessments, obtaining precise quantitative measurements of musculature remains challenging. This study aimed to develop a publicly available model for muscle segmentation in MRIs and demonstrate its applicability across various anatomical locati… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

    Comments: 24 pages, 6 figures

  8. arXiv:2506.21174  [pdf

    eess.AS cs.LG

    Performance improvement of spatial semantic segmentation with enriched audio features and agent-based error correction for DCASE 2025 Challenge Task 4

    Authors: Jongyeon Park, Joonhee Lee, Do-Hyeon Lim, Hong Kook Kim, Hyeongcheol Geum, Jeong Eun Lim

    Abstract: This technical report presents submission systems for Task 4 of the DCASE 2025 Challenge. This model incorporates additional audio features (spectral roll-off and chroma features) into the embedding feature extracted from the mel-spectral feature to im-prove the classification capabilities of an audio-tagging model in the spatial semantic segmentation of sound scenes (S5) system. This approach is… ▽ More

    Submitted 26 June, 2025; originally announced June 2025.

    Comments: DCASE 2025 challenge Task4, 5 pages

  9. arXiv:2506.20598  [pdf

    cs.AI eess.SY

    Fine-Tuning and Prompt Engineering of LLMs, for the Creation of Multi-Agent AI for Addressing Sustainable Protein Production Challenges

    Authors: Alexander D. Kalian, Jaewook Lee, Stefan P. Johannesson, Lennart Otte, Christer Hogstrand, Miao Guo

    Abstract: The global demand for sustainable protein sources has accelerated the need for intelligent tools that can rapidly process and synthesise domain-specific scientific knowledge. In this study, we present a proof-of-concept multi-agent Artificial Intelligence (AI) framework designed to support sustainable protein production research, with an initial focus on microbial protein sources. Our Retrieval-Au… ▽ More

    Submitted 25 June, 2025; originally announced June 2025.

  10. arXiv:2506.19446  [pdf, ps, other

    cs.SD eess.AS

    Vo-Ve: An Explainable Voice-Vector for Speaker Identity Evaluation

    Authors: Jaejun Lee, Kyogu Lee

    Abstract: In this paper, we propose Vo-Ve, a novel voice-vector embedding that captures speaker identity. Unlike conventional speaker embeddings, Vo-Ve is explainable, as it contains the probabilities of explicit voice attribute classes. Through extensive analysis, we demonstrate that Vo-Ve not only evaluates speaker similarity competitively with conventional techniques but also provides an interpretable ex… ▽ More

    Submitted 24 June, 2025; originally announced June 2025.

    Comments: Interspeech 2025

  11. arXiv:2506.16572  [pdf, ps, other

    eess.IV cs.CV

    DiffO: Single-step Diffusion for Image Compression at Ultra-Low Bitrates

    Authors: Chanung Park, Joo Chan Lee, Jong Hwan Ko

    Abstract: Although image compression is fundamental to visual data processing and has inspired numerous standard and learned codecs, these methods still suffer severe quality degradation at extremely low bits per pixel. While recent diffusion based models provided enhanced generative performance at low bitrates, they still yields limited perceptual quality and prohibitive decoding latency due to multiple de… ▽ More

    Submitted 19 June, 2025; originally announced June 2025.

  12. arXiv:2506.14657  [pdf, ps, other

    eess.AS cs.AR

    ASAP-FE: Energy-Efficient Feature Extraction Enabling Multi-Channel Keyword Spotting on Edge Processors

    Authors: Jongin Choi, Jina Park, Woojoo Lee, Jae-Jin Lee, Massoud Pedram

    Abstract: Multi-channel keyword spotting (KWS) has become crucial for voice-based applications in edge environments. However, its substantial computational and energy requirements pose significant challenges. We introduce ASAP-FE (Agile Sparsity-Aware Parallelized-Feature Extractor), a hardware-oriented front-end designed to address these challenges. Our framework incorporates three key innovations: (1) Hal… ▽ More

    Submitted 17 June, 2025; originally announced June 2025.

    Comments: 7 pages, 11 figures, ISLPED 2025

  13. arXiv:2506.10265  [pdf, ps, other

    eess.SP cs.CV cs.HC

    Ground Reaction Force Estimation via Time-aware Knowledge Distillation

    Authors: Eun Som Jeon, Sinjini Mitra, Jisoo Lee, Omik M. Save, Ankita Shukla, Hyunglae Lee, Pavan Turaga

    Abstract: Human gait analysis with wearable sensors has been widely used in various applications, such as daily life healthcare, rehabilitation, physical therapy, and clinical diagnostics and monitoring. In particular, ground reaction force (GRF) provides critical information about how the body interacts with the ground during locomotion. Although instrumented treadmills have been widely used as the gold st… ▽ More

    Submitted 11 June, 2025; originally announced June 2025.

    Journal ref: IEEE Internet of Things Journal, 2025

  14. arXiv:2506.06348  [pdf, other

    eess.SP cs.LG

    Multi-Platform Methane Plume Detection via Model and Domain Adaptation

    Authors: Vassiliki Mancoridis, Brian Bue, Jake H. Lee, Andrew K. Thorpe, Daniel Cusworth, Alana Ayasse, Philip G. Brodrick, Riley Duren

    Abstract: Prioritizing methane for near-term climate action is crucial due to its significant impact on global warming. Previous work used columnwise matched filter products from the airborne AVIRIS-NG imaging spectrometer to detect methane plume sources; convolutional neural networks (CNNs) discerned anthropogenic methane plumes from false positive enhancements. However, as an increasing number of remote s… ▽ More

    Submitted 1 June, 2025; originally announced June 2025.

    Comments: 12 pages 8 figures. In review

  15. arXiv:2506.02863  [pdf, ps, other

    eess.AS cs.AI cs.SD

    CapSpeech: Enabling Downstream Applications in Style-Captioned Text-to-Speech

    Authors: Helin Wang, Jiarui Hai, Dading Chong, Karan Thakkar, Tiantian Feng, Dongchao Yang, Junhyeok Lee, Laureano Moro Velazquez, Jesus Villalba, Zengyi Qin, Shrikanth Narayanan, Mounya Elhiali, Najim Dehak

    Abstract: Recent advancements in generative artificial intelligence have significantly transformed the field of style-captioned text-to-speech synthesis (CapTTS). However, adapting CapTTS to real-world applications remains challenging due to the lack of standardized, comprehensive datasets and limited research on downstream tasks built upon CapTTS. To address these gaps, we introduce CapSpeech, a new benchm… ▽ More

    Submitted 3 June, 2025; originally announced June 2025.

  16. arXiv:2506.01460  [pdf, ps, other

    cs.SD eess.AS

    Few-step Adversarial Schrödinger Bridge for Generative Speech Enhancement

    Authors: Seungu Han, Sungho Lee, Juheon Lee, Kyogu Lee

    Abstract: Deep generative models have recently been employed for speech enhancement to generate perceptually valid clean speech on large-scale datasets. Several diffusion models have been proposed, and more recently, a tractable Schrödinger Bridge has been introduced to transport between the clean and noisy speech distributions. However, these models often suffer from an iterative reverse process and requir… ▽ More

    Submitted 2 June, 2025; originally announced June 2025.

    Comments: Accepted to Interspeech 2025

  17. arXiv:2505.23317  [pdf, ps, other

    eess.SY cs.CV

    CF-DETR: Coarse-to-Fine Transformer for Real-Time Object Detection

    Authors: Woojin Shin, Donghwa Kang, Byeongyun Park, Brent Byunghoon Kang, Jinkyu Lee, Hyeongboo Baek

    Abstract: Detection Transformers (DETR) are increasingly adopted in autonomous vehicle (AV) perception systems due to their superior accuracy over convolutional networks. However, concurrently executing multiple DETR tasks presents significant challenges in meeting firm real-time deadlines (R1) and high accuracy requirements (R2), particularly for safety-critical objects, while navigating the inherent laten… ▽ More

    Submitted 29 May, 2025; originally announced May 2025.

    Comments: 12 pages

  18. arXiv:2505.15914  [pdf, ps, other

    cs.SD eess.AS

    A Novel Deep Learning Framework for Efficient Multichannel Acoustic Feedback Control

    Authors: Yuan-Kuei Wu, Juan Azcarreta, Kashyap Patel, Buye Xu, Jung-Suk Lee, Sanha Lee, Ashutosh Pandey

    Abstract: This study presents a deep-learning framework for controlling multichannel acoustic feedback in audio devices. Traditional digital signal processing methods struggle with convergence when dealing with highly correlated noise such as feedback. We introduce a Convolutional Recurrent Network that efficiently combines spatial and temporal processing, significantly enhancing speech enhancement capabili… ▽ More

    Submitted 29 May, 2025; v1 submitted 21 May, 2025; originally announced May 2025.

    Comments: Accepted by Interspeech 2025

  19. arXiv:2505.14648  [pdf, ps, other

    cs.SD eess.AS

    Vox-Profile: A Speech Foundation Model Benchmark for Characterizing Diverse Speaker and Speech Traits

    Authors: Tiantian Feng, Jihwan Lee, Anfeng Xu, Yoonjeong Lee, Thanathai Lertpetchpun, Xuan Shi, Helin Wang, Thomas Thebaud, Laureano Moro-Velazquez, Dani Byrd, Najim Dehak, Shrikanth Narayanan

    Abstract: We introduce Vox-Profile, a comprehensive benchmark to characterize rich speaker and speech traits using speech foundation models. Unlike existing works that focus on a single dimension of speaker traits, Vox-Profile provides holistic and multi-dimensional profiles that reflect both static speaker traits (e.g., age, sex, accent) and dynamic speech properties (e.g., emotion, speech flow). This benc… ▽ More

    Submitted 20 May, 2025; originally announced May 2025.

  20. arXiv:2505.13814  [pdf, ps, other

    eess.AS cs.AI cs.SD

    Articulatory Feature Prediction from Surface EMG during Speech Production

    Authors: Jihwan Lee, Kevin Huang, Kleanthis Avramidis, Simon Pistrosch, Monica Gonzalez-Machorro, Yoonjeong Lee, Björn Schuller, Louis Goldstein, Shrikanth Narayanan

    Abstract: We present a model for predicting articulatory features from surface electromyography (EMG) signals during speech production. The proposed model integrates convolutional layers and a Transformer block, followed by separate predictors for articulatory features. Our approach achieves a high prediction correlation of approximately 0.9 for most articulatory features. Furthermore, we demonstrate that t… ▽ More

    Submitted 28 May, 2025; v1 submitted 19 May, 2025; originally announced May 2025.

    Comments: Accepted for Interspeech2025

  21. arXiv:2505.12686  [pdf, other

    cs.LG cs.SD eess.AS

    RoVo: Robust Voice Protection Against Unauthorized Speech Synthesis with Embedding-Level Perturbations

    Authors: Seungmin Kim, Sohee Park, Donghyun Kim, Jisu Lee, Daeseon Choi

    Abstract: With the advancement of AI-based speech synthesis technologies such as Deep Voice, there is an increasing risk of voice spoofing attacks, including voice phishing and fake news, through unauthorized use of others' voices. Existing defenses that inject adversarial perturbations directly into audio signals have limited effectiveness, as these perturbations can easily be neutralized by speech enhance… ▽ More

    Submitted 19 May, 2025; originally announced May 2025.

  22. arXiv:2505.04174  [pdf, other

    cs.LG cs.AI cs.NI eess.SP

    On-Device LLM for Context-Aware Wi-Fi Roaming

    Authors: Ju-Hyung Lee, Yanqing Lu, Klaus Doppler

    Abstract: Roaming in Wireless LAN (Wi-Fi) is a critical yet challenging task for maintaining seamless connectivity in dynamic mobile environments. Conventional threshold-based or heuristic schemes often fail, leading to either sticky or excessive handovers. We introduce the first cross-layer use of an on-device large language model (LLM): high-level reasoning in the application layer that issues real-time a… ▽ More

    Submitted 20 May, 2025; v1 submitted 7 May, 2025; originally announced May 2025.

  23. arXiv:2505.01750  [pdf, ps, other

    eess.AS cs.SD

    FLOWER: Flow-Based Estimated Gaussian Guidance for General Speech Restoration

    Authors: Da-Hee Yang, Jaeuk Lee, Joon-Hyuk Chang

    Abstract: We introduce FLOWER, a novel conditioning method designed for speech restoration that integrates Gaussian guidance into generative frameworks. By transforming clean speech into a predefined prior distribution (e.g., Gaussian distribution) using a normalizing flow network, FLOWER extracts critical information to guide generative models. This guidance is incorporated into each block of the generativ… ▽ More

    Submitted 3 May, 2025; originally announced May 2025.

  24. arXiv:2505.01079  [pdf, other

    cs.CV eess.IV

    Improving Editability in Image Generation with Layer-wise Memory

    Authors: Daneul Kim, Jaeah Lee, Jaesik Park

    Abstract: Most real-world image editing tasks require multiple sequential edits to achieve desired results. Current editing approaches, primarily designed for single-object modifications, struggle with sequential editing: especially with maintaining previous edits along with adapting new objects naturally into the existing content. These limitations significantly hinder complex editing scenarios where multi… ▽ More

    Submitted 2 May, 2025; originally announced May 2025.

    Comments: CVPR 2025. Project page : https://carpedkm.github.io/projects/improving_edit/index.html

  25. arXiv:2505.00481  [pdf, other

    eess.SY

    Stabilization by Controllers Having Integer Coefficients

    Authors: Joowon Lee, Donggil Lee, Junsoo Kim

    Abstract: The system property of ``having integer coefficients,'' that is, a transfer function has an integer monic polynomial as its denominator, is significant in the field of encrypted control as it is required for a dynamic controller to be realized over encrypted data. This paper shows that there always exists a controller with integer coefficients stabilizing a given discrete-time linear time-invarian… ▽ More

    Submitted 1 May, 2025; originally announced May 2025.

  26. arXiv:2505.00133  [pdf, other

    eess.IV cs.CV

    Efficient and robust 3D blind harmonization for large domain gaps

    Authors: Hwihun Jeong, Hayeon Lee, Se Young Chun, Jongho Lee

    Abstract: Blind harmonization has emerged as a promising technique for MR image harmonization to achieve scale-invariant representations, requiring only target domain data (i.e., no source domain data necessary). However, existing methods face limitations such as inter-slice heterogeneity in 3D, moderate image quality, and limited performance for a large domain gap. To address these challenges, we introduce… ▽ More

    Submitted 30 April, 2025; originally announced May 2025.

  27. An Addendum to NeBula: Towards Extending TEAM CoSTAR's Solution to Larger Scale Environments

    Authors: Ali Agha, Kyohei Otsu, Benjamin Morrell, David D. Fan, Sung-Kyun Kim, Muhammad Fadhil Ginting, Xianmei Lei, Jeffrey Edlund, Seyed Fakoorian, Amanda Bouman, Fernando Chavez, Taeyeon Kim, Gustavo J. Correa, Maira Saboia, Angel Santamaria-Navarro, Brett Lopez, Boseong Kim, Chanyoung Jung, Mamoru Sobue, Oriana Claudia Peltzer, Joshua Ott, Robert Trybula, Thomas Touma, Marcel Kaufmann, Tiago Stegun Vaquero , et al. (64 additional authors not shown)

    Abstract: This paper presents an appendix to the original NeBula autonomy solution developed by the TEAM CoSTAR (Collaborative SubTerranean Autonomous Robots), participating in the DARPA Subterranean Challenge. Specifically, this paper presents extensions to NeBula's hardware, software, and algorithmic components that focus on increasing the range and scale of the exploration environment. From the algorithm… ▽ More

    Submitted 18 April, 2025; originally announced April 2025.

    Journal ref: IEEE Transactions on Field Robotics, vol. 1, pp. 476-526, 2024

  28. Documentation on Encrypted Dynamic Control Simulation Code using Ring-LWE based Cryptosystems

    Authors: Yeongjun Jang, Joowon Lee, Junsoo Kim

    Abstract: Encrypted controllers offer secure computation by employing modern cryptosystems to execute control operations directly over encrypted data without decryption. However, incorporating cryptosystems into dynamic controllers significantly increases the computational load. This paper aims to provide an accessible guideline for running encrypted controllers using an open-source library Lattigo, which s… ▽ More

    Submitted 17 April, 2025; originally announced April 2025.

    Comments: 6 pages

    Journal ref: Journal of The Society of Instrument and Control Engineers, vol. 64, no. 4, pp. 248-254, 2025

  29. arXiv:2504.10686  [pdf, other

    cs.CV eess.IV

    The Tenth NTIRE 2025 Efficient Super-Resolution Challenge Report

    Authors: Bin Ren, Hang Guo, Lei Sun, Zongwei Wu, Radu Timofte, Yawei Li, Yao Zhang, Xinning Chai, Zhengxue Cheng, Yingsheng Qin, Yucai Yang, Li Song, Hongyuan Yu, Pufan Xu, Cheng Wan, Zhijuan Huang, Peng Guo, Shuyuan Cui, Chenjun Li, Xuehai Hu, Pan Pan, Xin Zhang, Heng Zhang, Qing Luo, Linyan Jiang , et al. (122 additional authors not shown)

    Abstract: This paper presents a comprehensive review of the NTIRE 2025 Challenge on Single-Image Efficient Super-Resolution (ESR). The challenge aimed to advance the development of deep models that optimize key computational metrics, i.e., runtime, parameters, and FLOPs, while achieving a PSNR of at least 26.90 dB on the $\operatorname{DIV2K\_LSDIR\_valid}$ dataset and 26.99 dB on the… ▽ More

    Submitted 14 April, 2025; originally announced April 2025.

    Comments: Accepted by CVPR2025 NTIRE Workshop, Efficient Super-Resolution Challenge Report. 50 pages

  30. arXiv:2504.09132  [pdf, ps, other

    cs.LG eess.SP

    Self-Supervised Autoencoder Network for Robust Heart Rate Extraction from Noisy Photoplethysmogram: Applying Blind Source Separation to Biosignal Analysis

    Authors: Matthew B. Webster, Dongheon Lee, Joonnyong Lee

    Abstract: Biosignals can be viewed as mixtures measuring particular physiological events, and blind source separation (BSS) aims to extract underlying source signals from mixtures. This paper proposes a self-supervised multi-encoder autoencoder (MEAE) to separate heartbeat-related source signals from photoplethysmogram (PPG), enhancing heart rate (HR) detection in noisy PPG data. The MEAE is trained on PPG… ▽ More

    Submitted 4 June, 2025; v1 submitted 12 April, 2025; originally announced April 2025.

    Comments: 12 pages, 5 figures, 1 table, preprint

    MSC Class: I.2.6

  31. arXiv:2504.04066  [pdf, other

    eess.IV cs.CV

    Performance Analysis of Deep Learning Models for Femur Segmentation in MRI Scan

    Authors: Mengyuan Liu, Yixiao Chen, Anning Tian, Xinmeng Wu, Mozhi Shen, Tianchou Gong, Jeongkyu Lee

    Abstract: Convolutional neural networks like U-Net excel in medical image segmentation, while attention mechanisms and KAN enhance feature extraction. Meta's SAM 2 uses Vision Transformers for prompt-based segmentation without fine-tuning. However, biases in these models impact generalization with limited data. In this study, we systematically evaluate and compare the performance of three CNN-based models,… ▽ More

    Submitted 5 April, 2025; originally announced April 2025.

  32. arXiv:2504.03229  [pdf, other

    eess.SY

    A Robust Method for Fault Detection and Severity Estimation in Mechanical Vibration Data

    Authors: Youngjae Jeon, Eunho Heo, Jinmo Lee, Taewon Uhm, Dongjin Lee

    Abstract: This paper proposes a robust method for fault detection and severity estimation in multivariate time-series data to enhance predictive maintenance of mechanical systems. We use the Temporal Graph Convolutional Network (T-GCN) model to capture both spatial and temporal dependencies among variables. This enables accurate future state predictions under varying operational conditions. To address the c… ▽ More

    Submitted 4 April, 2025; originally announced April 2025.

    Comments: 8 pages, 9 figures

    Journal ref: 2025 IEEE International Conference on Prognostics and Health Management (ICPHM)

  33. arXiv:2504.00447  [pdf, other

    cs.RO eess.SY

    Egocentric Conformal Prediction for Safe and Efficient Navigation in Dynamic Cluttered Environments

    Authors: Jaeuk Shin, Jungjin Lee, Insoon Yang

    Abstract: Conformal prediction (CP) has emerged as a powerful tool in robotics and control, thanks to its ability to calibrate complex, data-driven models with formal guarantees. However, in robot navigation tasks, existing CP-based methods often decouple prediction from control, evaluating models without considering whether prediction errors actually compromise safety. Consequently, ego-vehicles may become… ▽ More

    Submitted 1 April, 2025; originally announced April 2025.

  34. arXiv:2504.00137  [pdf, other

    eess.SY

    Performance analysis of metasurface-based spatial multimode transmission for 6G wireless communications

    Authors: Ju Yong Lee, Seung-Won Keum, Sang Min Oh, Dang-Oh Kim, Dong-Ho Cho

    Abstract: In 6th generation wireless communication technology, it is important to utilize space resources efficiently. Recently, holographic multiple-input multiple-output (HMIMO) and meta-surface technology have attracted attention as technologies that maximize space utilization for 6G mobile communications. However, studies on HMIMO communications are still in an initial stage and its fundamental limits a… ▽ More

    Submitted 31 March, 2025; originally announced April 2025.

  35. arXiv:2503.23439  [pdf, other

    cs.CL cs.AI cs.LG cs.SD eess.AS

    Speculative End-Turn Detector for Efficient Speech Chatbot Assistant

    Authors: Hyunjong Ok, Suho Yoo, Jaeho Lee

    Abstract: Spoken dialogue systems powered by large language models have demonstrated remarkable abilities in understanding human speech and generating appropriate spoken responses. However, these systems struggle with end-turn detection (ETD) -- the ability to distinguish between user turn completion and hesitation. This limitation often leads to premature or delayed responses, disrupting the flow of spoken… ▽ More

    Submitted 30 March, 2025; originally announced March 2025.

    Comments: Preprint

  36. arXiv:2503.23108  [pdf, other

    eess.AS cs.LG cs.SD

    SupertonicTTS: Towards Highly Scalable and Efficient Text-to-Speech System

    Authors: Hyeongju Kim, Jinhyeok Yang, Yechan Yu, Seunghun Ji, Jacob Morton, Frederik Bous, Joon Byun, Juheon Lee

    Abstract: We present a novel text-to-speech (TTS) system, namely SupertonicTTS, for improved scalability and efficiency in speech synthesis. SupertonicTTS comprises three components: a speech autoencoder for continuous latent representation, a text-to-latent module leveraging flow-matching for text-to-latent mapping, and an utterance-level duration predictor. To enable a lightweight architecture, we employ… ▽ More

    Submitted 16 May, 2025; v1 submitted 29 March, 2025; originally announced March 2025.

    Comments: 21 pages, preprint

  37. arXiv:2503.22026  [pdf, other

    cs.CV eess.IV

    Multispectral Demosaicing via Dual Cameras

    Authors: SaiKiran Tedla, Junyong Lee, Beixuan Yang, Mahmoud Afifi, Michael S. Brown

    Abstract: Multispectral (MS) images capture detailed scene information across a wide range of spectral bands, making them invaluable for applications requiring rich spectral data. Integrating MS imaging into multi camera devices, such as smartphones, has the potential to enhance both spectral applications and RGB image quality. A critical step in processing MS data is demosaicing, which reconstructs color i… ▽ More

    Submitted 8 April, 2025; v1 submitted 27 March, 2025; originally announced March 2025.

  38. arXiv:2503.21057  [pdf, other

    eess.SY

    Validation and Calibration of Energy Models with Real Vehicle Data from Chassis Dynamometer Experiments

    Authors: Joy Carpio, Sulaiman Almatrudi, Nour Khoudari, Zhe Fu, Kenneth Butts, Jonathan Lee, Benjamin Seibold, Alexandre Bayen

    Abstract: Accurate estimation of vehicle fuel consumption typically requires detailed modeling of complex internal powertrain dynamics, often resulting in computationally intensive simulations. However, many transportation applications-such as traffic flow modeling, optimization, and control-require simplified models that are fast, interpretable, and easy to implement, while still maintaining fidelity to ph… ▽ More

    Submitted 26 March, 2025; originally announced March 2025.

  39. arXiv:2503.19228  [pdf, ps, other

    eess.SY

    Bridging the Sim-to-real Gap: A Control Framework for Imitation Learning of Model Predictive Control

    Authors: Seungtaek Kim, Jonghyup Lee, Kyoungseok Han, Seibum B. Choi

    Abstract: To address the computational challenges of Model Predictive Control (MPC), recent research has studied using imitation learning to approximate the MPC to a computationally efficient Deep Neural Network (DNN). However, this introduces a common issue in learning-based control, the simulation-to-reality (sim-to-real) gap, and Domain Randomization (DR) has been widely used to mitigate this gap by intr… ▽ More

    Submitted 3 July, 2025; v1 submitted 24 March, 2025; originally announced March 2025.

  40. arXiv:2503.16853  [pdf, ps, other

    cs.CL cs.AI cs.LG cs.SD eess.AS

    Imagine to Hear: Auditory Knowledge Generation can be an Effective Assistant for Language Models

    Authors: Suho Yoo, Hyunjong Ok, Jaeho Lee

    Abstract: Language models pretrained on text-only corpora often struggle with tasks that require auditory commonsense knowledge. Previous work addresses this problem by augmenting the language model to retrieve knowledge from external audio databases. This approach has several limitations, such as the potential lack of relevant audio in databases and the high costs associated with constructing the databases… ▽ More

    Submitted 8 June, 2025; v1 submitted 21 March, 2025; originally announced March 2025.

    Comments: 12 pages, 5 figures, ACL Findings 2025

  41. arXiv:2503.15498  [pdf, other

    cs.HC cs.AI cs.MA cs.MM cs.SD eess.AS

    Revival: Collaborative Artistic Creation through Human-AI Interactions in Musical Creativity

    Authors: Keon Ju M. Lee, Philippe Pasquier, Jun Yuri

    Abstract: Revival is an innovative live audiovisual performance and music improvisation by our artist collective K-Phi-A, blending human and AI musicianship to create electronic music with audio-reactive visuals. The performance features real-time co-creative improvisation between a percussionist, an electronic music artist, and AI musical agents. Trained in works by deceased composers and the collective's… ▽ More

    Submitted 19 January, 2025; originally announced March 2025.

    Comments: Keon Ju M. Lee, Philippe Pasquier and Jun Yuri. 2024. In Proceedings of the Creativity and Generative AI NIPS (Neural Information Processing Systems) Workshop

  42. arXiv:2503.12891  [pdf

    eess.SY

    PD-Skygroundhook Controller for Semi-Active Suspension System Using Magnetorheological Fluid Dampers

    Authors: Hansol Lim, Jee Won Lee, Seung-Bok Choi, Jongseong Brad Choi

    Abstract: This paper presents a Proportional-Derivative (PD) Skygroundhook controller for magnetorheological (MR) dampers in semi-active suspensions. Traditional skyhook, Groundhook, and hybrid Skygroundhook controllers are well-known for their ability to reduce body and wheel vibrations; however, each approach has limitations in handling a broad frequency spectrum and often relies on abrupt switching. By a… ▽ More

    Submitted 17 March, 2025; originally announced March 2025.

    Comments: This work has been submitted to the 2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) for possible publication

  43. arXiv:2503.09906  [pdf, other

    eess.AS cs.SD

    ValSub: Subsampling Validation Data to Mitigate Forgetting during ASR Personalization

    Authors: Haaris Mehmood, Karthikeyan Saravanan, Pablo Peso Parada, David Tuckey, Mete Ozay, Gil Ho Lee, Jungin Lee, Seokyeong Jung

    Abstract: Automatic Speech Recognition (ASR) is widely used within consumer devices such as mobile phones. Recently, personalization or on-device model fine-tuning has shown that adaptation of ASR models towards target user speech improves their performance over rare words or accented speech. Despite these gains, fine-tuning on user data (target domain) risks the personalized model to forget knowledge about… ▽ More

    Submitted 7 April, 2025; v1 submitted 12 March, 2025; originally announced March 2025.

    Comments: Accepted at ICASSP 2025

  44. arXiv:2503.06743  [pdf, ps, other

    eess.IV cs.CV

    GlaGAN: A Generative Unsupervised Model for High-Precision Segmentation of Retinal Main Vessels toward Early Detection of Glaucoma

    Authors: Cheng Huang, Weizheng Xie, Tsengdar J. Lee, Jui-Kai Wang, Karanjit Kooner, Ning Zhang, Jia Zhang

    Abstract: Structural changes in the main retinal blood vessels are critical biomarkers for glaucoma onset and progression. Identifying these vessels is essential for vascular modeling yet highly challenging. This paper introduces GlaGAN, an unsupervised generative AI model for segmenting main blood vessels in Optical Coherence Tomography Angiography (OCTA) images. The process begins with the Space Colonizat… ▽ More

    Submitted 7 July, 2025; v1 submitted 9 March, 2025; originally announced March 2025.

  45. arXiv:2503.02274  [pdf, other

    eess.SY

    Rethinking Static Line Rating for Economic and Efficient Power Operation in South Korea

    Authors: Junseon Park, Junhyun Lee, Hyeongon Park

    Abstract: In South Korea, power grid is currently operated based on the static line rating (SLR) method, where the transmission line capacity is determined based on extreme weather conditions. However, with global warming, there is a concern that the temperatures during summer may exceed the SLR criteria, posing safety risks. On the other hand, the conservative estimates used for winter conditions limit… ▽ More

    Submitted 3 March, 2025; originally announced March 2025.

  46. arXiv:2503.00790  [pdf

    cs.SD cs.ET eess.AS

    Acoustic Anomaly Detection on UAM Propeller Defect with Acoustic dataset for Crack of drone Propeller (ADCP)

    Authors: Juho Lee, Donghyun Yoon, Gumoon Jeong, Hyeoncheol Kim

    Abstract: The imminent commercialization of UAM requires stable, AI-based maintenance systems to ensure safety for both passengers and pedestrians. This paper presents a methodology for non-destructively detecting cracks in UAM propellers using drone propeller sound datasets. Normal operating sounds were recorded, and abnormal sounds (categorized as ripped and broken) were differentiated by varying the micr… ▽ More

    Submitted 2 March, 2025; originally announced March 2025.

    Comments: 25 pages

  47. arXiv:2502.17726  [pdf, other

    cs.SD cs.AI cs.DL cs.IR eess.AS

    The GigaMIDI Dataset with Features for Expressive Music Performance Detection

    Authors: Keon Ju Maverick Lee, Jeff Ens, Sara Adkins, Pedro Sarmento, Mathieu Barthet, Philippe Pasquier

    Abstract: The Musical Instrument Digital Interface (MIDI), introduced in 1983, revolutionized music production by allowing computers and instruments to communicate efficiently. MIDI files encode musical instructions compactly, facilitating convenient music sharing. They benefit Music Information Retrieval (MIR), aiding in research on music understanding, computational musicology, and generative music. The G… ▽ More

    Submitted 24 February, 2025; originally announced February 2025.

    Comments: Published at Transactions of the International Society for Music Information Retrieval (TISMIR), 8(1), 1-19

  48. arXiv:2502.17470  [pdf, other

    eess.SP cs.AI

    MC2SleepNet: Multi-modal Cross-masking with Contrastive Learning for Sleep Stage Classification

    Authors: Younghoon Na, Hyun Keun Ahn, Hyun-Kyung Lee, Yoongeol Lee, Seung Hun Oh, Hongkwon Kim, Jeong-Gun Lee

    Abstract: Sleep profoundly affects our health, and sleep deficiency or disorders can cause physical and mental problems. Despite significant findings from previous studies, challenges persist in optimizing deep learning models, especially in multi-modal learning for high-accuracy sleep stage classification. Our research introduces MC2SleepNet (Multi-modal Cross-masking with Contrastive learning for Sleep st… ▽ More

    Submitted 26 February, 2025; v1 submitted 13 February, 2025; originally announced February 2025.

  49. arXiv:2502.15602  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    KAD: No More FAD! An Effective and Efficient Evaluation Metric for Audio Generation

    Authors: Yoonjin Chung, Pilsun Eu, Junwon Lee, Keunwoo Choi, Juhan Nam, Ben Sangbae Chon

    Abstract: Although being widely adopted for evaluating generated audio signals, the Fréchet Audio Distance (FAD) suffers from significant limitations, including reliance on Gaussian assumptions, sensitivity to sample size, and high computational complexity. As an alternative, we introduce the Kernel Audio Distance (KAD), a novel, distribution-free, unbiased, and computationally efficient metric based on Max… ▽ More

    Submitted 9 March, 2025; v1 submitted 21 February, 2025; originally announced February 2025.

  50. arXiv:2502.10836  [pdf, other

    eess.SP

    Blind Massive MIMO for Dense IoT Networks

    Authors: Jeongjae Lee, Songnam Hong

    Abstract: In this paper, we investigate the downlink communication challenges in heavy-load Internet-of-Things (IoT) networks supported by frequency-division-duplexing (FDD) millimeter-wave (mmWave) massive multiple-input multiple-output (MIMO) systems. The excessive overhead required for obtaining channel state information at the transmitter (CSIT) is essential to achieve high spectral efficiency through c… ▽ More

    Submitted 15 February, 2025; originally announced February 2025.

    Comments: Submitted to IEEE Internet of Things Journal