Skip to main content

Showing 1–50 of 91 results for author: Choi, S

Searching in archive eess. Search in all archives.
.
  1. arXiv:2506.08846  [pdf, ps, other

    cs.CY cs.CL cs.SD eess.AS

    Addressing Pitfalls in Auditing Practices of Automatic Speech Recognition Technologies: A Case Study of People with Aphasia

    Authors: Katelyn Xiaoying Mei, Anna Seo Gyeong Choi, Hilke Schellmann, Mona Sloane, Allison Koenecke

    Abstract: Automatic Speech Recognition (ASR) has transformed daily tasks from video transcription to workplace hiring. ASR systems' growing use warrants robust and standardized auditing approaches to ensure automated transcriptions of high and equitable quality. This is especially critical for people with speech and language disorders (such as aphasia) who may disproportionately depend on ASR systems to nav… ▽ More

    Submitted 11 July, 2025; v1 submitted 10 June, 2025; originally announced June 2025.

  2. arXiv:2506.06311  [pdf, ps, other

    eess.SP cs.LG

    A Novel Shape-Aware Topological Representation for GPR Data with DNN Integration

    Authors: Meiyan Kang, Shizuo Kaji, Sang-Yun Lee, Taegon Kim, Hee-Hwan Ryu, Suyoung Choi

    Abstract: Ground Penetrating Radar (GPR) is a widely used Non-Destructive Testing (NDT) technique for subsurface exploration, particularly in infrastructure inspection and maintenance. However, conventional interpretation methods are often limited by noise sensitivity and a lack of structural awareness. This study presents a novel framework that enhances the detection of underground utilities, especially pi… ▽ More

    Submitted 10 July, 2025; v1 submitted 26 May, 2025; originally announced June 2025.

    Comments: 15 pages, 6 figures

  3. arXiv:2506.01129  [pdf, ps, other

    cs.SD eess.AS

    Comparative Evaluation of Acoustic Feature Extraction Tools for Clinical Speech Analysis

    Authors: Anna Seo Gyeong Choi, Alexander Richardson, Ryan Partlan, Sunny Tang, Sunghye Cho

    Abstract: This study compares three acoustic feature extraction toolkits (OpenSMILE, Praat, and Librosa) applied to clinical speech data from individuals with schizophrenia spectrum disorders (SSD) and healthy controls (HC). By standardizing extraction parameters across the toolkits, we analyzed speech samples from 77 SSD and 87 HC participants and found significant toolkit-dependent variations. While F0 pe… ▽ More

    Submitted 1 June, 2025; originally announced June 2025.

    Comments: Accepted to Interspeech 2025

  4. arXiv:2504.11555  [pdf, other

    math.OC cs.LG eess.SY stat.ML

    Sub-optimality of the Separation Principle for Quadratic Control from Bilinear Observations

    Authors: Yahya Sattar, Sunmook Choi, Yassir Jedra, Maryam Fazel, Sarah Dean

    Abstract: We consider the problem of controlling a linear dynamical system from bilinear observations with minimal quadratic cost. Despite the similarity of this problem to standard linear quadratic Gaussian (LQG) control, we show that when the observation model is bilinear, neither does the Separation Principle hold, nor is the optimal controller affine in the estimated state. Moreover, the cost-to-go is n… ▽ More

    Submitted 15 April, 2025; originally announced April 2025.

  5. arXiv:2503.19228  [pdf, ps, other

    eess.SY

    Bridging the Sim-to-real Gap: A Control Framework for Imitation Learning of Model Predictive Control

    Authors: Seungtaek Kim, Jonghyup Lee, Kyoungseok Han, Seibum B. Choi

    Abstract: To address the computational challenges of Model Predictive Control (MPC), recent research has studied using imitation learning to approximate the MPC to a computationally efficient Deep Neural Network (DNN). However, this introduces a common issue in learning-based control, the simulation-to-reality (sim-to-real) gap, and Domain Randomization (DR) has been widely used to mitigate this gap by intr… ▽ More

    Submitted 3 July, 2025; v1 submitted 24 March, 2025; originally announced March 2025.

  6. arXiv:2503.12891  [pdf

    eess.SY

    PD-Skygroundhook Controller for Semi-Active Suspension System Using Magnetorheological Fluid Dampers

    Authors: Hansol Lim, Jee Won Lee, Seung-Bok Choi, Jongseong Brad Choi

    Abstract: This paper presents a Proportional-Derivative (PD) Skygroundhook controller for magnetorheological (MR) dampers in semi-active suspensions. Traditional skyhook, Groundhook, and hybrid Skygroundhook controllers are well-known for their ability to reduce body and wheel vibrations; however, each approach has limitations in handling a broad frequency spectrum and often relies on abrupt switching. By a… ▽ More

    Submitted 17 March, 2025; originally announced March 2025.

    Comments: This work has been submitted to the 2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) for possible publication

  7. Data-Driven Mispronunciation Pattern Discovery for Robust Speech Recognition

    Authors: Anna Seo Gyeong Choi, Jonghyeon Park, Myungwoo Oh

    Abstract: Recent advancements in machine learning have significantly improved speech recognition, but recognizing speech from non-fluent or accented speakers remains a challenge. Previous efforts, relying on rule-based pronunciation patterns, have struggled to fully capture non-native errors. We propose two data-driven approaches using speech corpora to automatically detect mispronunciation patterns. By ali… ▽ More

    Submitted 1 February, 2025; originally announced February 2025.

    Comments: Accepted to ICASSP 2025

  8. arXiv:2501.01094  [pdf, other

    cs.SD cs.AI cs.MM eess.AS

    MMVA: Multimodal Matching Based on Valence and Arousal across Images, Music, and Musical Captions

    Authors: Suhwan Choi, Kyu Won Kim, Myungjoo Kang

    Abstract: We introduce Multimodal Matching based on Valence and Arousal (MMVA), a tri-modal encoder framework designed to capture emotional content across images, music, and musical captions. To support this framework, we expand the Image-Music-Emotion-Matching-Net (IMEMNet) dataset, creating IMEMNet-C which includes 24,756 images and 25,944 music clips with corresponding musical captions. We employ multimo… ▽ More

    Submitted 2 January, 2025; originally announced January 2025.

    Comments: Paper accepted in Artificial Intelligence for Music workshop at AAAI 2025

  9. arXiv:2411.16692  [pdf, other

    eess.SP cs.LG eess.SY

    Leveraging Conversational Generative AI for Anomaly Detection in Digital Substations

    Authors: Aydin Zaboli, Seong Lok Choi, Junho Hong

    Abstract: This study addresses critical challenges of cybersecurity in digital substations by proposing an innovative task-oriented dialogue (ToD) system for anomaly detection (AD) in multicast messages, specifically, generic object oriented substation event (GOOSE) and sampled value (SV) datasets. Leveraging generative artificial intelligence (GenAI) technology, the proposed framework demonstrates superior… ▽ More

    Submitted 9 November, 2024; originally announced November 2024.

    Comments: 5 pages, 4 figures, Submitted to 2025 IEEE Power and Energy Society General Meeting (PESGM 2025), Austin, TX

  10. arXiv:2411.15490  [pdf, other

    cs.CV cs.LG eess.IV

    Improving Factuality of 3D Brain MRI Report Generation with Paired Image-domain Retrieval and Text-domain Augmentation

    Authors: Junhyeok Lee, Yujin Oh, Dahyoun Lee, Hyon Keun Joh, Chul-Ho Sohn, Sung Hyun Baik, Cheol Kyu Jung, Jung Hyun Park, Kyu Sung Choi, Byung-Hoon Kim, Jong Chul Ye

    Abstract: Acute ischemic stroke (AIS) requires time-critical management, with hours of delayed intervention leading to an irreversible disability of the patient. Since diffusion weighted imaging (DWI) using the magnetic resonance image (MRI) plays a crucial role in the detection of AIS, automated prediction of AIS from DWI has been a research topic of clinical importance. While text radiology reports contai… ▽ More

    Submitted 23 November, 2024; originally announced November 2024.

  11. Scalable Wavelength Arbitration for Microring-based DWDM Transceivers

    Authors: Sunjin Choi, Vladimir Stojanović

    Abstract: This paper introduces the concept of autonomous microring arbitration, or wavelength arbitration, to address the challenge of multi-microring initialization in microring-based Dense-Wavelength-Division-Multiplexed (DWDM) transceivers. This arbitration is inherently policy-driven, defining critical system characteristics such as the spectral ordering of microrings. Furthermore, to facilitate large-… ▽ More

    Submitted 7 February, 2025; v1 submitted 22 November, 2024; originally announced November 2024.

  12. arXiv:2410.15609  [pdf, other

    cs.CL cs.SD eess.AS

    Interventional Speech Noise Injection for ASR Generalizable Spoken Language Understanding

    Authors: Yeonjoon Jung, Jaeseong Lee, Seungtaek Choi, Dohyeon Lee, Minsoo Kim, Seung-won Hwang

    Abstract: Recently, pre-trained language models (PLMs) have been increasingly adopted in spoken language understanding (SLU). However, automatic speech recognition (ASR) systems frequently produce inaccurate transcriptions, leading to noisy inputs for SLU models, which can significantly degrade their performance. To address this, our objective is to train SLU models to withstand ASR errors by exposing them… ▽ More

    Submitted 20 October, 2024; originally announced October 2024.

    Comments: 9 pages, 3 figures

  13. arXiv:2410.13839  [pdf, other

    cs.SD cs.AI eess.AS

    Accelerating Codec-based Speech Synthesis with Multi-Token Prediction and Speculative Decoding

    Authors: Tan Dat Nguyen, Ji-Hoon Kim, Jeongsoo Choi, Shukjae Choi, Jinseok Park, Younglo Lee, Joon Son Chung

    Abstract: The goal of this paper is to accelerate codec-based speech synthesis systems with minimum sacrifice to speech quality. We propose an enhanced inference method that allows for flexible trade-offs between speed and quality during inference without requiring additional training. Our core idea is to predict multiple tokens per inference step of the AR module using multiple prediction heads, resulting… ▽ More

    Submitted 17 October, 2024; originally announced October 2024.

    Comments: Submitted to IEEE ICASSP 2025

  14. arXiv:2410.03192  [pdf, other

    eess.AS cs.AI cs.SD

    MultiVerse: Efficient and Expressive Zero-Shot Multi-Task Text-to-Speech

    Authors: Taejun Bak, Youngsik Eom, SeungJae Choi, Young-Sun Joo

    Abstract: Text-to-speech (TTS) systems that scale up the amount of training data have achieved significant improvements in zero-shot speech synthesis. However, these systems have certain limitations: they require a large amount of training data, which increases costs, and often overlook prosody similarity. To address these issues, we propose MultiVerse, a zero-shot multi-task TTS system that is able to perf… ▽ More

    Submitted 4 October, 2024; originally announced October 2024.

    Comments: Accepted to EMNLP 2024 Findings

  15. arXiv:2409.11910  [pdf, other

    eess.IV cs.CV

    Tumor aware recurrent inter-patient deformable image registration of computed tomography scans with lung cancer

    Authors: Jue Jiang, Chloe Min Seo Choi, Maria Thor, Joseph O. Deasy, Harini Veeraraghavan

    Abstract: Background: Voxel-based analysis (VBA) for population level radiotherapy (RT) outcomes modeling requires topology preserving inter-patient deformable image registration (DIR) that preserves tumors on moving images while avoiding unrealistic deformations due to tumors occurring on fixed images. Purpose: We developed a tumor-aware recurrent registration (TRACER) deep learning (DL) method and evaluat… ▽ More

    Submitted 18 September, 2024; originally announced September 2024.

    Comments: Minor revision under the journal of Medical Physics

    Journal ref: Medical Physics 2024

  16. arXiv:2409.06603  [pdf, other

    cs.CV eess.IV

    A Practical Gated Recurrent Transformer Network Incorporating Multiple Fusions for Video Denoising

    Authors: Kai Guo, Seungwon Choi, Jongseong Choi, Lae-Hoon Kim

    Abstract: State-of-the-art (SOTA) video denoising methods employ multi-frame simultaneous denoising mechanisms, resulting in significant delays (e.g., 16 frames), making them impractical for real-time cameras. To overcome this limitation, we propose a multi-fusion gated recurrent Transformer network (GRTN) that achieves SOTA denoising performance with only a single-frame delay. Specifically, the spatial den… ▽ More

    Submitted 10 September, 2024; originally announced September 2024.

    Comments: 5 pages, 5 figures

  17. arXiv:2409.03143  [pdf, other

    cs.GR eess.IV physics.optics

    Large Étendue 3D Holographic Display with Content-adaptive Dynamic Fourier Modulation

    Authors: Brian Chao, Manu Gopakumar, Suyeon Choi, Jonghyun Kim, Liang Shi, Gordon Wetzstein

    Abstract: Emerging holographic display technology offers unique capabilities for next-generation virtual reality systems. Current holographic near-eye displays, however, only support a small étendue, which results in a direct tradeoff between achievable field of view and eyebox size. Étendue expansion has recently been explored, but existing approaches are either fundamentally limited in the image quality t… ▽ More

    Submitted 23 November, 2024; v1 submitted 4 September, 2024; originally announced September 2024.

    Comments: 12 pages, 7 figures, to be published in SIGGRAPH Asia 2024. Project website: https://bchao1.github.io/holo_dfm/

  18. arXiv:2407.09434  [pdf, other

    cs.LG cs.AI cs.CE eess.SY

    Foundation Models for the Electric Power Grid

    Authors: Hendrik F. Hamann, Thomas Brunschwiler, Blazhe Gjorgiev, Leonardo S. A. Martins, Alban Puech, Anna Varbella, Jonas Weiss, Juan Bernabe-Moreno, Alexandre Blondin Massé, Seong Choi, Ian Foster, Bri-Mathias Hodge, Rishabh Jain, Kibaek Kim, Vincent Mai, François Mirallès, Martin De Montigny, Octavio Ramos-Leaños, Hussein Suprême, Le Xie, El-Nasser S. Youssef, Arnaud Zinflou, Alexander J. Belyi, Ricardo J. Bessa, Bishnu Prasad Bhattarai , et al. (2 additional authors not shown)

    Abstract: Foundation models (FMs) currently dominate news headlines. They employ advanced deep learning architectures to extract structural information autonomously from vast datasets through self-supervision. The resulting rich representations of complex systems and dynamics can be applied to many downstream applications. Therefore, FMs can find uses in electric power grids, challenged by the energy transi… ▽ More

    Submitted 12 November, 2024; v1 submitted 12 July, 2024; originally announced July 2024.

    Comments: Major equal contributors: H.F.H., T.B., B.G., L.S.A.M., A.P., A.V., J.W.; Significant equal contributors: J.B., A.B.M., S.C., I.F., B.H., R.J., K.K., V.M., F.M., M.D.M., O.R., H.S., L.X., E.S.Y., A.Z.; Other equal contributors: A.J.B., R.J.B., B.P.B., J.S., S.S; Lead contact: H.F.H

  19. arXiv:2406.15725  [pdf, other

    eess.AS cs.SD

    Self Training and Ensembling Frequency Dependent Networks with Coarse Prediction Pooling and Sound Event Bounding Boxes

    Authors: Hyeonuk Nam, Deokki Min, Seungdeok Choi, Inhan Choi, Yong-Hwa Park

    Abstract: To tackle sound event detection (SED), we propose frequency dependent networks (FreDNets), which heavily leverage frequency-dependent methods. We apply frequency warping and FilterAugment, which are frequency-dependent data augmentation methods. The model architecture consists of 3 branches: audio teacher-student transformer (ATST) branch, BEATs branch and CNN branch including either partial dilat… ▽ More

    Submitted 19 September, 2024; v1 submitted 22 June, 2024; originally announced June 2024.

    Comments: DCASE 2024 Challenge Task 4 technical report, DCASE 2024 Workshop accepted

  20. arXiv:2406.05472  [pdf, other

    cs.CR eess.SY

    A Novel Generative AI-Based Framework for Anomaly Detection in Multicast Messages in Smart Grid Communications

    Authors: Aydin Zaboli, Seong Lok Choi, Tai-Jin Song, Junho Hong

    Abstract: Cybersecurity breaches in digital substations can pose significant challenges to the stability and reliability of power system operations. To address these challenges, defense and mitigation techniques are required. Identifying and detecting anomalies in information and communication technology (ICT) is crucial to ensure secure device interactions within digital substations. This paper proposes a… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

    Comments: 10 pages, 10 figures, Submitted to IEEE Transactions on Information Forensics and Security

  21. arXiv:2405.01591  [pdf, other

    cs.CL cs.AI eess.IV

    Simplifying Multimodality: Unimodal Approach to Multimodal Challenges in Radiology with General-Domain Large Language Model

    Authors: Seonhee Cho, Choonghan Kim, Jiho Lee, Chetan Chilkunda, Sujin Choi, Joo Heung Yoon

    Abstract: Recent advancements in Large Multimodal Models (LMMs) have attracted interest in their generalization capability with only a few samples in the prompt. This progress is particularly relevant to the medical domain, where the quality and sensitivity of data pose unique challenges for model training and application. However, the dependency on high-quality data for effective in-context learning raises… ▽ More

    Submitted 29 April, 2024; originally announced May 2024.

    Comments: Under review

  22. arXiv:2402.17127  [pdf, other

    cs.SD eess.AS

    Experimental Study: Enhancing Voice Spoofing Detection Models with wav2vec 2.0

    Authors: Taein Kang, Soyul Han, Sunmook Choi, Jaejin Seo, Sanghyeok Chung, Seungeun Lee, Seungsang Oh, Il-Youp Kwak

    Abstract: Conventional spoofing detection systems have heavily relied on the use of handcrafted features derived from speech data. However, a notable shift has recently emerged towards the direct utilization of raw speech waveforms, as demonstrated by methods like SincNet filters. This shift underscores the demand for more sophisticated audio sample features. Moreover, the success of deep learning models, p… ▽ More

    Submitted 26 February, 2024; originally announced February 2024.

    Comments: 5 pages

    MSC Class: 00A71 ACM Class: I.2.6

  23. arXiv:2401.13498  [pdf, other

    cs.SD cs.AI cs.LG eess.AS eess.SP

    Expressive Acoustic Guitar Sound Synthesis with an Instrument-Specific Input Representation and Diffusion Outpainting

    Authors: Hounsu Kim, Soonbeom Choi, Juhan Nam

    Abstract: Synthesizing performing guitar sound is a highly challenging task due to the polyphony and high variability in expression. Recently, deep generative models have shown promising results in synthesizing expressive polyphonic instrument sounds from music scores, often using a generic MIDI input. In this work, we propose an expressive acoustic guitar sound synthesis model with a customized input repre… ▽ More

    Submitted 24 January, 2024; originally announced January 2024.

    Comments: Accepted to ICASSP 2024

  24. arXiv:2401.12473  [pdf, other

    eess.AS cs.SD

    Boosting Unknown-number Speaker Separation with Transformer Decoder-based Attractor

    Authors: Younglo Lee, Shukjae Choi, Byeong-Yeol Kim, Zhong-Qiu Wang, Shinji Watanabe

    Abstract: We propose a novel speech separation model designed to separate mixtures with an unknown number of speakers. The proposed model stacks 1) a dual-path processing block that can model spectro-temporal patterns, 2) a transformer decoder-based attractor (TDA) calculation module that can deal with an unknown number of speakers, and 3) triple-path processing blocks that can model inter-speaker relations… ▽ More

    Submitted 22 January, 2024; originally announced January 2024.

    Comments: 5 pages, 4 figures, accepted by ICASSP 2024

  25. arXiv:2311.18287  [pdf, other

    eess.IV cs.CV cs.GR

    Dispersed Structured Light for Hyperspectral 3D Imaging

    Authors: Suhyun Shin, Seokjun Choi, Felix Heide, Seung-Hwan Baek

    Abstract: Hyperspectral 3D imaging aims to acquire both depth and spectral information of a scene. However, existing methods are either prohibitively expensive and bulky or compromise on spectral and depth accuracy. In this work, we present Dispersed Structured Light (DSL), a cost-effective and compact method for accurate hyperspectral 3D imaging. DSL modifies a traditional projector-camera system by placin… ▽ More

    Submitted 25 March, 2024; v1 submitted 30 November, 2023; originally announced November 2023.

  26. arXiv:2311.05462  [pdf, other

    cs.CR eess.SY

    ChatGPT and Other Large Language Models for Cybersecurity of Smart Grid Applications

    Authors: Aydin Zaboli, Seong Lok Choi, Tai-Jin Song, Junho Hong

    Abstract: Cybersecurity breaches targeting electrical substations constitute a significant threat to the integrity of the power grid, necessitating comprehensive defense and mitigation strategies. Any anomaly in information and communication technology (ICT) should be detected for secure communications between devices in digital substations. This paper proposes large language models (LLM), e.g., ChatGPT, fo… ▽ More

    Submitted 25 February, 2024; v1 submitted 9 November, 2023; originally announced November 2023.

    Comments: 5 pages, 2 figures, Accepted, 2024 IEEE Power & Energy Society General Meeting (PESGM), Seattle, WA, USA

  27. arXiv:2311.00332  [pdf, other

    q-bio.TO cs.CV eess.IV

    SDF4CHD: Generative Modeling of Cardiac Anatomies with Congenital Heart Defects

    Authors: Fanwei Kong, Sascha Stocker, Perry S. Choi, Michael Ma, Daniel B. Ennis, Alison Marsden

    Abstract: Congenital heart disease (CHD) encompasses a spectrum of cardiovascular structural abnormalities, often requiring customized treatment plans for individual patients. Computational modeling and analysis of these unique cardiac anatomies can improve diagnosis and treatment planning and may ultimately lead to improved outcomes. Deep learning (DL) methods have demonstrated the potential to enable effi… ▽ More

    Submitted 8 November, 2023; v1 submitted 1 November, 2023; originally announced November 2023.

  28. arXiv:2310.10633  [pdf, other

    physics.optics eess.IV

    Telescope imaging beyond the Rayleigh limit in extremely low SNR

    Authors: Hyunsoo Choi, Seungman Choi, Peter Menart, Angshuman Deka, Zubin Jacob

    Abstract: The Rayleigh limit and low Signal-to-Noise Ratio (SNR) scenarios pose significant limitations to optical imaging systems used in remote sensing, infrared thermal imaging, and space domain awareness. In this study, we introduce a Stochastic Sub-Rayleigh Imaging (SSRI) algorithm to localize point objects and estimate their positions, brightnesses, and number in low SNR conditions, even below the Ray… ▽ More

    Submitted 17 January, 2024; v1 submitted 16 October, 2023; originally announced October 2023.

    Comments: 18 pages, 5 figures

  29. arXiv:2310.06364  [pdf, other

    cs.SD cs.AI eess.AS

    Noisy-ArcMix: Additive Noisy Angular Margin Loss Combined With Mixup Anomalous Sound Detection

    Authors: Soonhyeon Choi, Jung-Woo Choi

    Abstract: Unsupervised anomalous sound detection (ASD) aims to identify anomalous sounds by learning the features of normal operational sounds and sensing their deviations. Recent approaches have focused on the self-supervised task utilizing the classification of normal data, and advanced models have shown that securing representation space for anomalous data is important through representation learning yie… ▽ More

    Submitted 10 October, 2023; originally announced October 2023.

    Comments: Submitted to ICASSP 2024

  30. arXiv:2309.07937  [pdf, other

    eess.AS cs.LG cs.SD

    Voxtlm: unified decoder-only models for consolidating speech recognition/synthesis and speech/text continuation tasks

    Authors: Soumi Maiti, Yifan Peng, Shukjae Choi, Jee-weon Jung, Xuankai Chang, Shinji Watanabe

    Abstract: We propose a decoder-only language model, VoxtLM, that can perform four tasks: speech recognition, speech synthesis, text generation, and speech continuation. VoxtLM integrates text vocabulary with discrete speech tokens from self-supervised speech features and uses special tokens to enable multitask learning. Compared to a single-task model, VoxtLM exhibits a significant improvement in speech syn… ▽ More

    Submitted 24 January, 2024; v1 submitted 13 September, 2023; originally announced September 2023.

  31. arXiv:2308.02133  [pdf, other

    eess.SP

    NeuralEQ: Neural-Network-Based Equalizer for High-Speed Wireline Communication

    Authors: Hanseok Kim, Jae Hyung Ju, Hyun Seok Choi, Hyeri Roh, Woo-Seok Choi

    Abstract: With the growing demand for high-bandwidth applications like video streaming and cloud services, the data transfer rates required for wireline communication keeps increasing, making the channel loss a major obstacle in achieving low bit error rate (BER). Equalization techniques such as feed-forward equalizer (FFE) and decision feedback equalizer (DFE) are commonly used to compensate for channel lo… ▽ More

    Submitted 4 August, 2023; originally announced August 2023.

  32. arXiv:2306.14310  [pdf, other

    cs.CL cs.SD eess.AS

    Addressing Cold Start Problem for End-to-end Automatic Speech Scoring

    Authors: Jungbae Park, Seungtaek Choi

    Abstract: Integrating automatic speech scoring/assessment systems has become a critical aspect of second-language speaking education. With self-supervised learning advancements, end-to-end speech scoring approaches have exhibited promising results. However, this study highlights the significant decrease in the performance of speech scoring systems in new question contexts, thereby identifying this as a cold… ▽ More

    Submitted 25 June, 2023; originally announced June 2023.

    Comments: Accepted at Interspeech 2023, 4 pages, 1 page for reference

  33. arXiv:2306.06340  [pdf, other

    eess.SP cs.LG q-bio.QM

    ECGBERT: Understanding Hidden Language of ECGs with Self-Supervised Representation Learning

    Authors: Seokmin Choi, Sajad Mousavi, Phillip Si, Haben G. Yhdego, Fatemeh Khadem, Fatemeh Afghah

    Abstract: In the medical field, current ECG signal analysis approaches rely on supervised deep neural networks trained for specific tasks that require substantial amounts of labeled data. However, our paper introduces ECGBERT, a self-supervised representation learning approach that unlocks the underlying language of ECGs. By unsupervised pre-training of the model, we mitigate challenges posed by the lack of… ▽ More

    Submitted 10 June, 2023; originally announced June 2023.

  34. An empirical study on speech restoration guided by self supervised speech representation

    Authors: Jaeuk Byun, Youna Ji, Soo Whan Chung, Soyeon Choe, Min Seok Choi

    Abstract: Enhancing speech quality is an indispensable yet difficult task as it is often complicated by a range of degradation factors. In addition to additive noise, reverberation, clipping, and speech attenuation can all adversely affect speech quality. Speech restoration aims to recover speech components from these distortions. This paper focuses on exploring the impact of self-supervised speech represen… ▽ More

    Submitted 30 May, 2023; originally announced May 2023.

    Comments: To be presented at ICASSP 2023

  35. arXiv:2305.08878  [pdf, other

    eess.IV cs.CV cs.LG

    Learning to Learn Unlearned Feature for Brain Tumor Segmentation

    Authors: Seungyub Han, Yeongmo Kim, Seokhyeon Ha, Jungwoo Lee, Seunghong Choi

    Abstract: We propose a fine-tuning algorithm for brain tumor segmentation that needs only a few data samples and helps networks not to forget the original tasks. Our approach is based on active learning and meta-learning. One of the difficulties in medical image segmentation is the lack of datasets with proper annotations, because it requires doctors to tag reliable annotation and there are many variants of… ▽ More

    Submitted 13 May, 2023; originally announced May 2023.

    Comments: Medical Imaging Meets NeurIPS 2018

  36. arXiv:2304.08707  [pdf, other

    eess.AS cs.SD

    Neural Speech Enhancement with Very Low Algorithmic Latency and Complexity via Integrated Full- and Sub-Band Modeling

    Authors: Zhong-Qiu Wang, Samuele Cornell, Shukjae Choi, Younglo Lee, Byeong-Yeol Kim, Shinji Watanabe

    Abstract: We propose FSB-LSTM, a novel long short-term memory (LSTM) based architecture that integrates full- and sub-band (FSB) modeling, for single- and multi-channel speech enhancement in the short-time Fourier transform (STFT) domain. The model maintains an information highway to flow an over-complete input representation through multiple FSB-LSTM modules. Each FSB-LSTM module consists of a full-band bl… ▽ More

    Submitted 17 April, 2023; originally announced April 2023.

    Comments: in ICASSP 2023

  37. arXiv:2304.02389  [pdf, other

    eess.IV cs.CV cs.LG

    DRAC: Diabetic Retinopathy Analysis Challenge with Ultra-Wide Optical Coherence Tomography Angiography Images

    Authors: Bo Qian, Hao Chen, Xiangning Wang, Haoxuan Che, Gitaek Kwon, Jaeyoung Kim, Sungjin Choi, Seoyoung Shin, Felix Krause, Markus Unterdechler, Junlin Hou, Rui Feng, Yihao Li, Mostafa El Habib Daho, Qiang Wu, Ping Zhang, Xiaokang Yang, Yiyu Cai, Weiping Jia, Huating Li, Bin Sheng

    Abstract: Computer-assisted automatic analysis of diabetic retinopathy (DR) is of great importance in reducing the risks of vision loss and even blindness. Ultra-wide optical coherence tomography angiography (UW-OCTA) is a non-invasive and safe imaging modality in DR diagnosis system, but there is a lack of publicly available benchmarks for model development and evaluation. To promote further research and s… ▽ More

    Submitted 5 April, 2023; originally announced April 2023.

  38. arXiv:2304.00471  [pdf, other

    cs.SD cs.CV cs.GR cs.LG eess.AS

    A Unified Compression Framework for Efficient Speech-Driven Talking-Face Generation

    Authors: Bo-Kyeong Kim, Jaemin Kang, Daeun Seo, Hancheol Park, Shinkook Choi, Hyoung-Kyu Song, Hyungshin Kim, Sungsu Lim

    Abstract: Virtual humans have gained considerable attention in numerous industries, e.g., entertainment and e-commerce. As a core technology, synthesizing photorealistic face frames from target speech and facial identity has been actively studied with generative adversarial networks. Despite remarkable results of modern talking-face generation models, they often entail high computational burdens, which limi… ▽ More

    Submitted 28 April, 2023; v1 submitted 2 April, 2023; originally announced April 2023.

    Comments: MLSys Workshop on On-Device Intelligence, 2023; Demo: https://huggingface.co/spaces/nota-ai/compressed_wav2lip

  39. arXiv:2303.16511  [pdf, other

    eess.AS

    Joint unsupervised and supervised learning for context-aware language identification

    Authors: Jinseok Park, Hyung Yong Kim, Jihwan Park, Byeong-Yeol Kim, Shukjae Choi, Yunkyu Lim

    Abstract: Language identification (LID) recognizes the language of a spoken utterance automatically. According to recent studies, LID models trained with an automatic speech recognition (ASR) task perform better than those trained with a LID task only. However, we need additional text labels to train the model to recognize speech, and acquiring the text labels is a cost high. In order to overcome this probl… ▽ More

    Submitted 14 April, 2023; v1 submitted 29 March, 2023; originally announced March 2023.

    Comments: Accepted by ICASSP 2023

  40. arXiv:2303.07592  [pdf, other

    eess.AS cs.SD

    Lightweight feature encoder for wake-up word detection based on self-supervised speech representation

    Authors: Hyungjun Lim, Younggwan Kim, Kiho Yeom, Eunjoo Seo, Hoodong Lee, Stanley Jungkyu Choi, Honglak Lee

    Abstract: Self-supervised learning method that provides generalized speech representations has recently received increasing attention. Wav2vec 2.0 is the most famous example, showing remarkable performance in numerous downstream speech processing tasks. Despite its success, it is challenging to use it directly for wake-up word detection on mobile devices due to its expensive computational cost. In this work… ▽ More

    Submitted 13 March, 2023; originally announced March 2023.

    Comments: Accepted by ICASSP 2023

  41. arXiv:2303.01105  [pdf, other

    eess.IV cs.CV cs.LG

    Evidence-empowered Transfer Learning for Alzheimer's Disease

    Authors: Kai Tzu-iunn Ong, Hana Kim, Minjin Kim, Jinseong Jang, Beomseok Sohn, Yoon Seong Choi, Dosik Hwang, Seong Jae Hwang, Jinyoung Yeo

    Abstract: Transfer learning has been widely utilized to mitigate the data scarcity problem in the field of Alzheimer's disease (AD). Conventional transfer learning relies on re-using models trained on AD-irrelevant tasks such as natural image classification. However, it often leads to negative transfer due to the discrepancy between the non-medical source and target medical domains. To address this, we pres… ▽ More

    Submitted 17 April, 2023; v1 submitted 2 March, 2023; originally announced March 2023.

    Comments: Accepted to IEEE International Symposium on Biomedical Imaging (ISBI) 2023. The authorship was changed from co-first authors to a single first author, which was authorized by the adviser/corresponding author Jinyoung Yeo (Apr 18th, 2023)

  42. arXiv:2211.15948  [pdf, other

    cs.SD eess.AS

    Neural Vocoder Feature Estimation for Dry Singing Voice Separation

    Authors: Jaekwon Im, Soonbeom Choi, Sangeon Yong, Juhan Nam

    Abstract: Singing voice separation (SVS) is a task that separates singing voice audio from its mixture with instrumental audio. Previous SVS studies have mainly employed the spectrogram masking method which requires a large dimensionality in predicting the binary masks. In addition, they focused on extracting a vocal stem that retains the wet sound with the reverberation effect. This result may hinder the r… ▽ More

    Submitted 29 November, 2022; originally announced November 2022.

    Comments: 6 pages, 4 figures

    Journal ref: 14th Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2022

  43. arXiv:2211.12433  [pdf, other

    cs.SD eess.AS

    TF-GridNet: Integrating Full- and Sub-Band Modeling for Speech Separation

    Authors: Zhong-Qiu Wang, Samuele Cornell, Shukjae Choi, Younglo Lee, Byeong-Yeol Kim, Shinji Watanabe

    Abstract: We propose TF-GridNet for speech separation. The model is a novel deep neural network (DNN) integrating full- and sub-band modeling in the time-frequency (T-F) domain. It stacks several blocks, each consisting of an intra-frame full-band module, a sub-band temporal module, and a cross-frame self-attention module. It is trained to perform complex spectral mapping, where the real and imaginary (RI)… ▽ More

    Submitted 4 August, 2023; v1 submitted 22 November, 2022; originally announced November 2022.

    Comments: In IEEE/ACM Transactions on Audio, Speech, and Language Processing. A sound demo is available at https://zqwang7.github.io/demos/TF-GridNet-demo/index.html, and the code is available at https://github.com/espnet/espnet/pull/5395

  44. arXiv:2210.09135  [pdf, other

    cs.CV eess.IV

    Gated Recurrent Unit for Video Denoising

    Authors: Kai Guo, Seungwon Choi, Jongseong Choi

    Abstract: Current video denoising methods perform temporal fusion by designing convolutional neural networks (CNN) or combine spatial denoising with temporal fusion into basic recurrent neural networks (RNNs). However, there have not yet been works which adapt gated recurrent unit (GRU) mechanisms for video denoising. In this letter, we propose a new video denoising model based on GRU, namely GRU-VD. First,… ▽ More

    Submitted 17 October, 2022; originally announced October 2022.

    Comments: 5 pages, 5 figures

    MSC Class: 62H35; 68U10 ACM Class: I.4.4

  45. arXiv:2209.03952  [pdf, other

    cs.SD eess.AS

    TF-GridNet: Making Time-Frequency Domain Models Great Again for Monaural Speaker Separation

    Authors: Zhong-Qiu Wang, Samuele Cornell, Shukjae Choi, Younglo Lee, Byeong-Yeol Kim, Shinji Watanabe

    Abstract: We propose TF-GridNet, a novel multi-path deep neural network (DNN) operating in the time-frequency (T-F) domain, for monaural talker-independent speaker separation in anechoic conditions. The model stacks several multi-path blocks, each consisting of an intra-frame spectral module, a sub-band temporal module, and a full-band self-attention module, to leverage local and global spectro-temporal inf… ▽ More

    Submitted 15 March, 2023; v1 submitted 8 September, 2022; originally announced September 2022.

    Comments: in IEEE ICASSP 2023

  46. Optimal Parking Planning for Shared Autonomous Vehicles

    Authors: Seongjin Choi, Jinwoo Lee

    Abstract: Parking is a crucial element of the driving experience in urban transportation systems. Especially in the coming era of Shared Autonomous Vehicles (SAVs), parking operations in urban transportation networks will inevitably change. Parking stations will serve as storage places for unused vehicles and depots that control the level-of-service of SAVs. This study presents an Analytical Parking Plannin… ▽ More

    Submitted 7 August, 2022; originally announced August 2022.

    Comments: 27 pages, 9 figures, 9 tables

  47. arXiv:2206.12059  [pdf

    eess.AS cs.SD

    Data Augmentation and Squeeze-and-Excitation Network on Multiple Dimension for Sound Event Localization and Detection in Real Scenes

    Authors: Byeong-Yun Ko, Hyeonuk Nam, Seong-Hu Kim, Deokki Min, Seung-Deok Choi, Yong-Hwa Park

    Abstract: Performance of sound event localization and detection (SELD) in real scenes is limited by small size of SELD dataset, due to difficulty in obtaining sufficient amount of realistic multi-channel audio data recordings with accurate label. We used two main strategies to solve problems arising from the small real SELD dataset. First, we applied various data augmentation methods on all data dimensions:… ▽ More

    Submitted 23 June, 2022; originally announced June 2022.

    Comments: Technical Report submitted for DCASE2022 Challenge Task3

  48. arXiv:2206.11645  [pdf, ps, other

    eess.AS

    Frequency Dependent Sound Event Detection for DCASE 2022 Challenge Task 4

    Authors: Hyeonuk Nam, Seong-Hu Kim, Deokki Min, Byeong-Yun Ko, Seung-Deok Choi, Yong-Hwa Park

    Abstract: While many deep learning methods on other domains have been applied to sound event detection (SED), differences between original domains of the methods and SED have not been appropriately considered so far. As SED uses audio data with two dimensions (time and frequency) for input, thorough comprehension on these two dimensions is essential for application of methods from other domains on SED. Prev… ▽ More

    Submitted 23 June, 2022; originally announced June 2022.

    Comments: Technical Reprot submitted for DCASE2022 Challenge Task4

  49. arXiv:2206.03612  [pdf, other

    cs.CV cs.AI cs.LG eess.SP

    Predictive Modeling of Charge Levels for Battery Electric Vehicles using CNN EfficientNet and IGTD Algorithm

    Authors: Seongwoo Choi, Chongzhou Fang, David Haddad, Minsung Kim

    Abstract: Convolutional Neural Networks (CNN) have been a good solution for understanding a vast image dataset. As the increased number of battery-equipped electric vehicles is flourishing globally, there has been much research on understanding which charge levels electric vehicle drivers would choose to charge their vehicles to get to their destination without any prevention. We implemented deep learning a… ▽ More

    Submitted 7 June, 2022; originally announced June 2022.

  50. Federated Learning Enables Big Data for Rare Cancer Boundary Detection

    Authors: Sarthak Pati, Ujjwal Baid, Brandon Edwards, Micah Sheller, Shih-Han Wang, G Anthony Reina, Patrick Foley, Alexey Gruzdev, Deepthi Karkada, Christos Davatzikos, Chiharu Sako, Satyam Ghodasara, Michel Bilello, Suyash Mohan, Philipp Vollmuth, Gianluca Brugnara, Chandrakanth J Preetha, Felix Sahm, Klaus Maier-Hein, Maximilian Zenk, Martin Bendszus, Wolfgang Wick, Evan Calabrese, Jeffrey Rudie, Javier Villanueva-Meyer , et al. (254 additional authors not shown)

    Abstract: Although machine learning (ML) has shown promise in numerous domains, there are concerns about generalizability to out-of-sample data. This is currently addressed by centrally sharing ample, and importantly diverse, data from multiple sites. However, such centralization is challenging to scale (or even not feasible) due to various limitations. Federated ML (FL) provides an alternative to train acc… ▽ More

    Submitted 25 April, 2022; v1 submitted 22 April, 2022; originally announced April 2022.

    Comments: federated learning, deep learning, convolutional neural network, segmentation, brain tumor, glioma, glioblastoma, FeTS, BraTS