Skip to main content

Showing 1–50 of 186 results for author: Kim, K

Searching in archive eess. Search in all archives.
.
  1. arXiv:2506.21174  [pdf

    eess.AS cs.LG

    Performance improvement of spatial semantic segmentation with enriched audio features and agent-based error correction for DCASE 2025 Challenge Task 4

    Authors: Jongyeon Park, Joonhee Lee, Do-Hyeon Lim, Hong Kook Kim, Hyeongcheol Geum, Jeong Eun Lim

    Abstract: This technical report presents submission systems for Task 4 of the DCASE 2025 Challenge. This model incorporates additional audio features (spectral roll-off and chroma features) into the embedding feature extracted from the mel-spectral feature to im-prove the classification capabilities of an audio-tagging model in the spatial semantic segmentation of sound scenes (S5) system. This approach is… ▽ More

    Submitted 26 June, 2025; originally announced June 2025.

    Comments: DCASE 2025 challenge Task4, 5 pages

  2. arXiv:2506.01947  [pdf, ps, other

    eess.IV cs.CV

    RAW Image Reconstruction from RGB on Smartphones. NTIRE 2025 Challenge Report

    Authors: Marcos V. Conde, Radu Timofte, Radu Berdan, Beril Besbinar, Daisuke Iso, Pengzhou Ji, Xiong Dun, Zeying Fan, Chen Wu, Zhansheng Wang, Pengbo Zhang, Jiazi Huang, Qinglin Liu, Wei Yu, Shengping Zhang, Xiangyang Ji, Kyungsik Kim, Minkyung Kim, Hwalmin Lee, Hekun Ma, Huan Zheng, Yanyan Wei, Zhao Zhang, Jing Fang, Meilin Gao , et al. (8 additional authors not shown)

    Abstract: Numerous low-level vision tasks operate in the RAW domain due to its linear properties, bit depth, and sensor designs. Despite this, RAW image datasets are scarce and more expensive to collect than the already large and public sRGB datasets. For this reason, many approaches try to generate realistic RAW images using sensor information and sRGB images. This paper covers the second challenge on RAW… ▽ More

    Submitted 2 June, 2025; originally announced June 2025.

    Comments: CVPR 2025 - New Trends in Image Restoration and Enhancement (NTIRE)

  3. arXiv:2505.21607  [pdf

    physics.optics eess.SP

    A Comb-based Colorless Coherent WDM Transmitter

    Authors: Di Che, Brian Stern, Kwangwoong Kim, Cagri Ozdilek, Timofey Shpakovsky, John D. Jost, Maxim Karpov

    Abstract: We propose a comb-based WDM transmitter capable of modulating independent signals to comb lines without demultiplexing them and prove its concept and potential scalability in a WDM transmitter consisting of a Kerr microcomb and a silicon I/Q modulator array.

    Submitted 27 May, 2025; originally announced May 2025.

    Comments: Published in OFC'2025, Postdeadline Th4B.4 (updated version)

  4. Constant-Sum High-Order Barrier Functions for Safety Between Parallel Boundaries

    Authors: Kwang Hak Kim, Mamadou Diagne, Miroslav Krstić

    Abstract: This paper takes a step towards addressing the difficulty of constructing Control Barrier Functions (CBFs) for parallel safety boundaries. A single CBF for both boundaries has been reported to be difficult to validate for safety, and we identify why this challenge is inherent. To overcome this, the proposed method constructs separate CBFs for each boundary. We begin by presenting results for the r… ▽ More

    Submitted 21 May, 2025; originally announced May 2025.

    Comments: Submitted to IEEE L-CSS and 2025 Conference on Decision and Control (CDC)

  5. arXiv:2504.18157  [pdf, other

    eess.AS cs.SD

    DOSE : Drum One-Shot Extraction from Music Mixture

    Authors: Suntae Hwang, Seonghyeon Kang, Kyungsu Kim, Semin Ahn, Kyogu Lee

    Abstract: Drum one-shot samples are crucial for music production, particularly in sound design and electronic music. This paper introduces Drum One-Shot Extraction, a task in which the goal is to extract drum one-shots that are present in the music mixture. To facilitate this, we propose the Random Mixture One-shot Dataset (RMOD), comprising large-scale, randomly arranged music mixtures paired with correspo… ▽ More

    Submitted 25 April, 2025; originally announced April 2025.

    Comments: Published in IEEE ICASSP 2025

  6. arXiv:2503.19735  [pdf

    eess.IV cs.CV

    InterSliceBoost: Identifying Tissue Layers in Three-dimensional Ultrasound Images for Chronic Lower Back Pain (cLBP) Assessment

    Authors: Zixue Zeng, Matthew Cartier, Xiaoyan Zhao, Pengyu Chen, Xin Meng, Zhiyu Sheng, Maryam Satarpour, John M Cormack, Allison C. Bean, Ryan P. Nussbaum, Maya Maurer, Emily Landis-Walkenhorst, Kang Kim, Ajay D. Wasan, Jiantao Pu

    Abstract: Available studies on chronic lower back pain (cLBP) typically focus on one or a few specific tissues rather than conducting a comprehensive layer-by-layer analysis. Since three-dimensional (3-D) images often contain hundreds of slices, manual annotation of these anatomical structures is both time-consuming and error-prone. We aim to develop and validate a novel approach called InterSliceBoost to e… ▽ More

    Submitted 25 March, 2025; originally announced March 2025.

  7. arXiv:2502.17528  [pdf, other

    eess.SP

    Temperature Compensation Method of Six-Axis Force/Torque Sensor Using Gated Recurrent Unit

    Authors: Hyun-Bin Kim, Seokju Lee, Byeong-Il Ham, Kyung-Soo Kim

    Abstract: This study aims to enhance the accuracy of a six-axis force/torque sensor compared to existing approaches that utilize Multi-Layer Perceptron (MLP) and the Least Square Method. The sensor used in this study is based on a photo-coupler and operates with infrared light, making it susceptible to dark current effects, which cause drift due to temperature variations. Additionally, the sensor is compact… ▽ More

    Submitted 24 February, 2025; originally announced February 2025.

    Comments: 8 pages, 9 figures

  8. arXiv:2502.00619  [pdf, other

    eess.IV cs.AI cs.CV

    Distribution-aware Fairness Learning in Medical Image Segmentation From A Control-Theoretic Perspective

    Authors: Yujin Oh, Pengfei Jin, Sangjoon Park, Sekeun Kim, Siyeop Yoon, Kyungsang Kim, Jin Sung Kim, Xiang Li, Quanzheng Li

    Abstract: Ensuring fairness in medical image segmentation is critical due to biases in imbalanced clinical data acquisition caused by demographic attributes (e.g., age, sex, race) and clinical factors (e.g., disease severity). To address these challenges, we introduce Distribution-aware Mixture of Experts (dMoE), inspired by optimal control theory. We provide a comprehensive analysis of its underlying mecha… ▽ More

    Submitted 27 May, 2025; v1 submitted 1 February, 2025; originally announced February 2025.

    Comments: ICML 2025 spotlight, see https://openreview.net/forum?id=BUONdewsBa

  9. arXiv:2501.05085  [pdf, other

    eess.IV cs.CV cs.LG

    End-to-End Deep Learning for Interior Tomography with Low-Dose X-ray CT

    Authors: Yoseob Han, Dufan Wu, Kyungsang Kim, Quanzheng Li

    Abstract: Objective: There exist several X-ray computed tomography (CT) scanning strategies to reduce a radiation dose, such as (1) sparse-view CT, (2) low-dose CT, and (3) region-of-interest (ROI) CT (called interior tomography). To further reduce the dose, the sparse-view and/or low-dose CT settings can be applied together with interior tomography. Interior tomography has various advantages in terms of re… ▽ More

    Submitted 9 January, 2025; originally announced January 2025.

    Comments: Published by Physics in Medicine & Biology (2022.5)

  10. arXiv:2501.01094  [pdf, other

    cs.SD cs.AI cs.MM eess.AS

    MMVA: Multimodal Matching Based on Valence and Arousal across Images, Music, and Musical Captions

    Authors: Suhwan Choi, Kyu Won Kim, Myungjoo Kang

    Abstract: We introduce Multimodal Matching based on Valence and Arousal (MMVA), a tri-modal encoder framework designed to capture emotional content across images, music, and musical captions. To support this framework, we expand the Image-Music-Emotion-Matching-Net (IMEMNet) dataset, creating IMEMNet-C which includes 24,756 images and 25,944 music clips with corresponding musical captions. We employ multimo… ▽ More

    Submitted 2 January, 2025; originally announced January 2025.

    Comments: Paper accepted in Artificial Intelligence for Music workshop at AAAI 2025

  11. arXiv:2412.19763  [pdf, other

    eess.SP

    Multi-population Differential Evolution for RSS based Cooperative Localization in Wireless Sensor Networks with Limited Communication Range

    Authors: Lismer Andres Caceres Najarro, Iickho Song, Muhammad Salman, Kiseon Kim

    Abstract: This paper presents a novel approach to deal with the cooperative localization problem in wireless sensor networks based on received signal strength measurements. In cooperative scenarios, the cost function of the localization problem becomes increasingly nonlinear and nonconvex due to the heightened interaction between sensor nodes, making the estimation of the positions of the target nodes more… ▽ More

    Submitted 27 December, 2024; originally announced December 2024.

  12. arXiv:2412.09109  [pdf, other

    eess.SY cs.CE math.OC

    evS2CP: Real-time Simultaneous Speed and Charging Planner for Connected Electric Vehicles

    Authors: Minwoo Gwon, Jiwon Kim, Seungjun Yoo, Kwang-Ki K. Kim

    Abstract: This paper presents evS2CP, an optimization-based framework for simultaneous speed and charging planning designed for connected electric vehicles (EVs). With EVs emerging as competitive alternatives to internal combustion engine vehicles, overcoming challenges such as limited charging infrastructure is crucial. evS2CP addresses these issues by minimizing the travel time, charging time, and energy… ▽ More

    Submitted 12 December, 2024; originally announced December 2024.

    Comments: 15 pages, 9 figures, 2 tables

    MSC Class: 93C85; 49M37; 65K05; 90C29; 68T40; 70B15 ACM Class: G.1.6; G.1.2; H.5.2; G.4; H.4.3; J.6

  13. arXiv:2412.08895  [pdf, other

    eess.SP stat.AP stat.CO

    Fully Bayesian Wideband Direction-of-Arrival Estimation and Detection via RJMCMC

    Authors: Kyurae Kim, Philip T. Clemson, James P. Reilly, Jason F. Ralph, Simon Maskell

    Abstract: We propose a fully Bayesian approach to wideband, or broadband, direction-of-arrival (DoA) estimation and signal detection. Unlike previous works in wideband DoA estimation and detection, where the signals were modeled in the time-frequency domain, we directly model the time-domain representation and treat the non-causal part of the source signal as latent variables. Furthermore, our Bayesian mode… ▽ More

    Submitted 11 December, 2024; originally announced December 2024.

  14. arXiv:2412.00150  [pdf, other

    cs.CV eess.IV

    Curriculum Fine-tuning of Vision Foundation Model for Medical Image Classification Under Label Noise

    Authors: Yeonguk Yu, Minhwan Ko, Sungho Shin, Kangmin Kim, Kyoobin Lee

    Abstract: Deep neural networks have demonstrated remarkable performance in various vision tasks, but their success heavily depends on the quality of the training data. Noisy labels are a critical issue in medical datasets and can significantly degrade model performance. Previous clean sample selection methods have not utilized the well pre-trained features of vision foundation models (VFMs) and assumed that… ▽ More

    Submitted 29 November, 2024; originally announced December 2024.

    Comments: Accepted at NeurIPS 2024

  15. arXiv:2411.00274  [pdf, other

    cs.CV cs.LG eess.IV

    Adaptive Residual Transformation for Enhanced Feature-Based OOD Detection in SAR Imagery

    Authors: Kyung-hwan Lee, Kyung-tae Kim

    Abstract: Recent advances in deep learning architectures have enabled efficient and accurate classification of pre-trained targets in Synthetic Aperture Radar (SAR) images. Nevertheless, the presence of unknown targets in real battlefield scenarios is unavoidable, resulting in misclassification and reducing the accuracy of the classifier. Over the past decades, various feature-based out-of-distribution (OOD… ▽ More

    Submitted 31 October, 2024; originally announced November 2024.

  16. arXiv:2410.06493  [pdf, other

    cs.RO cs.AI eess.SY math.OC

    BiC-MPPI: Goal-Pursuing, Sampling-Based Bidirectional Rollout Clustering Path Integral for Trajectory Optimization

    Authors: Minchan Jung, Kwangki Kim

    Abstract: This paper introduces the Bidirectional Clustered MPPI (BiC-MPPI) algorithm, a novel trajectory optimization method aimed at enhancing goal-directed guidance within the Model Predictive Path Integral (MPPI) framework. BiC-MPPI incorporates bidirectional dynamics approximations and a new guide cost mechanism, improving both trajectory planning and goal-reaching performance. By leveraging forward an… ▽ More

    Submitted 8 October, 2024; originally announced October 2024.

    Comments: 7 pages, 1 figures

    MSC Class: 68T40; 13P25 ACM Class: I.2.9; I.2.8; G.1.6; G.4

  17. arXiv:2410.00184  [pdf, other

    eess.IV cs.CV cs.LG

    Volumetric Conditional Score-based Residual Diffusion Model for PET/MR Denoising

    Authors: Siyeop Yoon, Rui Hu, Yuang Wang, Matthew Tivnan, Young-don Son, Dufan Wu, Xiang Li, Kyungsang Kim, Quanzheng Li

    Abstract: PET imaging is a powerful modality offering quantitative assessments of molecular and physiological processes. The necessity for PET denoising arises from the intrinsic high noise levels in PET imaging, which can significantly hinder the accurate interpretation and quantitative analysis of the scans. With advances in deep learning techniques, diffusion model-based PET denoising techniques have sho… ▽ More

    Submitted 30 September, 2024; originally announced October 2024.

    Comments: Accepted to MICCAI 2024

  18. arXiv:2410.00046  [pdf, other

    eess.IV cs.CV cs.LG

    Mixture of Multicenter Experts in Multimodal Generative AI for Advanced Radiotherapy Target Delineation

    Authors: Yujin Oh, Sangjoon Park, Xiang Li, Wang Yi, Jonathan Paly, Jason Efstathiou, Annie Chan, Jun Won Kim, Hwa Kyung Byun, Ik Jae Lee, Jaeho Cho, Chan Woo Wee, Peng Shu, Peilong Wang, Nathan Yu, Jason Holmes, Jong Chul Ye, Quanzheng Li, Wei Liu, Woong Sub Koom, Jin Sung Kim, Kyungsang Kim

    Abstract: Clinical experts employ diverse philosophies and strategies in patient care, influenced by regional patient populations. However, existing medical artificial intelligence (AI) models are often trained on data distributions that disproportionately reflect highly prevalent patterns, reinforcing biases and overlooking the diverse expertise of clinicians. To overcome this limitation, we introduce the… ▽ More

    Submitted 26 October, 2024; v1 submitted 27 September, 2024; originally announced October 2024.

    Comments: 39 pages

  19. arXiv:2409.19834  [pdf, ps, other

    eess.SY

    Utilizing Priors in Sampling-based Cost Minimization

    Authors: Yuan-Yao Lou, Jonathan Spencer, Kwang Taik Kim, Mung Chiang

    Abstract: We consider an autonomous vehicle (AV) agent performing a long-term cost-minimization problem in the elapsed time $T$ over sequences of states $s_{1:T}$ and actions $a_{1:T}$ for some fixed, known (though potentially learned) cost function $C(s_t,a_t)$, approximate system dynamics $P$, and distribution over initial states $d_0$. The goal is to minimize the expected cost-to-go of the driving trajec… ▽ More

    Submitted 29 September, 2024; originally announced September 2024.

  20. Semantic Communication Enabled 6G-NTN Framework: A Novel Denoising and Gateway Hop Integration Mechanism

    Authors: Loc X. Nguyen, Sheikh Salman Hassan, Yan Kyaw Tun, Kitae Kim, Zhu Han, Choong Seon Hong

    Abstract: The sixth-generation (6G) non-terrestrial networks (NTNs) are crucial for real-time monitoring in critical applications like disaster relief. However, limited bandwidth, latency, rain attenuation, long propagation delays, and co-channel interference pose challenges to efficient satellite communication. Therefore, semantic communication (SC) has emerged as a promising solution to improve transmissi… ▽ More

    Submitted 23 September, 2024; originally announced September 2024.

    Comments: 13 pages, 8 figures, 2 tables

    Journal ref: in IEEE Transactions on Wireless Communications, Jun. 2025

  21. arXiv:2409.13281  [pdf, other

    eess.SY

    Wireless Interconnection Network (WINE) for Post-Exascale High-Performance Computing

    Authors: Hong Ki Kim, Yong Hun Jang, Hee Soo Kim, Won Young Kang, Young-Chai Ko, Sang Hyun Lee

    Abstract: Interconnection networks, or `interconnects,' play a crucial role in administering the communication among computing units of high-performance computing (HPC) systems. Efficient provisioning of interconnects minimizes the processing delay wherein computing units await information sharing between each other, thereby enhancing the overall computation efficiency. Ideally, interconnects are designed w… ▽ More

    Submitted 20 September, 2024; originally announced September 2024.

    Comments: 20 pages, 5 figures, to be published in IEEE Wireless Communications Magazine

  22. arXiv:2409.00078  [pdf, other

    eess.SP cs.LG cs.NI

    SGP-RI: A Real-Time-Trainable and Decentralized IoT Indoor Localization Model Based on Sparse Gaussian Process with Reduced-Dimensional Inputs

    Authors: Zhe Tang, Sihao Li, Zichen Huang, Guandong Yang, Kyeong Soo Kim, Jeremy S. Smith

    Abstract: Internet of Things (IoT) devices are deployed in the filed, there is an enormous amount of untapped potential in local computing on those IoT devices. Harnessing this potential for indoor localization, therefore, becomes an exciting research area. Conventionally, the training and deployment of indoor localization models are based on centralized servers with substantial computational resources. Thi… ▽ More

    Submitted 24 August, 2024; originally announced September 2024.

    Comments: 10 pages, 4 figures, under review for journal publication

  23. arXiv:2408.12860  [pdf, other

    cs.NI eess.SP

    Active STAR-RIS Empowered Edge System for Enhanced Energy Efficiency and Task Management

    Authors: Pyae Sone Aung, Kitae Kim, Yan Kyaw Tun, Zhu Han, Choong Seon Hong

    Abstract: The proliferation of data-intensive and low-latency applications has driven the development of multi-access edge computing (MEC) as a viable solution to meet the increasing demands for high-performance computing and storage capabilities at the network edge. Despite the benefits of MEC, challenges such as obstructions cause non-line-of-sight (NLoS) communication to persist. Reconfigurable intellige… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

    Comments: 13 pages, 10 figures

  24. arXiv:2408.05961  [pdf, other

    stat.ME eess.SP

    Cross-Spectral Analysis of Bivariate Graph Signals

    Authors: Kyusoon Kim, Hee-Seok Oh

    Abstract: With the advancements in technology and monitoring tools, we often encounter multivariate graph signals, which can be seen as the realizations of multivariate graph processes, and revealing the relationship between their constituent quantities is one of the important problems. To address this issue, we propose a cross-spectral analysis tool for bivariate graph signals. The main goal of this study… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

  25. arXiv:2407.19862  [pdf, other

    cs.SD eess.AS

    Wavespace: A Highly Explorable Wavetable Generator

    Authors: Hazounne Lee, Kihong Kim, Sungho Lee, Kyogu Lee

    Abstract: Wavetable synthesis generates quasi-periodic waveforms of musical tones by interpolating a list of waveforms called wavetable. As generative models that utilize latent representations offer various methods in waveform generation for musical applications, studies in wavetable generation with invertible architecture have also arisen recently. While they are promising, it is still challenging to gene… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

  26. arXiv:2407.09434  [pdf, other

    cs.LG cs.AI cs.CE eess.SY

    Foundation Models for the Electric Power Grid

    Authors: Hendrik F. Hamann, Thomas Brunschwiler, Blazhe Gjorgiev, Leonardo S. A. Martins, Alban Puech, Anna Varbella, Jonas Weiss, Juan Bernabe-Moreno, Alexandre Blondin Massé, Seong Choi, Ian Foster, Bri-Mathias Hodge, Rishabh Jain, Kibaek Kim, Vincent Mai, François Mirallès, Martin De Montigny, Octavio Ramos-Leaños, Hussein Suprême, Le Xie, El-Nasser S. Youssef, Arnaud Zinflou, Alexander J. Belyi, Ricardo J. Bessa, Bishnu Prasad Bhattarai , et al. (2 additional authors not shown)

    Abstract: Foundation models (FMs) currently dominate news headlines. They employ advanced deep learning architectures to extract structural information autonomously from vast datasets through self-supervision. The resulting rich representations of complex systems and dynamics can be applied to many downstream applications. Therefore, FMs can find uses in electric power grids, challenged by the energy transi… ▽ More

    Submitted 12 November, 2024; v1 submitted 12 July, 2024; originally announced July 2024.

    Comments: Major equal contributors: H.F.H., T.B., B.G., L.S.A.M., A.P., A.V., J.W.; Significant equal contributors: J.B., A.B.M., S.C., I.F., B.H., R.J., K.K., V.M., F.M., M.D.M., O.R., H.S., L.X., E.S.Y., A.Z.; Other equal contributors: A.J.B., R.J.B., B.P.B., J.S., S.S; Lead contact: H.F.H

  27. arXiv:2407.09030  [pdf, other

    eess.IV cs.CV

    CAMP: Continuous and Adaptive Learning Model in Pathology

    Authors: Anh Tien Nguyen, Keunho Byeon, Kyungeun Kim, Boram Song, Seoung Wan Chae, Jin Tae Kwak

    Abstract: There exist numerous diagnostic tasks in pathology. Conventional computational pathology formulates and tackles them as independent and individual image classification problems, thereby resulting in computational inefficiency and high costs. To address the challenges, we propose a generic, unified, and universal framework, called a continuous and adaptive learning model in pathology (CAMP), for pa… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

    Comments: Under review

  28. arXiv:2407.08503  [pdf, other

    eess.IV cs.CV

    DIOR-ViT: Differential Ordinal Learning Vision Transformer for Cancer Classification in Pathology Images

    Authors: Ju Cheon Lee, Keunho Byeon, Boram Song, Kyungeun Kim, Jin Tae Kwak

    Abstract: In computational pathology, cancer grading has been mainly studied as a categorical classification problem, which does not utilize the ordering nature of cancer grades such as the higher the grade is, the worse the cancer is. To incorporate the ordering relationship among cancer grades, we introduce a differential ordinal learning problem in which we define and learn the degree of difference in th… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

  29. arXiv:2406.12721  [pdf

    eess.AS cs.SD

    Sound event detection based on auxiliary decoder and maximum probability aggregation for DCASE Challenge 2024 Task 4

    Authors: Sang Won Son, Jongyeon Park, Hong Kook Kim, Sulaiman Vesal, Jeong Eun Lim

    Abstract: In this report, we propose three novel methods for developing a sound event detection (SED) model for the DCASE 2024 Challenge Task 4. First, we propose an auxiliary decoder attached to the final convolutional block to improve feature extraction capabilities while reducing dependency on embeddings from pre-trained large models. The proposed auxiliary decoder operates independently from the main de… ▽ More

    Submitted 24 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

    Comments: DCASE 2024 challenge Task4, 4 pages

  30. arXiv:2406.11248  [pdf

    eess.AS cs.AI cs.SD

    Performance Improvement of Language-Queried Audio Source Separation Based on Caption Augmentation From Large Language Models for DCASE Challenge 2024 Task 9

    Authors: Do Hyun Lee, Yoonah Song, Hong Kook Kim

    Abstract: We present a prompt-engineering-based text-augmentation approach applied to a language-queried audio source separation (LASS) task. To enhance the performance of LASS, the proposed approach utilizes large language models (LLMs) to generate multiple captions corresponding to each sentence of the training dataset. To this end, we first perform experiments to identify the most effective prompts for c… ▽ More

    Submitted 26 November, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

    Comments: DCASE 2024 Challenge Task 9, 4 pages

  31. arXiv:2406.09345  [pdf, other

    cs.CL cs.SD eess.AS

    DiscreteSLU: A Large Language Model with Self-Supervised Discrete Speech Units for Spoken Language Understanding

    Authors: Suwon Shon, Kwangyoun Kim, Yi-Te Hsu, Prashant Sridhar, Shinji Watanabe, Karen Livescu

    Abstract: The integration of pre-trained text-based large language models (LLM) with speech input has enabled instruction-following capabilities for diverse speech tasks. This integration requires the use of a speech encoder, a speech adapter, and an LLM, trained on diverse tasks. We propose the use of discrete speech units (DSU), rather than continuous-valued speech encoder outputs, that are converted to t… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  32. arXiv:2406.02000  [pdf, other

    cs.NI eess.SP

    Advancing Ultra-Reliable 6G: Transformer and Semantic Localization Empowered Robust Beamforming in Millimeter-Wave Communications

    Authors: Avi Deb Raha, Kitae Kim, Apurba Adhikary, Mrityunjoy Gain, Zhu Han, Choong Seon Hong

    Abstract: Advancements in 6G wireless technology have elevated the importance of beamforming, especially for attaining ultra-high data rates via millimeter-wave (mmWave) frequency deployment. Although promising, mmWave bands require substantial beam training to achieve precise beamforming. While initial deep learning models that use RGB camera images demonstrated promise in reducing beam training overhead,… ▽ More

    Submitted 30 July, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

  33. arXiv:2405.19771  [pdf, other

    cs.NI eess.SP

    Data Service Maximization in Space-Air-Ground Integrated 6G Networks

    Authors: Nway Nway Ei, Kitae Kim, Yan Kyaw Tun, Zhu Han, Choong Seon Hong

    Abstract: Integrating terrestrial and non-terrestrial networks has emerged as a promising paradigm to fulfill the constantly growing demand for connectivity, low transmission delay, and quality of services (QoS). This integration brings together the strengths of the reliability of terrestrial networks, broad coverage and service continuity of non-terrestrial networks like low earth orbit satellites (LEOSats… ▽ More

    Submitted 19 July, 2024; v1 submitted 30 May, 2024; originally announced May 2024.

    Comments: 5 pages, 4 figures

  34. arXiv:2405.05787  [pdf, other

    cs.RO cs.CV eess.SY

    Autonomous Robotic Ultrasound System for Liver Follow-up Diagnosis: Pilot Phantom Study

    Authors: Tianpeng Zhang, Sekeun Kim, Jerome Charton, Haitong Ma, Kyungsang Kim, Na Li, Quanzheng Li

    Abstract: The paper introduces a novel autonomous robot ultrasound (US) system targeting liver follow-up scans for outpatients in local communities. Given a computed tomography (CT) image with specific target regions of interest, the proposed system carries out the autonomous follow-up scan in three steps: (i) initial robot contact to surface, (ii) coordinate mapping between CT image and robot, and (iii) ta… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  35. arXiv:2405.03905  [pdf, other

    cs.AR cs.CV cs.SD eess.AS

    DeltaKWS: A 65nm 36nJ/Decision Bio-inspired Temporal-Sparsity-Aware Digital Keyword Spotting IC with 0.6V Near-Threshold SRAM

    Authors: Qinyu Chen, Kwantae Kim, Chang Gao, Sheng Zhou, Taekwang Jang, Tobi Delbruck, Shih-Chii Liu

    Abstract: This paper introduces DeltaKWS, to the best of our knowledge, the first $Δ$RNN-enabled fine-grained temporal sparsity-aware KWS IC for voice-controlled devices. The 65 nm prototype chip features a number of techniques to enhance performance, area, and power efficiencies, specifically: 1) a bio-inspired delta-gated recurrent neural network ($Δ$RNN) classifier leveraging temporal similarities betwee… ▽ More

    Submitted 26 November, 2024; v1 submitted 6 May, 2024; originally announced May 2024.

    Comments: This paper has been accepted for publication in the IEEE Transactions on Circuits and Systems for Artificial Intelligence (TCASAI)

  36. arXiv:2404.07021  [pdf, other

    eess.SP

    A 4x32Gb/s 1.8pJ/bit Collaborative Baud-Rate CDR with Background Eye-Climbing Algorithm and Low-Power Global Clock Distribution

    Authors: Jihee Kim, Jia Park, Jiwon Shin, Hanseok Kim, Kahyun Kim, Haengbeom Shin, Ha-Jung Park, Woo-Seok Choi

    Abstract: This paper presents design techniques for an energy-efficient multi-lane receiver (RX) with baud-rate clock and data recovery (CDR), which is essential for high-throughput low-latency communication in high-performance computing systems. The proposed low-power global clock distribution not only significantly reduces power consumption across multi-lane RXs but is capable of compensating for the freq… ▽ More

    Submitted 22 April, 2024; v1 submitted 10 April, 2024; originally announced April 2024.

  37. arXiv:2404.04096  [pdf, other

    cs.IT eess.SP

    Machine Learning-Aided Cooperative Localization under Dense Urban Environment

    Authors: Hoon Lee, Hong Ki Kim, Seung Hyun Oh, Sang Hyun Lee

    Abstract: Future wireless network technology provides automobiles with the connectivity feature to consolidate the concept of vehicular networks that collaborate on conducting cooperative driving tasks. The full potential of connected vehicles, which promises road safety and quality driving experience, can be leveraged if machine learning models guarantee the robustness in performing core functions includin… ▽ More

    Submitted 5 April, 2024; originally announced April 2024.

  38. arXiv:2404.01517  [pdf, other

    cs.LG eess.SP

    Addressing Heterogeneity in Federated Load Forecasting with Personalization Layers

    Authors: Shourya Bose, Yu Zhang, Kibaek Kim

    Abstract: The advent of smart meters has enabled pervasive collection of energy consumption data for training short-term load forecasting models. In response to privacy concerns, federated learning (FL) has been proposed as a privacy-preserving approach for training, but the quality of trained models degrades as client data becomes heterogeneous. In this paper we propose the use of personalization layers fo… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

  39. arXiv:2404.01464  [pdf, other

    eess.IV cs.AI cs.CV cs.LG

    Data-Efficient Unsupervised Interpolation Without Any Intermediate Frame for 4D Medical Images

    Authors: JungEun Kim, Hangyul Yoon, Geondo Park, Kyungsu Kim, Eunho Yang

    Abstract: 4D medical images, which represent 3D images with temporal information, are crucial in clinical practice for capturing dynamic changes and monitoring long-term disease progression. However, acquiring 4D medical images poses challenges due to factors such as radiation exposure and imaging duration, necessitating a balance between achieving high temporal resolution and minimizing adverse effects. Gi… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

    Comments: CVPR 2024

  40. arXiv:2402.17790  [pdf, other

    eess.SP cs.LG

    EEG classifier cross-task transfer to avoid training sessions in robot-assisted rehabilitation

    Authors: Niklas Kueper, Su Kyoung Kim, Elsa Andrea Kirchner

    Abstract: Background: For an individualized support of patients during rehabilitation, learning of individual machine learning models from the human electroencephalogram (EEG) is required. Our approach allows labeled training data to be recorded without the need for a specific training session. For this, the planned exoskeleton-assisted rehabilitation enables bilateral mirror therapy, in which movement inte… ▽ More

    Submitted 26 February, 2024; originally announced February 2024.

    Comments: 11 pages, 6 figures, 1 table

    MSC Class: 68

  41. arXiv:2401.15313  [pdf, other

    cs.RO cs.CV eess.SY math.OC

    Multi-Robot Relative Pose Estimation in SE(2) with Observability Analysis: A Comparison of Extended Kalman Filtering and Robust Pose Graph Optimization

    Authors: Kihoon Shin, Hyunjae Sim, Seungwon Nam, Yonghee Kim, Jae Hu, Kwang-Ki K. Kim

    Abstract: In this study, we address multi-robot localization issues, with a specific focus on cooperative localization and observability analysis of relative pose estimation. Cooperative localization involves enhancing each robot's information through a communication network and message passing. If odometry data from a target robot can be transmitted to the ego robot, observability of their relative pose es… ▽ More

    Submitted 4 February, 2024; v1 submitted 27 January, 2024; originally announced January 2024.

    Comments: 20 pages, 21 figures

    MSC Class: 93C85; 93E11; 93E24; 90C26; 93E10; 62M20;

  42. arXiv:2401.08962  [pdf, other

    cs.HC cs.LG cs.SD eess.AS

    DOO-RE: A dataset of ambient sensors in a meeting room for activity recognition

    Authors: Hyunju Kim, Geon Kim, Taehoon Lee, Kisoo Kim, Dongman Lee

    Abstract: With the advancement of IoT technology, recognizing user activities with machine learning methods is a promising way to provide various smart services to users. High-quality data with privacy protection is essential for deploying such services in the real world. Data streams from surrounding ambient sensors are well suited to the requirement. Existing ambient sensor datasets only support constrain… ▽ More

    Submitted 16 January, 2024; originally announced January 2024.

  43. arXiv:2401.08835  [pdf, other

    cs.CL eess.AS

    Improving ASR Contextual Biasing with Guided Attention

    Authors: Jiyang Tang, Kwangyoun Kim, Suwon Shon, Felix Wu, Prashant Sridhar, Shinji Watanabe

    Abstract: In this paper, we propose a Guided Attention (GA) auxiliary training loss, which improves the effectiveness and robustness of automatic speech recognition (ASR) contextual biasing without introducing additional parameters. A common challenge in previous literature is that the word error rate (WER) reduction brought by contextual biasing diminishes as the number of bias phrases increases. To addres… ▽ More

    Submitted 16 January, 2024; originally announced January 2024.

    Comments: Accepted at ICASSP 2024

  44. arXiv:2312.14939  [pdf, other

    q-bio.NC cs.CV cs.LG eess.IV

    Large-scale Graph Representation Learning of Dynamic Brain Connectome with Transformers

    Authors: Byung-Hoon Kim, Jungwon Choi, EungGu Yun, Kyungsang Kim, Xiang Li, Juho Lee

    Abstract: Graph Transformers have recently been successful in various graph representation learning tasks, providing a number of advantages over message-passing Graph Neural Networks. Utilizing Graph Transformers for learning the representation of the brain functional connectivity network is also gaining interest. However, studies to date have underlooked the temporal dynamics of functional connectivity, wh… ▽ More

    Submitted 4 December, 2023; originally announced December 2023.

    Comments: NeurIPS 2023 Temporal Graph Learning Workshop

  45. arXiv:2312.09895  [pdf, other

    cs.CL cs.SD eess.AS

    Generative Context-aware Fine-tuning of Self-supervised Speech Models

    Authors: Suwon Shon, Kwangyoun Kim, Prashant Sridhar, Yi-Te Hsu, Shinji Watanabe, Karen Livescu

    Abstract: When performing tasks like automatic speech recognition or spoken language understanding for a given utterance, access to preceding text or audio provides contextual information can improve performance. Considering the recent advances in generative large language models (LLM), we hypothesize that an LLM could generate useful context information using the preceding text. With appropriate prompts, L… ▽ More

    Submitted 15 December, 2023; originally announced December 2023.

  46. arXiv:2312.01004  [pdf, other

    eess.SY

    Learning-based Ecological Adaptive Cruise Control of Autonomous Electric Vehicles: A Comparison of ADP, DQN and DDPG Approaches

    Authors: Sunwoo Kim, Kwang-Ki K. Kim

    Abstract: This paper presents model-based and model-free learning methods for economic and ecological adaptive cruise control (Eco-ACC) of connected and autonomous electric vehicles. For model-based optimal control of Eco-ACC, we considered longitudinal vehicle dynamics and a quasi-steady-state powertrain model including the physical limits of a commercial electric vehicle. We used adaptive dynamic programm… ▽ More

    Submitted 1 December, 2023; originally announced December 2023.

    MSC Class: 93E20; 68T20; 49M37; 90-08

  47. arXiv:2311.10224  [pdf, other

    eess.IV cs.CV cs.LG

    CV-Attention UNet: Attention-based UNet for 3D Cerebrovascular Segmentation of Enhanced TOF-MRA Images

    Authors: Syed Farhan Abbas, Nguyen Thanh Duc, Yoonguu Song, Kyungwon Kim, Ekta Srivastava, Boreom Lee

    Abstract: Due to the lack of automated methods, to diagnose cerebrovascular disease, time-of-flight magnetic resonance angiography (TOF-MRA) is assessed visually, making it time-consuming. The commonly used encoder-decoder architectures for cerebrovascular segmentation utilize redundant features, eventually leading to the extraction of low-level features multiple times. Additionally, convolutional neural ne… ▽ More

    Submitted 19 June, 2024; v1 submitted 16 November, 2023; originally announced November 2023.

  48. Deep Video Inpainting Guided by Audio-Visual Self-Supervision

    Authors: Kyuyeon Kim, Junsik Jung, Woo Jae Kim, Sung-Eui Yoon

    Abstract: Humans can easily imagine a scene from auditory information based on their prior knowledge of audio-visual events. In this paper, we mimic this innate human ability in deep learning models to improve the quality of video inpainting. To implement the prior knowledge, we first train the audio-visual network, which learns the correspondence between auditory and visual information. Then, the audio-vis… ▽ More

    Submitted 11 October, 2023; originally announced October 2023.

    Comments: Accepted at ICASSP 2022

  49. arXiv:2310.02467  [pdf

    physics.optics eess.SP physics.app-ph

    Dual-Polarization Phase Retrieval Receiver in Silicon Photonics

    Authors: Brian Stern, Hanzi Huang, Haoshuo Chen, Kwangwoong Kim, Mohamad Hossein Idjadi

    Abstract: We demonstrate a silicon photonic dual-polarization phase retrieval receiver. The receiver recovers phase from intensity-only measurements without a local oscillator or transmitted carrier. We design silicon waveguides providing long delays and microring resonators with large dispersion to enable symbol-to-symbol interference and dispersive projection in the phase retrieval algorithm. We retrieve… ▽ More

    Submitted 3 October, 2023; originally announced October 2023.

    Comments: 11 pages, 7 figures

  50. arXiv:2309.13539  [pdf, other

    eess.IV

    MediViSTA: Medical Video Segmentation via Temporal Fusion SAM Adaptation for Echocardiography

    Authors: Sekeun Kim, Pengfei Jin, Cheng Chen, Kyungsang Kim, Zhiliang Lyu, Hui Ren, Sunghwan Kim, Zhengliang Liu, Aoxiao Zhong, Tianming Liu, Xiang Li, Quanzheng Li

    Abstract: Despite achieving impressive results in general-purpose semantic segmentation with strong generalization on natural images, the Segment Anything Model (SAM) has shown less precision and stability in medical image segmentation. In particular, the original SAM architecture is designed for 2D natural images and is therefore not support to handle three-dimensional information, which is particularly im… ▽ More

    Submitted 6 November, 2024; v1 submitted 23 September, 2023; originally announced September 2023.