Skip to main content

Showing 1–45 of 45 results for author: Lim, H

Searching in archive eess. Search in all archives.
.
  1. arXiv:2507.00185  [pdf

    eess.IV cs.AI cs.CV

    Multimodal, Multi-Disease Medical Imaging Foundation Model (MerMED-FM)

    Authors: Yang Zhou, Chrystie Wan Ning Quek, Jun Zhou, Yan Wang, Yang Bai, Yuhe Ke, Jie Yao, Laura Gutierrez, Zhen Ling Teo, Darren Shu Jeng Ting, Brian T. Soetikno, Christopher S. Nielsen, Tobias Elze, Zengxiang Li, Linh Le Dinh, Lionel Tim-Ee Cheng, Tran Nguyen Tuan Anh, Chee Leong Cheng, Tien Yin Wong, Nan Liu, Iain Beehuat Tan, Tony Kiat Hon Lim, Rick Siow Mong Goh, Yong Liu, Daniel Shu Wei Ting

    Abstract: Current artificial intelligence models for medical imaging are predominantly single modality and single disease. Attempts to create multimodal and multi-disease models have resulted in inconsistent clinical accuracy. Furthermore, training these models typically requires large, labour-intensive, well-labelled datasets. We developed MerMED-FM, a state-of-the-art multimodal, multi-specialty foundatio… ▽ More

    Submitted 30 June, 2025; originally announced July 2025.

    Comments: 42 pages, 3 composite figures, 4 tables

  2. arXiv:2506.23102  [pdf, ps, other

    eess.IV cs.CV

    MedRegion-CT: Region-Focused Multimodal LLM for Comprehensive 3D CT Report Generation

    Authors: Sunggu Kyung, Jinyoung Seo, Hyunseok Lim, Dongyeong Kim, Hyungbin Park, Jimin Sung, Jihyun Kim, Wooyoung Jo, Yoojin Nam, Namkug Kim

    Abstract: The recent release of RadGenome-Chest CT has significantly advanced CT-based report generation. However, existing methods primarily focus on global features, making it challenging to capture region-specific details, which may cause certain abnormalities to go unnoticed. To address this, we propose MedRegion-CT, a region-focused Multi-Modal Large Language Model (MLLM) framework, featuring three key… ▽ More

    Submitted 29 June, 2025; originally announced June 2025.

    Comments: 14 pages, 5 figures, submitted to ICCV 2025

  3. arXiv:2506.06732  [pdf, ps, other

    eess.AS cs.AI eess.SP

    Neural Spectral Band Generation for Audio Coding

    Authors: Woongjib Choi, Byeong Hyeon Kim, Hyungseob Lim, Inseon Jang, Hong-Goo Kang

    Abstract: Spectral band replication (SBR) enables bit-efficient coding by generating high-frequency bands from the low-frequency ones. However, it only utilizes coarse spectral features upon a subband-wise signal replication, limiting adaptability to diverse acoustic signals. In this paper, we explore the efficacy of a deep neural network (DNN)-based generative approach for coding the high-frequency bands,… ▽ More

    Submitted 28 July, 2025; v1 submitted 7 June, 2025; originally announced June 2025.

    Comments: Accepted to Interspeech 2025

  4. arXiv:2505.05710  [pdf, ps, other

    cs.CV cs.AI eess.IV

    HyperspectralMAE: The Hyperspectral Imagery Classification Model using Fourier-Encoded Dual-Branch Masked Autoencoder

    Authors: Wooyoung Jeong, Hyun Jae Park, Seonghun Jeong, Jong Wook Jang, Tae Hoon Lim, Dae Seoung Kim

    Abstract: Hyperspectral imagery provides rich spectral detail but poses unique challenges because of its high dimensionality in both spatial and spectral domains. We propose \textit{HyperspectralMAE}, a Transformer-based foundation model for hyperspectral data that employs a \textit{dual masking} strategy: during pre-training we randomly occlude 50\% of spatial patches and 50\% of spectral bands. This force… ▽ More

    Submitted 8 May, 2025; originally announced May 2025.

  5. arXiv:2503.12891  [pdf

    eess.SY

    PD-Skygroundhook Controller for Semi-Active Suspension System Using Magnetorheological Fluid Dampers

    Authors: Hansol Lim, Jee Won Lee, Seung-Bok Choi, Jongseong Brad Choi

    Abstract: This paper presents a Proportional-Derivative (PD) Skygroundhook controller for magnetorheological (MR) dampers in semi-active suspensions. Traditional skyhook, Groundhook, and hybrid Skygroundhook controllers are well-known for their ability to reduce body and wheel vibrations; however, each approach has limitations in handling a broad frequency spectrum and often relies on abrupt switching. By a… ▽ More

    Submitted 17 March, 2025; originally announced March 2025.

    Comments: This work has been submitted to the 2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) for possible publication

  6. arXiv:2503.07940  [pdf, ps, other

    cs.CV cs.RO eess.IV

    BUFFER-X: Towards Zero-Shot Point Cloud Registration in Diverse Scenes

    Authors: Minkyun Seo, Hyungtae Lim, Kanghee Lee, Luca Carlone, Jaesik Park

    Abstract: Recent advances in deep learning-based point cloud registration have improved generalization, yet most methods still require retraining or manual parameter tuning for each new environment. In this paper, we identify three key factors limiting generalization: (a) reliance on environment-specific voxel size and search radius, (b) poor out-of-domain robustness of learning-based keypoint detectors, an… ▽ More

    Submitted 6 August, 2025; v1 submitted 10 March, 2025; originally announced March 2025.

    Comments: 20 pages, 14 figures. Accepted as a highlight paper at ICCV 2025

  7. arXiv:2501.18911  [pdf, ps, other

    cs.IT eess.SP

    Integrated Communication and Binary State Detection Under Unequal Error Constraints

    Authors: Daewon Seo, Sung Hoon Lim

    Abstract: This work considers a problem of integrated sensing and communication (ISAC) in which the goal of sensing is to detect a binary state. Unlike most approaches that minimize the total detection error probability, in our work, we disaggregate the error probability into false alarm and missed detection probabilities and investigate their information-theoretic three-way tradeoff including communication… ▽ More

    Submitted 31 January, 2025; originally announced January 2025.

  8. arXiv:2410.21256  [pdf, other

    cs.AI cs.CV eess.IV

    Multi-modal AI for comprehensive breast cancer prognostication

    Authors: Jan Witowski, Ken G. Zeng, Joseph Cappadona, Jailan Elayoubi, Khalil Choucair, Elena Diana Chiru, Nancy Chan, Young-Joon Kang, Frederick Howard, Irina Ostrovnaya, Carlos Fernandez-Granda, Freya Schnabel, Zoe Steinsnyder, Ugur Ozerdem, Kangning Liu, Waleed Abdulsattar, Yu Zong, Lina Daoud, Rafic Beydoun, Anas Saad, Nitya Thakore, Mohammad Sadic, Frank Yeung, Elisa Liu, Theodore Hill , et al. (26 additional authors not shown)

    Abstract: Treatment selection in breast cancer is guided by molecular subtypes and clinical characteristics. However, current tools including genomic assays lack the accuracy required for optimal clinical decision-making. We developed a novel artificial intelligence (AI)-based approach that integrates digital pathology images with clinical data, providing a more robust and effective method for predicting th… ▽ More

    Submitted 2 March, 2025; v1 submitted 28 October, 2024; originally announced October 2024.

  9. arXiv:2410.14683  [pdf, other

    q-bio.NC cs.AI cs.CV eess.IV

    Brain-Aware Readout Layers in GNNs: Advancing Alzheimer's early Detection and Neuroimaging

    Authors: Jiwon Youn, Dong Woo Kang, Hyun Kook Lim, Mansu Kim

    Abstract: Alzheimer's disease (AD) is a neurodegenerative disorder characterized by progressive memory and cognitive decline, affecting millions worldwide. Diagnosing AD is challenging due to its heterogeneous nature and variable progression. This study introduces a novel brain-aware readout layer (BA readout layer) for Graph Neural Networks (GNNs), designed to improve interpretability and predictive accura… ▽ More

    Submitted 3 October, 2024; originally announced October 2024.

  10. arXiv:2409.19709  [pdf, other

    cs.RO eess.SY

    Obstacle-Aware Quadrupedal Locomotion With Resilient Multi-Modal Reinforcement Learning

    Authors: I Made Aswin Nahrendra, Byeongho Yu, Minho Oh, Dongkyu Lee, Seunghyun Lee, Hyeonwoo Lee, Hyungtae Lim, Hyun Myung

    Abstract: Quadrupedal robots hold promising potential for applications in navigating cluttered environments with resilience akin to their animal counterparts. However, their floating base configuration makes them vulnerable to real-world uncertainties, yielding substantial challenges in their locomotion control. Deep reinforcement learning has become one of the plausible alternatives for realizing a robust… ▽ More

    Submitted 29 September, 2024; originally announced September 2024.

    Comments: Under review. Project site is available at https://dreamwaqpp.github.io

  11. arXiv:2409.16296  [pdf

    cs.CV cs.GR eess.IV

    LiDAR-3DGS: LiDAR Reinforced 3D Gaussian Splatting for Multimodal Radiance Field Rendering

    Authors: Hansol Lim, Hanbeom Chang, Jongseong Brad Choi, Chul Min Yeum

    Abstract: In this paper, we explore the capabilities of multimodal inputs to 3D Gaussian Splatting (3DGS) based Radiance Field Rendering. We present LiDAR-3DGS, a novel method of reinforcing 3DGS inputs with LiDAR generated point clouds to significantly improve the accuracy and detail of 3D models. We demonstrate a systematic approach of LiDAR reinforcement to 3DGS to enable capturing of important features… ▽ More

    Submitted 9 September, 2024; originally announced September 2024.

  12. arXiv:2407.06918  [pdf

    physics.optics eess.IV eess.SP

    Identity-enabled CDMA LiDAR for massively parallel ranging with a single-element receiver

    Authors: Yixiu Shen, Zi Heng Lim, Guangya Zhou

    Abstract: Light detection and ranging (LiDAR) have emerged as a crucial tool for high-resolution 3D imaging, particularly in autonomous vehicles, remote sensing, and augmented reality. However, the increasing demand for faster acquisition speed and higher resolution in LiDAR systems has highlighted the limitations of traditional mechanical scanning methods. This study introduces a novel wavelength-multiplex… ▽ More

    Submitted 10 July, 2024; v1 submitted 9 July, 2024; originally announced July 2024.

  13. arXiv:2401.12987  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    TelME: Teacher-leading Multimodal Fusion Network for Emotion Recognition in Conversation

    Authors: Taeyang Yun, Hyunkuk Lim, Jeonghwan Lee, Min Song

    Abstract: Emotion Recognition in Conversation (ERC) plays a crucial role in enabling dialogue systems to effectively respond to user requests. The emotions in a conversation can be identified by the representations from various modalities, such as audio, visual, and text. However, due to the weak contribution of non-verbal modalities to recognize emotions, multimodal ERC has always been considered a challen… ▽ More

    Submitted 31 March, 2024; v1 submitted 16 January, 2024; originally announced January 2024.

    Comments: NAACL 2024 main conference

  14. arXiv:2401.12499  [pdf, ps, other

    cs.IT eess.SP

    On the Fundamental Tradeoff of Joint Communication and Quickest Change Detection with State-Independent Data Channels

    Authors: Daewon Seo, Sung Hoon Lim

    Abstract: In this work, we take the initiative in studying the information-theoretic tradeoff between communication and quickest change detection (QCD) under an integrated sensing and communication setting. We formally establish a joint communication and sensing problem for the quickest change detection. We assume a broadcast channel with a transmitter, a communication receiver, and a QCD detector in which… ▽ More

    Submitted 9 October, 2024; v1 submitted 23 January, 2024; originally announced January 2024.

  15. arXiv:2309.02730  [pdf, other

    eess.AS cs.AI cs.SD

    Stylebook: Content-Dependent Speaking Style Modeling for Any-to-Any Voice Conversion using Only Speech Data

    Authors: Hyungseob Lim, Kyungguen Byun, Sunkuk Moon, Erik Visser

    Abstract: While many recent any-to-any voice conversion models succeed in transferring some target speech's style information to the converted speech, they still lack the ability to faithfully reproduce the speaking style of the target speaker. In this work, we propose a novel method to extract rich style information from target utterances and to efficiently transfer it to source speech content without requ… ▽ More

    Submitted 14 December, 2023; v1 submitted 6 September, 2023; originally announced September 2023.

    Comments: 5 pages, 2 figures, 2 tables

  16. arXiv:2307.16706  [pdf, ps, other

    eess.SY cs.AI

    Continuous-Time Distributed Dynamic Programming for Networked Multi-Agent Markov Decision Processes

    Authors: Donghwan Lee, Han-Dong Lim, Do Wan Kim

    Abstract: The main goal of this paper is to investigate continuous-time distributed dynamic programming (DP) algorithms for networked multi-agent Markov decision problems (MAMDPs). In our study, we adopt a distributed multi-agent framework where individual agents have access only to their own rewards, lacking insights into the rewards of other agents. Moreover, each agent has the ability to share its parame… ▽ More

    Submitted 13 June, 2024; v1 submitted 31 July, 2023; originally announced July 2023.

  17. arXiv:2304.00432  [pdf, other

    eess.SY

    Multi-Agent Reachability Calibration with Conformal Prediction

    Authors: Anish Muthali, Haotian Shen, Sampada Deglurkar, Michael H. Lim, Rebecca Roelofs, Aleksandra Faust, Claire Tomlin

    Abstract: We investigate methods to provide safety assurances for autonomous agents that incorporate predictions of other, uncontrolled agents' behavior into their own trajectory planning. Given a learning-based forecasting model that predicts agents' trajectories, we introduce a method for providing probabilistic assurances on the model's prediction error with calibrated confidence intervals. Through quant… ▽ More

    Submitted 13 December, 2023; v1 submitted 1 April, 2023; originally announced April 2023.

  18. arXiv:2303.07592  [pdf, other

    eess.AS cs.SD

    Lightweight feature encoder for wake-up word detection based on self-supervised speech representation

    Authors: Hyungjun Lim, Younggwan Kim, Kiho Yeom, Eunjoo Seo, Hoodong Lee, Stanley Jungkyu Choi, Honglak Lee

    Abstract: Self-supervised learning method that provides generalized speech representations has recently received increasing attention. Wav2vec 2.0 is the most famous example, showing remarkable performance in numerous downstream speech processing tasks. Despite its success, it is challenging to use it directly for wake-up word detection on mobile devices due to its expensive computational cost. In this work… ▽ More

    Submitted 13 March, 2023; originally announced March 2023.

    Comments: Accepted by ICASSP 2023

  19. arXiv:2210.11923  [pdf, other

    cs.CR eess.SY

    RollBack: A New Time-Agnostic Replay Attack Against the Automotive Remote Keyless Entry Systems

    Authors: Levente Csikor, Hoon Wei Lim, Jun Wen Wong, Soundarya Ramesh, Rohini Poolat Parameswarath, Mun Choon Chan

    Abstract: Today's RKE systems implement disposable rolling codes, making every key fob button press unique, effectively preventing simple replay attacks. However, a prior attack called RollJam was proven to break all rolling code-based systems in general. By a careful sequence of signal jamming, capturing, and replaying, an attacker can become aware of the subsequent valid unlock signal that has not been us… ▽ More

    Submitted 14 September, 2022; originally announced October 2022.

    Comments: 24 pages, 5 figures Under submission to a journal

    Journal ref: ACM Transactions on Cyber-Physical Systems, 2024

  20. arXiv:2210.05015  [pdf, other

    cs.AI cs.RO eess.SY stat.ML

    Optimality Guarantees for Particle Belief Approximation of POMDPs

    Authors: Michael H. Lim, Tyler J. Becker, Mykel J. Kochenderfer, Claire J. Tomlin, Zachary N. Sunberg

    Abstract: Partially observable Markov decision processes (POMDPs) provide a flexible representation for real-world decision and control problems. However, POMDPs are notoriously difficult to solve, especially when the state and observation spaces are continuous or hybrid, which is often the case for physical systems. While recent online sampling-based POMDP algorithms that plan with observation likelihood w… ▽ More

    Submitted 19 October, 2023; v1 submitted 10 October, 2022; originally announced October 2022.

    Journal ref: Journal of Artificial Intelligence Research, 77, 1591-1636 (2023)

  21. arXiv:2204.07629  [pdf

    q-bio.PE eess.SY

    Navigation between initial and desired community states using shortcuts

    Authors: Benjamin W. Blonder, Michael H. Lim, Zachary Sunberg, Claire Tomlin

    Abstract: Ecological management problems often involve navigating from an initial to a desired community state. We ask whether navigation is possible without brute-force additions and deletions of species, using actions of varying costs: adding/deleting a small number of individuals of a species, changing the environment, and waiting. Navigation can yield direct paths (single sequence of actions) or shortcu… ▽ More

    Submitted 2 December, 2022; v1 submitted 15 April, 2022; originally announced April 2022.

  22. Residual Aligner Network

    Authors: Jian-Qing Zheng, Ziyang Wang, Baoru Huang, Ngee Han Lim, Bartlomiej W. Papiez

    Abstract: Image registration is important for medical imaging, the estimation of the spatial transformation between different images. Many previous studies have used learning-based methods for coarse-to-fine registration to efficiently perform 3D image registration. The coarse-to-fine approach, however, is limited when dealing with the different motions of nearby objects. Here we propose a novel Motion-Awar… ▽ More

    Submitted 7 March, 2022; originally announced March 2022.

  23. arXiv:2112.09456  [pdf, other

    cs.AI cs.LG cs.RO eess.SY

    Compositional Learning-based Planning for Vision POMDPs

    Authors: Sampada Deglurkar, Michael H. Lim, Johnathan Tucker, Zachary N. Sunberg, Aleksandra Faust, Claire J. Tomlin

    Abstract: The Partially Observable Markov Decision Process (POMDP) is a powerful framework for capturing decision-making problems that involve state and transition uncertainty. However, most current POMDP planners cannot effectively handle high-dimensional image observations prevalent in real world applications, and often require lengthy online training that requires interaction with the environment. In thi… ▽ More

    Submitted 2 December, 2022; v1 submitted 17 December, 2021; originally announced December 2021.

  24. arXiv:2110.07716  [pdf

    cs.CV eess.IV

    Adversarial Scene Reconstruction and Object Detection System for Assisting Autonomous Vehicle

    Authors: Md Foysal Haque, Hay-Youn Lim, Dae-Seong Kang

    Abstract: In the current computer vision era classifying scenes through video surveillance systems is a crucial task. Artificial Intelligence (AI) Video Surveillance technologies have been advanced remarkably while artificial intelligence and deep learning ascended into the system. Adopting the superior compounds of deep learning visual classification methods achieved enormous accuracy in classifying visual… ▽ More

    Submitted 13 October, 2021; originally announced October 2021.

  25. arXiv:2109.07578  [pdf, other

    cs.LG cs.AI cs.RO eess.SY

    Multi-Task Learning with Sequence-Conditioned Transporter Networks

    Authors: Michael H. Lim, Andy Zeng, Brian Ichter, Maryam Bandari, Erwin Coumans, Claire Tomlin, Stefan Schaal, Aleksandra Faust

    Abstract: Enabling robots to solve multiple manipulation tasks has a wide range of industrial applications. While learning-based approaches enjoy flexibility and generalizability, scaling these approaches to solve such compositional tasks remains a challenge. In this work, we aim to solve multi-task learning through the lens of sequence-conditioning and weighted sampling. First, we propose a new suite of be… ▽ More

    Submitted 15 September, 2021; originally announced September 2021.

  26. arXiv:2104.06780  [pdf, other

    cs.CV eess.IV

    Towards a Better Understanding of VR Sickness: Physical Symptom Prediction for VR Contents

    Authors: Hak Gu Kim, Sangmin Lee, Seongyeop Kim, Heoun-taek Lim, Yong Man Ro

    Abstract: We address the black-box issue of VR sickness assessment (VRSA) by evaluating the level of physical symptoms of VR sickness. For the VR contents inducing the similar VR sickness level, the physical symptoms can vary depending on the characteristics of the contents. Most of existing VRSA methods focused on assessing the overall VR sickness score. To make better understanding of VR sickness, it is r… ▽ More

    Submitted 14 April, 2021; originally announced April 2021.

    Comments: AAAI 2021

  27. arXiv:2103.05158  [pdf

    cs.CV cs.AI eess.IV

    Deep Learning-based High-precision Depth Map Estimation from Missing Viewpoints for 360 Degree Digital Holography

    Authors: Hakdong Kim, Heonyeong Lim, Minkyu Jee, Yurim Lee, Jisoo Jeong, Kyudam Choi, MinSung Yoon, Cheongwon Kim

    Abstract: In this paper, we propose a novel, convolutional neural network model to extract highly precise depth maps from missing viewpoints, especially well applicable to generate holographic 3D contents. The depth map is an essential element for phase extraction which is required for synthesis of computer-generated hologram (CGH). The proposed model called the HDD Net uses MSE for the better performance o… ▽ More

    Submitted 8 March, 2021; originally announced March 2021.

    Comments: 12 pages, 10 figures, 5 tables

  28. arXiv:2102.09785  [pdf, ps, other

    eess.SP cs.IT cs.LG

    Deep Learning-based Beam Tracking for Millimeter-wave Communications under Mobility

    Authors: Sun Hong Lim, Sunwoo Kim, Byonghyo Shim, Jun Won Choi

    Abstract: In this paper, we propose a deep learning-based beam tracking method for millimeter-wave (mmWave)communications. Beam tracking is employed for transmitting the known symbols using the sounding beams and tracking time-varying channels to maintain a reliable communication link. When the pose of a user equipment (UE) device varies rapidly, the mmWave channels also tend to vary fast, which hinders sea… ▽ More

    Submitted 1 December, 2022; v1 submitted 19 February, 2021; originally announced February 2021.

    Comments: 23 pages, 8 figures

  29. arXiv:2101.10248  [pdf, other

    eess.IV cs.CV

    D-Net: Siamese based Network with Mutual Attention for Volume Alignment

    Authors: Jian-Qing Zheng, Ngee Han Lim, Bartlomiej W. Papiez

    Abstract: Alignment of contrast and non-contrast-enhanced imaging is essential for the quantification of changes in several biomedical applications. In particular, the extraction of cartilage shape from contrast-enhanced Computed Tomography (CT) of tibiae requires accurate alignment of the bone, currently performed manually. Existing deep learning-based methods for alignment require a common template or are… ▽ More

    Submitted 25 January, 2021; originally announced January 2021.

    Comments: this uploaded manuscript is another version of which published in: International Workshop on Shape in Medical Imaging, Springer, 2020, pp. 73-84

    Journal ref: in: International Workshop on Shape in Medical Imaging, Springer, 2020, pp. 73-84

  30. arXiv:2012.10140  [pdf, other

    cs.LG cs.AI cs.RO eess.SY

    Voronoi Progressive Widening: Efficient Online Solvers for Continuous State, Action, and Observation POMDPs

    Authors: Michael H. Lim, Claire J. Tomlin, Zachary N. Sunberg

    Abstract: This paper introduces Voronoi Progressive Widening (VPW), a generalization of Voronoi optimistic optimization (VOO) and action progressive widening to partially observable Markov decision processes (POMDPs). Tree search algorithms can use VPW to effectively handle continuous or hybrid action spaces by efficiently balancing local and global action searching. This paper proposes two VPW-based algori… ▽ More

    Submitted 1 April, 2021; v1 submitted 18 December, 2020; originally announced December 2020.

  31. arXiv:2010.11910  [pdf, other

    cs.SD cs.IR cs.LG eess.AS

    Neural Audio Fingerprint for High-specific Audio Retrieval based on Contrastive Learning

    Authors: Sungkyun Chang, Donmoon Lee, Jeongsoo Park, Hyungui Lim, Kyogu Lee, Karam Ko, Yoonchang Han

    Abstract: Most of existing audio fingerprinting systems have limitations to be used for high-specific audio retrieval at scale. In this work, we generate a low-dimensional representation from a short unit segment of audio, and couple this fingerprint with a fast maximum inner-product search. To this end, we present a contrastive learning framework that derives from the segment-level search objective. Each u… ▽ More

    Submitted 10 February, 2021; v1 submitted 22 October, 2020; originally announced October 2020.

    Comments: ICASSP 2021 (accepted)

  32. arXiv:2010.02477  [pdf, other

    eess.AS cs.CL cs.LG cs.SD stat.ML

    A Unified Deep Learning Framework for Short-Duration Speaker Verification in Adverse Environments

    Authors: Youngmoon Jung, Yeunju Choi, Hyungjun Lim, Hoirin Kim

    Abstract: Speaker verification (SV) has recently attracted considerable research interest due to the growing popularity of virtual assistants. At the same time, there is an increasing requirement for an SV system: it should be robust to short speech segments, especially in noisy and reverberant environments. In this paper, we consider one more important requirement for practical applications: the system sho… ▽ More

    Submitted 6 October, 2020; originally announced October 2020.

    Comments: 19 pages, 10 figures, 13 tables

    Journal ref: in IEEE Access, vol. 8, pp. 175448-175466, 2020

  33. arXiv:2008.01405  [pdf, other

    cs.CV cs.LG cs.RO eess.IV

    MSDPN: Monocular Depth Prediction with Partial Laser Observation using Multi-stage Neural Networks

    Authors: Hyungtae Lim, Hyeonjae Gil, Hyun Myung

    Abstract: In this study, a deep-learning-based multi-stage network architecture called Multi-Stage Depth Prediction Network (MSDPN) is proposed to predict a dense depth map using a 2D LiDAR and a monocular camera. Our proposed network consists of a multi-stage encoder-decoder architecture and Cross Stage Feature Aggregation (CSFA). The proposed multi-stage encoder-decoder architecture alleviates the partial… ▽ More

    Submitted 4 August, 2020; originally announced August 2020.

    Comments: 8 pages, 8 figures, IEEE/RSJ Intelligent Robots and Systems

    ACM Class: I.2.9

  34. arXiv:2007.11747  [pdf, other

    eess.AS cs.LG cs.SD

    Sequential Routing Framework: Fully Capsule Network-based Speech Recognition

    Authors: Kyungmin Lee, Hyunwhan Joe, Hyeontaek Lim, Kwangyoun Kim, Sungsoo Kim, Chang Woo Han, Hong-Gee Kim

    Abstract: Capsule networks (CapsNets) have recently gotten attention as a novel neural architecture. This paper presents the sequential routing framework which we believe is the first method to adapt a CapsNet-only structure to sequence-to-sequence recognition. Input sequences are capsulized then sliced by a window size. Each slice is classified to a label at the corresponding time through iterative routing… ▽ More

    Submitted 1 April, 2021; v1 submitted 22 July, 2020; originally announced July 2020.

    Comments: 42 pages, 8 figures (totally 11 figures), submitted to Computer Speech and Language (Only line numbers were removed from the submitted version)

  35. Robust Precoder for Mitigating Inter-Symbol and Inter-Carrier Interferences in Coherent Optical FBMC/OQAM

    Authors: Khaled Abdulaziz Alaghbari, Heng-Siong Lim, Tawfig Eltaif

    Abstract: In this paper, a new precoder for a coherent optical filter bank multicarrier with offset quadrature amplitude modulation (FBMC/OQAM) system is proposed. The precoder is designed based on an iterative polynomial eigenvalue decomposition (PEVD) algorithm to jointly mitigate the inter-symbol interference (ISI) and inter-carrier interference (ICI). The PEVD algorithm is used to decompose a polynomial… ▽ More

    Submitted 20 January, 2020; originally announced January 2020.

    Comments: 12 pages, 12 figures

    Journal ref: IEEE Photonics Journal, vol. 11, no. 4, pp. 1-15, Aug. 2019

  36. arXiv:1910.13122  [pdf

    cs.CY cs.AI cs.HC cs.LG eess.SY

    Algorithmic decision-making in AVs: Understanding ethical and technical concerns for smart cities

    Authors: Hazel Si Min Lim, Araz Taeihagh

    Abstract: Autonomous Vehicles (AVs) are increasingly embraced around the world to advance smart mobility and more broadly, smart, and sustainable cities. Algorithms form the basis of decision-making in AVs, allowing them to perform driving tasks autonomously, efficiently, and more safely than human drivers and offering various economic, social, and environmental benefits. However, algorithmic decision-makin… ▽ More

    Submitted 29 October, 2019; originally announced October 2019.

    Journal ref: Sustainability, 2019, 11(20), 5791

  37. arXiv:1910.04332  [pdf, other

    cs.LG cs.RO eess.SY stat.ML

    Sparse tree search optimality guarantees in POMDPs with continuous observation spaces

    Authors: Michael H. Lim, Claire J. Tomlin, Zachary N. Sunberg

    Abstract: Partially observable Markov decision processes (POMDPs) with continuous state and observation spaces have powerful flexibility for representing real-world decision and control problems but are notoriously difficult to solve. Recent online sampling-based algorithms that use observation likelihood weighting have shown unprecedented effectiveness in domains with continuous observation spaces. However… ▽ More

    Submitted 5 June, 2023; v1 submitted 9 October, 2019; originally announced October 2019.

  38. arXiv:1910.00341  [pdf, other

    eess.AS cs.IR cs.LG cs.SD stat.ML

    Additional Shared Decoder on Siamese Multi-view Encoders for Learning Acoustic Word Embeddings

    Authors: Myunghun Jung, Hyungjun Lim, Jahyun Goo, Youngmoon Jung, Hoirin Kim

    Abstract: Acoustic word embeddings --- fixed-dimensional vector representations of arbitrary-length words --- have attracted increasing interest in query-by-example spoken term detection. Recently, on the fact that the orthography of text labels partly reflects the phonetic similarity between the words' pronunciation, a multi-view approach has been introduced that jointly learns acoustic and text embeddings… ▽ More

    Submitted 1 October, 2019; originally announced October 2019.

    Comments: Accepted at 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU 2019)

  39. arXiv:1907.11818  [pdf, other

    eess.IV cs.CV cs.LG math.OC

    Momentum-Net: Fast and convergent iterative neural network for inverse problems

    Authors: Il Yong Chun, Zhengyu Huang, Hongki Lim, Jeffrey A. Fessler

    Abstract: Iterative neural networks (INN) are rapidly gaining attention for solving inverse problems in imaging, image processing, and computer vision. INNs combine regression NNs and an iterative model-based image reconstruction (MBIR) algorithm, often leading to both good generalization capability and outperforming reconstruction quality over existing MBIR optimization models. This paper proposes the firs… ▽ More

    Submitted 20 June, 2020; v1 submitted 26 July, 2019; originally announced July 2019.

    Comments: 28 pages, 13 figures, 3 algorithms, 4 tables, submitted revision to IEEE T-PAMI

    Journal ref: IEEE Trans. Pattern Anal. Mach. Intell., 45(5):4915-4931, Apr. 2023

  40. arXiv:1906.08333  [pdf, other

    eess.AS cs.CL cs.LG cs.SD stat.ML

    Spatial Pyramid Encoding with Convex Length Normalization for Text-Independent Speaker Verification

    Authors: Youngmoon Jung, Younggwan Kim, Hyungjun Lim, Yeunju Choi, Hoirin Kim

    Abstract: In this paper, we propose a new pooling method called spatial pyramid encoding (SPE) to generate speaker embeddings for text-independent speaker verification. We first partition the output feature maps from a deep residual network (ResNet) into increasingly fine sub-regions and extract speaker embeddings from each sub-region through a learnable dictionary encoding layer. These embeddings are conca… ▽ More

    Submitted 19 June, 2019; originally announced June 2019.

    Comments: 5 pages, 2 figures, Interspeech 2019

    Journal ref: Proc. of Interspeech 2019, 2019, pp. 4030-4034

  41. arXiv:1906.02327  [pdf, other

    eess.IV cs.LG physics.med-ph stat.ML

    Improved low-count quantitative PET reconstruction with an iterative neural network

    Authors: Hongki Lim, Il Yong Chun, Yuni K. Dewaraja, Jeffrey A. Fessler

    Abstract: Image reconstruction in low-count PET is particularly challenging because gammas from natural radioactivity in Lu-based crystals cause high random fractions that lower the measurement signal-to-noise-ratio (SNR). In model-based image reconstruction (MBIR), using more iterations of an unregularized method may increase the noise, so incorporating regularization into the image reconstruction is desir… ▽ More

    Submitted 25 May, 2020; v1 submitted 5 June, 2019; originally announced June 2019.

  42. arXiv:1905.13536  [pdf, other

    cs.CV cs.LG cs.PF eess.IV stat.ML

    Scaling Video Analytics on Constrained Edge Nodes

    Authors: Christopher Canel, Thomas Kim, Giulio Zhou, Conglong Li, Hyeontaek Lim, David G. Andersen, Michael Kaminsky, Subramanya R. Dulloor

    Abstract: As video camera deployments continue to grow, the need to process large volumes of real-time data strains wide area network infrastructure. When per-camera bandwidth is limited, it is infeasible for applications such as traffic monitoring and pedestrian tracking to offload high-quality video streams to a datacenter. This paper presents FilterForward, a new edge-to-cloud system that enables datacen… ▽ More

    Submitted 24 May, 2019; originally announced May 2019.

    Comments: This paper is an extended version of a paper with the same title published in the 2nd SysML Conference, SysML '19 (Canel et. al., 2019)

  43. arXiv:1904.07386  [pdf, other

    eess.AS cs.CL cs.SD

    I4U Submission to NIST SRE 2018: Leveraging from a Decade of Shared Experiences

    Authors: Kong Aik Lee, Ville Hautamaki, Tomi Kinnunen, Hitoshi Yamamoto, Koji Okabe, Ville Vestman, Jing Huang, Guohong Ding, Hanwu Sun, Anthony Larcher, Rohan Kumar Das, Haizhou Li, Mickael Rouvier, Pierre-Michel Bousquet, Wei Rao, Qing Wang, Chunlei Zhang, Fahimeh Bahmaninezhad, Hector Delgado, Jose Patino, Qiongqiong Wang, Ling Guo, Takafumi Koshinaka, Jiacen Zhang, Koichi Shinoda , et al. (21 additional authors not shown)

    Abstract: The I4U consortium was established to facilitate a joint entry to NIST speaker recognition evaluations (SRE). The latest edition of such joint submission was in SRE 2018, in which the I4U submission was among the best-performing systems. SRE'18 also marks the 10-year anniversary of I4U consortium into NIST SRE series of evaluation. The primary objective of the current paper is to summarize the res… ▽ More

    Submitted 15 April, 2019; originally announced April 2019.

    Comments: 5 pages

  44. arXiv:1811.02736  [pdf, ps, other

    eess.AS cs.AI cs.CL cs.SD eess.SP

    Learning acoustic word embeddings with phonetically associated triplet network

    Authors: Hyungjun Lim, Younggwan Kim, Youngmoon Jung, Myunghun Jung, Hoirin Kim

    Abstract: Previous researches on acoustic word embeddings used in query-by-example spoken term detection have shown remarkable performance improvements when using a triplet network. However, the triplet network is trained using only a limited information about acoustic similarity between words. In this paper, we propose a novel architecture, phonetically associated triplet network (PATN), which aims at incr… ▽ More

    Submitted 27 November, 2018; v1 submitted 6 November, 2018; originally announced November 2018.

    Comments: 5 pages, 4 figures, submitted to ICASSP 2019

  45. arXiv:1712.01011  [pdf

    cs.SD cs.LG eess.AS

    Chord Generation from Symbolic Melody Using BLSTM Networks

    Authors: Hyungui Lim, Seungyeon Rhyu, Kyogu Lee

    Abstract: Generating a chord progression from a monophonic melody is a challenging problem because a chord progression requires a series of layered notes played simultaneously. This paper presents a novel method of generating chord sequences from a symbolic melody using bidirectional long short-term memory (BLSTM) networks trained on a lead sheet database. To this end, a group of feature vectors composed of… ▽ More

    Submitted 4 December, 2017; originally announced December 2017.

    Comments: 18th International Society for Music Information Retrieval Conference (ISMIR 2017)