Skip to main content

Showing 1–50 of 51 results for author: Hwang, J

Searching in archive eess. Search in all archives.
.
  1. arXiv:2507.06585  [pdf, ps, other

    cs.IT eess.SP

    Hybrid Quantum Convolutional Neural Network-Aided Pilot Assignment in Cell-Free Massive MIMO Systems

    Authors: Doan Hieu Nguyen, Xuan Tung Nguyen, Seon-Geun Jeong, Trinh Van Chien, Lajos Hanzo, Won Joo Hwang

    Abstract: A sophisticated hybrid quantum convolutional neural network (HQCNN) is conceived for handling the pilot assignment task in cell-free massive MIMO systems, while maximizing the total ergodic sum throughput. The existing model-based solutions found in the literature are inefficient and/or computationally demanding. Similarly, conventional deep neural networks may struggle in the face of high-dimensi… ▽ More

    Submitted 9 July, 2025; originally announced July 2025.

    Comments: 5 pages, 3 figures, and 2 tables. Accepted by IEEE TVT

  2. arXiv:2505.12233  [pdf, ps, other

    eess.IV cs.CV

    PRETI: Patient-Aware Retinal Foundation Model via Metadata-Guided Representation Learning

    Authors: Yeonkyung Lee, Woojung Han, Youngjun Jun, Hyeonmin Kim, Jungkyung Cho, Seong Jae Hwang

    Abstract: Retinal foundation models have significantly advanced retinal image analysis by leveraging self-supervised learning to reduce dependence on labeled data while achieving strong generalization. Many recent approaches enhance retinal image understanding using report supervision, but obtaining clinical reports is often costly and challenging. In contrast, metadata (e.g., age, gender) is widely availab… ▽ More

    Submitted 18 May, 2025; originally announced May 2025.

    Comments: MICCAI2025 early accept

  3. arXiv:2504.08008  [pdf

    eess.SP

    Estimation of Solar Spectral Irradiance Using Meteorological Data and Analysis of Optimal Conditions for Solar Power Generation

    Authors: Jeonggyu Hwang

    Abstract: This study proposes an approximate model to estimate the solar radiation spectrum intensity in Seoul, Republic of Korea, for the year 2024, aiming to analyze optimal conditions related to energy generation. Since the solar radiation spectrum varies with atmospheric conditions, accurately predicting it typically requires complex spectral radiation models. However, such models entail high computatio… ▽ More

    Submitted 9 April, 2025; originally announced April 2025.

  4. arXiv:2503.18642  [pdf, other

    eess.IV cs.CV

    Rethinking Glaucoma Calibration: Voting-Based Binocular and Metadata Integration

    Authors: Taejin Jeong, Joohyeok Kim, Jaehoon Joo, Yeonwoo Jung, Hyeonmin Kim, Seong Jae Hwang

    Abstract: Glaucoma is an incurable ophthalmic disease that damages the optic nerve, leads to vision loss, and ranks among the leading causes of blindness worldwide. Diagnosing glaucoma typically involves fundus photography, optical coherence tomography (OCT), and visual field testing. However, the high cost of OCT often leads to reliance on fundus photography and visual field testing, both of which exhibit… ▽ More

    Submitted 24 March, 2025; originally announced March 2025.

  5. arXiv:2503.10055  [pdf, other

    cs.CV eess.IV

    Fourier Decomposition for Explicit Representation of 3D Point Cloud Attributes

    Authors: Donghyun Kim, Hyunah Ko, Chanyoung Kim, Seong Jae Hwang

    Abstract: While 3D point clouds are widely utilized across various vision applications, their irregular and sparse nature make them challenging to handle. In response, numerous encoding approaches have been proposed to capture the rich semantic information of point clouds. Yet, a critical limitation persists: a lack of consideration for colored point clouds which are more capable 3D representations as they… ▽ More

    Submitted 13 March, 2025; originally announced March 2025.

  6. arXiv:2503.01907  [pdf, other

    cs.CV eess.IV

    Technical Report for ReID-SAM on SkiTB Visual Tracking Challenge 2025

    Authors: Kunjun Li, Cheng-Yen Yang, Hsiang-Wei Huang, Jenq-Neng Hwang

    Abstract: This report introduces ReID-SAM, a novel model developed for the SkiTB Challenge that addresses the complexities of tracking skier appearance. Our approach integrates the SAMURAI tracker with a person re-identification (Re-ID) module and advanced post-processing techniques to enhance accuracy in challenging skiing scenarios. We employ an OSNet-based Re-ID model to minimize identity switches and ut… ▽ More

    Submitted 28 February, 2025; originally announced March 2025.

    Comments: Technical report for 2nd solution of SkiTB Visual Tracking Challenge (WACV 2025)

  7. arXiv:2502.07243  [pdf, other

    cs.SD cs.AI eess.AS

    Vevo: Controllable Zero-Shot Voice Imitation with Self-Supervised Disentanglement

    Authors: Xueyao Zhang, Xiaohui Zhang, Kainan Peng, Zhenyu Tang, Vimal Manohar, Yingru Liu, Jeff Hwang, Dangna Li, Yuhao Wang, Julian Chan, Yuan Huang, Zhizheng Wu, Mingbo Ma

    Abstract: The imitation of voice, targeted on specific speech attributes such as timbre and speaking style, is crucial in speech generation. However, existing methods rely heavily on annotated data, and struggle with effectively disentangling timbre and style, leading to challenges in achieving controllable generation, especially in zero-shot scenarios. To address these issues, we propose Vevo, a versatile… ▽ More

    Submitted 10 February, 2025; originally announced February 2025.

    Comments: Accepted by ICLR 2025

  8. arXiv:2501.03045  [pdf, other

    eess.AS cs.AI

    Single-Channel Distance-Based Source Separation for Mobile GPU in Outdoor and Indoor Environments

    Authors: Hanbin Bae, Byungjun Kang, Jiwon Kim, Jaeyong Hwang, Hosang Sung, Hoon-Young Cho

    Abstract: This study emphasizes the significance of exploring distance-based source separation (DSS) in outdoor environments. Unlike existing studies that primarily focus on indoor settings, the proposed model is designed to capture the unique characteristics of outdoor audio sources. It incorporates advanced techniques, including a two-stage conformer block, a linear relation-aware self-attention (RSA), an… ▽ More

    Submitted 6 January, 2025; originally announced January 2025.

    Comments: Accepted by ICASSP2025. \c{opyright} 2025 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component

  9. arXiv:2410.06542  [pdf, other

    eess.IV cs.CV

    MedImageInsight: An Open-Source Embedding Model for General Domain Medical Imaging

    Authors: Noel C. F. Codella, Ying Jin, Shrey Jain, Yu Gu, Ho Hin Lee, Asma Ben Abacha, Alberto Santamaria-Pang, Will Guyman, Naiteek Sangani, Sheng Zhang, Hoifung Poon, Stephanie Hyland, Shruthi Bannur, Javier Alvarez-Valle, Xue Li, John Garrett, Alan McMillan, Gaurav Rajguru, Madhu Maddi, Nilesh Vijayrania, Rehaan Bhimai, Nick Mecklenburg, Rupal Jain, Daniel Holstein, Naveen Gaur , et al. (6 additional authors not shown)

    Abstract: In this work, we present MedImageInsight, an open-source medical imaging embedding model. MedImageInsight is trained on medical images with associated text and labels across a diverse collection of domains, including X-Ray, CT, MRI, dermoscopy, OCT, fundus photography, ultrasound, histopathology, and mammography. Rigorous evaluations demonstrate MedImageInsight's ability to achieve state-of-the-ar… ▽ More

    Submitted 9 October, 2024; originally announced October 2024.

  10. arXiv:2409.16552  [pdf

    q-bio.NC eess.SY

    Device for detection of activity-dependent changes in neural spheroids at MHz and GHz frequencies

    Authors: Saeed Omidi, Gianluca Fabi, Xiaopeng Wang, James C. M. Hwang, Yevgeny Berdichevsky

    Abstract: Intracellular processes triggered by neural activity include changes in ionic concentrations, protein release, and synaptic vesicle cycling. These processes play significant roles in neurological disorders. The beneficial effects of brain stimulation may also be mediated through intracellular changes. There is a lack of label-free techniques for monitoring activity-dependent intracellular changes.… ▽ More

    Submitted 24 September, 2024; originally announced September 2024.

  11. arXiv:2407.13930  [pdf, other

    cs.CV cs.AI eess.SP

    RT-Pose: A 4D Radar Tensor-based 3D Human Pose Estimation and Localization Benchmark

    Authors: Yuan-Hao Ho, Jen-Hao Cheng, Sheng Yao Kuan, Zhongyu Jiang, Wenhao Chai, Hsiang-Wei Huang, Chih-Lung Lin, Jenq-Neng Hwang

    Abstract: Traditional methods for human localization and pose estimation (HPE), which mainly rely on RGB images as an input modality, confront substantial limitations in real-world applications due to privacy concerns. In contrast, radar-based HPE methods emerge as a promising alternative, characterized by distinctive attributes such as through-wall recognition and privacy-preserving, rendering the method m… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: ECCV 2024

  12. arXiv:2407.07517  [pdf, other

    eess.IV cs.CV

    Parameter Efficient Fine Tuning for Multi-scanner PET to PET Reconstruction

    Authors: Yumin Kim, Gayoon Choi, Seong Jae Hwang

    Abstract: Reducing scan time in Positron Emission Tomography (PET) imaging while maintaining high-quality images is crucial for minimizing patient discomfort and radiation exposure. Due to the limited size of datasets and distribution discrepancy across scanners in medical imaging, fine-tuning in a parameter-efficient and effective manner is on the rise. Motivated by the potential of Parameter-Efficient Fin… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

  13. arXiv:2407.05059  [pdf, other

    eess.IV cs.CV

    Slice-Consistent 3D Volumetric Brain CT-to-MRI Translation with 2D Brownian Bridge Diffusion Model

    Authors: Kyobin Choo, Youngjun Jun, Mijin Yun, Seong Jae Hwang

    Abstract: In neuroimaging, generally, brain CT is more cost-effective and accessible imaging option compared to MRI. Nevertheless, CT exhibits inferior soft-tissue contrast and higher noise levels, yielding less precise structural clarity. In response, leveraging more readily available CT to construct its counterpart MRI, namely, medical image-to-image translation (I2I), serves as a promising solution. Part… ▽ More

    Submitted 6 July, 2024; originally announced July 2024.

    Comments: 13 pages, 7 figures, Early accepted at Medical Image Computing and Computer Assisted Intervention (MICCAI) 2024

    ACM Class: I.4.5; I.4.9; J.3

  14. arXiv:2406.02560  [pdf, other

    eess.AS cs.AI cs.CL cs.LG

    Less Peaky and More Accurate CTC Forced Alignment by Label Priors

    Authors: Ruizhe Huang, Xiaohui Zhang, Zhaoheng Ni, Li Sun, Moto Hira, Jeff Hwang, Vimal Manohar, Vineel Pratap, Matthew Wiesner, Shinji Watanabe, Daniel Povey, Sanjeev Khudanpur

    Abstract: Connectionist temporal classification (CTC) models are known to have peaky output distributions. Such behavior is not a problem for automatic speech recognition (ASR), but it can cause inaccurate forced alignments (FA), especially at finer granularity, e.g., phoneme level. This paper aims at alleviating the peaky behavior for CTC and improve its suitability for forced alignment generation, by leve… ▽ More

    Submitted 18 July, 2024; v1 submitted 22 April, 2024; originally announced June 2024.

    Comments: Accepted by ICASSP 2024. Github repo: https://github.com/huangruizhe/audio/tree/aligner_label_priors

  15. arXiv:2406.02557  [pdf, other

    eess.IV cs.AI cs.CV cs.MM

    EVAN: Evolutional Video Streaming Adaptation via Neural Representation

    Authors: Mufan Liu, Le Yang, Yiling Xu, Ye-kui Wang, Jenq-Neng Hwang

    Abstract: Adaptive bitrate (ABR) using conventional codecs cannot further modify the bitrate once a decision has been made, exhibiting limited adaptation capability. This may result in either overly conservative or overly aggressive bitrate selection, which could cause either inefficient utilization of the network bandwidth or frequent re-buffering, respectively. Neural representation for video (NeRV), whic… ▽ More

    Submitted 15 April, 2024; originally announced June 2024.

    Comments: accepted by ICME (conference)

  16. arXiv:2401.10285  [pdf

    eess.SP cs.LG q-bio.NC

    Analyzing Brain Activity During Learning Tasks with EEG and Machine Learning

    Authors: Ryan Cho, Mobasshira Zaman, Kyu Taek Cho, Jaejin Hwang

    Abstract: This study aimed to analyze brain activity during various STEM activities, exploring the feasibility of classifying between different tasks. EEG brain data from twenty subjects engaged in five cognitive tasks were collected and segmented into 4-second clips. Power spectral densities of brain frequency waves were then analyzed. Testing different k-intervals with XGBoost, Random Forest, and Bagging… ▽ More

    Submitted 15 January, 2024; originally announced January 2024.

    Comments: 20 pages, 7 figures

  17. arXiv:2401.07889  [pdf

    cs.LG cs.AI eess.SP

    Machine Learning Techniques to Identify Hand Gestures amidst Forearm Muscle Signals

    Authors: Ryan Cho, Sunil Patel, Kyu Taek Cho, Jaejin Hwang

    Abstract: This study investigated the use of forearm EMG data for distinguishing eight hand gestures, employing the Neural Network and Random Forest algorithms on data from ten participants. The Neural Network achieved 97 percent accuracy with 1000-millisecond windows, while the Random Forest achieved 85 percent accuracy with 200-millisecond windows. Larger window sizes improved gesture classification due t… ▽ More

    Submitted 15 January, 2024; originally announced January 2024.

    Comments: 21 pages, 7 figures

  18. arXiv:2311.10261  [pdf, other

    cs.CV eess.SP

    Vision meets mmWave Radar: 3D Object Perception Benchmark for Autonomous Driving

    Authors: Yizhou Wang, Jen-Hao Cheng, Jui-Te Huang, Sheng-Yao Kuan, Qiqian Fu, Chiming Ni, Shengyu Hao, Gaoang Wang, Guanbin Xing, Hui Liu, Jenq-Neng Hwang

    Abstract: Sensor fusion is crucial for an accurate and robust perception system on autonomous vehicles. Most existing datasets and perception solutions focus on fusing cameras and LiDAR. However, the collaboration between camera and radar is significantly under-exploited. The incorporation of rich semantic information from the camera, and reliable 3D information from the radar can potentially achieve an eff… ▽ More

    Submitted 16 November, 2023; originally announced November 2023.

  19. arXiv:2310.17864  [pdf, other

    eess.AS cs.SD

    TorchAudio 2.1: Advancing speech recognition, self-supervised learning, and audio processing components for PyTorch

    Authors: Jeff Hwang, Moto Hira, Caroline Chen, Xiaohui Zhang, Zhaoheng Ni, Guangzhi Sun, Pingchuan Ma, Ruizhe Huang, Vineel Pratap, Yuekai Zhang, Anurag Kumar, Chin-Yun Yu, Chuang Zhu, Chunxi Liu, Jacob Kahn, Mirco Ravanelli, Peng Sun, Shinji Watanabe, Yangyang Shi, Yumeng Tao, Robin Scheibler, Samuele Cornell, Sean Kim, Stavros Petridis

    Abstract: TorchAudio is an open-source audio and speech processing library built for PyTorch. It aims to accelerate the research and development of audio and speech technologies by providing well-designed, easy-to-use, and performant PyTorch components. Its contributors routinely engage with users to understand their needs and fulfill them by developing impactful features. Here, we survey TorchAudio's devel… ▽ More

    Submitted 26 October, 2023; originally announced October 2023.

  20. arXiv:2309.06770  [pdf, other

    eess.IV eess.SY

    Deep Learning-based Synthetic High-Resolution In-Depth Imaging Using an Attachable Dual-element Endoscopic Ultrasound Probe

    Authors: Hah Min Lew, Jae Seong Kim, Moon Hwan Lee, Jaegeun Park, Sangyeon Youn, Hee Man Kim, Jihun Kim, Jae Youn Hwang

    Abstract: Endoscopic ultrasound (EUS) imaging has a trade-off between resolution and penetration depth. By considering the in-vivo characteristics of human organs, it is necessary to provide clinicians with appropriate hardware specifications for precise diagnosis. Recently, super-resolution (SR) ultrasound imaging studies, including the SR task in deep learning fields, have been reported for enhancing ultr… ▽ More

    Submitted 13 September, 2023; originally announced September 2023.

    Comments: 10 pages, 9 figures

  21. arXiv:2309.05287  [pdf, other

    cs.SD cs.AI eess.AS

    Addressing Feature Imbalance in Sound Source Separation

    Authors: Jaechang Kim, Jeongyeon Hwang, Soheun Yi, Jaewoong Cho, Jungseul Ok

    Abstract: Neural networks often suffer from a feature preference problem, where they tend to overly rely on specific features to solve a task while disregarding other features, even if those neglected features are essential for the task. Feature preference problems have primarily been investigated in classification task. However, we observe that feature preference occurs in high-dimensional regression task,… ▽ More

    Submitted 4 October, 2023; v1 submitted 11 September, 2023; originally announced September 2023.

  22. arXiv:2306.07489  [pdf, other

    eess.AS cs.AI cs.SD eess.SP

    PauseSpeech: Natural Speech Synthesis via Pre-trained Language Model and Pause-based Prosody Modeling

    Authors: Ji-Sang Hwang, Sang-Hoon Lee, Seong-Whan Lee

    Abstract: Although text-to-speech (TTS) systems have significantly improved, most TTS systems still have limitations in synthesizing speech with appropriate phrasing. For natural speech synthesis, it is important to synthesize the speech with a phrasing structure that groups words into phrases based on semantic information. In this paper, we propose PuaseSpeech, a speech synthesis system with a pre-trained… ▽ More

    Submitted 12 June, 2023; originally announced June 2023.

    Comments: 13 pages, 4 figures, 3 tables, under reivew

  23. arXiv:2306.06814  [pdf, other

    eess.AS cs.AI cs.SD eess.SP

    HiddenSinger: High-Quality Singing Voice Synthesis via Neural Audio Codec and Latent Diffusion Models

    Authors: Ji-Sang Hwang, Sang-Hoon Lee, Seong-Whan Lee

    Abstract: Recently, denoising diffusion models have demonstrated remarkable performance among generative models in various domains. However, in the speech domain, the application of diffusion models for synthesizing time-varying audio faces limitations in terms of complexity and controllability, as speech synthesis requires very high-dimensional samples with long-term acoustic features. To alleviate the cha… ▽ More

    Submitted 11 June, 2023; originally announced June 2023.

    Comments: 11 pages, 5 figures, 5 tables, under review

  24. arXiv:2305.13831  [pdf, other

    cs.SD cs.CL eess.AS

    ZET-Speech: Zero-shot adaptive Emotion-controllable Text-to-Speech Synthesis with Diffusion and Style-based Models

    Authors: Minki Kang, Wooseok Han, Sung Ju Hwang, Eunho Yang

    Abstract: Emotional Text-To-Speech (TTS) is an important task in the development of systems (e.g., human-like dialogue agents) that require natural and emotional speech. Existing approaches, however, only aim to produce emotional TTS for seen speakers during training, without consideration of the generalization to unseen speakers. In this paper, we propose ZET-Speech, a zero-shot adaptive emotion-controllab… ▽ More

    Submitted 23 May, 2023; originally announced May 2023.

    Comments: Accepted by INTERSPEECH 2023

  25. arXiv:2305.05356  [pdf, other

    cs.CV cs.MM eess.IV

    Learning Dynamic Point Cloud Compression via Hierarchical Inter-frame Block Matching

    Authors: Shuting Xia, Tingyu Fan, Yiling Xu, Jenq-Neng Hwang, Zhu Li

    Abstract: 3D dynamic point cloud (DPC) compression relies on mining its temporal context, which faces significant challenges due to DPC's sparsity and non-uniform structure. Existing methods are limited in capturing sufficient temporal dependencies. Therefore, this paper proposes a learning-based DPC compression framework via hierarchical block-matching-based inter-prediction module to compensate and compre… ▽ More

    Submitted 16 May, 2023; v1 submitted 9 May, 2023; originally announced May 2023.

    Comments: 9 pages for the main body, 3 pages for the supplemental after References

  26. arXiv:2303.14936  [pdf, ps, other

    math.OC eess.SY

    TALOS: A toolbox for spacecraft conceptual design

    Authors: Victor Gandarillas, John T. Hwang

    Abstract: We present the Toolbox for Analysis and Large-scale Optimization of Spacecraft (TALOS), a framework designed for applying large-scale multidisciplinary design optimization (MDO) to spacecraft design problems. The framework is built using the Computational System Design Language (CSDL), with abstractions for users to describe systems at a high level. CSDL is a compiled, embedded domain-specific lan… ▽ More

    Submitted 27 March, 2023; originally announced March 2023.

  27. arXiv:2303.01105  [pdf, other

    eess.IV cs.CV cs.LG

    Evidence-empowered Transfer Learning for Alzheimer's Disease

    Authors: Kai Tzu-iunn Ong, Hana Kim, Minjin Kim, Jinseong Jang, Beomseok Sohn, Yoon Seong Choi, Dosik Hwang, Seong Jae Hwang, Jinyoung Yeo

    Abstract: Transfer learning has been widely utilized to mitigate the data scarcity problem in the field of Alzheimer's disease (AD). Conventional transfer learning relies on re-using models trained on AD-irrelevant tasks such as natural image classification. However, it often leads to negative transfer due to the discrepancy between the non-medical source and target medical domains. To address this, we pres… ▽ More

    Submitted 17 April, 2023; v1 submitted 2 March, 2023; originally announced March 2023.

    Comments: Accepted to IEEE International Symposium on Biomedical Imaging (ISBI) 2023. The authorship was changed from co-first authors to a single first author, which was authorized by the adviser/corresponding author Jinyoung Yeo (Apr 18th, 2023)

  28. arXiv:2211.09383  [pdf, other

    eess.AS cs.AI cs.SD

    Grad-StyleSpeech: Any-speaker Adaptive Text-to-Speech Synthesis with Diffusion Models

    Authors: Minki Kang, Dongchan Min, Sung Ju Hwang

    Abstract: There has been a significant progress in Text-To-Speech (TTS) synthesis technology in recent years, thanks to the advancement in neural generative modeling. However, existing methods on any-speaker adaptive TTS have achieved unsatisfactory performance, due to their suboptimal accuracy in mimicking the target speakers' styles. In this work, we present Grad-StyleSpeech, which is an any-speaker adapt… ▽ More

    Submitted 13 March, 2023; v1 submitted 17 November, 2022; originally announced November 2022.

    Comments: ICASSP 2023

  29. arXiv:2208.10922  [pdf, other

    cs.CV cs.LG eess.AS eess.IV

    StyleTalker: One-shot Style-based Audio-driven Talking Head Video Generation

    Authors: Dongchan Min, Minyoung Song, Eunji Ko, Sung Ju Hwang

    Abstract: We propose StyleTalker, a novel audio-driven talking head generation model that can synthesize a video of a talking person from a single reference image with accurately audio-synced lip shapes, realistic head poses, and eye blinks. Specifically, by leveraging a pretrained image generator and an image encoder, we estimate the latent codes of the talking head video that faithfully reflects the given… ▽ More

    Submitted 15 March, 2024; v1 submitted 23 August, 2022; originally announced August 2022.

  30. arXiv:2204.02403  [pdf, other

    eess.IV cs.CV

    Explainable Deep Learning Algorithm for Distinguishing Incomplete Kawasaki Disease by Coronary Artery Lesions on Echocardiographic Imaging

    Authors: Haeyun Lee, Yongsoon Eun, Jae Youn Hwang, Lucy Youngmin Eun

    Abstract: Background and Objective: Incomplete Kawasaki disease (KD) has often been misdiagnosed due to a lack of the clinical manifestations of classic KD. However, it is associated with a markedly higher prevalence of coronary artery lesions. Identifying coronary artery lesions by echocardiography is important for the timely diagnosis of and favorable outcomes in KD. Moreover, similar to KD, coronavirus d… ▽ More

    Submitted 5 April, 2022; originally announced April 2022.

  31. arXiv:2111.08988  [pdf, other

    cs.GR cs.LG eess.IV eess.SP

    LVAC: Learned Volumetric Attribute Compression for Point Clouds using Coordinate Based Networks

    Authors: Berivan Isik, Philip A. Chou, Sung Jin Hwang, Nick Johnston, George Toderici

    Abstract: We consider the attributes of a point cloud as samples of a vector-valued volumetric function at discrete positions. To compress the attributes given the positions, we compress the parameters of the volumetric function. We model the volumetric function by tiling space into blocks, and representing the function over each block by shifts of a coordinate-based, or implicit, neural network. Inputs to… ▽ More

    Submitted 17 November, 2021; originally announced November 2021.

    Comments: 30 pages, 29 figures

  32. arXiv:2110.15018  [pdf, other

    eess.AS cs.SD

    TorchAudio: Building Blocks for Audio and Speech Processing

    Authors: Yao-Yuan Yang, Moto Hira, Zhaoheng Ni, Anjali Chourdia, Artyom Astafurov, Caroline Chen, Ching-Feng Yeh, Christian Puhrsch, David Pollack, Dmitriy Genzel, Donny Greenberg, Edward Z. Yang, Jason Lian, Jay Mahadeokar, Jeff Hwang, Ji Chen, Peter Goldsborough, Prabhat Roy, Sean Narenthiran, Shinji Watanabe, Soumith Chintala, Vincent Quenneville-Bélair, Yangyang Shi

    Abstract: This document describes version 0.10 of TorchAudio: building blocks for machine learning applications in the audio and speech processing domain. The objective of TorchAudio is to accelerate the development and deployment of machine learning applications for researchers and engineers by providing off-the-shelf building blocks. The building blocks are designed to be GPU-compatible, automatically dif… ▽ More

    Submitted 16 February, 2022; v1 submitted 28 October, 2021; originally announced October 2021.

    Comments: Accepted by ICASSP 2022

  33. arXiv:2107.09049  [pdf

    eess.IV cs.AI cs.LG

    Deep Open Snake Tracker for Vessel Tracing

    Authors: Li Chen, Wenjin Liu, Niranjan Balu, Mahmud Mossa-Basha, Thomas S. Hatsukami, Jenq-Neng Hwang, Chun Yuan

    Abstract: Vessel tracing by modeling vascular structures in 3D medical images with centerlines and radii can provide useful information for vascular health. Existing algorithms have been developed but there are certain persistent problems such as incomplete or inaccurate vessel tracing, especially in complicated vascular beds like the intracranial arteries. We propose here a deep learning based open curve a… ▽ More

    Submitted 19 July, 2021; originally announced July 2021.

    Comments: MICCAI 2021

  34. arXiv:2106.03153  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Meta-StyleSpeech : Multi-Speaker Adaptive Text-to-Speech Generation

    Authors: Dongchan Min, Dong Bok Lee, Eunho Yang, Sung Ju Hwang

    Abstract: With rapid progress in neural text-to-speech (TTS) models, personalized speech generation is now in high demand for many applications. For practical applicability, a TTS model should generate high-quality speech with only a few audio samples from the given speaker, that are also short in length. However, existing methods either require to fine-tune the model or achieve low adaptation quality witho… ▽ More

    Submitted 16 June, 2021; v1 submitted 6 June, 2021; originally announced June 2021.

    Comments: Accepted by ICML 2021

  35. arXiv:2102.13147  [pdf, other

    cs.CV cs.LG eess.IV

    Multi-Domain Learning by Meta-Learning: Taking Optimal Steps in Multi-Domain Loss Landscapes by Inner-Loop Learning

    Authors: Anthony Sicilia, Xingchen Zhao, Davneet Minhas, Erin O'Connor, Howard Aizenstein, William Klunk, Dana Tudorascu, Seong Jae Hwang

    Abstract: We consider a model-agnostic solution to the problem of Multi-Domain Learning (MDL) for multi-modal applications. Many existing MDL techniques are model-dependent solutions which explicitly require nontrivial architectural changes to construct domain-specific modules. Thus, properly applying these MDL techniques for new problems with well-established models, e.g. U-Net for semantic segmentation, m… ▽ More

    Submitted 25 February, 2021; originally announced February 2021.

    Comments: IEEE International Symposium on Biomedical Imaging 2021

  36. RODNet: A Real-Time Radar Object Detection Network Cross-Supervised by Camera-Radar Fused Object 3D Localization

    Authors: Yizhou Wang, Zhongyu Jiang, Yudong Li, Jenq-Neng Hwang, Guanbin Xing, Hui Liu

    Abstract: Various autonomous or assisted driving strategies have been facilitated through the accurate and reliable perception of the environment around a vehicle. Among the commonly used sensors, radar has usually been considered as a robust and cost-effective solution even in adverse driving scenarios, e.g., weak/strong lighting or bad weather. Instead of considering to fuse the unreliable information fro… ▽ More

    Submitted 9 February, 2021; originally announced February 2021.

    Comments: IEEE Journal of Selected Topics in Signal Processing Special Issue on Recent Advances in Automotive Radar Signal Processing. arXiv admin note: text overlap with arXiv:2003.01816

  37. arXiv:2009.06943  [pdf, other

    eess.IV cs.CV

    AIM 2020 Challenge on Efficient Super-Resolution: Methods and Results

    Authors: Kai Zhang, Martin Danelljan, Yawei Li, Radu Timofte, Jie Liu, Jie Tang, Gangshan Wu, Yu Zhu, Xiangyu He, Wenjie Xu, Chenghua Li, Cong Leng, Jian Cheng, Guangyang Wu, Wenyi Wang, Xiaohong Liu, Hengyuan Zhao, Xiangtao Kong, Jingwen He, Yu Qiao, Chao Dong, Xiaotong Luo, Liang Chen, Jiangtao Zhang, Maitreya Suin , et al. (60 additional authors not shown)

    Abstract: This paper reviews the AIM 2020 challenge on efficient single image super-resolution with focus on the proposed solutions and results. The challenge task was to super-resolve an input image with a magnification factor x4 based on a set of prior examples of low and corresponding high resolution images. The goal is to devise a network that reduces one or several aspects such as runtime, parameter co… ▽ More

    Submitted 15 September, 2020; originally announced September 2020.

  38. arXiv:2009.05407  [pdf, other

    eess.SP cs.LG

    TRIER: Template-Guided Neural Networks for Robust and Interpretable Sleep Stage Identification from EEG Recordings

    Authors: Taeheon Lee, Jeonghwan Hwang, Honggu Lee

    Abstract: Neural networks often obtain sub-optimal representations during training, which degrade robustness as well as classification performances. This is a severe problem in applying deep learning to bio-medical domains, since models are vulnerable to being harmed by irregularities and scarcities in data. In this study, we propose a pre-training technique that handles this challenge in sleep staging task… ▽ More

    Submitted 9 September, 2020; originally announced September 2020.

    Comments: 10 pages, 5 figures, written in cikm format

  39. arXiv:2008.12912  [pdf, other

    eess.IV cs.CV cs.LG

    Multi-Attention Based Ultra Lightweight Image Super-Resolution

    Authors: Abdul Muqeet, Jiwon Hwang, Subin Yang, Jung Heum Kang, Yongwoo Kim, Sung-Ho Bae

    Abstract: Lightweight image super-resolution (SR) networks have the utmost significance for real-world applications. There are several deep learning based SR methods with remarkable performance, but their memory and computational cost are hindrances in practical usage. To tackle this problem, we propose a Multi-Attentive Feature Fusion Super-Resolution Network (MAFFSRN). MAFFSRN consists of proposed feature… ▽ More

    Submitted 21 September, 2020; v1 submitted 29 August, 2020; originally announced August 2020.

    Comments: ECCVW AIM2020

  40. arXiv:2007.14472  [pdf

    cs.CV cs.LG eess.IV stat.ML

    Automated Intracranial Artery Labeling using a Graph Neural Network and Hierarchical Refinement

    Authors: Li Chen, Thomas Hatsukami, Jenq-Neng Hwang, Chun Yuan

    Abstract: Automatically labeling intracranial arteries (ICA) with their anatomical names is beneficial for feature extraction and detailed analysis of intracranial vascular structures. There are significant variations in the ICA due to natural and pathological causes, making it challenging for automated labeling. However, the existing public dataset for evaluation of anatomical labeling is limited. We const… ▽ More

    Submitted 11 July, 2020; originally announced July 2020.

    Comments: MICCAI 2020

  41. arXiv:2007.03034  [pdf, other

    cs.IT eess.IV

    Nonlinear Transform Coding

    Authors: Johannes Ballé, Philip A. Chou, David Minnen, Saurabh Singh, Nick Johnston, Eirikur Agustsson, Sung Jin Hwang, George Toderici

    Abstract: We review a class of methods that can be collected under the name nonlinear transform coding (NTC), which over the past few years have become competitive with the best linear transform codecs for images, and have superseded them in terms of rate--distortion performance under established perceptual quality metrics such as MS-SSIM. We assess the empirical rate--distortion performance of NTC with the… ▽ More

    Submitted 23 October, 2020; v1 submitted 6 July, 2020; originally announced July 2020.

    Comments: 17 pages, 14 figures. Accepted for publication in IEEE Journal of Selected Topics in Signal Processing

  42. arXiv:2006.13304  [pdf, ps, other

    physics.geo-ph cs.LG eess.SP stat.ML

    Connectivity-informed Drainage Network Generation using Deep Convolution Generative Adversarial Networks

    Authors: Sung Eun Kim, Yongwon Seo, Junshik Hwang, Hongkyu Yoon, Jonghyun Lee

    Abstract: Stochastic network modeling is often limited by high computational costs to generate a large number of networks enough for meaningful statistical evaluation. In this study, Deep Convolutional Generative Adversarial Networks (DCGANs) were applied to quickly reproduce drainage networks from the already generated network samples without repetitive long modeling of the stochastic network model, Gibb's… ▽ More

    Submitted 16 June, 2020; originally announced June 2020.

    Comments: 16 pages; 9 figures; Python and Matlab scripts used in this paper can be found in https://github.com/saint-kim/RiverDCGANs

  43. arXiv:2004.11819  [pdf

    cs.CV cs.LG eess.IV stat.ML

    Domain Adaptive Transfer Attack (DATA)-based Segmentation Networks for Building Extraction from Aerial Images

    Authors: Younghwan Na, Jun Hee Kim, Kyungsu Lee, Juhum Park, Jae Youn Hwang, Jihwan P. Choi

    Abstract: Semantic segmentation models based on convolutional neural networks (CNNs) have gained much attention in relation to remote sensing and have achieved remarkable performance for the extraction of buildings from high-resolution aerial images. However, the issue of limited generalization for unseen images remains. When there is a domain gap between the training and test datasets, CNN-based segmentati… ▽ More

    Submitted 29 April, 2020; v1 submitted 11 April, 2020; originally announced April 2020.

    Comments: 11pages, 12 figures

  44. arXiv:2004.02863  [pdf, other

    eess.AS cs.LG cs.SD stat.ML

    Meta-Learning for Short Utterance Speaker Recognition with Imbalance Length Pairs

    Authors: Seong Min Kye, Youngmoon Jung, Hae Beom Lee, Sung Ju Hwang, Hoirin Kim

    Abstract: In practical settings, a speaker recognition system needs to identify a speaker given a short utterance, while the enrollment utterance may be relatively long. However, existing speaker recognition models perform poorly with such short utterances. To solve this problem, we introduce a meta-learning framework for imbalance length pairs. Specifically, we use a Prototypical Networks and train it with… ▽ More

    Submitted 10 August, 2020; v1 submitted 6 April, 2020; originally announced April 2020.

    Comments: Accepted to Interspeech 2020. The codes are available at https://github.com/seongmin-kye/meta-SR

  45. arXiv:2003.01816  [pdf, other

    cs.CV eess.SP

    RODNet: Radar Object Detection Using Cross-Modal Supervision

    Authors: Yizhou Wang, Zhongyu Jiang, Xiangyu Gao, Jenq-Neng Hwang, Guanbin Xing, Hui Liu

    Abstract: Radar is usually more robust than the camera in severe driving scenarios, e.g., weak/strong lighting and bad weather. However, unlike RGB images captured by a camera, the semantic information from the radar signals is noticeably difficult to extract. In this paper, we propose a deep radar object detection network (RODNet), to effectively detect objects purely from the carefully processed radar fre… ▽ More

    Submitted 8 February, 2021; v1 submitted 3 March, 2020; originally announced March 2020.

    Comments: Accepted by WACV 2021, 10 pages, 9 figures, 3 tables. Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 2021

  46. arXiv:1911.12990   

    cs.CV cs.LG eess.IV

    Semi-Relaxed Quantization with DropBits: Training Low-Bit Neural Networks via Bit-wise Regularization

    Authors: Jung Hyun Lee, Jihun Yun, Sung Ju Hwang, Eunho Yang

    Abstract: Network quantization, which aims to reduce the bit-lengths of the network weights and activations, has emerged as one of the key ingredients to reduce the size of neural networks for their deployments to resource-limited devices. In order to overcome the nature of transforming continuous activations and weights to discrete ones, recent study called Relaxed Quantization (RQ) [Louizos et al. 2019] s… ▽ More

    Submitted 7 September, 2021; v1 submitted 29 November, 2019; originally announced November 2019.

    Comments: New submission with another link

  47. Eye in the Sky: Drone-Based Object Tracking and 3D Localization

    Authors: Haotian Zhang, Gaoang Wang, Zhichao Lei, Jenq-Neng Hwang

    Abstract: Drones, or general UAVs, equipped with a single camera have been widely deployed to a broad range of applications, such as aerial photography, fast goods delivery and most importantly, surveillance. Despite the great progress achieved in computer vision algorithms, these algorithms are not usually optimized for dealing with images or video sequences acquired by drones, due to various challenges su… ▽ More

    Submitted 18 October, 2019; originally announced October 2019.

    Comments: Accepted to ACMMM2019

  48. arXiv:1910.06962  [pdf, other

    cs.CV cs.LG eess.IV

    SegSort: Segmentation by Discriminative Sorting of Segments

    Authors: Jyh-Jing Hwang, Stella X. Yu, Jianbo Shi, Maxwell D. Collins, Tien-Ju Yang, Xiao Zhang, Liang-Chieh Chen

    Abstract: Almost all existing deep learning approaches for semantic segmentation tackle this task as a pixel-wise classification problem. Yet humans understand a scene not in terms of pixels, but by decomposing it into perceptual groups and structures that are the basic building blocks of recognition. This motivates us to propose an end-to-end pixel-wise metric learning approach that mimics this process. In… ▽ More

    Submitted 30 October, 2019; v1 submitted 15 October, 2019; originally announced October 2019.

    Comments: In ICCV 2019. Webpage & Code: https://jyhjinghwang.github.io/projects/segsort.html

  49. arXiv:1909.02087  [pdf

    eess.IV

    Automated Artery Localization and Vessel Wall Segmentation of Magnetic Resonance Vessel Wall Images using Tracklet Refinement and Polar Conversion

    Authors: Li Chen, Jie Sun, Gador Canton, Niranjan Balu, Xihai Zhao, Rui Li, Thomas S. Hatsukami, Jenq-Neng Hwang, Chun Yuan

    Abstract: Quantitative analysis of vessel wall structures by automated vessel wall segmentation provides useful imaging biomarkers in evaluating atherosclerotic lesions and plaque progression time-efficiently. To quantify vessel wall features, drawing lumen and outer wall contours of the artery of interest is required. To alleviate manual labor in contour drawing, some computer-assisted tools exist, but man… ▽ More

    Submitted 4 September, 2019; originally announced September 2019.

  50. arXiv:1802.01436  [pdf, other

    eess.IV cs.IT

    Variational image compression with a scale hyperprior

    Authors: Johannes Ballé, David Minnen, Saurabh Singh, Sung Jin Hwang, Nick Johnston

    Abstract: We describe an end-to-end trainable model for image compression based on variational autoencoders. The model incorporates a hyperprior to effectively capture spatial dependencies in the latent representation. This hyperprior relates to side information, a concept universal to virtually all modern image codecs, but largely unexplored in image compression using artificial neural networks (ANNs). Unl… ▽ More

    Submitted 1 May, 2018; v1 submitted 31 January, 2018; originally announced February 2018.

    Comments: accepted as a conference contribution to International Conference on Learning Representations 2018