Skip to main content

Showing 1–31 of 31 results for author: Bao, H

Searching in archive eess. Search in all archives.
.
  1. arXiv:2501.03045  [pdf, other

    eess.AS cs.AI

    Single-Channel Distance-Based Source Separation for Mobile GPU in Outdoor and Indoor Environments

    Authors: Hanbin Bae, Byungjun Kang, Jiwon Kim, Jaeyong Hwang, Hosang Sung, Hoon-Young Cho

    Abstract: This study emphasizes the significance of exploring distance-based source separation (DSS) in outdoor environments. Unlike existing studies that primarily focus on indoor settings, the proposed model is designed to capture the unique characteristics of outdoor audio sources. It incorporates advanced techniques, including a two-stage conformer block, a linear relation-aware self-attention (RSA), an… ▽ More

    Submitted 6 January, 2025; originally announced January 2025.

    Comments: Accepted by ICASSP2025. \c{opyright} 2025 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component

  2. arXiv:2410.21276  [pdf, other

    cs.CL cs.AI cs.CV cs.CY cs.LG cs.SD eess.AS

    GPT-4o System Card

    Authors: OpenAI, :, Aaron Hurst, Adam Lerer, Adam P. Goucher, Adam Perelman, Aditya Ramesh, Aidan Clark, AJ Ostrow, Akila Welihinda, Alan Hayes, Alec Radford, Aleksander MÄ…dry, Alex Baker-Whitcomb, Alex Beutel, Alex Borzunov, Alex Carney, Alex Chow, Alex Kirillov, Alex Nichol, Alex Paino, Alex Renzin, Alex Tachard Passos, Alexander Kirillov, Alexi Christakis , et al. (395 additional authors not shown)

    Abstract: GPT-4o is an autoregressive omni model that accepts as input any combination of text, audio, image, and video, and generates any combination of text, audio, and image outputs. It's trained end-to-end across text, vision, and audio, meaning all inputs and outputs are processed by the same neural network. GPT-4o can respond to audio inputs in as little as 232 milliseconds, with an average of 320 mil… ▽ More

    Submitted 25 October, 2024; originally announced October 2024.

  3. arXiv:2410.05920  [pdf, other

    cs.SD cs.AI eess.AS

    FINALLY: fast and universal speech enhancement with studio-like quality

    Authors: Nicholas Babaev, Kirill Tamogashev, Azat Saginbaev, Ivan Shchekotov, Hanbin Bae, Hosang Sung, WonJun Lee, Hoon-Young Cho, Pavel Andreev

    Abstract: In this paper, we address the challenge of speech enhancement in real-world recordings, which often contain various forms of distortion, such as background noise, reverberation, and microphone artifacts. We revisit the use of Generative Adversarial Networks (GANs) for speech enhancement and theoretically show that GANs are naturally inclined to seek the point of maximum density within the conditio… ▽ More

    Submitted 31 October, 2024; v1 submitted 8 October, 2024; originally announced October 2024.

    Comments: Accepted to NeurIPS 2024

  4. arXiv:2409.18705  [pdf, other

    eess.AS cs.AI cs.SD eess.SP

    Speech Boosting: Low-Latency Live Speech Enhancement for TWS Earbuds

    Authors: Hanbin Bae, Pavel Andreev, Azat Saginbaev, Nicholas Babaev, Won-Jun Lee, Hosang Sung, Hoon-Young Cho

    Abstract: This paper introduces a speech enhancement solution tailored for true wireless stereo (TWS) earbuds on-device usage. The solution was specifically designed to support conversations in noisy environments, with active noise cancellation (ANC) activated. The primary challenges for speech enhancement models in this context arise from computational complexity that limits on-device usage and latency tha… ▽ More

    Submitted 27 September, 2024; originally announced September 2024.

    Comments: Accepted by Interspeech 2024

  5. arXiv:2405.09643  [pdf

    physics.soc-ph eess.SP

    Energy Consumption of Plant Factory with Artificial Light: Challenges and Opportunities

    Authors: Wenyi Cai, Kunlang Bu, Lingyan Zha, Jingjin Zhang, Dayi Lai, Hua Bao

    Abstract: Plant factory with artificial light (PFAL) is a promising technology for relieving the food crisis, especially in urban areas or arid regions endowed with abundant resources. However, lighting and HVAC (heating, ventilation, and air conditioning) systems of PFAL have led to much greater energy consumption than open-field and greenhouse farming, limiting the application of PFAL to a wider extent. R… ▽ More

    Submitted 11 May, 2024; originally announced May 2024.

  6. arXiv:2402.00028  [pdf, other

    cs.GR cs.CV eess.IV

    Neural Rendering and Its Hardware Acceleration: A Review

    Authors: Xinkai Yan, Jieting Xu, Yuchi Huo, Hujun Bao

    Abstract: Neural rendering is a new image and video generation method based on deep learning. It combines the deep learning model with the physical knowledge of computer graphics, to obtain a controllable and realistic scene model, and realize the control of scene attributes such as lighting, camera parameters, posture and so on. On the one hand, neural rendering can not only make full use of the advantages… ▽ More

    Submitted 6 January, 2024; originally announced February 2024.

  7. arXiv:2310.10088  [pdf, other

    eess.IV cs.CV cs.LG

    PUCA: Patch-Unshuffle and Channel Attention for Enhanced Self-Supervised Image Denoising

    Authors: Hyemi Jang, Junsung Park, Dahuin Jung, Jaihyun Lew, Ho Bae, Sungroh Yoon

    Abstract: Although supervised image denoising networks have shown remarkable performance on synthesized noisy images, they often fail in practice due to the difference between real and synthesized noise. Since clean-noisy image pairs from the real world are extremely costly to gather, self-supervised learning, which utilizes noisy input itself as a target, has been studied. To prevent a self-supervised deno… ▽ More

    Submitted 16 October, 2023; originally announced October 2023.

    Comments: Accepted to NeurIPS 2023

  8. Energy Efficient Operation of Adaptive Massive MIMO 5G HetNets

    Authors: Siddarth Marwaha, Eduard A. Jorswieck, Mostafa Jassim, Thomas Kuerner, David Lopez Perez, Xilnli Geng, Harvey Bao

    Abstract: For energy efficient operation of the massive multiple-input multiple-output (MIMO) networks, various aspects of energy efficiency maximization have been addressed, where a careful selection of number of active antennas has shown significant gains. Moreover, switching-off physical resource blocks (PRBs) and carrier shutdown saves energy in low load scenarios. However, the joint optimization of spe… ▽ More

    Submitted 10 October, 2023; v1 submitted 16 February, 2023; originally announced February 2023.

    Journal ref: IEEE Transactions on Wireless Communications, 13 Nov, 2023

  9. arXiv:2211.16653  [pdf

    cs.LG cs.AI eess.SP

    Correlation recurrent units: A novel neural architecture for improving the predictive performance of time-series data

    Authors: Sunghyun Sim, Dohee Kim, Hyerim Bae

    Abstract: The time-series forecasting (TSF) problem is a traditional problem in the field of artificial intelligence. Models such as Recurrent Neural Network (RNN), Long Short Term Memory (LSTM), and GRU (Gate Recurrent Units) have contributed to improving the predictive accuracy of TSF. Furthermore, model structures have been proposed to combine time-series decomposition methods, such as seasonal-trend dec… ▽ More

    Submitted 28 August, 2024; v1 submitted 29 November, 2022; originally announced November 2022.

  10. arXiv:2206.13404  [pdf, other

    eess.AS cs.AI cs.SD

    Avocodo: Generative Adversarial Network for Artifact-free Vocoder

    Authors: Taejun Bak, Junmo Lee, Hanbin Bae, Jinhyeok Yang, Jae-Sung Bae, Young-Sun Joo

    Abstract: Neural vocoders based on the generative adversarial neural network (GAN) have been widely used due to their fast inference speed and lightweight networks while generating high-quality speech waveforms. Since the perceptually important speech components are primarily concentrated in the low-frequency bands, most GAN-based vocoders perform multi-scale analysis that evaluates downsampled speech wavef… ▽ More

    Submitted 3 January, 2023; v1 submitted 27 June, 2022; originally announced June 2022.

    Comments: Accepted for publication in the 37th AAAI conference on artificial intelligence (AAAI 2023)

  11. arXiv:2206.11321  [pdf

    cs.SE eess.SY

    An Application of a Modified Beta Factor Method for the Analysis of Software Common Cause Failures

    Authors: Tate Shorthill, Han Bao, Edward Chen, Heng Ban

    Abstract: This paper presents an approach for modeling software common cause failures (CCFs) within digital instrumentation and control (I&C) systems. CCFs consist of a concurrent failure between two or more components due to a shared failure cause and coupling mechanism. This work emphasizes the importance of identifying software-centric attributes related to the coupling mechanisms necessary for simultane… ▽ More

    Submitted 22 June, 2022; originally announced June 2022.

    Comments: 12 pages, 3 Figures, 7 Tables, presented at the Probabilistic Safety Assessment & Management conference in 2022. arXiv admin note: text overlap with arXiv:2204.03717

  12. arXiv:2204.05753  [pdf, other

    eess.AS cs.AI cs.SD

    Enhancement of Pitch Controllability using Timbre-Preserving Pitch Augmentation in FastPitch

    Authors: Hanbin Bae, Young-Sun Joo

    Abstract: The recently developed pitch-controllable text-to-speech (TTS) model, i.e. FastPitch, was conditioned for the pitch contours. However, the quality of the synthesized speech degraded considerably for pitch values that deviated significantly from the average pitch; i.e. the ability to control pitch was limited. To address this issue, we propose two algorithms to improve the robustness of FastPitch.… ▽ More

    Submitted 12 April, 2022; originally announced April 2022.

    Comments: Submitted to INTERSPEECH 2022

  13. arXiv:2204.03717  [pdf

    eess.SY

    Quantitative Evaluation of Common Cause Failures in High Safety-significant Safety-related Digital Instrumentation and Control Systems in Nuclear Power Plants

    Authors: Han Bao, Hongbin Zhang, Tate Shorthill, Edward Chen, Svetlana Lawrence

    Abstract: Digital instrumentation and control (DIC) systems at nuclear power plants (NPPs) have many advantages over analog systems. They are proven to be more reliable, cheaper, and easier to maintain given obsolescence of analog components. However, they also pose new engineering and technical challenges, such as possibility of common cause failures (CCFs) unique to digital systems. This paper proposes a… ▽ More

    Submitted 7 April, 2022; originally announced April 2022.

    Comments: 41 pages, 25 figures, 13 tables. The manuscript has been submitted to Reliability Engineering & System Safety

  14. arXiv:2203.03583  [pdf, ps, other

    cs.CL cs.SD eess.AS

    Korean Tokenization for Beam Search Rescoring in Speech Recognition

    Authors: Kyuhong Shim, Hyewon Bae, Wonyong Sung

    Abstract: The performance of automatic speech recognition (ASR) models can be greatly improved by proper beam-search decoding with external language model (LM). There has been an increasing interest in Korean speech recognition, but not many studies have been focused on the decoding procedure. In this paper, we propose a Korean tokenization method for neural network-based LM used for Korean ASR. Although th… ▽ More

    Submitted 28 March, 2022; v1 submitted 22 February, 2022; originally announced March 2022.

    Comments: Submitted to INTERSPEECH 2022

  15. arXiv:2112.09287  [pdf

    eess.SY

    An Integrated Risk Assessment Process of Safety-Related Digital I&C Systems in Nuclear Power Plants

    Authors: Hongbin Zhang, Han Bao, Tate Shorthill, Edward Quinn

    Abstract: Upgrading the existing analog instrumentation and control (IC) systems to state-of-the-art digital IC (DIC) systems will greatly benefit existing light-water reactors (LWRs). However, the issue of software common cause failure (CCF) remains an obstacle in terms of qualification for digital technologies. Existing analyses of CCFs in I&C systems mainly focus on hardware failures. With the applicatio… ▽ More

    Submitted 16 December, 2021; originally announced December 2021.

    Comments: 26 pages. This paper is under review of the Journal of Nuclear Technology

  16. arXiv:2110.04466  [pdf, other

    cs.IT cs.LG eess.SY

    ProductAE: Towards Training Larger Channel Codes based on Neural Product Codes

    Authors: Mohammad Vahid Jamali, Hamid Saber, Homayoon Hatami, Jung Hyun Bae

    Abstract: There have been significant research activities in recent years to automate the design of channel encoders and decoders via deep learning. Due the dimensionality challenge in channel coding, it is prohibitively complex to design and train relatively large neural channel codes via deep learning techniques. Consequently, most of the results in the literature are limited to relatively short codes hav… ▽ More

    Submitted 10 September, 2022; v1 submitted 9 October, 2021; originally announced October 2021.

  17. arXiv:2109.10273  [pdf, ps, other

    cs.IT eess.SP

    Secrecy Offloading Rate Maximization for Multi-Access Mobile Edge Computing Networks

    Authors: Mingxiong Zhao, Huiqi Bao, Li Yin, Jianping Yao, Tony Q. S. Quek

    Abstract: This letter considers a multi-access mobile edge computing (MEC) network consisting of multiple users, multiple base stations, and a malicious eavesdropper. Specifically, the users adopt the partial offloading strategy by partitioning the computation task into several parts. One is executed locally and the others are securely offloaded to multiple MEC servers integrated into the base stations by l… ▽ More

    Submitted 21 September, 2021; originally announced September 2021.

    Comments: Double-column, 5 pages, 3 figures, accepted for publication at the IEEE Communications Letter

  18. arXiv:2106.15205  [pdf, other

    eess.AS cs.SD

    N-Singer: A Non-Autoregressive Korean Singing Voice Synthesis System for Pronunciation Enhancement

    Authors: Gyeong-Hoon Lee, Tae-Woo Kim, Hanbin Bae, Min-Ji Lee, Young-Ik Kim, Hoon-Young Cho

    Abstract: Recently, end-to-end Korean singing voice systems have been designed to generate realistic singing voices. However, these systems still suffer from a lack of robustness in terms of pronunciation accuracy. In this paper, we propose N-Singer, a non-autoregressive Korean singing voice system, to synthesize accurate and pronounced Korean singing voices in parallel. N-Singer consists of a Transformer-b… ▽ More

    Submitted 21 February, 2022; v1 submitted 29 June, 2021; originally announced June 2021.

    Comments: Accepted to INTERSPEECH 2021

  19. arXiv:2106.15123  [pdf, other

    eess.AS cs.LG cs.SD eess.SP

    FastPitchFormant: Source-filter based Decomposed Modeling for Speech Synthesis

    Authors: Taejun Bak, Jae-Sung Bae, Hanbin Bae, Young-Ik Kim, Hoon-Young Cho

    Abstract: Methods for modeling and controlling prosody with acoustic features have been proposed for neural text-to-speech (TTS) models. Prosodic speech can be generated by conditioning acoustic features. However, synthesized speech with a large pitch-shift scale suffers from audio quality degradation, and speaker characteristics deformation. To address this problem, we propose a feed-forward Transformer ba… ▽ More

    Submitted 29 June, 2021; originally announced June 2021.

    Comments: Accepted to INTERSPEECH 2021

  20. Joint Time and Power Allocation for 5G NR Unlicensed Systems

    Authors: Haizhou Bao, Yiming Huo, Xiaodai Dong, Chuanhe Huang

    Abstract: The fifth-generation (5G) and beyond networks are designed to efficiently utilize the spectrum resources to meet various quality of service (QoS) requirements. The unlicensed frequency bands used by WiFi are mainly deployed for indoor applications and are not always fully occupied. The cellular industry has been working to enable cellular and WiFi coexistence. In particular, 5G New Radio in unlice… ▽ More

    Submitted 22 April, 2021; originally announced April 2021.

    Comments: 15 pages, 6 figures, to appear in IEEE Transactions on Wireless Communications

  21. arXiv:2103.13952  [pdf, other

    cs.RO eess.SY

    Estimation of Closest In-Path Vehicle (CIPV) by Low-Channel LiDAR and Camera Sensor Fusion for Autonomous Vehicle

    Authors: Hyunjin Bae, Gu Lee, Jaeseung Yang, Gwanjun Shin, Yongseob Lim, Gyeungho Choi

    Abstract: In autonomous driving, using a variety of sensors to recognize preceding vehicles in middle and long distance is helpful for improving driving performance and developing various functions. However, if only LiDAR or camera is used in the recognition stage, it is difficult to obtain necessary data due to the limitations of each sensor. In this paper, we proposed a method of converting the tracking d… ▽ More

    Submitted 25 March, 2021; originally announced March 2021.

    Comments: 13 pages, 19 figures, submitted to MDPI Sensors

  22. arXiv:2103.03049  [pdf, other

    eess.AS cs.LG cs.SD

    A Neural Text-to-Speech Model Utilizing Broadcast Data Mixed with Background Music

    Authors: Hanbin Bae, Jae-Sung Bae, Young-Sun Joo, Young-Ik Kim, Hoon-Young Cho

    Abstract: Recently, it has become easier to obtain speech data from various media such as the internet or YouTube, but directly utilizing them to train a neural text-to-speech (TTS) model is difficult. The proportion of clean speech is insufficient and the remainder includes background music. Even with the global style token (GST). Therefore, we propose the following method to successfully train an end-to-e… ▽ More

    Submitted 4 March, 2021; originally announced March 2021.

    Comments: Accepted at ICASSP 2021

  23. A Design of Cooperative Overtaking Based on Complex Lane Detection and Collision Risk Estimation

    Authors: Junlan Chen, Ke Wang, Huanhuan Bao, Tao Chen

    Abstract: Cooperative overtaking is believed to have the capability of improving road safety and traffic efficiency by means of the real-time information exchange between traffic participants, including road infrastructures, nearby vehicles and others. In this paper, we focused on the critical issues of modeling, computation, and analysis of cooperative overtaking and made it playing a key role in the road… ▽ More

    Submitted 11 August, 2020; originally announced August 2020.

    Journal ref: IEEE Access, 2019, 7: 87951-87959

  24. arXiv:2007.15281  [pdf, other

    eess.AS cs.SD

    Speaking Speed Control of End-to-End Speech Synthesis using Sentence-Level Conditioning

    Authors: Jae-Sung Bae, Hanbin Bae, Young-Sun Joo, Junmo Lee, Gyeong-Hoon Lee, Hoon-Young Cho

    Abstract: This paper proposes a controllable end-to-end text-to-speech (TTS) system to control the speaking speed (speed-controllable TTS; SCTTS) of synthesized speech with sentence-level speaking-rate value as an additional input. The speaking-rate value, the ratio of the number of input phonemes to the length of input speech, is adopted in the proposed system to control the speaking speed. Furthermore, th… ▽ More

    Submitted 13 August, 2020; v1 submitted 30 July, 2020; originally announced July 2020.

    Comments: Accepted to INTERSPEECH 2020

  25. arXiv:2006.13360  [pdf, other

    cs.RO eess.SY

    Evaluation of Sampling Methods for Robotic Sediment Sampling Systems

    Authors: Jun Han Bae, Wonse Jo, Jee Hwan Park, Richard M. Voyles, Sara K. McMillan, Byung-Cheol Min

    Abstract: Analysis of sediments from rivers, lakes, reservoirs, wetlands and other constructed surface water impoundments is an important tool to characterize the function and health of these systems, but is generally carried out manually. This is costly and can be hazardous and difficult for humans due to inaccessibility, contamination, or availability of required equipment. Robotic sampling systems can ea… ▽ More

    Submitted 23 June, 2020; originally announced June 2020.

  26. arXiv:2005.02348  [pdf

    eess.SY

    A Redundancy-Guided Approach for the Hazard Analysis of Digital Instrumentation and Control Systems in Advanced Nuclear Power Plants

    Authors: Tate Shorthill, Han Bao, Hongbin Zhang, Heng Ban

    Abstract: Digital instrumentation and control (I&C) upgrades are a vital research area for nuclear industry. Despite their performance benefits, deployment of digital I&C in nuclear power plants (NPPs) has been limited. Digital I&C systems exhibit complex failure modes including common cause failures (CCFs) which can be difficult to identify. This paper describes the development of a redundancy-guided appli… ▽ More

    Submitted 5 May, 2020; originally announced May 2020.

  27. arXiv:2004.14774  [pdf, other

    cs.CV cs.LG cs.RO eess.IV stat.ML

    IROS 2019 Lifelong Robotic Vision Challenge -- Lifelong Object Recognition Report

    Authors: Qi She, Fan Feng, Qi Liu, Rosa H. M. Chan, Xinyue Hao, Chuanlin Lan, Qihan Yang, Vincenzo Lomonaco, German I. Parisi, Heechul Bae, Eoin Brophy, Baoquan Chen, Gabriele Graffieti, Vidit Goel, Hyonyoung Han, Sathursan Kanagarajah, Somesh Kumar, Siew-Kei Lam, Tin Lun Lam, Liang Ma, Davide Maltoni, Lorenzo Pellegrini, Duvindu Piyasena, Shiliang Pu, Debdoot Sheet , et al. (11 additional authors not shown)

    Abstract: This report summarizes IROS 2019-Lifelong Robotic Vision Competition (Lifelong Object Recognition Challenge) with methods and results from the top $8$ finalists (out of over~$150$ teams). The competition dataset (L)ifel(O)ng (R)obotic V(IS)ion (OpenLORIS) - Object Recognition (OpenLORIS-object) is designed for driving lifelong/continual learning research and application in robotic vision domain, w… ▽ More

    Submitted 26 April, 2020; originally announced April 2020.

    Comments: 9 pages, 11 figures, 3 tables, accepted into IEEE Robotics and Automation Magazine. arXiv admin note: text overlap with arXiv:1911.06487

  28. arXiv:1911.02761  [pdf

    eess.IV cs.CV cs.LG

    Investigations of the Influences of a CNN's Receptive Field on Segmentation of Subnuclei of Bilateral Amygdalae

    Authors: Han Bao

    Abstract: Segmentation of objects with various sizes is relatively less explored in medical imaging, and has been very challenging in computer vision tasks in general. We hypothesize that the receptive field of a deep model corresponds closely to the size of object to be segmented, which could critically influence the segmentation accuracy of objects with varied sizes. In this study, we employed "AmygNet",… ▽ More

    Submitted 7 November, 2019; originally announced November 2019.

    Comments: 16 pages, 10 figures, ADEIJ journal

  29. arXiv:1905.00274  [pdf

    eess.SP

    An efficient coding algorithm for general Framed Pulse Width Modulations

    Authors: Soon-Won Kwon, Hyeon-Min Bae

    Abstract: This paper introduces a new coding algorithm for Framed Pulse Width Modulation (FPWM). The proposed algorithm requires 93% fewer look-up tables (LUTs) than the previous FPWM coding algorithm and increases a bitrate by 25%. The proposed algorithm is compatible with general FPWM with various frame lengths and pulse width resolutions. Theoretical bitrates and the sizes of LUT required for coding vari… ▽ More

    Submitted 1 May, 2019; originally announced May 2019.

  30. arXiv:1905.00273  [pdf

    eess.SP

    A fully-digital semi-rotational frequency detection algorithm for bang-bang CDRs

    Authors: Soon-Won Kwon, Hanho Choi, Younho Jeon, Bongjin Kim, WooHyun Kwon, Homin Park, Kyeongha Kwon, Gain Kim, Hyeon-Min Bae

    Abstract: This work presents a new frequency acquisition method using semi-rotational frequency detection (SRFD) algorithm for a reference-less clock and data recovery (CDR) in a serial-link receiver. The proposed SRFD algorithm classifies the bang-bang phase detector(BBPD) outputs to estimate the current phase state, and detects the frequency mismatch between the input data and the sampling clock. The VCO-… ▽ More

    Submitted 1 May, 2019; originally announced May 2019.

  31. Generating Multi-Scroll Chua's Attractors via Simplified Piecewise-Linear Chua's Diode

    Authors: Ning Wang, Chengqing Li, Han Bao, Mo Chen, Bocheng Bao

    Abstract: High implementation complexity of multi-scroll circuit is a bottleneck problem in real chaos-based communication. Especially, in multi-scroll Chua's circuit, the simplified implementation of piecewise-linear resistors with multiple segments is difficult due to their intricate irregular breakpoints and slopes. To solve the challenge, this paper presents a systematic scheme for synthesizing a Chua's… ▽ More

    Submitted 21 August, 2019; v1 submitted 27 October, 2018; originally announced October 2018.

    Comments: 14 pages, 15 figures

    MSC Class: 37G35

    Journal ref: IEEE Transactions on Circuits and Systems I: Regular Papers, 2019