Skip to main content

Showing 1–6 of 6 results for author: Shih, S

Searching in archive eess. Search in all archives.
.
  1. arXiv:2506.22362  [pdf, ps, other

    eess.AS cs.LG

    DiffSoundStream: Efficient Speech Tokenization via Diffusion Decoding

    Authors: Yang Yang, Yunpeng Li, George Sung, Shao-Fu Shih, Craig Dooley, Alessio Centazzo, Ramanan Rajeswaran

    Abstract: Token-based language modeling is a prominent approach for speech generation, where tokens are obtained by quantizing features from self-supervised learning (SSL) models and extracting codes from neural speech codecs, generally referred to as semantic tokens and acoustic tokens. These tokens are often modeled autoregressively, with the inference speed being constrained by the token rate. In this wo… ▽ More

    Submitted 27 June, 2025; originally announced June 2025.

  2. arXiv:2401.08864  [pdf, other

    eess.AS cs.LG cs.SD

    Binaural Angular Separation Network

    Authors: Yang Yang, George Sung, Shao-Fu Shih, Hakan Erdogan, Chehung Lee, Matthias Grundmann

    Abstract: We propose a neural network model that can separate target speech sources from interfering sources at different angular regions using two microphones. The model is trained with simulated room impulse responses (RIRs) using omni-directional microphones without needing to collect real RIRs. By relying on specific angular regions and multiple room simulations, the model utilizes consistent time diffe… ▽ More

    Submitted 16 January, 2024; originally announced January 2024.

    Comments: Accepted to ICASSP 2024

  3. arXiv:2303.07486  [pdf, other

    eess.AS cs.LG cs.SD

    Guided Speech Enhancement Network

    Authors: Yang Yang, Shao-Fu Shih, Hakan Erdogan, Jamie Menjay Lin, Chehung Lee, Yunpeng Li, George Sung, Matthias Grundmann

    Abstract: High quality speech capture has been widely studied for both voice communication and human computer interface reasons. To improve the capture performance, we can often find multi-microphone speech enhancement techniques deployed on various devices. Multi-microphone speech enhancement problem is often decomposed into two decoupled steps: a beamformer that provides spatial filtering and a single-cha… ▽ More

    Submitted 13 March, 2023; originally announced March 2023.

    Comments: Accepted to ICASSP 2023

  4. arXiv:2206.07219  [pdf, ps, other

    eess.IV cs.AI cs.CV cs.LG

    A Projection-Based K-space Transformer Network for Undersampled Radial MRI Reconstruction with Limited Training Subjects

    Authors: Chang Gao, Shu-Fu Shih, J. Paul Finn, Xiaodong Zhong

    Abstract: The recent development of deep learning combined with compressed sensing enables fast reconstruction of undersampled MR images and has achieved state-of-the-art performance for Cartesian k-space trajectories. However, non-Cartesian trajectories such as the radial trajectory need to be transformed onto a Cartesian grid in each iteration of the network training, slowing down the training process and… ▽ More

    Submitted 25 July, 2022; v1 submitted 14 June, 2022; originally announced June 2022.

    Comments: Accepted at MICCAI 2022

  5. arXiv:2112.13156  [pdf, other

    cs.SD cs.MM eess.AS

    Enabling Real-time On-chip Audio Super Resolution for Bone Conduction Microphones

    Authors: Yuang Li, Yuntao Wang, Xin Liu, Yuanchun Shi, Shao-fu Shih

    Abstract: Voice communication using the air conduction microphone in noisy environments suffers from the degradation of speech audibility. Bone conduction microphones (BCM) are robust against ambient noises but suffer from limited effective bandwidth due to their sensing mechanism. Although existing audio super resolution algorithms can recover the high frequency loss to achieve high-fidelity audio, they re… ▽ More

    Submitted 24 December, 2021; originally announced December 2021.

  6. arXiv:1809.04214  [pdf, other

    cs.CL cs.IR cs.LG cs.SD eess.AS

    Automatic, Personalized, and Flexible Playlist Generation using Reinforcement Learning

    Authors: Shun-Yao Shih, Heng-Yu Chi

    Abstract: Songs can be well arranged by professional music curators to form a riveting playlist that creates engaging listening experiences. However, it is time-consuming for curators to timely rearrange these playlists for fitting trends in future. By exploiting the techniques of deep learning and reinforcement learning, in this paper, we consider music playlist generation as a language modeling problem an… ▽ More

    Submitted 11 September, 2018; originally announced September 2018.

    Comments: 7 pages, 4 figures, ISMIR 2018