Skip to main content

Showing 1–13 of 13 results for author: Oikawa, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2411.07517  [pdf, other

    eess.SP cs.SD eess.AS eess.IV physics.optics

    SoundSil-DS: Deep Denoising and Segmentation of Sound-field Images with Silhouettes

    Authors: Risako Tanigawa, Kenji Ishikawa, Noboru Harada, Yasuhiro Oikawa

    Abstract: Development of optical technology has enabled imaging of two-dimensional (2D) sound fields. This acousto-optic sensing enables understanding of the interaction between sound and objects such as reflection and diffraction. Moreover, it is expected to be used an advanced measurement technology for sonars in self-driving vehicles and assistive robots. However, the low sound-pressure sensitivity of th… ▽ More

    Submitted 11 November, 2024; originally announced November 2024.

    Comments: 13 pages, 12 figures, 5 tables. Accepted by WACV 2025

  2. arXiv:2211.08246  [pdf, other

    cs.SD eess.AS eess.SP

    Online Phase Reconstruction via DNN-based Phase Differences Estimation

    Authors: Yoshiki Masuyama, Kohei Yatabe, Kento Nagatomo, Yasuhiro Oikawa

    Abstract: This paper presents a two-stage online phase reconstruction framework using causal deep neural networks (DNNs). Phase reconstruction is a task of recovering phase of the short-time Fourier transform (STFT) coefficients only from the corresponding magnitude. However, phase is sensitive to waveform shifts and not easy to estimate from the magnitude even with a DNN. To overcome this problem, we propo… ▽ More

    Submitted 12 November, 2022; originally announced November 2022.

    Comments: Accepted to IEEE/ACM Trans. Audio, Speech, and Language Processing

  3. arXiv:2202.08458  [pdf, other

    eess.AS cs.SD

    Wearable SELD dataset: Dataset for sound event localization and detection using wearable devices around head

    Authors: Kento Nagatomo, Masahiro Yasuda, Kohei Yatabe, Shoichiro Saito, Yasuhiro Oikawa

    Abstract: Sound event localization and detection (SELD) is a combined task of identifying the sound event and its direction. Deep neural networks (DNNs) are utilized to associate them with the sound signals observed by a microphone array. Although ambisonic microphones are popular in the literature of SELD, they might limit the range of applications due to their predetermined geometry. Some applications (in… ▽ More

    Submitted 17 February, 2022; originally announced February 2022.

    Comments: 5 pages, 6 figures, accepted to IEEE Int. Conf. Acoust. Speech Signal Process. (ICASSP), 2022

  4. arXiv:2202.08028  [pdf, other

    eess.AS cs.SD eess.SP

    APPLADE: Adjustable Plug-and-play Audio Declipper Combining DNN with Sparse Optimization

    Authors: Tomoro Tanaka, Kohei Yatabe, Masahiro Yasuda, Yasuhiro Oikawa

    Abstract: In this paper, we propose an audio declipping method that takes advantages of both sparse optimization and deep learning. Since sparsity-based audio declipping methods have been developed upon constrained optimization, they are adjustable and well-studied in theory. However, they always uniformly promote sparsity and ignore the individual properties of a signal. Deep neural network (DNN)-based met… ▽ More

    Submitted 16 February, 2022; originally announced February 2022.

    Comments: 5 pages, 7 figures, accepted to IEEE Int. Conf. Acoust. Speech Signal Process. (ICASSP), 2022

  5. arXiv:2007.13976  [pdf, other

    cs.SD cs.CV eess.AS

    Self-supervised Neural Audio-Visual Sound Source Localization via Probabilistic Spatial Modeling

    Authors: Yoshiki Masuyama, Yoshiaki Bando, Kohei Yatabe, Yoko Sasaki, Masaki Onishi, Yasuhiro Oikawa

    Abstract: Detecting sound source objects within visual observation is important for autonomous robots to comprehend surrounding environments. Since sounding objects have a large variety with different appearances in our living environments, labeling all sounding objects is impossible in practice. This calls for self-supervised learning which does not require manual labeling. Most of conventional self-superv… ▽ More

    Submitted 27 July, 2020; originally announced July 2020.

    Comments: Accepted for publication in 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

  6. arXiv:2002.05843  [pdf, other

    eess.AS cs.SD

    Real-time speech enhancement using equilibriated RNN

    Authors: Daiki Takeuchi, Kohei Yatabe, Yuma Koizumi, Yasuhiro Oikawa, Noboru Harada

    Abstract: We propose a speech enhancement method using a causal deep neural network~(DNN) for real-time applications. DNN has been widely used for estimating a time-frequency~(T-F) mask which enhances a speech signal. One popular DNN structure for that is a recurrent neural network~(RNN) owing to its capability of effectively modelling time-sequential data like speech. In particular, the long short-term mem… ▽ More

    Submitted 13 February, 2020; originally announced February 2020.

    Comments: To appear in Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2020)

  7. arXiv:2002.05832  [pdf, other

    eess.AS cs.SD

    Phase reconstruction based on recurrent phase unwrapping with deep neural networks

    Authors: Yoshiki Masuyama, Kohei Yatabe, Yuma Koizumi, Yasuhiro Oikawa, Noboru Harada

    Abstract: Phase reconstruction, which estimates phase from a given amplitude spectrogram, is an active research field in acoustical signal processing with many applications including audio synthesis. To take advantage of rich knowledge from data, several studies presented deep neural network (DNN)--based phase reconstruction methods. However, the training of a DNN for phase reconstruction is not an easy tas… ▽ More

    Submitted 13 February, 2020; originally announced February 2020.

    Comments: To appear at the 45th International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2020)

  8. arXiv:1911.10764  [pdf, other

    eess.AS cs.SD

    Invertible DNN-based nonlinear time-frequency transform for speech enhancement

    Authors: Daiki Takeuchi, Kohei Yatabe, Yuma Koizumi, Yasuhiro Oikawa, Noboru Harada

    Abstract: We propose an end-to-end speech enhancement method with trainable time-frequency~(T-F) transform based on invertible deep neural network~(DNN). The resent development of speech enhancement is brought by using DNN. The ordinary DNN-based speech enhancement employs T-F transform, typically the short-time Fourier transform~(STFT), and estimates a T-F mask using DNN. On the other hand, some methods ha… ▽ More

    Submitted 13 February, 2020; v1 submitted 25 November, 2019; originally announced November 2019.

    Comments: To appear in Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2020)

  9. arXiv:1903.08876  [pdf, other

    eess.AS cs.SD

    Data-driven design of perfect reconstruction filterbank for DNN-based sound source enhancement

    Authors: Daiki Takeuchi, Kohei Yatabe, Yuma Koizumi, Yasuhiro Oikawa, Noboru Harada

    Abstract: We propose a data-driven design method of perfect-reconstruction filterbank (PRFB) for sound-source enhancement (SSE) based on deep neural network (DNN). DNNs have been used to estimate a time-frequency (T-F) mask in the short-time Fourier transform (STFT) domain. Their training is more stable when a simple cost function as mean-squared error (MSE) is utilized comparing to some advanced cost such… ▽ More

    Submitted 21 March, 2019; originally announced March 2019.

    Comments: 5 pages, to appear in IEEE ICASSP 2019 (Paper Code: AASP-P8.8, Session: Spatial Audio, Audio Enhancement and Bandwidth Extension)

  10. arXiv:1903.05603  [pdf, other

    eess.AS cs.SD eess.SP

    Low-rankness of Complex-valued Spectrogram and Its Application to Phase-aware Audio Processing

    Authors: Yoshiki Masuyama, Kohei Yatabe, Yasuhiro Oikawa

    Abstract: Low-rankness of amplitude spectrograms has been effectively utilized in audio signal processing methods including non-negative matrix factorization. However, such methods have a fundamental limitation owing to their amplitude-only treatment where the phase of the observed signal is utilized for resynthesizing the estimated signal. In order to address this limitation, we directly treat a complex-va… ▽ More

    Submitted 13 March, 2019; originally announced March 2019.

    Comments: 5 pages, to appear in IEEE ICASSP 2019 (Paper Code: AASP-P13.9, Session: Acoustic Scene Classification and Music Signal Analysis)

  11. arXiv:1903.05600  [pdf, other

    eess.AS cs.SD eess.SP

    Phase-aware Harmonic/Percussive Source Separation via Convex Optimization

    Authors: Yoshiki Masuyama, Kohei Yatabe, Yasuhiro Oikawa

    Abstract: Decomposition of an audio mixture into harmonic and percussive components, namely harmonic/percussive source separation (HPSS), is a useful pre-processing tool for many audio applications. Popular approaches to HPSS exploit the distinctive source-specific structures of power spectrograms. However, such approaches consider only power spectrograms, and the phase remains intact for resynthesizing the… ▽ More

    Submitted 13 March, 2019; originally announced March 2019.

    Comments: 5 pages, to appear in IEEE ICASSP 2019 (Paper Code: AASP-P16.5, Session: Music Signal Analysis, Feedback and Echo Cancellation and Equalization)

  12. arXiv:1903.03971  [pdf, other

    cs.SD cs.LG eess.AS

    Deep Griffin-Lim Iteration

    Authors: Yoshiki Masuyama, Kohei Yatabe, Yuma Koizumi, Yasuhiro Oikawa, Noboru Harada

    Abstract: This paper presents a novel phase reconstruction method (only from a given amplitude spectrogram) by combining a signal-processing-based approach and a deep neural network (DNN). To retrieve a time-domain signal from its amplitude spectrogram, the corresponding phase is required. One of the popular phase reconstruction methods is the Griffin-Lim algorithm (GLA), which is based on the redundancy of… ▽ More

    Submitted 10 March, 2019; originally announced March 2019.

    Comments: 5 pages, to appear in IEEE ICASSP 2019 (Paper Code: AASP-L3.1, Session: Source Separation and Speech Enhancement I)

  13. arXiv:1811.08783  [pdf, other

    eess.SP cs.SD eess.AS

    Designing nearly tight window for improving time-frequency masking

    Authors: Tsubasa Kusano, Yoshiki Masuyama, Kohei Yatabe, Yasuhiro Oikawa

    Abstract: Many audio signal processing methods are formulated in the time-frequency (T-F) domain which is obtained by the short-time Fourier transform (STFT). The properties of the STFT are fully characterized by window function, number of frequency channels, and time-shift. Thus, designing a better window is important for improving the performance of the processing especially when a less redundant T-F repr… ▽ More

    Submitted 4 February, 2019; v1 submitted 17 November, 2018; originally announced November 2018.