Skip to main content

Showing 1–5 of 5 results for author: Morishima, S

Searching in archive eess. Search in all archives.
.
  1. arXiv:2412.08343  [pdf, other

    cs.GR cs.SD eess.AS

    SyncViolinist: Music-Oriented Violin Motion Generation Based on Bowing and Fingering

    Authors: Hiroki Nishizawa, Keitaro Tanaka, Asuka Hirata, Shugo Yamaguchi, Qi Feng, Masatoshi Hamanaka, Shigeo Morishima

    Abstract: Automatically generating realistic musical performance motion can greatly enhance digital media production, often involving collaboration between professionals and musicians. However, capturing the intricate body, hand, and finger movements required for accurate musical performances is challenging. Existing methods often fall short due to the complex mapping between audio and motion, typically req… ▽ More

    Submitted 11 December, 2024; originally announced December 2024.

    Comments: 10 pages, 7 figures, 6 tables, WACV 2025

  2. arXiv:2306.06495  [pdf, other

    eess.AS cs.SD

    Audio-Visual Speech Enhancement With Selective Off-Screen Speech Extraction

    Authors: Tomoya Yoshinaga, Keitaro Tanaka, Shigeo Morishima

    Abstract: This paper describes an audio-visual speech enhancement (AV-SE) method that estimates from noisy input audio a mixture of the speech of the speaker appearing in an input video (on-screen target speech) and of a selected speaker not appearing in the video (off-screen target speech). Although conventional AV-SE methods have suppressed all off-screen sounds, it is necessary to listen to a specific pr… ▽ More

    Submitted 10 June, 2023; originally announced June 2023.

    Comments: Accepted by EUSIPCO 2023

  3. Improving the Gap in Visual Speech Recognition Between Normal and Silent Speech Based on Metric Learning

    Authors: Sara Kashiwagi, Keitaro Tanaka, Qi Feng, Shigeo Morishima

    Abstract: This paper presents a novel metric learning approach to address the performance gap between normal and silent speech in visual speech recognition (VSR). The difference in lip movements between the two poses a challenge for existing VSR models, which exhibit degraded accuracy when applied to silent speech. To solve this issue and tackle the scarcity of training data for silent speech, we propose to… ▽ More

    Submitted 16 October, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

    Comments: Accepted by INTERSPEECH 2023

  4. arXiv:2203.15991  [pdf, other

    cs.CV cs.MM cs.SD eess.AS

    The Sound of Bounding-Boxes

    Authors: Takashi Oya, Shohei Iwase, Shigeo Morishima

    Abstract: In the task of audio-visual sound source separation, which leverages visual information for sound source separation, identifying objects in an image is a crucial step prior to separating the sound source. However, existing methods that assign sound on detected bounding boxes suffer from a problem that their approach heavily relies on pre-trained object detectors. Specifically, when using these exi… ▽ More

    Submitted 29 March, 2022; originally announced March 2022.

    Comments: 6 pages, 5 figures, ICPR (International Conference on Pattern Recognition) 2022

  5. arXiv:2007.05722  [pdf, other

    cs.CV cs.SD eess.AS

    Do We Need Sound for Sound Source Localization?

    Authors: Takashi Oya, Shohei Iwase, Ryota Natsume, Takahiro Itazuri, Shugo Yamaguchi, Shigeo Morishima

    Abstract: During the performance of sound source localization which uses both visual and aural information, it presently remains unclear how much either image or sound modalities contribute to the result, i.e. do we need both image and sound for sound source localization? To address this question, we develop an unsupervised learning system that solves sound source localization by decomposing this task into… ▽ More

    Submitted 11 July, 2020; originally announced July 2020.

    Comments: Paper: 14 pages, 6 figures. Supplementary Material: 6 pages, 3 figures. Videos and Codes will be released later