Skip to main content

Showing 1–40 of 40 results for author: Takeuchi, D

.
  1. arXiv:2506.10676  [pdf, ps, other

    cs.SD eess.AS

    Description and Discussion on DCASE 2025 Challenge Task 4: Spatial Semantic Segmentation of Sound Scenes

    Authors: Masahiro Yasuda, Binh Thien Nguyen, Noboru Harada, Romain Serizel, Mayank Mishra, Marc Delcroix, Shoko Araki, Daiki Takeuchi, Daisuke Niizumi, Yasunori Ohishi, Tomohiro Nakatani, Takao Kawamura, Nobutaka Ono

    Abstract: Spatial Semantic Segmentation of Sound Scenes (S5) aims to enhance technologies for sound event detection and separation from multi-channel input signals that mix multiple sound events with spatial information. This is a fundamental basis of immersive communication. The ultimate goal is to separate sound event signals with 6 Degrees of Freedom (6DoF) information into dry sound object signals and m… ▽ More

    Submitted 12 June, 2025; originally announced June 2025.

  2. arXiv:2506.00800  [pdf, other

    eess.AS cs.LG cs.SD

    CLAP-ART: Automated Audio Captioning with Semantic-rich Audio Representation Tokenizer

    Authors: Daiki Takeuchi, Binh Thien Nguyen, Masahiro Yasuda, Yasunori Ohishi, Daisuke Niizumi, Noboru Harada

    Abstract: Automated Audio Captioning (AAC) aims to describe the semantic contexts of general sounds, including acoustic events and scenes, by leveraging effective acoustic features. To enhance performance, an AAC method, EnCLAP, employed discrete tokens from EnCodec as an effective input for fine-tuning a language model BART. However, EnCodec is designed to reconstruct waveforms rather than capture the sema… ▽ More

    Submitted 31 May, 2025; originally announced June 2025.

    Comments: Accepted to Interspeech2025

  3. arXiv:2505.22036  [pdf, ps, other

    math.NT math.AG

    The $L$-polynomials of van der Geer--van der Vlugt curves in characteristic $2$

    Authors: Tetsushi Ito, Daichi Takeuchi, Takahiro Tsushima

    Abstract: The van der Geer--van der Vlugt curves form a class of Artin--Schreier coverings of the projective line over finite fields. We provide an explicit formula for their $L$-polynomials in characteristic $2$, expressed in terms of characters of maximal abelian subgroups of associated Heisenberg groups. For this purpose, we develop new methods specific to characteristic $2$ that exploit the structure of… ▽ More

    Submitted 28 May, 2025; originally announced May 2025.

  4. arXiv:2505.15307  [pdf, ps, other

    eess.AS cs.SD

    Towards Pre-training an Effective Respiratory Audio Foundation Model

    Authors: Daisuke Niizumi, Daiki Takeuchi, Masahiro Yasuda, Binh Thien Nguyen, Yasunori Ohishi, Noboru Harada

    Abstract: Recent advancements in foundation models have sparked interest in respiratory audio foundation models. However, the effectiveness of applying conventional pre-training schemes to datasets that are small-sized and lack diversity has not been sufficiently verified. This study aims to explore better pre-training practices for respiratory sounds by comparing numerous pre-trained audio models. Our inve… ▽ More

    Submitted 21 May, 2025; originally announced May 2025.

    Comments: 5 pages, 2 figures, 4 tables, Accepted by Interspeech 2025

    MSC Class: 68T07 ACM Class: J.3

  5. arXiv:2504.18004  [pdf, other

    eess.AS cs.SD

    Assessing the Utility of Audio Foundation Models for Heart and Respiratory Sound Analysis

    Authors: Daisuke Niizumi, Daiki Takeuchi, Masahiro Yasuda, Binh Thien Nguyen, Yasunori Ohishi, Noboru Harada

    Abstract: Pre-trained deep learning models, known as foundation models, have become essential building blocks in machine learning domains such as natural language processing and image domains. This trend has extended to respiratory and heart sound models, which have demonstrated effectiveness as off-the-shelf feature extractors. However, their evaluation benchmarking has been limited, resulting in incompati… ▽ More

    Submitted 24 April, 2025; originally announced April 2025.

    Comments: 4 pages, 1 figure, and 4 tables. Accepted by IEEE EMBC 2025

    MSC Class: 68T07 ACM Class: J.3

  6. arXiv:2503.22104  [pdf, other

    eess.AS

    M2D2: Exploring General-purpose Audio-Language Representations Beyond CLAP

    Authors: Daisuke Niizumi, Daiki Takeuchi, Masahiro Yasuda, Binh Thien Nguyen, Yasunori Ohishi, Noboru Harada

    Abstract: Contrastive language-audio pre-training (CLAP) has addressed audio-language tasks such as audio-text retrieval by aligning audio and text in a common feature space. While CLAP addresses general audio-language tasks, its audio features do not generalize well in audio tasks. In contrast, self-supervised learning (SSL) models learn general-purpose audio features that perform well in diverse audio tas… ▽ More

    Submitted 27 March, 2025; originally announced March 2025.

    Comments: 15 pages, 7 figures, 13 tables, under review at an IEEE journal

    MSC Class: 68T07 ACM Class: I.2.4

  7. arXiv:2503.22088  [pdf, ps, other

    eess.AS cs.SD

    Baseline Systems and Evaluation Metrics for Spatial Semantic Segmentation of Sound Scenes

    Authors: Binh Thien Nguyen, Masahiro Yasuda, Daiki Takeuchi, Daisuke Niizumi, Yasunori Ohishi, Noboru Harada

    Abstract: Immersive communication has made significant advancements, especially with the release of the codec for Immersive Voice and Audio Services. Aiming at its further realization, the DCASE 2025 Challenge has recently introduced a task for spatial semantic segmentation of sound scenes (S5), which focuses on detecting and separating sound events in spatial sound scenes. In this paper, we explore methods… ▽ More

    Submitted 9 June, 2025; v1 submitted 27 March, 2025; originally announced March 2025.

    Comments: Accepted to EUSIPCO2025

  8. arXiv:2501.06762  [pdf

    q-bio.NC cs.LG cs.NE

    Improving the adaptive and continuous learning capabilities of artificial neural networks: Lessons from multi-neuromodulatory dynamics

    Authors: Jie Mei, Alejandro Rodriguez-Garcia, Daigo Takeuchi, Gabriel Wainstein, Nina Hubig, Yalda Mohsenzadeh, Srikanth Ramaswamy

    Abstract: Continuous, adaptive learning-the ability to adapt to the environment and improve performance-is a hallmark of both natural and artificial intelligence. Biological organisms excel in acquiring, transferring, and retaining knowledge while adapting to dynamic environments, making them a rich source of inspiration for artificial neural networks (ANNs). This study explores how neuromodulation, a funda… ▽ More

    Submitted 12 January, 2025; originally announced January 2025.

  9. arXiv:2406.02032  [pdf, other

    eess.AS cs.MM cs.SD

    M2D-CLAP: Masked Modeling Duo Meets CLAP for Learning General-purpose Audio-Language Representation

    Authors: Daisuke Niizumi, Daiki Takeuchi, Yasunori Ohishi, Noboru Harada, Masahiro Yasuda, Shunsuke Tsubaki, Keisuke Imoto

    Abstract: Contrastive language-audio pre-training (CLAP) enables zero-shot (ZS) inference of audio and exhibits promising performance in several classification tasks. However, conventional audio representations are still crucial for many tasks where ZS is not applicable (e.g., regression problems). Here, we explore a new representation, a general-purpose audio-language representation, that performs well in… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: 5 pages, 1 figure, 5 tables. Accepted by Interspeech 2024

    MSC Class: 68T07

  10. arXiv:2404.17107  [pdf, other

    eess.AS cs.SD

    Exploring Pre-trained General-purpose Audio Representations for Heart Murmur Detection

    Authors: Daisuke Niizumi, Daiki Takeuchi, Yasunori Ohishi, Noboru Harada, Kunio Kashino

    Abstract: To reduce the need for skilled clinicians in heart sound interpretation, recent studies on automating cardiac auscultation have explored deep learning approaches. However, despite the demands for large data for deep learning, the size of the heart sound datasets is limited, and no pre-trained model is available. On the contrary, many pre-trained models for general audio tasks are available as gene… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: 4 pages, 1 figure, and 4 tables. Accepted by IEEE EMBC 2024

    MSC Class: 68T07

  11. arXiv:2404.06095  [pdf, other

    eess.AS cs.SD

    Masked Modeling Duo: Towards a Universal Audio Pre-training Framework

    Authors: Daisuke Niizumi, Daiki Takeuchi, Yasunori Ohishi, Noboru Harada, Kunio Kashino

    Abstract: Self-supervised learning (SSL) using masked prediction has made great strides in general-purpose audio representation. This study proposes Masked Modeling Duo (M2D), an improved masked prediction SSL, which learns by predicting representations of masked input signals that serve as training signals. Unlike conventional methods, M2D obtains a training signal by encoding only the masked part, encoura… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

    Comments: 15 pages, 6 figures, 15 tables. Accepted by TASLP

    MSC Class: 68T07

  12. arXiv:2403.17427  [pdf

    cond-mat.mtrl-sci cond-mat.mes-hall

    Atomic observation on diamond (001) surfaces with non-contact atomic force microscopy

    Authors: Runnan Zhang, Yuuki Yasui, Masahiro Fukuda, Taisuke Ozaki, Masahiko Ogura, Toshiharu Makino, Daisuke Takeuchi, Yoshiaki Sugimoto

    Abstract: To achieve atomic-level characterization of the diamond (001) surface, persistent efforts have been made over the past few decades. The motivation behind the pursuit extends beyond investigating surface defects and adsorbates; it also involves unraveling the mystery of the smooth growth of diamond. However, the inherently low conductivity and the short C-C bonds render atomic resolution imaging ex… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

  13. arXiv:2403.10756  [pdf, other

    eess.AS cs.SD

    Refining Knowledge Transfer on Audio-Image Temporal Agreement for Audio-Text Cross Retrieval

    Authors: Shunsuke Tsubaki, Daisuke Niizumi, Daiki Takeuchi, Yasunori Ohishi, Noboru Harada, Keisuke Imoto

    Abstract: The aim of this research is to refine knowledge transfer on audio-image temporal agreement for audio-text cross retrieval. To address the limited availability of paired non-speech audio-text data, learning methods for transferring the knowledge acquired from a large amount of paired audio-image data to shared audio-text representation have been investigated, suggesting the importance of how audio-… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

    Comments: Submitted to EUSIPCO2024

  14. arXiv:2402.08252  [pdf, other

    eess.AS cs.SD

    Unrestricted Global Phase Bias-Aware Single-channel Speech Enhancement with Conformer-based Metric GAN

    Authors: Shiqi Zhang, Zheng Qiu, Daiki Takeuchi, Noboru Harada, Shoji Makino

    Abstract: With the rapid development of neural networks in recent years, the ability of various networks to enhance the magnitude spectrum of noisy speech in the single-channel speech enhancement domain has become exceptionally outstanding. However, enhancing the phase spectrum using neural networks is often ineffective, which remains a challenging problem. In this paper, we found that the human ear cannot… ▽ More

    Submitted 4 June, 2024; v1 submitted 13 February, 2024; originally announced February 2024.

    Comments: Accepted by ICASSP 2024 Updated on 2024/06/04 to add one more citation in appendix

  15. arXiv:2308.11923  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Audio Difference Captioning Utilizing Similarity-Discrepancy Disentanglement

    Authors: Daiki Takeuchi, Yasunori Ohishi, Daisuke Niizumi, Noboru Harada, Kunio Kashino

    Abstract: We proposed Audio Difference Captioning (ADC) as a new extension task of audio captioning for describing the semantic differences between input pairs of similar but slightly different audio clips. The ADC solves the problem that conventional audio captioning sometimes generates similar captions for similar audio clips, failing to describe the difference in content. We also propose a cross-attentio… ▽ More

    Submitted 23 August, 2023; originally announced August 2023.

    Comments: Accepted to DCASE2023 Workshop

  16. arXiv:2305.15164  [pdf, ps, other

    math.NT

    Quadratic $\ell$-adic sheaf and its Heisenberg group

    Authors: Daichi Takeuchi

    Abstract: In this paper, we introduce a new class of $\ell$-adic sheaves, which we call quadratic $\ell$-adic sheaves, on connected unipotent commutative algebraic groups over finite fields. They are sheaf-theoretic enhancements of quadratic forms on finite abelian groups in the spirit of the function-sheaf dictionary. We show that a certain finite Heisenberg group acts on a quadratic sheaf and that the coh… ▽ More

    Submitted 24 May, 2023; originally announced May 2023.

    Comments: 39 pages

    MSC Class: 11T24; 11G25 (Primary) 14R20 (Secondary)

  17. arXiv:2305.14079  [pdf, other

    eess.AS cs.SD

    Masked Modeling Duo for Speech: Specializing General-Purpose Audio Representation to Speech using Denoising Distillation

    Authors: Daisuke Niizumi, Daiki Takeuchi, Yasunori Ohishi, Noboru Harada, Kunio Kashino

    Abstract: Self-supervised learning general-purpose audio representations have demonstrated high performance in a variety of tasks. Although they can be optimized for application by fine-tuning, even higher performance can be expected if they can be specialized to pre-train for an application. This paper explores the challenges and solutions in specializing general-purpose audio representations for a specifi… ▽ More

    Submitted 3 August, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

    Comments: Interspeech 2023; 5+2 pages, 2 figures, 6+6 tables, Code: https://github.com/nttcslab/m2d/tree/master/speech

    MSC Class: 68T07

  18. arXiv:2305.06117  [pdf, ps, other

    math.AG math.NT

    Gauss sums and Van der Geer--Van der Vlugt curves

    Authors: Daichi Takeuchi, Takahiro Tsushima

    Abstract: We study Van der Geer--Van der Vlugt curves in a ramification-theoretic view point. We give explicit formulae on L-polynomials of these curves. As a result, we show that these curves are supersingular and give sufficient conditions for these curves to be maximal or minimal.

    Submitted 10 May, 2023; originally announced May 2023.

    Comments: 12 pages

  19. arXiv:2304.14923  [pdf, ps, other

    eess.SP cs.SD eess.AS eess.IV physics.optics

    Deep sound-field denoiser: optically-measured sound-field denoising using deep neural network

    Authors: Kenji Ishikawa, Daiki Takeuchi, Noboru Harada, Takehiro Moriya

    Abstract: This paper proposes a deep sound-field denoiser, a deep neural network (DNN) based denoising of optically measured sound-field images. Sound-field imaging using optical methods has gained considerable attention due to its ability to achieve high-spatial-resolution imaging of acoustic phenomena that conventional acoustic sensors cannot accomplish. However, the optically measured sound-field images… ▽ More

    Submitted 21 September, 2023; v1 submitted 27 April, 2023; originally announced April 2023.

    Comments: 16 pages, 10 figures, 2 tables

  20. arXiv:2303.00455  [pdf, other

    eess.AS cs.SD

    First-shot anomaly sound detection for machine condition monitoring: A domain generalization baseline

    Authors: Noboru Harada, Daisuke Niizumi, Yasunori Ohishi, Daiki Takeuchi, Masahiro Yasuda

    Abstract: This paper provides a baseline system for First-shot-compliant unsupervised anomaly detection (ASD) for machine condition monitoring. First-shot ASD does not allow systems to do machine-type dependent hyperparameter tuning or tool ensembling based on the performance metric calculated with the grand truth. To show benchmark performance for First-shot ASD, this paper proposes an anomaly sound detect… ▽ More

    Submitted 1 March, 2023; originally announced March 2023.

    Comments: 5 pages, 2 figures

  21. arXiv:2210.14648  [pdf, other

    eess.AS cs.CV cs.LG cs.SD

    Masked Modeling Duo: Learning Representations by Encouraging Both Networks to Model the Input

    Authors: Daisuke Niizumi, Daiki Takeuchi, Yasunori Ohishi, Noboru Harada, Kunio Kashino

    Abstract: Masked Autoencoders is a simple yet powerful self-supervised learning method. However, it learns representations indirectly by reconstructing masked input patches. Several methods learn representations directly by predicting representations of masked patches; however, we think using all patches to encode training signal representations is suboptimal. We propose a new method, Masked Modeling Duo (M… ▽ More

    Submitted 2 March, 2023; v1 submitted 26 October, 2022; originally announced October 2022.

    Comments: 6 pages, 3 figures, and 6 tables. To appear at ICASSP2023

    MSC Class: 68T07

  22. arXiv:2209.10275  [pdf, other

    cs.IT

    Tight Exponential Strong Converse for Source Coding Problem with Encoded Side Information

    Authors: Daisuke Takeuchi, Shun Watanabe

    Abstract: The source coding problem with encoded side information is considered. A lower bound on the strong converse exponent has been derived by Oohama, but its tightness has not been clarified. In this paper, we derive a tight strong converse exponent. For the special case such that the side-information does not exists, we demonstrate that our tight exponent of the WAK problem reduces to the known tight… ▽ More

    Submitted 3 April, 2024; v1 submitted 21 September, 2022; originally announced September 2022.

    Comments: 15 pages, 5 figures; v2 adds an analysis of full-side information case; v3 adds numerical experiment and an application to the privacy amplification

  23. arXiv:2207.11964  [pdf, other

    eess.AS cs.LG cs.MM cs.SD

    ConceptBeam: Concept Driven Target Speech Extraction

    Authors: Yasunori Ohishi, Marc Delcroix, Tsubasa Ochiai, Shoko Araki, Daiki Takeuchi, Daisuke Niizumi, Akisato Kimura, Noboru Harada, Kunio Kashino

    Abstract: We propose a novel framework for target speech extraction based on semantic information, called ConceptBeam. Target speech extraction means extracting the speech of a target speaker in a mixture. Typical approaches have been exploiting properties of audio signals, such as harmonic structure and direction of arrival. In contrast, ConceptBeam tackles the problem with semantic clues. Specifically, we… ▽ More

    Submitted 25 July, 2022; originally announced July 2022.

    Comments: Accepted to ACM Multimedia 2022

  24. arXiv:2207.09732  [pdf, other

    eess.AS cs.CL cs.IR cs.LG cs.SD

    Introducing Auxiliary Text Query-modifier to Content-based Audio Retrieval

    Authors: Daiki Takeuchi, Yasunori Ohishi, Daisuke Niizumi, Noboru Harada, Kunio Kashino

    Abstract: The amount of audio data available on public websites is growing rapidly, and an efficient mechanism for accessing the desired data is necessary. We propose a content-based audio retrieval method that can retrieve a target audio that is similar to but slightly different from the query audio by introducing auxiliary textual information which describes the difference between the query and target aud… ▽ More

    Submitted 20 July, 2022; originally announced July 2022.

    Comments: Accepted to Interspeech 2022

  25. arXiv:2205.08138  [pdf, ps, other

    eess.AS cs.SD

    Composing General Audio Representation by Fusing Multilayer Features of a Pre-trained Model

    Authors: Daisuke Niizumi, Daiki Takeuchi, Yasunori Ohishi, Noboru Harada, Kunio Kashino

    Abstract: Many application studies rely on audio DNN models pre-trained on a large-scale dataset as essential feature extractors, and they extract features from the last layers. In this study, we focus on our finding that the middle layer features of existing supervised pre-trained models are more effective than the late layer features for some tasks. We propose a simple approach to compose features effecti… ▽ More

    Submitted 17 May, 2022; originally announced May 2022.

    Comments: 5 pages, 4 figures and 4 tables. Accepted by EUSIPCO 2022

    MSC Class: 68T07

  26. arXiv:2204.12260  [pdf, other

    eess.AS cs.SD

    Masked Spectrogram Modeling using Masked Autoencoders for Learning General-purpose Audio Representation

    Authors: Daisuke Niizumi, Daiki Takeuchi, Yasunori Ohishi, Noboru Harada, Kunio Kashino

    Abstract: Recent general-purpose audio representations show state-of-the-art performance on various audio tasks. These representations are pre-trained by self-supervised learning methods that create training signals from the input. For example, typical audio contrastive learning uses temporal relationships among input sounds to create training signals, whereas some methods use a difference among input views… ▽ More

    Submitted 26 April, 2022; originally announced April 2022.

    Comments: 22 pages, 8 figures. Under the review process

    MSC Class: 68T07

    Journal ref: HEAR: Holistic Evaluation of Audio Representations (NeurIPS 2021 Competition) PMLR 166 (2022) 1-24

  27. BYOL for Audio: Exploring Pre-trained General-purpose Audio Representations

    Authors: Daisuke Niizumi, Daiki Takeuchi, Yasunori Ohishi, Noboru Harada, Kunio Kashino

    Abstract: Pre-trained models are essential as feature extractors in modern machine learning systems in various domains. In this study, we hypothesize that representations effective for general audio tasks should provide multiple aspects of robust features of the input sound. For recognizing sounds regardless of perturbations such as varying pitch or timbre, features should be robust to these perturbations.… ▽ More

    Submitted 16 June, 2022; v1 submitted 15 April, 2022; originally announced April 2022.

    Comments: 15 pages, 6 figures, and 15 tables. Under the review process

    MSC Class: 68T07

    Journal ref: IEEE/ACM Trans. Audio, Speech, Language Process. 31 (2023) 137-151

  28. arXiv:2106.02369  [pdf, other

    eess.AS

    ToyADMOS2: Another dataset of miniature-machine operating sounds for anomalous sound detection under domain shift conditions

    Authors: Noboru Harada, Daisuke Niizumi, Daiki Takeuchi, Yasunori Ohishi, Masahiro Yasuda, Shoichiro Saito

    Abstract: This paper proposes a new large-scale dataset called "ToyADMOS2" for anomaly detection in machine operating sounds (ADMOS). As did for our previous ToyADMOS dataset, we collected a large number of operating sounds of miniature machines (toys) under normal and anomaly conditions by deliberately damaging them but extended with providing controlled depth of damages in anomaly samples. Since typical a… ▽ More

    Submitted 4 June, 2021; originally announced June 2021.

    Comments: 5 pages, 4 figures

  29. arXiv:2103.06695  [pdf, other

    eess.AS cs.LG cs.SD

    BYOL for Audio: Self-Supervised Learning for General-Purpose Audio Representation

    Authors: Daisuke Niizumi, Daiki Takeuchi, Yasunori Ohishi, Noboru Harada, Kunio Kashino

    Abstract: Inspired by the recent progress in self-supervised learning for computer vision that generates supervision using data augmentations, we explore a new general-purpose audio representation learning approach. We propose learning general-purpose audio representation from a single audio segment without expecting relationships between different time segments of audio samples. To implement this principle… ▽ More

    Submitted 20 April, 2021; v1 submitted 11 March, 2021; originally announced March 2021.

    Comments: IJCNN 2021, 8 pages, 4 figures

    MSC Class: 68T07

  30. arXiv:2012.07331  [pdf, other

    eess.AS cs.CL cs.SD

    Audio Captioning using Pre-Trained Large-Scale Language Model Guided by Audio-based Similar Caption Retrieval

    Authors: Yuma Koizumi, Yasunori Ohishi, Daisuke Niizumi, Daiki Takeuchi, Masahiro Yasuda

    Abstract: The goal of audio captioning is to translate input audio into its description using natural language. One of the problems in audio captioning is the lack of training data due to the difficulty in collecting audio-caption pairs by crawling the web. In this study, to overcome this problem, we propose to use a pre-trained large-scale language model. Since an audio input cannot be directly inputted in… ▽ More

    Submitted 14 December, 2020; originally announced December 2020.

    Comments: Submitted to ICASSP 2021

  31. arXiv:2010.11022  [pdf, ps, other

    math.AG math.NT

    Symmetric bilinear forms and local epsilon factors of isolated singularities in positive characteristic

    Authors: Daichi Takeuchi

    Abstract: Let $f\colon X\to\mathbb{A}^1_k$ be a morphism from a smooth variety to an affine line with an isolated singular point. For such a singularity, we have two invariants. One is a non-degenerate symmetric bilinear form (de Rham), and the other is the vanishing cycles complex (étale). In this article, we give a formula which expresses the local epsilon factor of the vanishing cycles complex in terms… ▽ More

    Submitted 21 October, 2020; originally announced October 2020.

    Comments: 59 pages. Comments are welcome

  32. arXiv:2009.11436  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Effects of Word-frequency based Pre- and Post- Processings for Audio Captioning

    Authors: Daiki Takeuchi, Yuma Koizumi, Yasunori Ohishi, Noboru Harada, Kunio Kashino

    Abstract: The system we used for Task 6 (Automated Audio Captioning)of the Detection and Classification of Acoustic Scenes and Events(DCASE) 2020 Challenge combines three elements, namely, dataaugmentation, multi-task learning, and post-processing, for audiocaptioning. The system received the highest evaluation scores, butwhich of the individual elements most fully contributed to its perfor-mance has not ye… ▽ More

    Submitted 23 September, 2020; originally announced September 2020.

    Comments: Accepted to DCASE2020 Workshop

  33. arXiv:2007.00225  [pdf, other

    eess.AS cs.LG cs.SD stat.ML

    The NTT DCASE2020 Challenge Task 6 system: Automated Audio Captioning with Keywords and Sentence Length Estimation

    Authors: Yuma Koizumi, Daiki Takeuchi, Yasunori Ohishi, Noboru Harada, Kunio Kashino

    Abstract: This technical report describes the system participating to the Detection and Classification of Acoustic Scenes and Events (DCASE) 2020 Challenge, Task 6: automated audio captioning. Our submission focuses on solving two indeterminacy problems in automated audio captioning: word selection indeterminacy and sentence length indeterminacy. We simultaneously solve the main caption generation and sub i… ▽ More

    Submitted 1 July, 2020; originally announced July 2020.

    Comments: Technical Report of DCASE2020 Challenge Task 6

  34. arXiv:2003.06931  [pdf, ps, other

    math.AG math.NT

    On continuity of local epsilon factors of $\ell$-adic sheaves

    Authors: Daichi Takeuchi

    Abstract: Let $S$ be a noetherian scheme and $f\colon X\to S$ be a smooth morphism of relative dimension 1. For a locally constant sheaf on the complement of a divisor in $X$ at over $S$, Deligne and Laumon proved that the universal local acyclicity is equivalent to the local constancy of Swan conductors. In this article, assuming the universal local acyclicity, we show an analogous result of the continuity… ▽ More

    Submitted 15 March, 2020; originally announced March 2020.

    Comments: This discusses a result separated from the first version of arXiv:1911.02269

  35. arXiv:2002.05873  [pdf, other

    eess.AS cs.LG cs.SD stat.ML

    Speech Enhancement using Self-Adaptation and Multi-Head Self-Attention

    Authors: Yuma Koizumi, Kohei Yatabe, Marc Delcroix, Yoshiki Masuyama, Daiki Takeuchi

    Abstract: This paper investigates a self-adaptation method for speech enhancement using auxiliary speaker-aware features; we extract a speaker representation used for adaptation directly from the test utterance. Conventional studies of deep neural network (DNN)--based speech enhancement mainly focus on building a speaker independent model. Meanwhile, in speech applications including speech recognition and s… ▽ More

    Submitted 14 February, 2020; originally announced February 2020.

    Comments: 5 pages, to appear in IEEE ICASSP 2020

  36. arXiv:2002.05843  [pdf, other

    eess.AS cs.SD

    Real-time speech enhancement using equilibriated RNN

    Authors: Daiki Takeuchi, Kohei Yatabe, Yuma Koizumi, Yasuhiro Oikawa, Noboru Harada

    Abstract: We propose a speech enhancement method using a causal deep neural network~(DNN) for real-time applications. DNN has been widely used for estimating a time-frequency~(T-F) mask which enhances a speech signal. One popular DNN structure for that is a recurrent neural network~(RNN) owing to its capability of effectively modelling time-sequential data like speech. In particular, the long short-term mem… ▽ More

    Submitted 13 February, 2020; originally announced February 2020.

    Comments: To appear in Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2020)

  37. arXiv:1911.10764  [pdf, other

    eess.AS cs.SD

    Invertible DNN-based nonlinear time-frequency transform for speech enhancement

    Authors: Daiki Takeuchi, Kohei Yatabe, Yuma Koizumi, Yasuhiro Oikawa, Noboru Harada

    Abstract: We propose an end-to-end speech enhancement method with trainable time-frequency~(T-F) transform based on invertible deep neural network~(DNN). The resent development of speech enhancement is brought by using DNN. The ordinary DNN-based speech enhancement employs T-F transform, typically the short-time Fourier transform~(STFT), and estimates a T-F mask using DNN. On the other hand, some methods ha… ▽ More

    Submitted 13 February, 2020; v1 submitted 25 November, 2019; originally announced November 2019.

    Comments: To appear in Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2020)

  38. arXiv:1911.02269  [pdf, ps, other

    math.AG math.NT

    Characteristic Epsilon Cycles of $\ell$-adic Sheaves on Varieties

    Authors: Daichi Takeuchi

    Abstract: Let $X$ be a smooth variety over a finite field $\mathbb{F}_q$. Let $\ell$ be a rational prime number invertible in $\mathbb{F}_q$. For an $\ell$-adic sheaf $\mathcal{F}$ on $X$, we construct a cycle supported on the singular support of $\mathcal{F}$ whose coefficients are $\ell$-adic numbers modulo roots of unity. It is a refinement of the characteristic cycle $CC(\mathcal{F})$, in the sense that… ▽ More

    Submitted 15 March, 2020; v1 submitted 6 November, 2019; originally announced November 2019.

    Comments: The previous version is divided into 2 parts and this contains the main parts of the previous one. arXiv admin note: text overlap with arXiv:1510.03018 by other authors

  39. arXiv:1903.08876  [pdf, other

    eess.AS cs.SD

    Data-driven design of perfect reconstruction filterbank for DNN-based sound source enhancement

    Authors: Daiki Takeuchi, Kohei Yatabe, Yuma Koizumi, Yasuhiro Oikawa, Noboru Harada

    Abstract: We propose a data-driven design method of perfect-reconstruction filterbank (PRFB) for sound-source enhancement (SSE) based on deep neural network (DNN). DNNs have been used to estimate a time-frequency (T-F) mask in the short-time Fourier transform (STFT) domain. Their training is more stable when a simple cost function as mean-squared error (MSE) is utilized comparing to some advanced cost such… ▽ More

    Submitted 21 March, 2019; originally announced March 2019.

    Comments: 5 pages, to appear in IEEE ICASSP 2019 (Paper Code: AASP-P8.8, Session: Spatial Audio, Audio Enhancement and Bandwidth Extension)

  40. Blow-ups and the class field theory for curves

    Authors: Daichi Takeuchi

    Abstract: We propose another proof of the geometric class field theory for curves by considering blow-ups of symmetric products of curves.

    Submitted 6 April, 2018; originally announced April 2018.

    Comments: 19pages

    Journal ref: Alg. Number Th. 13 (2019) 1327-1351