Skip to main content

Showing 1–33 of 33 results for author: Koyama, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2503.11331  [pdf, other

    cs.LG cs.AI cs.CV

    Cardiomyopathy Diagnosis Model from Endomyocardial Biopsy Specimens: Appropriate Feature Space and Class Boundary in Small Sample Size Data

    Authors: Masaya Mori, Yuto Omae, Yutaka Koyama, Kazuyuki Hara, Jun Toyotani, Yasuo Okumura, Hiroyuki Hao

    Abstract: As the number of patients with heart failure increases, machine learning (ML) has garnered attention in cardiomyopathy diagnosis, driven by the shortage of pathologists. However, endomyocardial biopsy specimens are often small sample size and require techniques such as feature extraction and dimensionality reduction. This study aims to determine whether texture features are effective for feature e… ▽ More

    Submitted 14 March, 2025; originally announced March 2025.

  2. FontCraft: Multimodal Font Design Using Interactive Bayesian Optimization

    Authors: Yuki Tatsukawa, I-Chao Shen, Mustafa Doga Dogan, Anran Qi, Yuki Koyama, Ariel Shamir, Takeo Igarashi

    Abstract: Creating new fonts requires a lot of human effort and professional typographic knowledge. Despite the rapid advancements of automatic font generation models, existing methods require users to prepare pre-designed characters with target styles using font-editing software, which poses a problem for non-expert users. To address this limitation, we propose FontCraft, a system that enables font generat… ▽ More

    Submitted 16 February, 2025; originally announced February 2025.

    Comments: 14 pages

    Journal ref: CHI 2025

  3. arXiv:2411.01135  [pdf, other

    cs.SD cs.IR cs.LG eess.AS

    Music Foundation Model as Generic Booster for Music Downstream Tasks

    Authors: WeiHsiang Liao, Yuhta Takida, Yukara Ikemiya, Zhi Zhong, Chieh-Hsin Lai, Giorgio Fabbro, Kazuki Shimada, Keisuke Toyama, Kinwai Cheuk, Marco A. Martínez-Ramírez, Shusuke Takahashi, Stefan Uhlich, Taketo Akama, Woosung Choi, Yuichiro Koyama, Yuki Mitsufuji

    Abstract: We demonstrate the efficacy of using intermediate representations from a single foundation model to enhance various music downstream tasks. We introduce SoniDo, a music foundation model (MFM) designed to extract hierarchical features from target music samples. By leveraging hierarchical intermediate features, SoniDo constrains the information granularity, leading to improved performance across var… ▽ More

    Submitted 5 November, 2024; v1 submitted 2 November, 2024; originally announced November 2024.

    Comments: 41 pages with 14 figures

  4. A Practical Style Transfer Pipeline for 3D Animation: Insights from Production R&D

    Authors: Hideki Todo, Yuki Koyama, Kunihiro Sakai, Akihiro Komiya, Jun Kato

    Abstract: Our animation studio has developed a practical style transfer pipeline for creating stylized 3D animation, which is suitable for complex real-world production. This paper presents the insights from our development process, where we explored various options to balance quality, artist control, and workload, leading to several key decisions. For example, we chose patch-based texture synthesis over ma… ▽ More

    Submitted 31 October, 2024; originally announced October 2024.

    Journal ref: SIGGRAPH Asia 2024 Technical Communications

  5. FontCLIP: A Semantic Typography Visual-Language Model for Multilingual Font Applications

    Authors: Yuki Tatsukawa, I-Chao Shen, Anran Qi, Yuki Koyama, Takeo Igarashi, Ariel Shamir

    Abstract: Acquiring the desired font for various design tasks can be challenging and requires professional typographic knowledge. While previous font retrieval or generation works have alleviated some of these difficulties, they often lack support for multiple languages and semantic attributes beyond the training data domains. To solve this problem, we present FontCLIP: a model that connects the semantic un… ▽ More

    Submitted 11 March, 2024; originally announced March 2024.

    Comments: 11 pages. Eurographics 2024. https://yukistavailable.github.io/fontclip.github.io/

  6. arXiv:2309.09223  [pdf, other

    cs.SD eess.AS

    Zero- and Few-shot Sound Event Localization and Detection

    Authors: Kazuki Shimada, Kengo Uchida, Yuichiro Koyama, Takashi Shibuya, Shusuke Takahashi, Yuki Mitsufuji, Tatsuya Kawahara

    Abstract: Sound event localization and detection (SELD) systems estimate direction-of-arrival (DOA) and temporal activation for sets of target classes. Neural network (NN)-based SELD systems have performed well in various sets of target classes, but they only output the DOA and temporal activation of preset classes trained before inference. To customize target classes after training, we tackle zero- and few… ▽ More

    Submitted 17 January, 2024; v1 submitted 17 September, 2023; originally announced September 2023.

    Comments: 5 pages, 4 figures, accepted for publication in IEEE ICASSP 2024

  7. arXiv:2306.09126  [pdf, other

    cs.SD cs.CV cs.MM eess.AS eess.IV

    STARSS23: An Audio-Visual Dataset of Spatial Recordings of Real Scenes with Spatiotemporal Annotations of Sound Events

    Authors: Kazuki Shimada, Archontis Politis, Parthasaarathy Sudarsanam, Daniel Krause, Kengo Uchida, Sharath Adavanne, Aapo Hakala, Yuichiro Koyama, Naoya Takahashi, Shusuke Takahashi, Tuomas Virtanen, Yuki Mitsufuji

    Abstract: While direction of arrival (DOA) of sound events is generally estimated from multichannel audio data recorded in a microphone array, sound events usually derive from visually perceptible source objects, e.g., sounds of footsteps come from the feet of a walker. This paper proposes an audio-visual sound event localization and detection (SELD) task, which uses multichannel audio and video information… ▽ More

    Submitted 14 November, 2023; v1 submitted 15 June, 2023; originally announced June 2023.

    Comments: 27 pages, 9 figures, accepted for publication in NeurIPS 2023 Track on Datasets and Benchmarks

  8. arXiv:2305.10734  [pdf, other

    cs.SD cs.CL eess.AS

    Diffusion-Based Speech Enhancement with Joint Generative and Predictive Decoders

    Authors: Hao Shi, Kazuki Shimada, Masato Hirano, Takashi Shibuya, Yuichiro Koyama, Zhi Zhong, Shusuke Takahashi, Tatsuya Kawahara, Yuki Mitsufuji

    Abstract: Diffusion-based generative speech enhancement (SE) has recently received attention, but reverse diffusion remains time-consuming. One solution is to initialize the reverse diffusion process with enhanced features estimated by a predictive SE system. However, the pipeline structure currently does not consider for a combined use of generative and predictive decoders. The predictive decoder allows us… ▽ More

    Submitted 28 February, 2024; v1 submitted 18 May, 2023; originally announced May 2023.

  9. arXiv:2305.05857  [pdf, other

    eess.AS cs.SD

    Diffusion-based Signal Refiner for Speech Separation

    Authors: Masato Hirano, Kazuki Shimada, Yuichiro Koyama, Shusuke Takahashi, Yuki Mitsufuji

    Abstract: We have developed a diffusion-based speech refiner that improves the reference-free perceptual quality of the audio predicted by preceding single-channel speech separation models. Although modern deep neural network-based speech separation models have show high performance in reference-based metrics, they often produce perceptually unnatural artifacts. The recent advancements made to diffusion mod… ▽ More

    Submitted 12 May, 2023; v1 submitted 9 May, 2023; originally announced May 2023.

    Comments: Under review

  10. Vitreoretinal Surgical Robotic System with Autonomous Orbital Manipulation using Vector-Field Inequalities

    Authors: Yuki Koyama, Murilo Marques Marinho, Kanako Harada

    Abstract: Vitreoretinal surgery pertains to the treatment of delicate tissues on the fundus of the eye using thin instruments. Surgeons frequently rotate the eye during surgery, which is called orbital manipulation, to observe regions around the fundus without moving the patient. In this paper, we propose the autonomous orbital manipulation of the eye in robot-assisted vitreoretinal surgery with our tele-op… ▽ More

    Submitted 10 February, 2023; originally announced February 2023.

    Comments: 7 pages, 7 figures, accepted on ICRA2023

    Journal ref: 2023 IEEE International Conference on Robotics and Automation (ICRA), London, United Kingdom, 2023, pp. 4654-4660

  11. arXiv:2211.10437  [pdf, other

    cs.CV

    A Structure-Guided Diffusion Model for Large-Hole Image Completion

    Authors: Daichi Horita, Jiaolong Yang, Dong Chen, Yuki Koyama, Kiyoharu Aizawa, Nicu Sebe

    Abstract: Image completion techniques have made significant progress in filling missing regions (i.e., holes) in images. However, large-hole completion remains challenging due to limited structural information. In this paper, we address this problem by integrating explicit structural guidance into diffusion-based image completion, forming our structure-guided diffusion model (SGDM). It consists of two casca… ▽ More

    Submitted 6 September, 2023; v1 submitted 18 November, 2022; originally announced November 2022.

    Comments: BMVC2023. Code: https://github.com/UdonDa/Structure_Guided_Diffusion_Model

  12. arXiv:2206.01948  [pdf, other

    eess.AS cs.SD

    STARSS22: A dataset of spatial recordings of real scenes with spatiotemporal annotations of sound events

    Authors: Archontis Politis, Kazuki Shimada, Parthasaarathy Sudarsanam, Sharath Adavanne, Daniel Krause, Yuichiro Koyama, Naoya Takahashi, Shusuke Takahashi, Yuki Mitsufuji, Tuomas Virtanen

    Abstract: This report presents the Sony-TAu Realistic Spatial Soundscapes 2022 (STARS22) dataset for sound event localization and detection, comprised of spatial recordings of real scenes collected in various interiors of two different sites. The dataset is captured with a high resolution spherical microphone array and delivered in two 4-channel formats, first-order Ambisonics and tetrahedral microphone arr… ▽ More

    Submitted 2 September, 2022; v1 submitted 4 June, 2022; originally announced June 2022.

  13. arXiv:2202.01664  [pdf, other

    eess.AS cs.LG cs.SD

    Distortion Audio Effects: Learning How to Recover the Clean Signal

    Authors: Johannes Imort, Giorgio Fabbro, Marco A. Martínez Ramírez, Stefan Uhlich, Yuichiro Koyama, Yuki Mitsufuji

    Abstract: Given the recent advances in music source separation and automatic mixing, removing audio effects in music tracks is a meaningful step toward developing an automated remixing system. This paper focuses on removing distortion audio effects applied to guitar tracks in music production. We explore whether effect removal can be solved by neural networks designed for source separation and audio effect… ▽ More

    Submitted 13 September, 2022; v1 submitted 3 February, 2022; originally announced February 2022.

    Comments: Audio examples available at https://joimort.github.io/distortionremoval/

  14. arXiv:2201.10424  [pdf, other

    eess.IV cs.CV

    Improving segmentation of calcified and non-calcified plaques on CCTA-CPR scans via masking of the artery wall

    Authors: Antonio Tejero-de-Pablos, Hiroaki Yamane, Yusuke Kurose, Junichi Iho, Youji Tokunaga, Makoto Horie, Keisuke Nishizawa, Yusaku Hayashi, Yasushi Koyama, Tatsuya Harada

    Abstract: The presence of plaques in the coronary arteries is a major risk to the patients' life. In particular, non-calcified plaques pose a great challenge, as they are harder to detect and more likely to rupture than calcified plaques. While current deep learning techniques allow precise segmentation of real-life images, the performance in medical images is still low. This is caused mostly by blurriness… ▽ More

    Submitted 10 April, 2023; v1 submitted 25 January, 2022; originally announced January 2022.

    Comments: Extended abstract (see SPIE for final published version)

    Journal ref: SPIE 12465, Medical Imaging 2023: Computer-Aided Diagnosis

  15. arXiv:2110.07124  [pdf, other

    eess.AS cs.SD

    Multi-ACCDOA: Localizing and Detecting Overlapping Sounds from the Same Class with Auxiliary Duplicating Permutation Invariant Training

    Authors: Kazuki Shimada, Yuichiro Koyama, Shusuke Takahashi, Naoya Takahashi, Emiru Tsunoo, Yuki Mitsufuji

    Abstract: Sound event localization and detection (SELD) involves identifying the direction-of-arrival (DOA) and the event class. The SELD methods with a class-wise output format make the model predict activities of all sound event classes and corresponding locations. The class-wise methods can output activity-coupled Cartesian DOA (ACCDOA) vectors, which enable us to solve a SELD task with a single target u… ▽ More

    Submitted 27 March, 2022; v1 submitted 13 October, 2021; originally announced October 2021.

    Comments: 5 pages, 3 figures, accepted for publication in IEEE ICASSP 2022

  16. arXiv:2110.06501  [pdf, other

    cs.SD eess.AS

    Spatial Data Augmentation with Simulated Room Impulse Responses for Sound Event Localization and Detection

    Authors: Yuichiro Koyama, Kazuhide Shigemi, Masafumi Takahashi, Kazuki Shimada, Naoya Takahashi, Emiru Tsunoo, Shusuke Takahashi, Yuki Mitsufuji

    Abstract: Recording and annotating real sound events for a sound event localization and detection (SELD) task is time consuming, and data augmentation techniques are often favored when the amount of data is limited. However, how to augment the spatial information in a dataset, including unlabeled directional interference events, remains an open research question. Furthermore, directional interference events… ▽ More

    Submitted 28 April, 2022; v1 submitted 13 October, 2021; originally announced October 2021.

    Comments: 5 pages, 2 figures, accepted for publication in IEEE ICASSP 2022

  17. arXiv:2110.06494  [pdf, other

    cs.SD eess.AS

    Music Source Separation with Deep Equilibrium Models

    Authors: Yuichiro Koyama, Naoki Murata, Stefan Uhlich, Giorgio Fabbro, Shusuke Takahashi, Yuki Mitsufuji

    Abstract: While deep neural network-based music source separation (MSS) is very effective and achieves high performance, its model size is often a problem for practical deployment. Deep implicit architectures such as deep equilibrium models (DEQ) were recently proposed, which can achieve higher performance than their explicit counterparts with limited depth while keeping the number of parameters small. This… ▽ More

    Submitted 28 April, 2022; v1 submitted 13 October, 2021; originally announced October 2021.

    Comments: 5 pages, 4 figures, accepted for publication in IEEE ICASSP 2022

  18. Spatial mixup: Directional loudness modification as data augmentation for sound event localization and detection

    Authors: Ricardo Falcon-Perez, Kazuki Shimada, Yuichiro Koyama, Shusuke Takahashi, Yuki Mitsufuji

    Abstract: Data augmentation methods have shown great importance in diverse supervised learning problems where labeled data is scarce or costly to obtain. For sound event localization and detection (SELD) tasks several augmentation methods have been proposed, with most borrowing ideas from other domains such as images, speech, or monophonic audio. However, only a few exploit the spatial properties of a full… ▽ More

    Submitted 12 October, 2021; originally announced October 2021.

    Comments: 5 pages, 2 figures, 4 tables. Submitted to the 2022 International Conference on Acoustics, Speech, & Signal Processing (ICASSP)

  19. Autonomous Coordinated Control of the Light Guide for Positioning in Vitreoretinal Surgery

    Authors: Yuki Koyama, Murilo M. Marinho, Mamoru Mitsuishi, Kanako Harada

    Abstract: Vitreoretinal surgery is challenging even for expert surgeons owing to the delicate target tissues and the diminutive workspace in the retina. In addition to improved dexterity and accuracy, robot assistance allows for (partial) task automation. In this work, we propose a strategy to automate the motion of the light guide with respect to the surgical instrument. This automation allows the instrume… ▽ More

    Submitted 20 January, 2022; v1 submitted 26 July, 2021; originally announced July 2021.

    Comments: Accepted on T-MRB 2022, 16 pages

    Journal ref: IEEE Transactions on Medical Robotics and Bionics, vol. 4, no. 1, pp. 156-171, Feb. 2022

  20. arXiv:2106.10806  [pdf, other

    eess.AS cs.SD

    Ensemble of ACCDOA- and EINV2-based Systems with D3Nets and Impulse Response Simulation for Sound Event Localization and Detection

    Authors: Kazuki Shimada, Naoya Takahashi, Yuichiro Koyama, Shusuke Takahashi, Emiru Tsunoo, Masafumi Takahashi, Yuki Mitsufuji

    Abstract: This report describes our systems submitted to the DCASE2021 challenge task 3: sound event localization and detection (SELD) with directional interference. Our previous system based on activity-coupled Cartesian direction of arrival (ACCDOA) representation enables us to solve a SELD task with a single target. This ACCDOA-based system with efficient network architecture called RD3Net and data augme… ▽ More

    Submitted 20 June, 2021; originally announced June 2021.

    Comments: 5 pages, 3 figures, submitted to DCASE2021 task3

  21. arXiv:2105.09207  [pdf, other

    cs.LG cs.CV cs.HC

    Tool- and Domain-Agnostic Parameterization of Style Transfer Effects Leveraging Pretrained Perceptual Metrics

    Authors: Hiromu Yakura, Yuki Koyama, Masataka Goto

    Abstract: Current deep learning techniques for style transfer would not be optimal for design support since their "one-shot" transfer does not fit exploratory design processes. To overcome this gap, we propose parametric transcription, which transcribes an end-to-end style transfer effect into parameter values of specific transformations available in an existing content editing tool. With this approach, use… ▽ More

    Submitted 19 May, 2021; originally announced May 2021.

    Comments: To appear in Proceedings of the 30th International Joint Conference on Artificial Intelligence (IJCAI 2021); Project page available at https://yumetaro.info/projects/parametric-transcription/

  22. arXiv:2010.15306  [pdf, other

    eess.AS cs.SD

    ACCDOA: Activity-Coupled Cartesian Direction of Arrival Representation for Sound Event Localization and Detection

    Authors: Kazuki Shimada, Yuichiro Koyama, Naoya Takahashi, Shusuke Takahashi, Yuki Mitsufuji

    Abstract: Neural-network (NN)-based methods show high performance in sound event localization and detection (SELD). Conventional NN-based methods use two branches for a sound event detection (SED) target and a direction-of-arrival (DOA) target. The two-branch representation with a single network has to decide how to balance the two objectives during optimization. Using two networks dedicated to each task in… ▽ More

    Submitted 14 February, 2021; v1 submitted 28 October, 2020; originally announced October 2020.

    Comments: 5 pages, 5 figures, accepted for publication in IEEE ICASSP 2021

  23. arXiv:2010.03190  [pdf, other

    cs.SD cs.HC eess.AS

    Generative Melody Composition with Human-in-the-Loop Bayesian Optimization

    Authors: Yijun Zhou, Yuki Koyama, Masataka Goto, Takeo Igarashi

    Abstract: Deep generative models allow even novice composers to generate various melodies by sampling latent vectors. However, finding the desired melody is challenging since the latent space is unintuitive and high-dimensional. In this work, we present an interactive system that supports generative melody composition with human-in-the-loop Bayesian optimization (BO). This system takes a mixed-initiative ap… ▽ More

    Submitted 7 October, 2020; originally announced October 2020.

    Comments: 10 pages, 2 figures, Proceedings of the 2020 Joint Conference on AI Music Creativity (CSMC-MuMe 2020)

    ACM Class: J.5; H.5.5; H.5.2

  24. arXiv:2005.12683  [pdf, other

    eess.AS cs.SD

    Exploring Optimal DNN Architecture for End-to-End Beamformers Based on Time-frequency References

    Authors: Yuichiro Koyama, Bhiksha Raj

    Abstract: Acoustic beamformers have been widely used to enhance audio signals. Currently, the best methods are the deep neural network (DNN)-powered variants of the generalized eigenvalue and minimum-variance distortionless response beamformers and the DNN-based filter-estimation methods that are used to directly compute beamforming filters. Both approaches are effective; however, they have blind spots in t… ▽ More

    Submitted 11 August, 2020; v1 submitted 23 May, 2020; originally announced May 2020.

    Comments: arXiv admin note: substantial text overlap with arXiv:1910.14262

  25. arXiv:2005.11612  [pdf, other

    eess.AS cs.SD

    Efficient Integration of Multi-channel Information for Speaker-independent Speech Separation

    Authors: Yuichiro Koyama, Oluwafemi Azeez, Bhiksha Raj

    Abstract: Although deep-learning-based methods have markedly improved the performance of speech separation over the past few years, it remains an open question how to integrate multi-channel signals for speech separation. We propose two methods, namely, early-fusion and late-fusion methods, to integrate multi-channel information based on the time-domain audio separation network, which has been proven effect… ▽ More

    Submitted 11 August, 2020; v1 submitted 23 May, 2020; originally announced May 2020.

  26. arXiv:2005.11611  [pdf, other

    eess.AS cs.SD

    Exploring the Best Loss Function for DNN-Based Low-latency Speech Enhancement with Temporal Convolutional Networks

    Authors: Yuichiro Koyama, Tyler Vuong, Stefan Uhlich, Bhiksha Raj

    Abstract: Recently, deep neural networks (DNNs) have been successfully used for speech enhancement, and DNN-based speech enhancement is becoming an attractive research area. While time-frequency masking based on the short-time Fourier transform (STFT) has been widely used for DNN-based speech enhancement over the last years, time domain methods such as the time-domain audio separation network (TasNet) have… ▽ More

    Submitted 20 August, 2020; v1 submitted 23 May, 2020; originally announced May 2020.

  27. arXiv:2005.04107  [pdf, other

    cs.GR cs.HC cs.LG

    Sequential Gallery for Interactive Visual Design Optimization

    Authors: Yuki Koyama, Issei Sato, Masataka Goto

    Abstract: Visual design tasks often involve tuning many design parameters. For example, color grading of a photograph involves many parameters, some of which non-expert users might be unfamiliar with. We propose a novel user-in-the-loop optimization method that allows users to efficiently find an appropriate parameter set by exploring such a high-dimensional design space through much easier two-dimensional… ▽ More

    Submitted 8 May, 2020; originally announced May 2020.

    Comments: To be published at ACM Trans. Graph. (Proc. SIGGRAPH 2020); Project page available at https://koyama.xyz/project/sequential_gallery/

    Journal ref: ACM Trans. Graph. 39, 4 (July 2020), pp.88:1-88:12

  28. arXiv:2004.03811  [pdf, other

    cs.CV

    MirrorNet: A Deep Bayesian Approach to Reflective 2D Pose Estimation from Human Images

    Authors: Takayuki Nakatsuka, Kazuyoshi Yoshii, Yuki Koyama, Satoru Fukayama, Masataka Goto, Shigeo Morishima

    Abstract: This paper proposes a statistical approach to 2D pose estimation from human images. The main problems with the standard supervised approach, which is based on a deep recognition (image-to-pose) model, are that it often yields anatomically implausible poses, and its performance is limited by the amount of paired data. To solve these problems, we propose a semi-supervised method that can make effect… ▽ More

    Submitted 8 April, 2020; originally announced April 2020.

    Comments: 19 pages

  29. arXiv:2002.12263  [pdf, other

    eess.IV cs.CV

    Coronary Wall Segmentation in CCTA Scans via a Hybrid Net with Contours Regularization

    Authors: Kaikai Huang, Antonio Tejero-de-Pablos, Hiroaki Yamane, Yusuke Kurose, Junichi Iho, Youji Tokunaga, Makoto Horie, Keisuke Nishizawa, Yusaku Hayashi, Yasushi Koyama, Tatsuya Harada

    Abstract: Providing closed and well-connected boundaries of coronary artery is essential to assist cardiologists in the diagnosis of coronary artery disease (CAD). Recently, several deep learning-based methods have been proposed for boundary detection and segmentation in a medical image. However, when applied to coronary wall detection, they tend to produce disconnected and inaccurate boundaries. In this pa… ▽ More

    Submitted 27 February, 2020; originally announced February 2020.

    Comments: 5 pages, 2 figures, accepted by ISBI 2020

  30. Computational Design with Crowds

    Authors: Yuki Koyama, Takeo Igarashi

    Abstract: Computational design is aimed at supporting or automating design processes using computational techniques. However, some classes of design tasks involve criteria that are difficult to handle only with computers. For example, visual design tasks seeking to fulfill aesthetic goals are difficult to handle purely with computers. One promising approach is to leverage human computation; that is, to inco… ▽ More

    Submitted 20 February, 2020; originally announced February 2020.

    Comments: This book chapter was originally published in Computational Interaction edited by Antti Oulasvirta, Per Ola Kristensson, Xiaojun Bi, and Andrew Howes

    Journal ref: Computational Interaction (Antti Oulasvirta, Per Ola Kristensson, Xiaojun Bi, and Andrew Howes (Eds.)), chapter 6, pages 153-184. Oxford University Press, 2018

  31. arXiv:1912.10708  [pdf

    stat.ML cs.LG

    Recreation of the Periodic Table with an Unsupervised Machine Learning Algorithm

    Authors: Minoru Kusaba, Chang Liu, Yukinori Koyama, Kiyoyuki Terakura, Ryo Yoshida

    Abstract: In 1869, the first draft of the periodic table was published by Russian chemist Dmitri Mendeleev. In terms of data science, his achievement can be viewed as a successful example of feature embedding based on human cognition: chemical properties of all known elements at that time were compressed onto the two-dimensional grid system for tabular display. In this study, we seek to answer the question… ▽ More

    Submitted 28 February, 2021; v1 submitted 23 December, 2019; originally announced December 2019.

    Comments: 28 pages, 14 figures, complete version of this paper is available at https://www.nature.com/articles/s41598-021-81850-z (Published: 26 February 2021)

  32. arXiv:1910.14262  [pdf, other

    cs.SD eess.AS

    W-Net BF: DNN-based Beamformer Using Joint Training Approach

    Authors: Yuichiro Koyama, Bhiksha Raj

    Abstract: Acoustic beamformers have been widely used to enhance audio signals. The best current methods are DNN-powered variants of the generalized eigenvalue beamformer, and DNN-based filterestimation methods that directly compute beamforming filters. Both approaches, while effective, have blindspots in their generalizability. We propose a novel approach that combines both approaches into a single framewor… ▽ More

    Submitted 29 February, 2020; v1 submitted 31 October, 2019; originally announced October 2019.

  33. arXiv:1910.13724  [pdf, other

    eess.AS cs.LG cs.SD

    Metric Learning with Background Noise Class for Few-shot Detection of Rare Sound Events

    Authors: Kazuki Shimada, Yuichiro Koyama, Akira Inoue

    Abstract: Few-shot learning systems for sound event recognition have gained interests since they require only a few examples to adapt to new target classes without fine-tuning. However, such systems have only been applied to chunks of sounds for classification or verification. In this paper, we aim to achieve few-shot detection of rare sound events, from query sequence that contain not only the target event… ▽ More

    Submitted 18 February, 2020; v1 submitted 30 October, 2019; originally announced October 2019.

    Comments: 5 pages, 5 figures, accepted for publication in IEEE ICASSP 2020