-
Zero-shot Sound Event Classification Using a Sound Attribute Vector with Global and Local Feature Learning
Authors:
Yi-Han Lin,
Xunquan Chen,
Ryoichi Takashima,
Tetsuya Takiguchi
Abstract:
This paper introduces a zero-shot sound event classification (ZS-SEC) method to identify sound events that have never occurred in training data. In our previous work, we proposed a ZS-SEC method using sound attribute vectors (SAVs), where a deep neural network model infers attribute information that describes the sound of an event class instead of inferring its class label directly. Our previous m…
▽ More
This paper introduces a zero-shot sound event classification (ZS-SEC) method to identify sound events that have never occurred in training data. In our previous work, we proposed a ZS-SEC method using sound attribute vectors (SAVs), where a deep neural network model infers attribute information that describes the sound of an event class instead of inferring its class label directly. Our previous method showed that it could classify unseen events to some extent; however, the accuracy for unseen events was far inferior to that for seen events. In this paper, we propose a new ZS-SEC method that can learn discriminative global features and local features simultaneously to enhance SAV-based ZS-SEC. In the proposed method, while the global features are learned in order to discriminate the event classes in the training data, the spectro-temporal local features are learned in order to regress the attribute information using attribute prototypes. The experimental results show that our proposed method can improve the accuracy of SAV-based ZS-SEC and can visualize the region in the spectrogram related to each attribute.
△ Less
Submitted 17 March, 2023;
originally announced March 2023.
-
Learn to See Faster: Pushing the Limits of High-Speed Camera with Deep Underexposed Image Denoising
Authors:
Weihao Zhuang,
Tristan Hascoet,
Ryoichi Takashima,
Tetsuya Takiguchi
Abstract:
The ability to record high-fidelity videos at high acquisition rates is central to the study of fast moving phenomena. The difficulty of imaging fast moving scenes lies in a trade-off between motion blur and underexposure noise: On the one hand, recordings with long exposure times suffer from motion blur effects caused by movements in the recorded scene. On the other hand, the amount of light reac…
▽ More
The ability to record high-fidelity videos at high acquisition rates is central to the study of fast moving phenomena. The difficulty of imaging fast moving scenes lies in a trade-off between motion blur and underexposure noise: On the one hand, recordings with long exposure times suffer from motion blur effects caused by movements in the recorded scene. On the other hand, the amount of light reaching camera photosensors decreases with exposure times so that short-exposure recordings suffer from underexposure noise. In this paper, we propose to address this trade-off by treating the problem of high-speed imaging as an underexposed image denoising problem. We combine recent advances on underexposed image denoising using deep learning and adapt these methods to the specificity of the high-speed imaging problem. Leveraging large external datasets with a sensor-specific noise model, our method is able to speedup the acquisition rate of a High-Speed Camera over one order of magnitude while maintaining similar image quality.
△ Less
Submitted 29 November, 2022;
originally announced November 2022.
-
Optimal Retail Tariff Design with Prosumers: Pursuing Equity at the Expenses of Economic Efficiencies?
Authors:
Yihsu Chen,
Andrew L. Liu,
Makoto Tanaka,
Ryuta Takashima
Abstract:
Distributed renewable resources owned by prosumers can be an effective way of fortifying grid resilience and enhancing sustainability. However, prosumers serve their own interests and their objectives are unlikely to align with that of society. This paper develops a bilevel model to study the optimal design of retail electricity tariffs considering the balance between economic efficiency and energ…
▽ More
Distributed renewable resources owned by prosumers can be an effective way of fortifying grid resilience and enhancing sustainability. However, prosumers serve their own interests and their objectives are unlikely to align with that of society. This paper develops a bilevel model to study the optimal design of retail electricity tariffs considering the balance between economic efficiency and energy equity. The retail tariff entails a fixed charge and a volumetric charge tied to electricity usage to recover utilities' fixed costs. We analyze solution properties of the bilevel problem and prove an optimal rate design, which is to use fixed charges to recover fixed costs and to balance energy equity among different income groups. This suggests that programs similar to CARE (California Alternative Rate of Energy), which offer lower retail rates to low-income households, are unlikely to be efficient, even if they are politically appealing.
△ Less
Submitted 28 September, 2022;
originally announced September 2022.
-
Current Source Localization Using Deep Prior with Depth Weighting
Authors:
Rio Yamana,
Hajime Yano,
Ryoichi Takashima,
Tetsuya Takiguchi,
Seiji Nakagawa
Abstract:
This paper proposes a novel neuronal current source localization method based on Deep Prior that represents a more complicated prior distribution of current source using convolutional networks. Deep Prior has been suggested as a means of an unsupervised learning approach that does not require learning using training data, and randomly-initialized neural networks are used to update a source locatio…
▽ More
This paper proposes a novel neuronal current source localization method based on Deep Prior that represents a more complicated prior distribution of current source using convolutional networks. Deep Prior has been suggested as a means of an unsupervised learning approach that does not require learning using training data, and randomly-initialized neural networks are used to update a source location using a single observation. In our previous work, a Deep-Prior-based current source localization method in the brain has been proposed but the performance was not almost the same as those of conventional approaches, such as sLORETA. In order to improve the Deep-Prior-based approach, in this paper, a depth weight of the current source is introduced for Deep Prior, where depth weighting amounts to assigning more penalty to the superficial currents. Its effectiveness is confirmed by experiments of current source estimation on simulated MEG data.
△ Less
Submitted 25 March, 2022;
originally announced March 2022.
-
Auxiliary Interference Speaker Loss for Target-Speaker Speech Recognition
Authors:
Naoyuki Kanda,
Shota Horiguchi,
Ryoichi Takashima,
Yusuke Fujita,
Kenji Nagamatsu,
Shinji Watanabe
Abstract:
In this paper, we propose a novel auxiliary loss function for target-speaker automatic speech recognition (ASR). Our method automatically extracts and transcribes target speaker's utterances from a monaural mixture of multiple speakers speech given a short sample of the target speaker. The proposed auxiliary loss function attempts to additionally maximize interference speaker ASR accuracy during t…
▽ More
In this paper, we propose a novel auxiliary loss function for target-speaker automatic speech recognition (ASR). Our method automatically extracts and transcribes target speaker's utterances from a monaural mixture of multiple speakers speech given a short sample of the target speaker. The proposed auxiliary loss function attempts to additionally maximize interference speaker ASR accuracy during training. This will regularize the network to achieve a better representation for speaker separation, thus achieving better accuracy on the target-speaker ASR. We evaluated our proposed method using two-speaker-mixed speech in various signal-to-interference-ratio conditions. We first built a strong target-speaker ASR baseline based on the state-of-the-art lattice-free maximum mutual information. This baseline achieved a word error rate (WER) of 18.06% on the test set while a normal ASR trained with clean data produced a completely corrupted result (WER of 84.71%). Then, our proposed loss further reduced the WER by 6.6% relative to this strong baseline, achieving a WER of 16.87%. In addition to the accuracy improvement, we also showed that the auxiliary output branch for the proposed loss can even be used for a secondary ASR for interference speakers' speech.
△ Less
Submitted 26 June, 2019;
originally announced June 2019.