-
Description and Discussion on DCASE 2025 Challenge Task 4: Spatial Semantic Segmentation of Sound Scenes
Authors:
Masahiro Yasuda,
Binh Thien Nguyen,
Noboru Harada,
Romain Serizel,
Mayank Mishra,
Marc Delcroix,
Shoko Araki,
Daiki Takeuchi,
Daisuke Niizumi,
Yasunori Ohishi,
Tomohiro Nakatani,
Takao Kawamura,
Nobutaka Ono
Abstract:
Spatial Semantic Segmentation of Sound Scenes (S5) aims to enhance technologies for sound event detection and separation from multi-channel input signals that mix multiple sound events with spatial information. This is a fundamental basis of immersive communication. The ultimate goal is to separate sound event signals with 6 Degrees of Freedom (6DoF) information into dry sound object signals and m…
▽ More
Spatial Semantic Segmentation of Sound Scenes (S5) aims to enhance technologies for sound event detection and separation from multi-channel input signals that mix multiple sound events with spatial information. This is a fundamental basis of immersive communication. The ultimate goal is to separate sound event signals with 6 Degrees of Freedom (6DoF) information into dry sound object signals and metadata about the object type (sound event class) and representing spatial information, including direction. However, because several existing challenge tasks already provide some of the subset functions, this task for this year focuses on detecting and separating sound events from multi-channel spatial input signals. This paper outlines the S5 task setting of the Detection and Classification of Acoustic Scenes and Events (DCASE) 2025 Challenge Task 4 and the DCASE2025 Task 4 Dataset, newly recorded and curated for this task. We also report experimental results for an S5 system trained and evaluated on this dataset. The full version of this paper will be published after the challenge results are made public.
△ Less
Submitted 12 June, 2025;
originally announced June 2025.
-
Road Redesign Technique Achieving Enhanced Road Safety by Inpainting with a Diffusion Model
Authors:
Sumit Mishra,
Medhavi Mishra,
Taeyoung Kim,
Dongsoo Har
Abstract:
Road infrastructure can affect the occurrence of road accidents. Therefore, identifying roadway features with high accident probability is crucial. Here, we introduce image inpainting that can assist authorities in achieving safe roadway design with minimal intervention in the current roadway structure. Image inpainting is based on inpainting safe roadway elements in a roadway image, replacing acc…
▽ More
Road infrastructure can affect the occurrence of road accidents. Therefore, identifying roadway features with high accident probability is crucial. Here, we introduce image inpainting that can assist authorities in achieving safe roadway design with minimal intervention in the current roadway structure. Image inpainting is based on inpainting safe roadway elements in a roadway image, replacing accident-prone (AP) features by using a diffusion model. After object-level segmentation, the AP features identified by the properties of accident hotspots are masked by a human operator and safe roadway elements are inpainted. With only an average time of 2 min for image inpainting, the likelihood of an image being classified as an accident hotspot drops by an average of 11.85%. In addition, safe urban spaces can be designed considering human factors of commuters such as gaze saliency. Considering this, we introduce saliency enhancement that suggests chrominance alteration for a safe road view.
△ Less
Submitted 14 February, 2023;
originally announced February 2023.
-
AVHYAS: A Free and Open Source QGIS Plugin for Advanced Hyperspectral Image Analysis
Authors:
Rosly Boy Lyngdoh,
Anand S Sahadevan,
Touseef Ahmad,
Pradyuman Singh Rathore,
Manoj Mishra,
Praveen Kumar Gupta,
Arundhati Misra
Abstract:
Advanced Hyperspectral Data Analysis Software (AVHYAS) plugin is a python3 based quantum GIS (QGIS) plugin designed to process and analyse hyperspectral (Hx) images. It is developed to guarantee full usage of present and future Hx airborne or spaceborne sensors and provides access to advanced algorithms for Hx data processing. The software is freely available and offers a range of basic and advanc…
▽ More
Advanced Hyperspectral Data Analysis Software (AVHYAS) plugin is a python3 based quantum GIS (QGIS) plugin designed to process and analyse hyperspectral (Hx) images. It is developed to guarantee full usage of present and future Hx airborne or spaceborne sensors and provides access to advanced algorithms for Hx data processing. The software is freely available and offers a range of basic and advanced tools such as atmospheric correction (for airborne AVIRISNG image), standard processing tools as well as powerful machine learning and Deep Learning interfaces for Hx data analysis.
△ Less
Submitted 24 June, 2021;
originally announced June 2021.
-
Recovery of coincident frequency domain multiplexed detector pulses using sequential deconvolution
Authors:
M. Mishra,
J. Mattingly
Abstract:
Multiplexing of radiation detector signals into a single channel significantly reduces the need for a large number of digitizer channels, which reduces the cost and the power consumption of a data acquisition system. We previously demonstrated frequency domain multiplexing by convolution using a prototype system that multiplexed two EJ-309 organic scintillators signals into a single channel. Each…
▽ More
Multiplexing of radiation detector signals into a single channel significantly reduces the need for a large number of digitizer channels, which reduces the cost and the power consumption of a data acquisition system. We previously demonstrated frequency domain multiplexing by convolution using a prototype system that multiplexed two EJ-309 organic scintillators signals into a single channel. Each detector pulse was converted to a damped sinusoid which was then combined into a single channel. The combined signal was digitized and the original detector signal was recovered from the damped sinusoid by deconvolution. In this paper, we demonstrate the recovery of multiple detector signals that arrive during the same digitized record via a new sequential deconvolution method. When two detectors produce signals in the same digitized record and their pulses do not overlap in time, we found that the charge, arrival time, and particle type can be estimated fairly precisely for the first pulse, but the second pulse exhibits substantial degradation in the precision of the estimated charge and arrival time. When the pulses overlap in time, we demonstrate both theoretically and experimentally that the part of the first pulse that does not overlap with the second can be recovered accurately, so the arrival time and amplitude of the first pulse can be estimated fairly precisely, but not the charge or particle type. None of these quantities can be estimated precisely for the second pulse when the two pulses overlap.
△ Less
Submitted 23 August, 2022; v1 submitted 8 June, 2020;
originally announced June 2020.
-
Adversarial Approximate Inference for Speech to Electroglottograph Conversion
Authors:
Prathosh A. P.,
Varun Srivastava,
Mayank Mishra
Abstract:
Speech produced by human vocal apparatus conveys substantial non-semantic information including the gender of the speaker, voice quality, affective state, abnormalities in the vocal apparatus etc. Such information is attributed to the properties of the voice source signal, which is usually estimated from the speech signal. However, most of the source estimation techniques depend heavily on the goo…
▽ More
Speech produced by human vocal apparatus conveys substantial non-semantic information including the gender of the speaker, voice quality, affective state, abnormalities in the vocal apparatus etc. Such information is attributed to the properties of the voice source signal, which is usually estimated from the speech signal. However, most of the source estimation techniques depend heavily on the goodness of the model assumptions and are prone to noise. A popular alternative is to indirectly obtain the source information through the Electroglottographic (EGG) signal that measures the electrical admittance around the vocal folds using dedicated hardware. In this paper, we address the problem of estimating the EGG signal directly from the speech signal, devoid of any hardware. Sampling from the intractable conditional distribution of the EGG signal given the speech signal is accomplished through optimization of an evidence lower bound. This is constructed via minimization of the KL-divergence between the true and the approximated posteriors of a latent variable learned using a deep neural auto-encoder that serves an informative prior. We demonstrate the efficacy of the method at generating the EGG signal by conducting several experiments on datasets comprising multiple speakers, voice qualities, noise settings and speech pathologies. The proposed method is evaluated on many benchmark metrics and is found to agree with the gold standard while proving better than the state-of-the-art algorithms on a few tasks such as epoch extraction.
△ Less
Submitted 7 September, 2019; v1 submitted 28 March, 2019;
originally announced March 2019.
-
Application of deconvolution to recover frequency-domain multiplexed detector pulses
Authors:
M. Mishra,
J. Mattingly,
R. M. Kolbas
Abstract:
Multiplexing of radiation detectors reduces the number of readout channels, which in turn reduces the number of digitizer input channels for data acquisition. We recently demonstrated frequency domain multiplexing (FDM) of pulse mode radiation detectors using a resonator that converts the detector signal into a damped sinusoid by convolution. The detectors were given unique "tags" by the oscillati…
▽ More
Multiplexing of radiation detectors reduces the number of readout channels, which in turn reduces the number of digitizer input channels for data acquisition. We recently demonstrated frequency domain multiplexing (FDM) of pulse mode radiation detectors using a resonator that converts the detector signal into a damped sinusoid by convolution. The detectors were given unique "tags" by the oscillation frequency of each resonator. The charge collected and the time-of-arrival of the detector pulse were estimated from the corresponding resonator output in the frequency domain.
In this paper, we demonstrate a new method to recover the detector pulse from the damped sinusoidal output by deconvolution. Deconvolution converts the frequency-encoded detector signal back to the original detector pulse. We have developed a new prototype FDM system to multiplex organic scintillators based on convolution and deconvolution. Using the new prototype, the charge collected under the anode pulse can be estimated from the recovered pulse with an uncertainty of about 4.4 keVee (keV electron equivalent). The time-of-arrival can be estimated from the recovered pulse with an uncertainty of about 102 ps. We also used a CeBr3 inorganic scintillator to measure the Cs-137 gamma spectrum using the recovered pulses and found a standard deviation of 13.8 keV at 662 keV compared to a standard deviation of 13.5 keV when the original pulses were used. Coincidence measurements with Na-22 using the deconvolved pulses resulted in a timing uncertainty of 617 ps compared to an uncertainty of 603 ps using the original pulses. Pulse shape discrimination was also performed using Cf-252 source and EJ-309 organic scintillator pulses recovered by deconvolution. A figure of merit value of 1.08 was observed when the recovered pulses were used compared to 1.2 for the original pulses.
△ Less
Submitted 22 February, 2019;
originally announced February 2019.