-
Status of the International Linear Collider
Authors:
Y. Abe,
S. Arai,
S. Araki,
H. Araki,
Y. Arimoto,
A. Aryshev,
S. Asai,
R. Bajpai,
T. Behnke,
S. Belomestnykh,
I. Bozovic,
J. E. Brau,
K. Buesser,
P. N. Burrows,
N. Catalan-Lasheras,
E. Cenni,
S. Chen,
J. Clark,
D. Delikaris,
M. Demarteau,
D. Denisov,
S. Doebert,
T. Dohmae,
R. Dowd,
G. Dugan
, et al. (127 additional authors not shown)
Abstract:
This paper is not a proposal for a CERN future project but provides information on the International Linear Collider (ILC) considered for Japan in order to facilitate the European Strategy discussion in a global context. It describes progress to date, ongoing engineering studies, updated cost estimate for the machine at $\sqrt{s}=250~\rm GeV$ and the situation in Japan. The physics of the ILC is n…
▽ More
This paper is not a proposal for a CERN future project but provides information on the International Linear Collider (ILC) considered for Japan in order to facilitate the European Strategy discussion in a global context. It describes progress to date, ongoing engineering studies, updated cost estimate for the machine at $\sqrt{s}=250~\rm GeV$ and the situation in Japan. The physics of the ILC is not presented here, but jointly for all Linear Collider projects in a separate document ``A Linear Collider Vision for the Future of Particle Physics'' submitted for the forthcoming European Strategy deliberations.
△ Less
Submitted 5 June, 2025; v1 submitted 16 May, 2025;
originally announced May 2025.
-
TS-SUPERB: A Target Speech Processing Benchmark for Speech Self-Supervised Learning Models
Authors:
Junyi Peng,
Takanori Ashihara,
Marc Delcroix,
Tsubasa Ochiai,
Oldrich Plchot,
Shoko Araki,
Jan Černocký
Abstract:
Self-supervised learning (SSL) models have significantly advanced speech processing tasks, and several benchmarks have been proposed to validate their effectiveness. However, previous benchmarks have primarily focused on single-speaker scenarios, with less exploration of target-speaker tasks in noisy, multi-talker conditions -- a more challenging yet practical case. In this paper, we introduce the…
▽ More
Self-supervised learning (SSL) models have significantly advanced speech processing tasks, and several benchmarks have been proposed to validate their effectiveness. However, previous benchmarks have primarily focused on single-speaker scenarios, with less exploration of target-speaker tasks in noisy, multi-talker conditions -- a more challenging yet practical case. In this paper, we introduce the Target-Speaker Speech Processing Universal Performance Benchmark (TS-SUPERB), which includes four widely recognized target-speaker processing tasks that require identifying the target speaker and extracting information from the speech mixture. In our benchmark, the speaker embedding extracted from enrollment speech is used as a clue to condition downstream models. The benchmark result reveals the importance of evaluating SSL models in target speaker scenarios, demonstrating that performance cannot be easily inferred from related single-speaker tasks. Moreover, by using a unified SSL-based target speech encoder, consisting of a speaker encoder and an extractor module, we also investigate joint optimization across TS tasks to leverage mutual information and demonstrate its effectiveness.
△ Less
Submitted 10 May, 2025;
originally announced May 2025.
-
Microphone Array Geometry Independent Multi-Talker Distant ASR: NTT System for the DASR Task of the CHiME-8 Challenge
Authors:
Naoyuki Kamo,
Naohiro Tawara,
Atsushi Ando,
Takatomo Kano,
Hiroshi Sato,
Rintaro Ikeshita,
Takafumi Moriya,
Shota Horiguch,
Kohei Matsuura,
Atsunori Ogawa,
Alexis Plaquet,
Takanori Ashihara,
Tsubasa Ochiai,
Masato Mimura,
Marc Delcroix,
Tomohiro Nakatani,
Taichi Asami,
Shoko Araki
Abstract:
In this paper, we introduce a multi-talker distant automatic speech recognition (DASR) system we designed for the DASR task 1 of the CHiME-8 challenge. Our system performs speaker counting, diarization, and ASR. It handles various recording conditions, from diner parties to professional meetings and from two to eight speakers. We perform diarization first, followed by speech enhancement, and then…
▽ More
In this paper, we introduce a multi-talker distant automatic speech recognition (DASR) system we designed for the DASR task 1 of the CHiME-8 challenge. Our system performs speaker counting, diarization, and ASR. It handles various recording conditions, from diner parties to professional meetings and from two to eight speakers. We perform diarization first, followed by speech enhancement, and then ASR as the challenge baseline. However, we introduced several key refinements. First, we derived a powerful speaker diarization relying on end-to-end speaker diarization with vector clustering (EEND-VC), multi-channel speaker counting using enhanced embeddings from EEND-VC, and target-speaker voice activity detection (TS-VAD). For speech enhancement, we introduced a novel microphone selection rule to better select the most relevant microphones among the distributed microphones and investigated improvements to beamforming. Finally, for ASR, we developed several models exploiting Whisper and WavLM speech foundation models. We present the results we submitted to the challenge and updated results we obtained afterward. Our strongest system achieves a 63% relative macro tcpWER improvement over the baseline and outperforms the challenge best results on the NOTSOFAR-1 meeting evaluation data among geometry-independent systems.
△ Less
Submitted 13 February, 2025;
originally announced February 2025.
-
30+ Years of Source Separation Research: Achievements and Future Challenges
Authors:
Shoko Araki,
Nobutaka Ito,
Reinhold Haeb-Umbach,
Gordon Wichern,
Zhong-Qiu Wang,
Yuki Mitsufuji
Abstract:
Source separation (SS) of acoustic signals is a research field that emerged in the mid-1990s and has flourished ever since. On the occasion of ICASSP's 50th anniversary, we review the major contributions and advancements in the past three decades in the speech, audio, and music SS research field. We will cover both single- and multi-channel SS approaches. We will also look back on key efforts to f…
▽ More
Source separation (SS) of acoustic signals is a research field that emerged in the mid-1990s and has flourished ever since. On the occasion of ICASSP's 50th anniversary, we review the major contributions and advancements in the past three decades in the speech, audio, and music SS research field. We will cover both single- and multi-channel SS approaches. We will also look back on key efforts to foster a culture of scientific evaluation in the research field, including challenges, performance metrics, and datasets. We will conclude by discussing current trends and future research directions.
△ Less
Submitted 20 January, 2025;
originally announced January 2025.
-
Mamba-based Segmentation Model for Speaker Diarization
Authors:
Alexis Plaquet,
Naohiro Tawara,
Marc Delcroix,
Shota Horiguchi,
Atsushi Ando,
Shoko Araki
Abstract:
Mamba is a newly proposed architecture which behaves like a recurrent neural network (RNN) with attention-like capabilities. These properties are promising for speaker diarization, as attention-based models have unsuitable memory requirements for long-form audio, and traditional RNN capabilities are too limited. In this paper, we propose to assess the potential of Mamba for diarization by comparin…
▽ More
Mamba is a newly proposed architecture which behaves like a recurrent neural network (RNN) with attention-like capabilities. These properties are promising for speaker diarization, as attention-based models have unsuitable memory requirements for long-form audio, and traditional RNN capabilities are too limited. In this paper, we propose to assess the potential of Mamba for diarization by comparing the state-of-the-art neural segmentation of the pyannote pipeline with our proposed Mamba-based variant. Mamba's stronger processing capabilities allow usage of longer local windows, which significantly improve diarization quality by making the speaker embedding extraction more reliable. We find Mamba to be a superior alternative to both traditional RNN and the tested attention-based model. Our proposed Mamba-based system achieves state-of-the-art performance on three widely used diarization datasets.
△ Less
Submitted 9 October, 2024; v1 submitted 8 October, 2024;
originally announced October 2024.
-
SoundBeam meets M2D: Target Sound Extraction with Audio Foundation Model
Authors:
Carlos Hernandez-Olivan,
Marc Delcroix,
Tsubasa Ochiai,
Daisuke Niizumi,
Naohiro Tawara,
Tomohiro Nakatani,
Shoko Araki
Abstract:
Target sound extraction (TSE) consists of isolating a desired sound from a mixture of arbitrary sounds using clues to identify it. A TSE system requires solving two problems at once, identifying the target source and extracting the target signal from the mixture. For increased practicability, the same system should work with various types of sound. The duality of the problem and the wide variety o…
▽ More
Target sound extraction (TSE) consists of isolating a desired sound from a mixture of arbitrary sounds using clues to identify it. A TSE system requires solving two problems at once, identifying the target source and extracting the target signal from the mixture. For increased practicability, the same system should work with various types of sound. The duality of the problem and the wide variety of sounds make it challenging to train a powerful TSE system from scratch. In this paper, to tackle this problem, we explore using a pre-trained audio foundation model that can provide rich feature representations of sounds within a TSE system. We chose the masked-modeling duo (M2D) foundation model, which appears especially suited for the TSE task, as it is trained using a dual objective consisting of sound-label predictions and improved masked prediction. These objectives are related to sound identification and the signal extraction problems of TSE. We propose a new TSE system that integrates the feature representation from M2D into SoundBeam, which is a strong TSE system that can exploit both target sound class labels and pre-recorded enrollments (or audio queries) as clues. We show experimentally that using M2D can increase extraction performance, especially when employing enrollment clues.
△ Less
Submitted 19 September, 2024;
originally announced September 2024.
-
NTT Multi-Speaker ASR System for the DASR Task of CHiME-8 Challenge
Authors:
Naoyuki Kamo,
Naohiro Tawara,
Atsushi Ando,
Takatomo Kano,
Hiroshi Sato,
Rintaro Ikeshita,
Takafumi Moriya,
Shota Horiguchi,
Kohei Matsuura,
Atsunori Ogawa,
Alexis Plaquet,
Takanori Ashihara,
Tsubasa Ochiai,
Masato Mimura,
Marc Delcroix,
Tomohiro Nakatani,
Taichi Asami,
Shoko Araki
Abstract:
We present a distant automatic speech recognition (DASR) system developed for the CHiME-8 DASR track. It consists of a diarization first pipeline. For diarization, we use end-to-end diarization with vector clustering (EEND-VC) followed by target speaker voice activity detection (TS-VAD) refinement. To deal with various numbers of speakers, we developed a new multi-channel speaker counting approach…
▽ More
We present a distant automatic speech recognition (DASR) system developed for the CHiME-8 DASR track. It consists of a diarization first pipeline. For diarization, we use end-to-end diarization with vector clustering (EEND-VC) followed by target speaker voice activity detection (TS-VAD) refinement. To deal with various numbers of speakers, we developed a new multi-channel speaker counting approach. We then apply guided source separation (GSS) with several improvements to the baseline system. Finally, we perform ASR using a combination of systems built from strong pre-trained models. Our proposed system achieves a macro tcpWER of 21.3 % on the dev set, which is a 57 % relative improvement over the baseline.
△ Less
Submitted 9 September, 2024;
originally announced September 2024.
-
Interaural time difference loss for binaural target sound extraction
Authors:
Carlos Hernandez-Olivan,
Marc Delcroix,
Tsubasa Ochiai,
Naohiro Tawara,
Tomohiro Nakatani,
Shoko Araki
Abstract:
Binaural target sound extraction (TSE) aims to extract a desired sound from a binaural mixture of arbitrary sounds while preserving the spatial cues of the desired sound. Indeed, for many applications, the target sound signal and its spatial cues carry important information about the sound source. Binaural TSE can be realized with a neural network trained to output only the desired sound given a b…
▽ More
Binaural target sound extraction (TSE) aims to extract a desired sound from a binaural mixture of arbitrary sounds while preserving the spatial cues of the desired sound. Indeed, for many applications, the target sound signal and its spatial cues carry important information about the sound source. Binaural TSE can be realized with a neural network trained to output only the desired sound given a binaural mixture and an embedding characterizing the desired sound class as inputs. Conventional TSE systems are trained using signal-level losses, which measure the difference between the extracted and reference signals for the left and right channels. In this paper, we propose adding explicit spatial losses to better preserve the spatial cues of the target sound. In particular, we explore losses aiming at preserving the interaural level (ILD), phase (IPD), and time differences (ITD). We show experimentally that adding such spatial losses, particularly our newly proposed ITD loss, helps preserve better spatial cues while maintaining the signal-level metrics.
△ Less
Submitted 1 August, 2024;
originally announced August 2024.
-
Rethinking Processing Distortions: Disentangling the Impact of Speech Enhancement Errors on Speech Recognition Performance
Authors:
Tsubasa Ochiai,
Kazuma Iwamoto,
Marc Delcroix,
Rintaro Ikeshita,
Hiroshi Sato,
Shoko Araki,
Shigeru Katagiri
Abstract:
It is challenging to improve automatic speech recognition (ASR) performance in noisy conditions with a single-channel speech enhancement (SE) front-end. This is generally attributed to the processing distortions caused by the nonlinear processing of single-channel SE front-ends. However, the causes of such degraded ASR performance have not been fully investigated. How to design single-channel SE f…
▽ More
It is challenging to improve automatic speech recognition (ASR) performance in noisy conditions with a single-channel speech enhancement (SE) front-end. This is generally attributed to the processing distortions caused by the nonlinear processing of single-channel SE front-ends. However, the causes of such degraded ASR performance have not been fully investigated. How to design single-channel SE front-ends in a way that significantly improves ASR performance remains an open research question. In this study, we investigate a signal-level numerical metric that can explain the cause of degradation in ASR performance. To this end, we propose a novel analysis scheme based on the orthogonal projection-based decomposition of SE errors. This scheme manually modifies the ratio of the decomposed interference, noise, and artifact errors, and it enables us to directly evaluate the impact of each error type on ASR performance. Our analysis reveals the particularly detrimental effect of artifact errors on ASR performance compared to the other types of errors. This provides us with a more principled definition of processing distortions that cause the ASR performance degradation. Then, we study two practical approaches for reducing the impact of artifact errors. First, we prove that the simple observation adding (OA) post-processing (i.e., interpolating the enhanced and observed signals) can monotonically improve the signal-to-artifact ratio. Second, we propose a novel training objective, called artifact-boosted signal-to-distortion ratio (AB-SDR), which forces the model to estimate the enhanced signals with fewer artifact errors. Through experiments, we confirm that both the OA and AB-SDR approaches are effective in decreasing artifact errors caused by single-channel SE front-ends, allowing them to significantly improve ASR performance.
△ Less
Submitted 23 April, 2024;
originally announced April 2024.
-
Probing Self-supervised Learning Models with Target Speech Extraction
Authors:
Junyi Peng,
Marc Delcroix,
Tsubasa Ochiai,
Oldrich Plchot,
Takanori Ashihara,
Shoko Araki,
Jan Cernocky
Abstract:
Large-scale pre-trained self-supervised learning (SSL) models have shown remarkable advancements in speech-related tasks. However, the utilization of these models in complex multi-talker scenarios, such as extracting a target speaker in a mixture, is yet to be fully evaluated. In this paper, we introduce target speech extraction (TSE) as a novel downstream task to evaluate the feature extraction c…
▽ More
Large-scale pre-trained self-supervised learning (SSL) models have shown remarkable advancements in speech-related tasks. However, the utilization of these models in complex multi-talker scenarios, such as extracting a target speaker in a mixture, is yet to be fully evaluated. In this paper, we introduce target speech extraction (TSE) as a novel downstream task to evaluate the feature extraction capabilities of pre-trained SSL models. TSE uniquely requires both speaker identification and speech separation, distinguishing it from other tasks in the Speech processing Universal PERformance Benchmark (SUPERB) evaluation. Specifically, we propose a TSE downstream model composed of two lightweight task-oriented modules based on the same frozen SSL model. One module functions as a speaker encoder to obtain target speaker information from an enrollment speech, while the other estimates the target speaker's mask to extract its speech from the mixture. Experimental results on the Libri2mix datasets reveal the relevance of the TSE downstream task to probe SSL models, as its performance cannot be simply deduced from other related tasks such as speaker verification and separation.
△ Less
Submitted 17 February, 2024;
originally announced February 2024.
-
Target Speech Extraction with Pre-trained Self-supervised Learning Models
Authors:
Junyi Peng,
Marc Delcroix,
Tsubasa Ochiai,
Oldrich Plchot,
Shoko Araki,
Jan Cernocky
Abstract:
Pre-trained self-supervised learning (SSL) models have achieved remarkable success in various speech tasks. However, their potential in target speech extraction (TSE) has not been fully exploited. TSE aims to extract the speech of a target speaker in a mixture guided by enrollment utterances. We exploit pre-trained SSL models for two purposes within a TSE framework, i.e., to process the input mixt…
▽ More
Pre-trained self-supervised learning (SSL) models have achieved remarkable success in various speech tasks. However, their potential in target speech extraction (TSE) has not been fully exploited. TSE aims to extract the speech of a target speaker in a mixture guided by enrollment utterances. We exploit pre-trained SSL models for two purposes within a TSE framework, i.e., to process the input mixture and to derive speaker embeddings from the enrollment. In this paper, we focus on how to effectively use SSL models for TSE. We first introduce a novel TSE downstream task following the SUPERB principles. This simple experiment shows the potential of SSL models for TSE, but extraction performance remains far behind the state-of-the-art. We then extend a powerful TSE architecture by incorporating two SSL-based modules: an Adaptive Input Enhancer (AIE) and a speaker encoder. Specifically, the proposed AIE utilizes intermediate representations from the CNN encoder by adjusting the time resolution of CNN encoder and transformer blocks through progressive upsampling, capturing both fine-grained and hierarchical features. Our method outperforms current TSE systems achieving a SI-SDR improvement of 14.0 dB on LibriMix. Moreover, we can further improve performance by 0.7 dB by fine-tuning the whole model including the SSL model parameters.
△ Less
Submitted 17 February, 2024;
originally announced February 2024.
-
Array Geometry-Robust Attention-Based Neural Beamformer for Moving Speakers
Authors:
Marvin Tammen,
Tsubasa Ochiai,
Marc Delcroix,
Tomohiro Nakatani,
Shoko Araki,
Simon Doclo
Abstract:
Although mask-based beamforming is a powerful speech enhancement approach, it often requires manual parameter tuning to handle moving speakers. Recently, this approach was augmented with an attention-based spatial covariance matrix aggregator (ASA) module, enabling accurate tracking of moving speakers without manual tuning. However, the deep neural network model used in this module is limited to s…
▽ More
Although mask-based beamforming is a powerful speech enhancement approach, it often requires manual parameter tuning to handle moving speakers. Recently, this approach was augmented with an attention-based spatial covariance matrix aggregator (ASA) module, enabling accurate tracking of moving speakers without manual tuning. However, the deep neural network model used in this module is limited to specific microphone arrays, necessitating a different model for varying channel permutations, numbers, or geometries. To improve the robustness of the ASA module against such variations, in this paper we investigate three approaches: training with random channel configurations, employing the transform-average-concatenate method to process multi-channel input features, and utilizing robust input features. Our experiments on the CHiME-3 and DEMAND datasets show that these approaches enable the ASA-augmented beamformer to track moving speakers across different microphone arrays unseen in training.
△ Less
Submitted 17 June, 2024; v1 submitted 5 February, 2024;
originally announced February 2024.
-
Lattice Rescoring Based on Large Ensemble of Complementary Neural Language Models
Authors:
Atsunori Ogawa,
Naohiro Tawara,
Marc Delcroix,
Shoko Araki
Abstract:
We investigate the effectiveness of using a large ensemble of advanced neural language models (NLMs) for lattice rescoring on automatic speech recognition (ASR) hypotheses. Previous studies have reported the effectiveness of combining a small number of NLMs. In contrast, in this study, we combine up to eight NLMs, i.e., forward/backward long short-term memory/Transformer-LMs that are trained with…
▽ More
We investigate the effectiveness of using a large ensemble of advanced neural language models (NLMs) for lattice rescoring on automatic speech recognition (ASR) hypotheses. Previous studies have reported the effectiveness of combining a small number of NLMs. In contrast, in this study, we combine up to eight NLMs, i.e., forward/backward long short-term memory/Transformer-LMs that are trained with two different random initialization seeds. We combine these NLMs through iterative lattice generation. Since these NLMs work complementarily with each other, by combining them one by one at each rescoring iteration, language scores attached to given lattice arcs can be gradually refined. Consequently, errors of the ASR hypotheses can be gradually reduced. We also investigate the effectiveness of carrying over contextual information (previous rescoring results) across a lattice sequence of a long speech such as a lecture speech. In experiments using a lecture speech corpus, by combining the eight NLMs and using context carry-over, we obtained a 24.4% relative word error rate reduction from the ASR 1-best baseline. For further comparison, we performed simultaneous (i.e., non-iterative) NLM combination and 100-best rescoring using the large ensemble of NLMs, which confirmed the advantage of lattice rescoring with iterative NLM combination.
△ Less
Submitted 19 December, 2023;
originally announced December 2023.
-
How does end-to-end speech recognition training impact speech enhancement artifacts?
Authors:
Kazuma Iwamoto,
Tsubasa Ochiai,
Marc Delcroix,
Rintaro Ikeshita,
Hiroshi Sato,
Shoko Araki,
Shigeru Katagiri
Abstract:
Jointly training a speech enhancement (SE) front-end and an automatic speech recognition (ASR) back-end has been investigated as a way to mitigate the influence of \emph{processing distortion} generated by single-channel SE on ASR. In this paper, we investigate the effect of such joint training on the signal-level characteristics of the enhanced signals from the viewpoint of the decomposed noise a…
▽ More
Jointly training a speech enhancement (SE) front-end and an automatic speech recognition (ASR) back-end has been investigated as a way to mitigate the influence of \emph{processing distortion} generated by single-channel SE on ASR. In this paper, we investigate the effect of such joint training on the signal-level characteristics of the enhanced signals from the viewpoint of the decomposed noise and artifact errors. The experimental analyses provide two novel findings: 1) ASR-level training of the SE front-end reduces the artifact errors while increasing the noise errors, and 2) simply interpolating the enhanced and observed signals, which achieves a similar effect of reducing artifacts and increasing noise, improves ASR performance without jointly modifying the SE and ASR modules, even for a strong ASR back-end using a WavLM feature extractor. Our findings provide a better understanding of the effect of joint training and a novel insight for designing an ASR agnostic SE front-end.
△ Less
Submitted 20 November, 2023;
originally announced November 2023.
-
Neural network-based virtual microphone estimation with virtual microphone and beamformer-level multi-task loss
Authors:
Hanako Segawa,
Tsubasa Ochiai,
Marc Delcroix,
Tomohiro Nakatani,
Rintaro Ikeshita,
Shoko Araki,
Takeshi Yamada,
Shoji Makino
Abstract:
Array processing performance depends on the number of microphones available. Virtual microphone estimation (VME) has been proposed to increase the number of microphone signals artificially. Neural network-based VME (NN-VME) trains an NN with a VM-level loss to predict a signal at a microphone location that is available during training but not at inference. However, this training objective may not…
▽ More
Array processing performance depends on the number of microphones available. Virtual microphone estimation (VME) has been proposed to increase the number of microphone signals artificially. Neural network-based VME (NN-VME) trains an NN with a VM-level loss to predict a signal at a microphone location that is available during training but not at inference. However, this training objective may not be optimal for a specific array processing back-end, such as beamforming. An alternative approach is to use a training objective considering the array-processing back-end, such as a loss on the beamformer output. This approach may generate signals optimal for beamforming but not physically grounded. To combine the advantages of both approaches, this paper proposes a multi-task loss for NN-VME that combines both VM-level and beamformer-level losses. We evaluate the proposed multi-task NN-VME on multi-talker underdetermined conditions and show that it achieves a 33.1 % relative WER improvement compared to using only real microphones and 10.8 % compared to using a prior NN-VME approach.
△ Less
Submitted 20 November, 2023;
originally announced November 2023.
-
Modified Parametric Multichannel Wiener Filter \\for Low-latency Enhancement of Speech Mixtures with Unknown Number of Speakers
Authors:
Ning Guo,
Tomohiro Nakatani,
Shoko Araki,
Takehiro Moriya
Abstract:
This paper introduces a novel low-latency online beamforming (BF) algorithm, named Modified Parametric Multichannel Wiener Filter (Mod-PMWF), for enhancing speech mixtures with unknown and varying number of speakers. Although conventional BFs such as linearly constrained minimum variance BF (LCMV BF) can enhance a speech mixture, they typically require such attributes of the speech mixture as the…
▽ More
This paper introduces a novel low-latency online beamforming (BF) algorithm, named Modified Parametric Multichannel Wiener Filter (Mod-PMWF), for enhancing speech mixtures with unknown and varying number of speakers. Although conventional BFs such as linearly constrained minimum variance BF (LCMV BF) can enhance a speech mixture, they typically require such attributes of the speech mixture as the number of speakers and the acoustic transfer functions (ATFs) from the speakers to the microphones. When the mixture attributes are unavailable, estimating them by low-latency processing is challenging, hindering the application of the BFs to the problem. In this paper, we overcome this problem by modifying a conventional Parametric Multichannel Wiener Filter (PMWF). The proposed Mod-PMWF can adaptively form a directivity pattern that enhances all the speakers in the mixture without explicitly estimating these attributes. Our experiments will show the proposed BF's effectiveness in interference reduction ratios and subjective listening tests.
△ Less
Submitted 29 June, 2023;
originally announced June 2023.
-
Simulation Example of a Black Noise
Authors:
Takafumi Amaba,
Takahiro Aoyama,
Shota Araki,
Shu Eguchi
Abstract:
B. Tsirelson and A. M. Vershik (1998) introduced the notion of a mathematical noise, which possesses completely opposite properties to those of a white noise. Afterward, B. Tsirelson (2004) called this noise: `black noise.' In this paper, we provide a method to simulate black noise using a modified Bayesian convolutional neural network. Then we study the behavior of black noise both numerically an…
▽ More
B. Tsirelson and A. M. Vershik (1998) introduced the notion of a mathematical noise, which possesses completely opposite properties to those of a white noise. Afterward, B. Tsirelson (2004) called this noise: `black noise.' In this paper, we provide a method to simulate black noise using a modified Bayesian convolutional neural network. Then we study the behavior of black noise both numerically and visually.
△ Less
Submitted 16 June, 2023;
originally announced June 2023.
-
Multi-Stream Extension of Variational Bayesian HMM Clustering (MS-VBx) for Combined End-to-End and Vector Clustering-based Diarization
Authors:
Marc Delcroix,
Naohiro Tawara,
Mireia Diez,
Federico Landini,
Anna Silnova,
Atsunori Ogawa,
Tomohiro Nakatani,
Lukas Burget,
Shoko Araki
Abstract:
Combining end-to-end neural speaker diarization (EEND) with vector clustering (VC), known as EEND-VC, has gained interest for leveraging the strengths of both methods. EEND-VC estimates activities and speaker embeddings for all speakers within an audio chunk and uses VC to associate these activities with speaker identities across different chunks. EEND-VC generates thus multiple streams of embeddi…
▽ More
Combining end-to-end neural speaker diarization (EEND) with vector clustering (VC), known as EEND-VC, has gained interest for leveraging the strengths of both methods. EEND-VC estimates activities and speaker embeddings for all speakers within an audio chunk and uses VC to associate these activities with speaker identities across different chunks. EEND-VC generates thus multiple streams of embeddings, one for each speaker in a chunk. We can cluster these embeddings using constrained agglomerative hierarchical clustering (cAHC), ensuring embeddings from the same chunk belong to different clusters. This paper introduces an alternative clustering approach, a multi-stream extension of the successful Bayesian HMM clustering of x-vectors (VBx), called MS-VBx. Experiments on three datasets demonstrate that MS-VBx outperforms cAHC in diarization and speaker counting performance.
△ Less
Submitted 22 May, 2023;
originally announced May 2023.
-
Input optics systems of the KAGRA detector during O3GK
Authors:
T. Akutsu,
M. Ando,
K. Arai,
Y. Arai,
S. Araki,
A. Araya,
N. Aritomi,
H. Asada,
Y. Aso,
S. Bae,
Y. Bae,
L. Baiotti,
R. Bajpai,
M. A. Barton,
K. Cannon,
Z. Cao,
E. Capocasa,
M. Chan,
C. Chen,
K. Chen,
Y. Chen,
C-I. Chiang,
H. Chu,
Y-K. Chu,
S. Eguchi
, et al. (228 additional authors not shown)
Abstract:
KAGRA, the underground and cryogenic gravitational-wave detector, was operated for its solo observation from February 25th to March 10th, 2020, and its first joint observation with the GEO 600 detector from April 7th -- 21st, 2020 (O3GK). This study presents an overview of the input optics systems of the KAGRA detector, which consist of various optical systems, such as a laser source, its intensit…
▽ More
KAGRA, the underground and cryogenic gravitational-wave detector, was operated for its solo observation from February 25th to March 10th, 2020, and its first joint observation with the GEO 600 detector from April 7th -- 21st, 2020 (O3GK). This study presents an overview of the input optics systems of the KAGRA detector, which consist of various optical systems, such as a laser source, its intensity and frequency stabilization systems, modulators, a Faraday isolator, mode-matching telescopes, and a high-power beam dump. These optics were successfully delivered to the KAGRA interferometer and operated stably during the observations. The laser frequency noise was observed to limit the detector sensitivity above a few kHz, whereas the laser intensity did not significantly limit the detector sensitivity.
△ Less
Submitted 12 October, 2022;
originally announced October 2022.
-
Site Split of Antiferromagnetic $α$-Mn Revealed by $^{55}$Mn Nuclear Magnetic Resonance
Authors:
Masahiro Manago,
Gaku Motoyama,
Shijo Nishigori,
Kenji Fujiwara,
Katsuki Kinjo,
Shunsaku Kitagawa,
Kenji Ishida,
Kazuto Akiba,
Shingo Araki,
Tatsuo C. Kobayashi,
Hisatomo Harima
Abstract:
The magnetic structure of antiferromagnetic $α$-Mn has been unclarified for almost 70 years since its magnetism was discovered. We measured the zero-field nuclear magnetic resonance spectra of antiferromagnetic $α$-Mn to obtain further insight into magnetism below $T_{\text{N}} = 95$ K. The site II spectra split into two sites with five subpeaks owing to quadrupole interaction, and this shows that…
▽ More
The magnetic structure of antiferromagnetic $α$-Mn has been unclarified for almost 70 years since its magnetism was discovered. We measured the zero-field nuclear magnetic resonance spectra of antiferromagnetic $α$-Mn to obtain further insight into magnetism below $T_{\text{N}} = 95$ K. The site II spectra split into two sites with five subpeaks owing to quadrupole interaction, and this shows that the ordered moments at site II are slightly tilted from the $[001]$ direction. The site III spectra revealed that this site splits into four sites below $T_{\text{N}}$. These findings clearly demonstrate that the antiferromagnetic $α$-Mn symmetry is lower than previously considered.
△ Less
Submitted 6 October, 2022;
originally announced October 2022.
-
An Unstructured Mesh Approach to Nonlinear Noise Reduction for Coupled Systems
Authors:
Aaron Kirtland,
Jonah Botvinick-Greenhouse,
Marianne DeBrito,
Megan Osborne,
Casey Johnson,
Robert S. Martin,
Samuel J. Araki,
Daniel Q. Eckhardt
Abstract:
To address noise inherent in electronic data acquisition systems and real world sources, Araki et al. [Physica D: Nonlinear Phenomena, 417 (2021) 132819] demonstrated a grid based nonlinear technique to remove noise from a chaotic signal, leveraging a clean high-fidelity signal from the same dynamical system and ensemble averaging in multidimensional phase space. This method achieved denoising of…
▽ More
To address noise inherent in electronic data acquisition systems and real world sources, Araki et al. [Physica D: Nonlinear Phenomena, 417 (2021) 132819] demonstrated a grid based nonlinear technique to remove noise from a chaotic signal, leveraging a clean high-fidelity signal from the same dynamical system and ensemble averaging in multidimensional phase space. This method achieved denoising of a time-series data with 100% added noise but suffered in regions of low data density. To improve this grid-based method, here an unstructured mesh based on triangulations and Voronoi diagrams is used to accomplish the same task. The unstructured mesh more uniformly distributes data samples over mesh cells to improve the accuracy of the reconstructed signal. By empirically balancing bias and variance errors in selecting the number of unstructured cells as a function of the number of available samples, the method achieves asymptotic statistical convergence with known test data and reduces synthetic noise on experimental signals from Hall Effect Thrusters (HETs) with greater success than the original grid-based strategy.
△ Less
Submitted 7 September, 2022;
originally announced September 2022.
-
ConceptBeam: Concept Driven Target Speech Extraction
Authors:
Yasunori Ohishi,
Marc Delcroix,
Tsubasa Ochiai,
Shoko Araki,
Daiki Takeuchi,
Daisuke Niizumi,
Akisato Kimura,
Noboru Harada,
Kunio Kashino
Abstract:
We propose a novel framework for target speech extraction based on semantic information, called ConceptBeam. Target speech extraction means extracting the speech of a target speaker in a mixture. Typical approaches have been exploiting properties of audio signals, such as harmonic structure and direction of arrival. In contrast, ConceptBeam tackles the problem with semantic clues. Specifically, we…
▽ More
We propose a novel framework for target speech extraction based on semantic information, called ConceptBeam. Target speech extraction means extracting the speech of a target speaker in a mixture. Typical approaches have been exploiting properties of audio signals, such as harmonic structure and direction of arrival. In contrast, ConceptBeam tackles the problem with semantic clues. Specifically, we extract the speech of speakers speaking about a concept, i.e., a topic of interest, using a concept specifier such as an image or speech. Solving this novel problem would open the door to innovative applications such as listening systems that focus on a particular topic discussed in a conversation. Unlike keywords, concepts are abstract notions, making it challenging to directly represent a target concept. In our scheme, a concept is encoded as a semantic embedding by mapping the concept specifier to a shared embedding space. This modality-independent space can be built by means of deep metric learning using paired data consisting of images and their spoken captions. We use it to bridge modality-dependent information, i.e., the speech segments in the mixture, and the specified, modality-independent concept. As a proof of our scheme, we performed experiments using a set of images associated with spoken captions. That is, we generated speech mixtures from these spoken captions and used the images or speech signals as the concept specifiers. We then extracted the target speech using the acoustic characteristics of the identified segments. We compare ConceptBeam with two methods: one based on keywords obtained from recognition systems and another based on sound source separation. We show that ConceptBeam clearly outperforms the baseline methods and effectively extracts speech based on the semantic representation.
△ Less
Submitted 25 July, 2022;
originally announced July 2022.
-
Mask-based Neural Beamforming for Moving Speakers with Self-Attention-based Tracking
Authors:
Tsubasa Ochiai,
Marc Delcroix,
Tomohiro Nakatani,
Shoko Araki
Abstract:
Beamforming is a powerful tool designed to enhance speech signals from the direction of a target source. Computing the beamforming filter requires estimating spatial covariance matrices (SCMs) of the source and noise signals. Time-frequency masks are often used to compute these SCMs. Most studies of mask-based beamforming have assumed that the sources do not move. However, sources often move in pr…
▽ More
Beamforming is a powerful tool designed to enhance speech signals from the direction of a target source. Computing the beamforming filter requires estimating spatial covariance matrices (SCMs) of the source and noise signals. Time-frequency masks are often used to compute these SCMs. Most studies of mask-based beamforming have assumed that the sources do not move. However, sources often move in practice, which causes performance degradation. In this paper, we address the problem of mask-based beamforming for moving sources. We first review classical approaches to tracking a moving source, which perform online or blockwise computation of the SCMs. We show that these approaches can be interpreted as computing a sum of instantaneous SCMs weighted by attention weights. These weights indicate which time frames of the signal to consider in the SCM computation. Online or blockwise computation assumes a heuristic and deterministic way of computing these attention weights that, although simple, may not result in optimal performance. We thus introduce a learning-based framework that computes optimal attention weights for beamforming. We achieve this using a neural network implemented with self-attention layers. We show experimentally that our proposed framework can greatly improve beamforming performance in moving source situations while maintaining high performance in non-moving situations, thus enabling the development of mask-based beamformers robust to source movements.
△ Less
Submitted 7 May, 2022;
originally announced May 2022.
-
SoundBeam: Target sound extraction conditioned on sound-class labels and enrollment clues for increased performance and continuous learning
Authors:
Marc Delcroix,
Jorge Bennasar Vázquez,
Tsubasa Ochiai,
Keisuke Kinoshita,
Yasunori Ohishi,
Shoko Araki
Abstract:
In many situations, we would like to hear desired sound events (SEs) while being able to ignore interference. Target sound extraction (TSE) tackles this problem by estimating the audio signal of the sounds of target SE classes in a mixture of sounds while suppressing all other sounds. We can achieve this with a neural network that extracts the target SEs by conditioning it on clues representing th…
▽ More
In many situations, we would like to hear desired sound events (SEs) while being able to ignore interference. Target sound extraction (TSE) tackles this problem by estimating the audio signal of the sounds of target SE classes in a mixture of sounds while suppressing all other sounds. We can achieve this with a neural network that extracts the target SEs by conditioning it on clues representing the target SE classes. Two types of clues have been proposed, i.e., target SE class labels and enrollment audio samples (or audio queries), which are pre-recorded audio samples of sounds from the target SE classes. Systems based on SE class labels can directly optimize embedding vectors representing the SE classes, resulting in high extraction performance. However, extending these systems to extract new SE classes not encountered during training is not easy. Enrollment-based approaches extract SEs by finding sounds in the mixtures that share similar characteristics to the enrollment audio samples. These approaches do not explicitly rely on SE class definitions and can thus handle new SE classes. In this paper, we introduce a TSE framework, SoundBeam, that combines the advantages of both approaches. We also perform an extensive evaluation of the different TSE schemes using synthesized and real mixtures, which shows the potential of SoundBeam.
△ Less
Submitted 2 November, 2022; v1 submitted 8 April, 2022;
originally announced April 2022.
-
Effective data screening technique for crowdsourced speech intelligibility experiments: Evaluation with IRM-based speech enhancement
Authors:
Ayako Yamamoto,
Toshio Irino,
Shoko Araki,
Kenichi Arai,
Atsunori Ogawa,
Keisuke Kinoshita,
Tomohiro Nakatani
Abstract:
It is essential to perform speech intelligibility (SI) experiments with human listeners in order to evaluate objective intelligibility measures for developing effective speech enhancement and noise reduction algorithms. Recently, crowdsourced remote testing has become a popular means for collecting a massive amount and variety of data at a relatively small cost and in a short time. However, carefu…
▽ More
It is essential to perform speech intelligibility (SI) experiments with human listeners in order to evaluate objective intelligibility measures for developing effective speech enhancement and noise reduction algorithms. Recently, crowdsourced remote testing has become a popular means for collecting a massive amount and variety of data at a relatively small cost and in a short time. However, careful data screening is essential for attaining reliable SI data. We performed SI experiments on speech enhanced by an "oracle" ideal ratio mask (IRM) in a well-controlled laboratory and in crowdsourced remote environments that could not be controlled directly. We introduced simple tone pip tests, in which participants were asked to report the number of audible tone pips, to estimate their listening levels above audible thresholds. The tone pip tests were very effective for data screening to reduce the variability of crowdsourced remote results so that the laboratory results would become similar. The results also demonstrated the SI of an oracle IRM, giving us the upper limit of the mask-based single-channel speech enhancement.
△ Less
Submitted 19 August, 2022; v1 submitted 30 March, 2022;
originally announced March 2022.
-
Search for Gravitational Waves Associated with Fast Radio Bursts Detected by CHIME/FRB During the LIGO--Virgo Observing Run O3a
Authors:
The LIGO Scientific Collaboration,
the Virgo Collaboration,
the KAGRA Collaboration,
the CHIME/FRB Collaboration,
:,
R. Abbott,
T. D. Abbott,
F. Acernese,
K. Ackley,
C. Adams,
N. Adhikari,
R. X. Adhikari,
V. B. Adya,
C. Affeldt,
D. Agarwal,
M. Agathos,
K. Agatsuma,
N. Aggarwal,
O. D. Aguiar,
L. Aiello,
A. Ain,
P. Ajith,
T. Akutsu,
S. Albanesi,
A. Allocca
, et al. (1633 additional authors not shown)
Abstract:
We search for gravitational-wave transients associated with fast radio bursts (FRBs) detected by the Canadian Hydrogen Intensity Mapping Experiment Fast Radio Burst Project (CHIME/FRB), during the first part of the third observing run of Advanced LIGO and Advanced Virgo (1 April 2019 15:00 UTC-1 Oct 2019 15:00 UTC). Triggers from 22 FRBs were analyzed with a search that targets compact binary coal…
▽ More
We search for gravitational-wave transients associated with fast radio bursts (FRBs) detected by the Canadian Hydrogen Intensity Mapping Experiment Fast Radio Burst Project (CHIME/FRB), during the first part of the third observing run of Advanced LIGO and Advanced Virgo (1 April 2019 15:00 UTC-1 Oct 2019 15:00 UTC). Triggers from 22 FRBs were analyzed with a search that targets compact binary coalescences with at least one neutron star component. A targeted search for generic gravitational-wave transients was conducted on 40 FRBs. We find no significant evidence for a gravitational-wave association in either search. Given the large uncertainties in the distances of the FRBs inferred from the dispersion measures in our sample, however, this does not conclusively exclude any progenitor models that include emission of a gravitational wave of the types searched for from any of these FRB events. We report $90\%$ confidence lower bounds on the distance to each FRB for a range of gravitational-wave progenitor models. By combining the inferred maximum distance information for each FRB with the sensitivity of the gravitational-wave searches, we set upper limits on the energy emitted through gravitational waves for a range of emission scenarios. We find values of order $10^{51}$-$10^{57}$ erg for a range of different emission models with central gravitational wave frequencies in the range 70-3560 Hz. Finally, we also found no significant coincident detection of gravitational waves with the repeater, FRB 20200120E, which is the closest known extragalactic FRB.
△ Less
Submitted 22 March, 2022;
originally announced March 2022.
-
Chirality-Controlled Enantiopure Crystal Growth of a Transition Metal Monosilicide by a Floating Zone Method
Authors:
Yusuke Kousaka,
Satoshi Iwasaki,
Taisei Sayo,
Hiroshi Tanida,
Takeshi Matsumura,
Shingo Araki,
Jun Akimitsu,
Yoshihiko Togawa
Abstract:
We performed a crystal growth to obtain chirality-controlled enantiopure crystals using a laser-diode-heated floating zone (LDFZ) method with a composition-gradient feed rod. It has been argued that the crystal handedness of $T$Si ($T$ : transition metal) is fixed depending on $T$ in the case of the ones grown by the conventional methods. We found that right-handed single crystals of CoSi and MnSi…
▽ More
We performed a crystal growth to obtain chirality-controlled enantiopure crystals using a laser-diode-heated floating zone (LDFZ) method with a composition-gradient feed rod. It has been argued that the crystal handedness of $T$Si ($T$ : transition metal) is fixed depending on $T$ in the case of the ones grown by the conventional methods. We found that right-handed single crystals of CoSi and MnSi were grown from the composition gradient feed rods that consist of FeSi--CoSi and FeSi--MnSi, respectively. The obtained CoSi and MnSi crystals inherit the chirality from the seed part of FeSi, which grows in a right-handed structure, and thus have the chirality opposite to that for the crystals in the literature. The LDFZ method with the feed rods with various combinations of $T$Si compounds enables a flexible control of the chirality of $T$Si and will be useful for clarifying the interplay between the crystalline chirality and chirality-induced physical responses.
△ Less
Submitted 23 January, 2022;
originally announced January 2022.
-
How Bad Are Artifacts?: Analyzing the Impact of Speech Enhancement Errors on ASR
Authors:
Kazuma Iwamoto,
Tsubasa Ochiai,
Marc Delcroix,
Rintaro Ikeshita,
Hiroshi Sato,
Shoko Araki,
Shigeru Katagiri
Abstract:
It is challenging to improve automatic speech recognition (ASR) performance in noisy conditions with single-channel speech enhancement (SE). In this paper, we investigate the causes of ASR performance degradation by decomposing the SE errors using orthogonal projection-based decomposition (OPD). OPD decomposes the SE errors into noise and artifact components. The artifact component is defined as t…
▽ More
It is challenging to improve automatic speech recognition (ASR) performance in noisy conditions with single-channel speech enhancement (SE). In this paper, we investigate the causes of ASR performance degradation by decomposing the SE errors using orthogonal projection-based decomposition (OPD). OPD decomposes the SE errors into noise and artifact components. The artifact component is defined as the SE error signal that cannot be represented as a linear combination of speech and noise sources. We propose manually scaling the error components to analyze their impact on ASR. We experimentally identify the artifact component as the main cause of performance degradation, and we find that mitigating the artifact can greatly improve ASR performance. Furthermore, we demonstrate that the simple observation adding (OA) technique (i.e., adding a scaled version of the observed signal to the enhanced speech) can monotonically increase the signal-to-artifact ratio under a mild condition. Accordingly, we experimentally confirm that OA improves ASR performance for both simulated and real recordings. The findings of this paper provide a better understanding of the influence of SE errors on ASR and open the door to future research on novel approaches for designing effective single-channel SE front-ends for ASR.
△ Less
Submitted 30 March, 2022; v1 submitted 17 January, 2022;
originally announced January 2022.
-
Narrowband searches for continuous and long-duration transient gravitational waves from known pulsars in the LIGO-Virgo third observing run
Authors:
The LIGO Scientific Collaboration,
the Virgo Collaboration,
the KAGRA Collaboration,
R. Abbott,
T. D. Abbott,
F. Acernese,
K. Ackley,
C. Adams,
N. Adhikari,
R. X. Adhikari,
V. B. Adya,
C. Affeldt,
D. Agarwal,
M. Agathos,
K. Agatsuma,
N. Aggarwal,
O. D. Aguiar,
L. Aiello,
A. Ain,
P. Ajith,
T. Akutsu,
S. Albanesi,
A. Allocca,
P. A. Altin,
A. Amato
, et al. (1636 additional authors not shown)
Abstract:
Isolated neutron stars that are asymmetric with respect to their spin axis are possible sources of detectable continuous gravitational waves. This paper presents a fully-coherent search for such signals from eighteen pulsars in data from LIGO and Virgo's third observing run (O3). For known pulsars, efficient and sensitive matched-filter searches can be carried out if one assumes the gravitational…
▽ More
Isolated neutron stars that are asymmetric with respect to their spin axis are possible sources of detectable continuous gravitational waves. This paper presents a fully-coherent search for such signals from eighteen pulsars in data from LIGO and Virgo's third observing run (O3). For known pulsars, efficient and sensitive matched-filter searches can be carried out if one assumes the gravitational radiation is phase-locked to the electromagnetic emission. In the search presented here, we relax this assumption and allow the frequency and frequency time-derivative of the gravitational waves to vary in a small range around those inferred from electromagnetic observations. We find no evidence for continuous gravitational waves, and set upper limits on the strain amplitude for each target. These limits are more constraining for seven of the targets than the spin-down limit defined by ascribing all rotational energy loss to gravitational radiation. In an additional search we look in O3 data for long-duration (hours-months) transient gravitational waves in the aftermath of pulsar glitches for six targets with a total of nine glitches. We report two marginal outliers from this search, but find no clear evidence for such emission either. The resulting duration-dependent strain upper limits do not surpass indirect energy constraints for any of these targets.
△ Less
Submitted 27 June, 2022; v1 submitted 21 December, 2021;
originally announced December 2021.
-
Switching Independent Vector Analysis and Its Extension to Blind and Spatially Guided Convolutional Beamforming Algorithms
Authors:
Tomohiro Nakatani,
Rintaro Ikeshita,
Keisuke Kinoshita,
Hiroshi Sawada,
Naoyuki Kamo,
Shoko Araki
Abstract:
This paper develops a framework that can perform denoising, dereverberation, and source separation accurately by using a relatively small number of microphones. It has been empirically confirmed that Independent Vector Analysis (IVA) can blindly separate N sources from their sound mixture even with diffuse noise when a sufficiently large number (=M) of microphones are available (i.e., M>>N). Howev…
▽ More
This paper develops a framework that can perform denoising, dereverberation, and source separation accurately by using a relatively small number of microphones. It has been empirically confirmed that Independent Vector Analysis (IVA) can blindly separate N sources from their sound mixture even with diffuse noise when a sufficiently large number (=M) of microphones are available (i.e., M>>N). However, the estimation accuracy seriously degrades as the number of microphones, or more specifically M-N (>=0), decreases. To overcome this limitation of IVA, we propose switching IVA (swIVA) in this paper. With swIVA, time frames of an observed signal with time-varying characteristics are clustered into several groups, each of which can be well handled by IVA using a small number of microphones, and thus accurate estimation can be achieved by applying IVA individually to each of the groups. Conventionally, a switching mechanism was introduced into a beamformer; however, no blind source separation algorithms with a switching mechanism have been successfully developed until this paper. In order to incorporate dereverberation capability, this paper further extends swIVA to blind Convolutional beamforming algorithm (swCIVA). It integrates swIVA and switching Weighted Prediction Error-based dereverberation (swWPE) in a jointly optimal way. We show that both swIVA and swCIVA can be optimized effectively based on blind signal processing, and that their performance can be further improved using a spatial guide for the initialization. Experiments show that both proposed methods largely outperform conventional IVA and its Convolutional beamforming extension (CIVA) in terms of objective signal quality and automatic speech recognition scores when using a relatively small number of microphones.
△ Less
Submitted 24 February, 2022; v1 submitted 20 November, 2021;
originally announced November 2021.
-
The population of merging compact binaries inferred using gravitational waves through GWTC-3
Authors:
The LIGO Scientific Collaboration,
the Virgo Collaboration,
the KAGRA Collaboration,
R. Abbott,
T. D. Abbott,
F. Acernese,
K. Ackley,
C. Adams,
N. Adhikari,
R. X. Adhikari,
V. B. Adya,
C. Affeldt,
D. Agarwal,
M. Agathos,
K. Agatsuma,
N. Aggarwal,
O. D. Aguiar,
L. Aiello,
A. Ain,
P. Ajith,
T. Akutsu,
S. Albanesi,
A. Allocca,
P. A. Altin,
A. Amato
, et al. (1612 additional authors not shown)
Abstract:
We report on the population properties of compact binary mergers inferred from gravitational-wave observations of these systems during the first three LIGO-Virgo observing runs. The Gravitational-Wave Transient Catalog 3 contains signals consistent with three classes of binary mergers: binary black hole, binary neutron star, and neutron star-black hole mergers. We infer the binary neutron star mer…
▽ More
We report on the population properties of compact binary mergers inferred from gravitational-wave observations of these systems during the first three LIGO-Virgo observing runs. The Gravitational-Wave Transient Catalog 3 contains signals consistent with three classes of binary mergers: binary black hole, binary neutron star, and neutron star-black hole mergers. We infer the binary neutron star merger rate to be between 10 and 1700 Gpc$^{-3} yr$^{-1}$ and the neutron star-black hole merger rate to be between 7.8 and 140 Gpc$^{-3} yr$^{-1}$, assuming a constant rate density in the comoving frame and taking the union of 90% credible intervals for methods used in this work. We infer the binary black hole merger rate, allowing for evolution with redshift, to be between 17.9 and 44 Gpc$^{-3}$ yr$^{-1}$ at a fiducial redshift (z=0.2). The rate of binary black hole mergers is observed to increase with redshift at a rate proportional to $(1+z)^κ$ with $κ=2.9^{+1.7}_{-1.8}$ for $z\lesssim1$. Using both binary neutron star and neutron star-black hole binaries, we obtain a broad, relatively flat neutron star mass distribution extending from $1.2^{+0.1}_{-0.2}$ to $2.0^{+0.3}_{-0.3}\,M_\odot$. We confidently determine that the merger rate as a function of mass sharply declines after the expected maximum neutron star mass, but cannot yet confirm or rule out the existence of a lower mass gap between neutron stars and black holes. We also find the binary black hole mass distribution has localized over- and underdensities relative to a power-law distribution, with peaks emerging at chirp masses of $8.3^{+0.3}_{-0.5}$ and $27.9^{+1.9}_{-1.8}\,M_\odot$. While we continue to find that the mass distribution of a binary's more massive component strongly decreases as a function of primary mass, we observe no evidence of a strongly suppressed merger rate above approximately $60\,M_\odot$ [abridged]
△ Less
Submitted 30 January, 2025; v1 submitted 5 November, 2021;
originally announced November 2021.
-
Search for Gravitational Waves Associated with Gamma-Ray Bursts Detected by Fermi and Swift During the LIGO-Virgo Run O3b
Authors:
The LIGO Scientific Collaboration,
the Virgo Collaboration,
the KAGRA Collaboration,
R. Abbott,
T. D. Abbott,
F. Acernese,
K. Ackley,
C. Adams,
N. Adhikari,
R. X. Adhikari,
V. B. Adya,
C. Affeldt,
D. Agarwal,
M. Agathos,
K. Agatsuma,
N. Aggarwal,
O. D. Aguiar,
L. Aiello,
A. Ain,
P. Ajith,
T. Akutsu,
S. Albanesi,
A. Allocca,
P. A. Altin,
A. Amato
, et al. (1610 additional authors not shown)
Abstract:
We search for gravitational-wave signals associated with gamma-ray bursts detected by the Fermi and Swift satellites during the second half of the third observing run of Advanced LIGO and Advanced Virgo (1 November 2019 15:00 UTC-27 March 2020 17:00 UTC).We conduct two independent searches: a generic gravitational-wave transients search to analyze 86 gamma-ray bursts and an analysis to target bina…
▽ More
We search for gravitational-wave signals associated with gamma-ray bursts detected by the Fermi and Swift satellites during the second half of the third observing run of Advanced LIGO and Advanced Virgo (1 November 2019 15:00 UTC-27 March 2020 17:00 UTC).We conduct two independent searches: a generic gravitational-wave transients search to analyze 86 gamma-ray bursts and an analysis to target binary mergers with at least one neutron star as short gamma-ray burst progenitors for 17 events. We find no significant evidence for gravitational-wave signals associated with any of these gamma-ray bursts. A weighted binomial test of the combined results finds no evidence for sub-threshold gravitational wave signals associated with this GRB ensemble either. We use several source types and signal morphologies during the searches, resulting in lower bounds on the estimated distance to each gamma-ray burst. Finally, we constrain the population of low luminosity short gamma-ray bursts using results from the first to the third observing runs of Advanced LIGO and Advanced Virgo. The resulting population is in accordance with the local binary neutron star merger rate.
△ Less
Submitted 5 November, 2021;
originally announced November 2021.
-
GWTC-3: Compact Binary Coalescences Observed by LIGO and Virgo During the Second Part of the Third Observing Run
Authors:
The LIGO Scientific Collaboration,
the Virgo Collaboration,
the KAGRA Collaboration,
R. Abbott,
T. D. Abbott,
F. Acernese,
K. Ackley,
C. Adams,
N. Adhikari,
R. X. Adhikari,
V. B. Adya,
C. Affeldt,
D. Agarwal,
M. Agathos,
K. Agatsuma,
N. Aggarwal,
O. D. Aguiar,
L. Aiello,
A. Ain,
P. Ajith,
S. Akcay,
T. Akutsu,
S. Albanesi,
A. Allocca,
P. A. Altin
, et al. (1637 additional authors not shown)
Abstract:
The third Gravitational-Wave Transient Catalog (GWTC-3) describes signals detected with Advanced LIGO and Advanced Virgo up to the end of their third observing run. Updating the previous GWTC-2.1, we present candidate gravitational waves from compact binary coalescences during the second half of the third observing run (O3b) between 1 November 2019, 15:00 UTC and 27 March 2020, 17:00 UTC. There ar…
▽ More
The third Gravitational-Wave Transient Catalog (GWTC-3) describes signals detected with Advanced LIGO and Advanced Virgo up to the end of their third observing run. Updating the previous GWTC-2.1, we present candidate gravitational waves from compact binary coalescences during the second half of the third observing run (O3b) between 1 November 2019, 15:00 UTC and 27 March 2020, 17:00 UTC. There are 35 compact binary coalescence candidates identified by at least one of our search algorithms with a probability of astrophysical origin $p_\mathrm{astro} > 0.5$. Of these, 18 were previously reported as low-latency public alerts, and 17 are reported here for the first time. Based upon estimates for the component masses, our O3b candidates with $p_\mathrm{astro} > 0.5$ are consistent with gravitational-wave signals from binary black holes or neutron star-black hole binaries, and we identify none from binary neutron stars. However, from the gravitational-wave data alone, we are not able to measure matter effects that distinguish whether the binary components are neutron stars or black holes. The range of inferred component masses is similar to that found with previous catalogs, but the O3b candidates include the first confident observations of neutron star-black hole binaries. Including the 35 candidates from O3b in addition to those from GWTC-2.1, GWTC-3 contains 90 candidates found by our analysis with $p_\mathrm{astro} > 0.5$ across the first three observing runs. These observations of compact binary coalescences present an unprecedented view of the properties of black holes and neutron stars.
△ Less
Submitted 23 October, 2023; v1 submitted 5 November, 2021;
originally announced November 2021.
-
All-sky, all-frequency directional search for persistent gravitational-waves from Advanced LIGO's and Advanced Virgo's first three observing runs
Authors:
The LIGO Scientific Collaboration,
the Virgo Collaboration,
the KAGRA Collaboration,
R. Abbott,
T. D. Abbott,
F. Acernese,
K. Ackley,
C. Adams,
N. Adhikari,
R. X. Adhikari,
V. B. Adya,
C. Affeldt,
D. Agarwal,
M. Agathos,
K. Agatsuma,
N. Aggarwal,
O. D. Aguiar,
L. Aiello,
A. Ain,
P. Ajith,
T. Akutsu,
S. Albanesi,
A. Allocca,
P. A. Altin,
A. Amato
, et al. (1605 additional authors not shown)
Abstract:
We present the first results from an all-sky all-frequency (ASAF) search for an anisotropic stochastic gravitational-wave background using the data from the first three observing runs of the Advanced LIGO and Advanced Virgo detectors. Upper limit maps on broadband anisotropies of a persistent stochastic background were published for all observing runs of the LIGO-Virgo detectors. However, a broadb…
▽ More
We present the first results from an all-sky all-frequency (ASAF) search for an anisotropic stochastic gravitational-wave background using the data from the first three observing runs of the Advanced LIGO and Advanced Virgo detectors. Upper limit maps on broadband anisotropies of a persistent stochastic background were published for all observing runs of the LIGO-Virgo detectors. However, a broadband analysis is likely to miss narrowband signals as the signal-to-noise ratio of a narrowband signal can be significantly reduced when combined with detector output from other frequencies. Data folding and the computationally efficient analysis pipeline, {\tt PyStoch}, enable us to perform the radiometer map-making at every frequency bin. We perform the search at 3072 {\tt{HEALPix}} equal area pixels uniformly tiling the sky and in every frequency bin of width $1/32$~Hz in the range $20-1726$~Hz, except for bins that are likely to contain instrumental artefacts and hence are notched. We do not find any statistically significant evidence for the existence of narrowband gravitational-wave signals in the analyzed frequency bins. Therefore, we place $95\%$ confidence upper limits on the gravitational-wave strain for each pixel-frequency pair, the limits are in the range $(0.030 - 9.6) \times10^{-24}$. In addition, we outline a method to identify candidate pixel-frequency pairs that could be followed up by a more sensitive (and potentially computationally expensive) search, e.g., a matched-filtering-based analysis, to look for fainter nearly monochromatic coherent signals. The ASAF analysis is inherently independent of models describing any spectral or spatial distribution of power. We demonstrate that the ASAF results can be appropriately combined over frequencies and sky directions to successfully recover the broadband directional and isotropic results.
△ Less
Submitted 19 October, 2021;
originally announced October 2021.
-
Tailoring collective motion of kinesin-driven microtubules via topographic landscapes
Authors:
Shunya Araki,
Kazusa Beppu,
Arif Md. Rashedul Kabir,
Akira Kakugo,
Yusuke T. Maeda
Abstract:
Biomolecular motor proteins that generate forces by consuming chemical energy obtained from ATP hydrolysis are pivotal for organizing broad cytoskeletal structures in living cells. The control of such cytoskeletal structures benefits programmable protein patterning; however, our current knowledge is limited owing to the underdevelopment of an engineering approach for controlling pattern formation.…
▽ More
Biomolecular motor proteins that generate forces by consuming chemical energy obtained from ATP hydrolysis are pivotal for organizing broad cytoskeletal structures in living cells. The control of such cytoskeletal structures benefits programmable protein patterning; however, our current knowledge is limited owing to the underdevelopment of an engineering approach for controlling pattern formation. Here, we demonstrate the tailoring of assembled patterns of microtubules (MTs) driven by kinesin motors by designing the boundary shape in fabricated microwells. We found an MT bundle structure along the microwell wall and a bridging structure perpendicular to the wall. Corroborated by the theory of self-propelled rods, we further showed that the alignment of MTs defined by the boundary shape determined the transition of the assembled patterns, providing a blueprint to reconstruct bridge structures in microchannels. Our findings provide a geometric rule to tailor the self-organization of cytoskeletons and motor proteins for nanotechnological applications.
△ Less
Submitted 12 October, 2021;
originally announced October 2021.
-
Search for subsolar-mass binaries in the first half of Advanced LIGO and Virgo's third observing run
Authors:
The LIGO Scientific Collaboration,
the Virgo Collaboration,
the KAGRA Collaboration,
R. Abbott,
T. D. Abbott,
F. Acernese,
K. Ackley,
C. Adams,
N. Adhikari,
R. X. Adhikari,
V. B. Adya,
C. Affeldt,
D. Agarwal,
M. Agathos,
K. Agatsuma,
N. Aggarwal,
O. D. Aguiar,
L. Aiello,
A. Ain,
P. Ajith,
T. Akutsu,
S. Albanesi,
A. Allocca,
P. A. Altin,
A. Amato
, et al. (1612 additional authors not shown)
Abstract:
We report on a search for compact binary coalescences where at least one binary component has a mass between 0.2 $M_\odot$ and 1.0 $M_\odot$ in Advanced LIGO and Advanced Virgo data collected between 1 April 2019 1500 UTC and 1 October 2019 1500 UTC. We extend previous analyses in two main ways: we include data from the Virgo detector and we allow for more unequal mass systems, with mass ratio…
▽ More
We report on a search for compact binary coalescences where at least one binary component has a mass between 0.2 $M_\odot$ and 1.0 $M_\odot$ in Advanced LIGO and Advanced Virgo data collected between 1 April 2019 1500 UTC and 1 October 2019 1500 UTC. We extend previous analyses in two main ways: we include data from the Virgo detector and we allow for more unequal mass systems, with mass ratio $q \geq 0.1$. We do not report any gravitational-wave candidates. The most significant trigger has a false alarm rate of 0.14 $\mathrm{yr}^{-1}$. This implies an upper limit on the merger rate of subsolar binaries in the range $[220-24200] \mathrm{Gpc}^{-3} \mathrm{yr}^{-1}$, depending on the chirp mass of the binary. We use this upper limit to derive astrophysical constraints on two phenomenological models that could produce subsolar-mass compact objects. One is an isotropic distribution of equal-mass primordial black holes. Using this model, we find that the fraction of dark matter in primordial black holes is $f_\mathrm{PBH} \equiv Ω_\mathrm{PBH} / Ω_\mathrm{DM} \lesssim 6\%$. The other is a dissipative dark matter model, in which fermionic dark matter can collapse and form black holes. The upper limit on the fraction of dark matter black holes depends on the minimum mass of the black holes that can be formed: the most constraining result is obtained at $M_\mathrm{min}=1 M_\odot$, where $f_\mathrm{DBH} \equiv Ω_\mathrm{PBH} / Ω_\mathrm{DM} \lesssim 0.003\%$. These are the tightest limits on spinning subsolar-mass binaries to date.
△ Less
Submitted 24 September, 2021;
originally announced September 2021.
-
Search for continuous gravitational waves from 20 accreting millisecond X-ray pulsars in O3 LIGO data
Authors:
The LIGO Scientific Collaboration,
the Virgo Collaboration,
the KAGRA Collaboration,
R. Abbott,
T. D. Abbott,
F. Acernese,
K. Ackley,
C. Adams,
N. Adhikari,
R. X. Adhikari,
V. B. Adya,
C. Affeldt,
D. Agarwal,
M. Agathos,
K. Agatsuma,
N. Aggarwal,
O. D. Aguiar,
L. Aiello,
A. Ain,
T. Akutsu,
S. Albanesi,
A. Allocca,
P. A. Altin,
A. Amato,
C. Anand
, et al. (1612 additional authors not shown)
Abstract:
Results are presented of searches for continuous gravitational waves from 20 accreting millisecond X-ray pulsars with accurately measured spin frequencies and orbital parameters, using data from the third observing run of the Advanced LIGO and Advanced Virgo detectors. The search algorithm uses a hidden Markov model, where the transition probabilities allow the frequency to wander according to an…
▽ More
Results are presented of searches for continuous gravitational waves from 20 accreting millisecond X-ray pulsars with accurately measured spin frequencies and orbital parameters, using data from the third observing run of the Advanced LIGO and Advanced Virgo detectors. The search algorithm uses a hidden Markov model, where the transition probabilities allow the frequency to wander according to an unbiased random walk, while the $\mathcal{J}$-statistic maximum-likelihood matched filter tracks the binary orbital phase. Three narrow sub-bands are searched for each target, centered on harmonics of the measured spin frequency. The search yields 16 candidates, consistent with a false alarm probability of 30% per sub-band and target searched. These candidates, along with one candidate from an additional target-of-opportunity search done for SAX J1808.4$-$3658, which was in outburst during one month of the observing run, cannot be confidently associated with a known noise source. Additional follow-up does not provide convincing evidence that any are a true astrophysical signal. When all candidates are assumed non-astrophysical, upper limits are set on the maximum wave strain detectable at 95% confidence, $h_0^{95\%}$. The strictest constraint is $h_0^{95\%} = 4.7\times 10^{-26}$ from IGR J17062$-$6143. Constraints on the detectable wave strain from each target lead to constraints on neutron star ellipticity and $r$-mode amplitude, the strictest of which are $ε^{95\%} = 3.1\times 10^{-7}$ and $α^{95\%} = 1.8\times 10^{-5}$ respectively. This analysis is the most comprehensive and sensitive search of continuous gravitational waves from accreting millisecond X-ray pulsars to date.
△ Less
Submitted 21 January, 2022; v1 submitted 19 September, 2021;
originally announced September 2021.
-
Blind and neural network-guided convolutional beamformer for joint denoising, dereverberation, and source separation
Authors:
Tomohiro Nakatani,
Rintaro Ikeshita,
Keisuke Kinoshita,
Hiroshi Sawada,
Shoko Araki
Abstract:
This paper proposes an approach for optimizing a Convolutional BeamFormer (CBF) that can jointly perform denoising (DN), dereverberation (DR), and source separation (SS). First, we develop a blind CBF optimization algorithm that requires no prior information on the sources or the room acoustics, by extending a conventional joint DR and SS method. For making the optimization computationally tractab…
▽ More
This paper proposes an approach for optimizing a Convolutional BeamFormer (CBF) that can jointly perform denoising (DN), dereverberation (DR), and source separation (SS). First, we develop a blind CBF optimization algorithm that requires no prior information on the sources or the room acoustics, by extending a conventional joint DR and SS method. For making the optimization computationally tractable, we incorporate two techniques into the approach: the Source-Wise Factorization (SW-Fact) of a CBF and the Independent Vector Extraction (IVE). To further improve the performance, we develop a method that integrates a neural network(NN) based source power spectra estimation with CBF optimization by an inverse-Gamma prior. Experiments using noisy reverberant mixtures reveal that our proposed method with both blind and NN-guided scenarios greatly outperforms the conventional state-of-the-art NN-supported mask-based CBF in terms of the improvement in automatic speech recognition and signal distortion reduction performance.
△ Less
Submitted 4 August, 2021;
originally announced August 2021.
-
All-sky search for long-duration gravitational-wave bursts in the third Advanced LIGO and Advanced Virgo run
Authors:
The LIGO Scientific Collaboration,
the Virgo Collaboration,
the KAGRA Collaboration,
R. Abbott,
T. D. Abbott,
F. Acernese,
K. Ackley,
C. Adams,
N. Adhikari,
R. X. Adhikari,
V. B. Adya,
C. Affeldt,
D. Agarwal,
M. Agathos,
K. Agatsuma,
N. Aggarwal,
O. D. Aguiar,
L. Aiello,
A. Ain,
P. Ajith,
T. Akutsu,
S. Albanesi,
A. Allocca,
P. A. Altin,
A. Amato
, et al. (1605 additional authors not shown)
Abstract:
After the detection of gravitational waves from compact binary coalescences, the search for transient gravitational-wave signals with less well-defined waveforms for which matched filtering is not well-suited is one of the frontiers for gravitational-wave astronomy. Broadly classified into "short" $ \lesssim 1~$\,s and "long" $ \gtrsim 1~$\,s duration signals, these signals are expected from a var…
▽ More
After the detection of gravitational waves from compact binary coalescences, the search for transient gravitational-wave signals with less well-defined waveforms for which matched filtering is not well-suited is one of the frontiers for gravitational-wave astronomy. Broadly classified into "short" $ \lesssim 1~$\,s and "long" $ \gtrsim 1~$\,s duration signals, these signals are expected from a variety of astrophysical processes, including non-axisymmetric deformations in magnetars or eccentric binary black hole coalescences. In this work, we present a search for long-duration gravitational-wave transients from Advanced LIGO and Advanced Virgo's third observing run from April 2019 to March 2020. For this search, we use minimal assumptions for the sky location, event time, waveform morphology, and duration of the source. The search covers the range of $2~\text{--}~ 500$~s in duration and a frequency band of $24 - 2048$ Hz. We find no significant triggers within this parameter space; we report sensitivity limits on the signal strength of gravitational waves characterized by the root-sum-square amplitude $h_{\mathrm{rss}}$ as a function of waveform morphology. These $h_{\mathrm{rss}}$ limits improve upon the results from the second observing run by an average factor of 1.8.
△ Less
Submitted 29 July, 2021;
originally announced July 2021.
-
All-sky search for short gravitational-wave bursts in the third Advanced LIGO and Advanced Virgo run
Authors:
The LIGO Scientific Collaboration,
the Virgo Collaboration,
the KAGRA Collaboration,
R. Abbott,
T. D. Abbott,
F. Acernese,
K. Ackley,
C. Adams,
N. Adhikari,
R. X. Adhikari,
V. B. Adya,
C. Affeldt,
D. Agarwal,
M. Agathos,
K. Agatsuma,
N. Aggarwal,
O. D. Aguiar,
L. Aiello,
A. Ain,
P. Ajith,
T. Akutsu,
S. Albanesi,
A. Allocca,
P. A. Altin,
A. Amato
, et al. (1608 additional authors not shown)
Abstract:
This paper presents the results of a search for generic short-duration gravitational-wave transients in data from the third observing run of Advanced LIGO and Advanced Virgo. Transients with durations of milliseconds to a few seconds in the 24--4096 Hz frequency band are targeted by the search, with no assumptions made regarding the incoming signal direction, polarization or morphology. Gravitatio…
▽ More
This paper presents the results of a search for generic short-duration gravitational-wave transients in data from the third observing run of Advanced LIGO and Advanced Virgo. Transients with durations of milliseconds to a few seconds in the 24--4096 Hz frequency band are targeted by the search, with no assumptions made regarding the incoming signal direction, polarization or morphology. Gravitational waves from compact binary coalescences that have been identified by other targeted analyses are detected, but no statistically significant evidence for other gravitational wave bursts is found. Sensitivities to a variety of signals are presented. These include updated upper limits on the source rate-density as a function of the characteristic frequency of the signal, which are roughly an order of magnitude better than previous upper limits. This search is sensitive to sources radiating as little as $\sim$10$^{-10} M_{\odot} c^2$ in gravitational waves at $\sim$70 Hz from a distance of 10~kpc, with 50\% detection efficiency at a false alarm rate of one per century. The sensitivity of this search to two plausible astrophysical sources is estimated: neutron star f-modes, which may be excited by pulsar glitches, as well as selected core-collapse supernova models.
△ Less
Submitted 8 July, 2021;
originally announced July 2021.
-
All-sky Search for Continuous Gravitational Waves from Isolated Neutron Stars in the Early O3 LIGO Data
Authors:
The LIGO Scientific Collaboration,
the Virgo Collaboration,
the KAGRA Collaboration,
R. Abbott,
T. D. Abbott,
S. Abraham,
F. Acernese,
K. Ackley,
A. Adams,
C. Adams,
R. X. Adhikari,
V. B. Adya,
C. Affeldt,
D. Agarwal,
M. Agathos,
K. Agatsuma,
N. Aggarwal,
O. D. Aguiar,
L. Aiello,
A. Ain,
P. Ajith,
T. Akutsu,
K. M. Aleman,
G. Allen,
A. Allocca
, et al. (1566 additional authors not shown)
Abstract:
We report on an all-sky search for continuous gravitational waves in the frequency band 20-2000\,Hz and with a frequency time derivative in the range of $[-1.0, +0.1]\times10^{-8}$\,Hz/s. Such a signal could be produced by a nearby, spinning and slightly non-axisymmetric isolated neutron star in our galaxy. This search uses the LIGO data from the first six months of Advanced LIGO's and Advanced Vi…
▽ More
We report on an all-sky search for continuous gravitational waves in the frequency band 20-2000\,Hz and with a frequency time derivative in the range of $[-1.0, +0.1]\times10^{-8}$\,Hz/s. Such a signal could be produced by a nearby, spinning and slightly non-axisymmetric isolated neutron star in our galaxy. This search uses the LIGO data from the first six months of Advanced LIGO's and Advanced Virgo's third observational run, O3. No periodic gravitational wave signals are observed, and 95\%\ confidence-level (CL) frequentist upper limits are placed on their strengths. The lowest upper limits on worst-case (linearly polarized) strain amplitude $h_0$ are $~1.7\times10^{-25}$ near 200\,Hz. For a circularly polarized source (most favorable orientation), the lowest upper limits are $\sim6.3\times10^{-26}$. These strict frequentist upper limits refer to all sky locations and the entire range of frequency derivative values. For a population-averaged ensemble of sky locations and stellar orientations, the lowest 95\%\ CL upper limits on the strain amplitude are $\sim1.\times10^{-25}$. These upper limits improve upon our previously published all-sky results, with the greatest improvement (factor of $\sim$2) seen at higher frequencies, in part because quantum squeezing has dramatically improved the detector noise level relative to the second observational run, O2. These limits are the most constraining to date over most of the parameter space searched.
△ Less
Submitted 8 October, 2021; v1 submitted 1 July, 2021;
originally announced July 2021.
-
Observation of gravitational waves from two neutron star-black hole coalescences
Authors:
The LIGO Scientific Collaboration,
the Virgo Collaboration,
the KAGRA Collaboration,
R. Abbott,
T. D. Abbott,
S. Abraham,
F. Acernese,
K. Ackley,
A. Adams,
C. Adams,
R. X. Adhikari,
V. B. Adya,
C. Affeldt,
D. Agarwal,
M. Agathos,
K. Agatsuma,
N. Aggarwal,
O. D. Aguiar,
L. Aiello,
A. Ain,
P. Ajith,
T. Akutsu,
K. M. Aleman,
G. Allen,
A. Allocca
, et al. (1577 additional authors not shown)
Abstract:
We report the observation of gravitational waves from two compact binary coalescences in LIGO's and Virgo's third observing run with properties consistent with neutron star-black hole (NSBH) binaries. The two events are named GW200105_162426 and GW200115_042309, abbreviated as GW200105 and GW200115; the first was observed by LIGO Livingston and Virgo, and the second by all three LIGO-Virgo detecto…
▽ More
We report the observation of gravitational waves from two compact binary coalescences in LIGO's and Virgo's third observing run with properties consistent with neutron star-black hole (NSBH) binaries. The two events are named GW200105_162426 and GW200115_042309, abbreviated as GW200105 and GW200115; the first was observed by LIGO Livingston and Virgo, and the second by all three LIGO-Virgo detectors. The source of GW200105 has component masses $8.9^{+1.2}_{-1.5}\,M_\odot$ and $1.9^{+0.3}_{-0.2}\,M_\odot$, whereas the source of GW200115 has component masses $5.7^{+1.8}_{-2.1}\,M_\odot$ and $1.5^{+0.7}_{-0.3}\,M_\odot$ (all measurements quoted at the 90% credible level). The probability that the secondary's mass is below the maximal mass of a neutron star is 89%-96% and 87%-98%, respectively, for GW200105 and GW200115, with the ranges arising from different astrophysical assumptions. The source luminosity distances are $280^{+110}_{-110}$ Mpc and $300^{+150}_{-100}$ Mpc, respectively. The magnitude of the primary spin of GW200105 is less than 0.23 at the 90% credible level, and its orientation is unconstrained. For GW200115, the primary spin has a negative spin projection onto the orbital angular momentum at 88% probability. We are unable to constrain spin or tidal deformation of the secondary component for either event. We infer a NSBH merger rate density of $45^{+75}_{-33}\,\mathrm{Gpc}^{-3} \mathrm{yr}^{-1}$ when assuming GW200105 and GW200115 are representative of the NSBH population, or $130^{+112}_{-69}\,\mathrm{Gpc}^{-3} \mathrm{yr}^{-1}$ under the assumption of a broader distribution of component masses.
△ Less
Submitted 29 June, 2021;
originally announced June 2021.
-
Few-shot learning of new sound classes for target sound extraction
Authors:
Marc Delcroix,
Jorge Bennasar Vázquez,
Tsubasa Ochiai,
Keisuke Kinoshita,
Shoko Araki
Abstract:
Target sound extraction consists of extracting the sound of a target acoustic event (AE) class from a mixture of AE sounds. It can be realized using a neural network that extracts the target sound conditioned on a 1-hot vector that represents the desired AE class. With this approach, embedding vectors associated with the AE classes are directly optimized for the extraction of sound classes seen du…
▽ More
Target sound extraction consists of extracting the sound of a target acoustic event (AE) class from a mixture of AE sounds. It can be realized using a neural network that extracts the target sound conditioned on a 1-hot vector that represents the desired AE class. With this approach, embedding vectors associated with the AE classes are directly optimized for the extraction of sound classes seen during training. However, it is not easy to extend this framework to new AE classes, i.e. unseen during training. Recently, speech, music, or AE sound extraction based on enrollment audio of the desired sound offers the potential of extracting any target sound in a mixture given only a short audio signal of a similar sound. In this work, we propose combining 1-hot- and enrollment-based target sound extraction, allowing optimal performance for seen AE classes and simple extension to new classes. In experiments with synthesized sound mixtures generated with the Freesound Dataset (FSD) datasets, we demonstrate the benefit of the combined framework for both seen and new AE classes. Besides, we also propose adapting the embedding vectors obtained from a few enrollment audio samples (few-shot) to further improve performance on new classes.
△ Less
Submitted 13 June, 2021;
originally announced June 2021.
-
PILOT: Introducing Transformers for Probabilistic Sound Event Localization
Authors:
Christopher Schymura,
Benedikt Bönninghoff,
Tsubasa Ochiai,
Marc Delcroix,
Keisuke Kinoshita,
Tomohiro Nakatani,
Shoko Araki,
Dorothea Kolossa
Abstract:
Sound event localization aims at estimating the positions of sound sources in the environment with respect to an acoustic receiver (e.g. a microphone array). Recent advances in this domain most prominently focused on utilizing deep recurrent neural networks. Inspired by the success of transformer architectures as a suitable alternative to classical recurrent neural networks, this paper introduces…
▽ More
Sound event localization aims at estimating the positions of sound sources in the environment with respect to an acoustic receiver (e.g. a microphone array). Recent advances in this domain most prominently focused on utilizing deep recurrent neural networks. Inspired by the success of transformer architectures as a suitable alternative to classical recurrent neural networks, this paper introduces a novel transformer-based sound event localization framework, where temporal dependencies in the received multi-channel audio signals are captured via self-attention mechanisms. Additionally, the estimated sound event positions are represented as multivariate Gaussian variables, yielding an additional notion of uncertainty, which many previously proposed deep learning-based systems designed for this application do not provide. The framework is evaluated on three publicly available multi-source sound event localization datasets and compared against state-of-the-art methods in terms of localization error and event detection accuracy. It outperforms all competing systems on all datasets with statistical significant differences in performance.
△ Less
Submitted 7 June, 2021;
originally announced June 2021.
-
Constraints on dark photon dark matter using data from LIGO's and Virgo's third observing run
Authors:
The LIGO Scientific Collaboration,
the Virgo Collaboration,
the KAGRA Collaboration,
R. Abbott,
T. D. Abbott,
F. Acernese,
K. Ackley,
C. Adams,
N. Adhikari,
R. X. Adhikari,
V. B. Adya,
C. Affeldt,
D. Agarwal,
M. Agathos,
K. Agatsuma,
N. Aggarwal,
O. D. Aguiar,
L. Aiello,
A. Ain,
P. Ajith,
T. Akutsu,
S. Albanesi,
A. Allocca,
P. A. Altin,
A. Amato
, et al. (1605 additional authors not shown)
Abstract:
We present a search for dark photon dark matter that could couple to gravitational-wave interferometers using data from Advanced LIGO and Virgo's third observing run. To perform this analysis, we use two methods, one based on cross-correlation of the strain channels in the two nearly aligned LIGO detectors, and one that looks for excess power in the strain channels of the LIGO and Virgo detectors.…
▽ More
We present a search for dark photon dark matter that could couple to gravitational-wave interferometers using data from Advanced LIGO and Virgo's third observing run. To perform this analysis, we use two methods, one based on cross-correlation of the strain channels in the two nearly aligned LIGO detectors, and one that looks for excess power in the strain channels of the LIGO and Virgo detectors. The excess power method optimizes the Fourier Transform coherence time as a function of frequency, to account for the expected signal width due to Doppler modulations. We do not find any evidence of dark photon dark matter with a mass between $m_{\rm A} \sim 10^{-14}-10^{-11}$ eV/$c^2$, which corresponds to frequencies between 10-2000 Hz, and therefore provide upper limits on the square of the minimum coupling of dark photons to baryons, i.e. $U(1)_{\rm B}$ dark matter. For the cross-correlation method, the best median constraint on the squared coupling is $\sim2.65\times10^{-46}$ at $m_{\rm A}\sim4.31\times10^{-13}$ eV/$c^2$; for the other analysis, the best constraint is $\sim 2.4\times 10^{-47}$ at $m_{\rm A}\sim 5.7\times 10^{-13}$ eV/$c^2$. These limits improve upon those obtained in direct dark matter detection experiments by a factor of $\sim100$ for $m_{\rm A}\sim [2-4]\times 10^{-13}$ eV/$c^2$, and are, in absolute terms, the most stringent constraint so far in a large mass range $m_A\sim$ $2\times 10^{-13}-8\times 10^{-12}$ eV/$c^2$.
△ Less
Submitted 6 May, 2024; v1 submitted 27 May, 2021;
originally announced May 2021.
-
Searches for continuous gravitational waves from young supernova remnants in the early third observing run of Advanced LIGO and Virgo
Authors:
The LIGO Scientific Collaboration,
the Virgo Collaboration,
the KAGRA Collaboration,
R. Abbott,
T. D. Abbott,
S. Abraham,
F. Acernese,
K. Ackley,
A. Adams,
C. Adams,
R. X. Adhikari,
V. B. Adya,
C. Affeldt,
D. Agarwal,
M. Agathos,
K. Agatsuma,
N. Aggarwal,
O. D. Aguiar,
L. Aiello,
A. Ain,
P. Ajith,
T. Akutsu,
K. M. Aleman,
G. Allen,
A. Allocca
, et al. (1567 additional authors not shown)
Abstract:
We present results of three wide-band directed searches for continuous gravitational waves from 15 young supernova remnants in the first half of the third Advanced LIGO and Virgo observing run. We use three search pipelines with distinct signal models and methods of identifying noise artifacts. Without ephemerides of these sources, the searches are conducted over a frequency band spanning from 10~…
▽ More
We present results of three wide-band directed searches for continuous gravitational waves from 15 young supernova remnants in the first half of the third Advanced LIGO and Virgo observing run. We use three search pipelines with distinct signal models and methods of identifying noise artifacts. Without ephemerides of these sources, the searches are conducted over a frequency band spanning from 10~Hz to 2~kHz. We find no evidence of continuous gravitational radiation from these sources. We set upper limits on the intrinsic signal strain at 95\% confidence level in sample sub-bands, estimate the sensitivity in the full band, and derive the corresponding constraints on the fiducial neutron star ellipticity and $r$-mode amplitude. The best 95\% confidence constraints placed on the signal strain are $7.7\times 10^{-26}$ and $7.8\times 10^{-26}$ near 200~Hz for the supernova remnants G39.2--0.3 and G65.7+1.2, respectively. The most stringent constraints on the ellipticity and $r$-mode amplitude reach $\lesssim 10^{-7}$ and $ \lesssim 10^{-5}$, respectively, at frequencies above $\sim 400$~Hz for the closest supernova remnant G266.2--1.2/Vela Jr.
△ Less
Submitted 14 July, 2021; v1 submitted 24 May, 2021;
originally announced May 2021.
-
Constraints from LIGO O3 data on gravitational-wave emission due to r-modes in the glitching pulsar PSR J0537-6910
Authors:
The LIGO Scientific Collaboration,
the Virgo Collaboration,
the KAGRA Collaboration,
R. Abbott,
T. D. Abbott,
S. Abraham,
F. Acernese,
K. Ackley,
A. Adams,
C. Adams,
R. X. Adhikari,
V. B. Adya,
C. Affeldt,
D. Agarwal,
M. Agathos,
K. Agatsuma,
N. Aggarwal,
O. D. Aguiar,
L. Aiello,
A. Ain,
P. Ajith,
T. Akutsu,
K. M. Aleman,
G. Allen,
A. Allocca
, et al. (1574 additional authors not shown)
Abstract:
We present a search for continuous gravitational-wave emission due to r-modes in the pulsar PSR J0537-6910 using data from the LIGO-Virgo Collaboration observing run O3. PSR J0537-6910 is a young energetic X-ray pulsar and is the most frequent glitcher known. The inter-glitch braking index of the pulsar suggests that gravitational-wave emission due to r-mode oscillations may play an important role…
▽ More
We present a search for continuous gravitational-wave emission due to r-modes in the pulsar PSR J0537-6910 using data from the LIGO-Virgo Collaboration observing run O3. PSR J0537-6910 is a young energetic X-ray pulsar and is the most frequent glitcher known. The inter-glitch braking index of the pulsar suggests that gravitational-wave emission due to r-mode oscillations may play an important role in the spin evolution of this pulsar. Theoretical models confirm this possibility and predict emission at a level that can be probed by ground-based detectors. In order to explore this scenario, we search for r-mode emission in the epochs between glitches by using a contemporaneous timing ephemeris obtained from NICER data. We do not detect any signals in the theoretically expected band of 86-97 Hz, and report upper limits on the amplitude of the gravitational waves. Our results improve on previous amplitude upper limits from r-modes in J0537-6910 by a factor of up to 3 and place stringent constraints on theoretical models for r-mode driven spin-down in PSR J0537-6910, especially for higher frequencies at which our results reach below the spin-down limit defined by energy conservation.
△ Less
Submitted 7 January, 2022; v1 submitted 29 April, 2021;
originally announced April 2021.
-
Comparison of remote experiments using crowdsourcing and laboratory experiments on speech intelligibility
Authors:
Ayako Yamamoto,
Toshio Irino,
Kenichi Arai,
Shoko Araki,
Atsunori Ogawa,
Keisuke Kinoshita,
Tomohiro Nakatani
Abstract:
Many subjective experiments have been performed to develop objective speech intelligibility measures, but the novel coronavirus outbreak has made it very difficult to conduct experiments in a laboratory. One solution is to perform remote testing using crowdsourcing; however, because we cannot control the listening conditions, it is unclear whether the results are entirely reliable. In this study,…
▽ More
Many subjective experiments have been performed to develop objective speech intelligibility measures, but the novel coronavirus outbreak has made it very difficult to conduct experiments in a laboratory. One solution is to perform remote testing using crowdsourcing; however, because we cannot control the listening conditions, it is unclear whether the results are entirely reliable. In this study, we compared speech intelligibility scores obtained in remote and laboratory experiments. The results showed that the mean and standard deviation (SD) of the remote experiments' speech reception threshold (SRT) were higher than those of the laboratory experiments. However, the variance in the SRTs across the speech-enhancement conditions revealed similarities, implying that remote testing results may be as useful as laboratory experiments to develop an objective measure. We also show that the practice session scores correlate with the SRT values. This is a priori information before performing the main tests and would be useful for data screening to reduce the variability of the SRT distribution.
△ Less
Submitted 16 April, 2021;
originally announced April 2021.
-
Search for anisotropic gravitational-wave backgrounds using data from Advanced LIGO and Advanced Virgo's first three observing runs
Authors:
The LIGO Scientific Collaboration,
the Virgo Collaboration,
the KAGRA Collaboration,
R. Abbott,
T. D. Abbott,
S. Abraham,
F. Acernese,
K. Ackley,
A. Adams,
C. Adams,
R. X. Adhikari,
V. B. Adya,
C. Affeldt,
D. Agarwal,
M. Agathos,
K. Agatsuma,
N. Aggarwal,
O. D. Aguiar,
L. Aiello,
A. Ain,
P. Ajith,
T. Akutsu,
K. M. Aleman,
G. Allen,
A. Allocca
, et al. (1568 additional authors not shown)
Abstract:
We report results from searches for anisotropic stochastic gravitational-wave backgrounds using data from the first three observing runs of the Advanced LIGO and Advanced Virgo detectors. For the first time, we include Virgo data in our analysis and run our search with a new efficient pipeline called {\tt PyStoch} on data folded over one sidereal day. We use gravitational-wave radiometry (broadban…
▽ More
We report results from searches for anisotropic stochastic gravitational-wave backgrounds using data from the first three observing runs of the Advanced LIGO and Advanced Virgo detectors. For the first time, we include Virgo data in our analysis and run our search with a new efficient pipeline called {\tt PyStoch} on data folded over one sidereal day. We use gravitational-wave radiometry (broadband and narrow band) to produce sky maps of stochastic gravitational-wave backgrounds and to search for gravitational waves from point sources. A spherical harmonic decomposition method is employed to look for gravitational-wave emission from spatially-extended sources. Neither technique found evidence of gravitational-wave signals. Hence we derive 95\% confidence-level upper limit sky maps on the gravitational-wave energy flux from broadband point sources, ranging from $F_{α, Θ} < {\rm (0.013 - 7.6)} \times 10^{-8} {\rm erg \, cm^{-2} \, s^{-1} \, Hz^{-1}},$ and on the (normalized) gravitational-wave energy density spectrum from extended sources, ranging from $Ω_{α, Θ} < {\rm (0.57 - 9.3)} \times 10^{-9} \, {\rm sr^{-1}}$, depending on direction ($Θ$) and spectral index ($α$). These limits improve upon previous limits by factors of $2.9 - 3.5$. We also set 95\% confidence level upper limits on the frequency-dependent strain amplitudes of quasimonochromatic gravitational waves coming from three interesting targets, Scorpius X-1, SN 1987A and the Galactic Center, with best upper limits range from $h_0 < {\rm (1.7-2.1)} \times 10^{-25},$ a factor of $\geq 2.0$ improvement compared to previous stochastic radiometer searches.
△ Less
Submitted 2 February, 2022; v1 submitted 15 March, 2021;
originally announced March 2021.
-
Exploiting Attention-based Sequence-to-Sequence Architectures for Sound Event Localization
Authors:
Christopher Schymura,
Tsubasa Ochiai,
Marc Delcroix,
Keisuke Kinoshita,
Tomohiro Nakatani,
Shoko Araki,
Dorothea Kolossa
Abstract:
Sound event localization frameworks based on deep neural networks have shown increased robustness with respect to reverberation and noise in comparison to classical parametric approaches. In particular, recurrent architectures that incorporate temporal context into the estimation process seem to be well-suited for this task. This paper proposes a novel approach to sound event localization by utili…
▽ More
Sound event localization frameworks based on deep neural networks have shown increased robustness with respect to reverberation and noise in comparison to classical parametric approaches. In particular, recurrent architectures that incorporate temporal context into the estimation process seem to be well-suited for this task. This paper proposes a novel approach to sound event localization by utilizing an attention-based sequence-to-sequence model. These types of models have been successfully applied to problems in natural language processing and automatic speech recognition. In this work, a multi-channel audio signal is encoded to a latent representation, which is subsequently decoded to a sequence of estimated directions-of-arrival. Herein, attentions allow for capturing temporal dependencies in the audio signal by focusing on specific frames that are relevant for estimating the activity and direction-of-arrival of sound events at the current time-step. The framework is evaluated on three publicly available datasets for sound event localization. It yields superior localization performance compared to state-of-the-art methods in both anechoic and reverberant conditions.
△ Less
Submitted 28 February, 2021;
originally announced March 2021.