Skip to main content

Showing 1–30 of 30 results for author: Ewert, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.14198  [pdf

    eess.AS cs.SD

    Combined assessment of auditory distance perception and externalization

    Authors: Henning Hoppe, Steven van de Par, Virginia Flanagin, Stephan D. Ewert

    Abstract: This study investigates frontal auditory distance perception (ADP) and externalization in virtual audio-visual environments, considering effects of headphone rendering method, room size, reverberation, and visual representation of the room. Either head-related impulse responses from an artificial head or a spherical head model were used for diotic (monophonic) and binaural auralizations with and w… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

    Comments: This work has been submitted to The Journal of the Acoustical Society of America of the for possible publication

  2. arXiv:2408.13904  [pdf

    cs.SD cs.HC eess.AS

    The effect of self-motion and room familiarity on sound source localization in virtual environments

    Authors: Niklas Isserstedt, Stephan D. Ewert, Virginia Flanagin, Steven van de Par

    Abstract: This paper investigates the influence of lateral horizontal self-motion of participants during signal presentation on distance and azimuth perception for frontal sound sources in a rectangular room. Additionally, the effect of deviating room acoustics for a single sound presentation embedded in a sequence of presentations using a baseline room acoustics for familiarization is analyzed. For this pu… ▽ More

    Submitted 25 August, 2024; originally announced August 2024.

  3. Evaluation of Virtual Acoustic Environments with Different Acoustic Level of Detail

    Authors: Stefan Fichna, Steven van de Par, Stephan D. Ewert

    Abstract: Virtual acoustic environments enable the creation and simulation of realistic and ecologically valid daily-life situations with applications in hearing research and audiology. Hereby, reverberant indoor environments play an important role. For real-time applications, simplifications in the room acoustics simulation are required, however, it remains unclear what acoustic level of detail (ALOD) is n… ▽ More

    Submitted 10 August, 2023; v1 submitted 29 June, 2023; originally announced June 2023.

    Comments: This work has been submitted to the I3DA 2023 International Conference on Immersive and 3D Audio for possible publication. Revised version after review

  4. arXiv:2306.16967  [pdf, other

    cs.SD eess.AS physics.med-ph

    On the relevance of acoustic measurements for creating realistic virtual acoustic environments

    Authors: Siegfried Gündert, Stephan D. Ewert, Steven van de Par

    Abstract: Geometrical approaches for room acoustics simulation have the advantage of requiring limited computational resources while still achieving a high perceptual plausibility. A common approach is using the image source model for direct and early reflections in connection with further simplified models such as a feedback delay network for the diffuse reverberant tail. When recreating real spaces as vir… ▽ More

    Submitted 29 June, 2023; originally announced June 2023.

    Comments: This work has been submitted to the I3DA 2023 International Conference (IEEE Xplore Digital Library) for possible publication

  5. Computationally-efficient and perceptually-motivated rendering of diffuse reflections in room acoustics simulation

    Authors: Stephan D. Ewert, Nico Gößling, Oliver Buttler, Steven van de Par, Hongmei Hu

    Abstract: Geometrical acoustics is well suited for simulating room reverberation in interactive real-time applications. While the image source model (ISM) is exceptionally fast, the restriction to specular reflections impacts its perceptual plausibility. To account for diffuse late reverberation, hybrid approaches have been proposed, e.g., using a feedback delay network (FDN) in combination with the ISM. He… ▽ More

    Submitted 29 June, 2023; originally announced June 2023.

    Comments: This work has been submitted to Forum Acusticum 2023 for publication

  6. Contrastive Learning-Based Audio to Lyrics Alignment for Multiple Languages

    Authors: Simon Durand, Daniel Stoller, Sebastian Ewert

    Abstract: Lyrics alignment gained considerable attention in recent years. State-of-the-art systems either re-use established speech recognition toolkits, or design end-to-end solutions involving a Connectionist Temporal Classification (CTC) loss. However, both approaches suffer from specific weaknesses: toolkits are known for their complexity, and CTC systems use a loss designed for transcription which can… ▽ More

    Submitted 13 June, 2023; originally announced June 2023.

    Comments: 5 pages, accepted at the International Conference on Acoustics, Speech, and Signal Processing (ICASSP) 2023

    Journal ref: ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes Island, Greece, 2023, pp. 1-5

  7. arXiv:2205.05871  [pdf, other

    cs.SD cs.LG eess.AS

    Towards Robust Unsupervised Disentanglement of Sequential Data -- A Case Study Using Music Audio

    Authors: Yin-Jyun Luo, Sebastian Ewert, Simon Dixon

    Abstract: Disentangled sequential autoencoders (DSAEs) represent a class of probabilistic graphical models that describes an observed sequence with dynamic latent variables and a static latent variable. The former encode information at a frame rate identical to the observation, while the latter globally governs the entire sequence. This introduces an inductive bias and facilitates unsupervised disentangleme… ▽ More

    Submitted 14 June, 2022; v1 submitted 12 May, 2022; originally announced May 2022.

    Comments: The paper is accepted to IJCAI 2022

  8. arXiv:2203.09893  [pdf, other

    cs.SD cs.LG eess.AS

    A Lightweight Instrument-Agnostic Model for Polyphonic Note Transcription and Multipitch Estimation

    Authors: Rachel M. Bittner, Juan José Bosch, David Rubinstein, Gabriel Meseguer-Brocal, Sebastian Ewert

    Abstract: Automatic Music Transcription (AMT) has been recognized as a key enabling technology with a wide range of applications. Given the task's complexity, best results have typically been reported for systems focusing on specific settings, e.g. instrument-specific systems tend to yield improved results over instrument-agnostic methods. Similarly, higher accuracy can be obtained when only estimating fram… ▽ More

    Submitted 12 May, 2022; v1 submitted 18 March, 2022; originally announced March 2022.

  9. arXiv:2202.01646  [pdf, other

    cs.SD eess.AS eess.SP

    Improving Lyrics Alignment through Joint Pitch Detection

    Authors: Jiawen Huang, Emmanouil Benetos, Sebastian Ewert

    Abstract: In recent years, the accuracy of automatic lyrics alignment methods has increased considerably. Yet, many current approaches employ frameworks designed for automatic speech recognition (ASR) and do not exploit properties specific to music. Pitch is one important musical attribute of singing voice but it is often ignored by current systems as the lyrics content is considered independent of the pitc… ▽ More

    Submitted 3 February, 2022; originally announced February 2022.

    Comments: To appear in Proc. ICASSP 2022

  10. arXiv:2110.02695  [pdf, other

    eess.AS cs.SD

    Lower Interaural Coherence in Off-Signal Bands Impairs Binaural Detection

    Authors: Bernhard Eurich, Jörg Encke, Stephan D. Ewert, Mathias Dietz

    Abstract: Differences in interaural phase configuration between a target and a masker can lead to substantial binaural unmasking. This effect is decreased for masking noises with an interaural time difference (ITD). Adding a second noise with an opposing ITD in most cases further reduces binaural unmasking. Thus far, modeling of these detection thresholds required both a mechanism for internal ITD compensat… ▽ More

    Submitted 7 March, 2022; v1 submitted 6 October, 2021; originally announced October 2021.

    Comments: 14 pages, 5 figures

    Journal ref: J. Acoust. Soc. Am. 151(6), 2022, 3927-3936

  11. Prediction of tone detection thresholds in interaurally delayed noise based on interaural phase difference fluctuations

    Authors: Mathias Dietz, Jörg Encke, Kristin I. Bracklo, Stephan D. Ewert

    Abstract: Differences between the interaural phase of a noise and a target tone improve detection thresholds. The maximum masking release is obtained for detecting an antiphasic tone (S$π$) in diotic noise (N0). It has been shown in several studies that this benefit gradually declines as an interaural delay is applied to the N0S$π$ complex. This decline has been attributed to the reduced interaural coherenc… ▽ More

    Submitted 1 July, 2021; originally announced July 2021.

    Comments: This work has been submitted to Acta Acustica for possible publication

    Journal ref: Acta Acustica, 5, 60 (2021)

  12. Computationally efficient spatial rendering of late reverberation in virtual acoustic environments

    Authors: Christoph Kirsch, Josef Poppitz, Torben Wendt, Steven van de Par, Stephan D. Ewert

    Abstract: For 6-DOF (degrees of freedom) interactive virtual acoustic environments (VAEs), the spatial rendering of diffuse late reverberation in addition to early (specular) reflections is important. In the interest of computational efficiency, the acoustic simulation of the late reverberation can be simplified by using a limited number of spatially distributed virtual reverb sources (VRS) each radiating i… ▽ More

    Submitted 30 June, 2021; originally announced July 2021.

    Comments: submitted to the I3DA 2021 International Conference(IEEE Xplore Digital Library). arXiv admin note: text overlap with arXiv:2106.15888

    Journal ref: 2021 Immersive and 3D Audio: from Architecture to Automotive (I3DA)

  13. arXiv:2106.15916  [pdf

    cs.SD eess.AS

    Communication conditions in virtual acoustic scenes in an underground station

    Authors: Ľuboš Hládek, Stephan D. Ewert, Bernhard U. Seeber

    Abstract: Underground stations are a common communication situation in towns: we talk with friends or colleagues, listen to announcements or shop for titbits while background noise and reverberation are challenging communication. Here, we perform an acoustical analysis of two communication scenes in an underground station in Munich and test speech intelligibility. The acoustical conditions were measured in… ▽ More

    Submitted 2 November, 2021; v1 submitted 30 June, 2021; originally announced June 2021.

    Comments: I3DA conference paper, 8 figures, 9 pages

  14. Effect of acoustic scene complexity and visual scene representation on auditory perception in virtual audio-visual environments

    Authors: Stefan Fichna, Thomas Biberger, Bernhard U. Seeber, Stephan D. Ewert

    Abstract: In daily life, social interaction and acoustic communication often take place in complex acoustic environments (CAE) with a variety of interfering sounds and reverberation. For hearing research and the evaluation of hearing systems, simulated CAEs using virtual reality techniques have gained interest in the context of ecological validity. In the current study, the effect of scene complexity and vi… ▽ More

    Submitted 7 November, 2021; v1 submitted 30 June, 2021; originally announced June 2021.

    Comments: Accepted publication in Proceedings of 3DA 2021 International Conference on Immersive and 3D Audio

  15. arXiv:2106.15888  [pdf

    eess.AS cs.SD

    Spatial resolution of late reverberation in virtual acoustic environments

    Authors: Christoph Kirsch, Josef Poppitz, Torben Wendt, Steven van de Par, Stephan D. Ewert

    Abstract: Late reverberation involves the superposition of many sound reflections resulting in a diffuse sound field. Since the spatially resolved perception of individual diffuse reflections is impossible, simplifications can potentially be made for modelling late reverberation in room acoustics simulations with reduced spatial resolution. Such simplifications are desired for interactive, real-time virtual… ▽ More

    Submitted 30 June, 2021; originally announced June 2021.

    Comments: Submitted to Trends in Hearing

  16. arXiv:2106.15659  [pdf

    eess.AS cs.SD

    Towards a generalized monaural and binaural auditory model for psychoacoustics and speech intelligibility

    Authors: Thomas Biberger, Stephan D. Ewert

    Abstract: Auditory perception involves cues in the monaural auditory pathways as well as binaural cues based on differences between the ears. So far auditory models have often focused on either monaural or binaural experiments in isolation. Although binaural models typically build upon stages of (existing) monaural models, only a few attempts have been made to extend a monaural model by a binaural stage usi… ▽ More

    Submitted 29 June, 2021; originally announced June 2021.

    Comments: submitted to Acta Acustica

  17. arXiv:1911.06393  [pdf, other

    cs.LG cs.SD eess.AS stat.ML

    Seq-U-Net: A One-Dimensional Causal U-Net for Efficient Sequence Modelling

    Authors: Daniel Stoller, Mi Tian, Sebastian Ewert, Simon Dixon

    Abstract: Convolutional neural networks (CNNs) with dilated filters such as the Wavenet or the Temporal Convolutional Network (TCN) have shown good results in a variety of sequence modelling tasks. However, efficiently modelling long-term dependencies in these sequences is still challenging. Although the receptive field of these models grows exponentially with the number of layers, computing the convolution… ▽ More

    Submitted 14 November, 2019; originally announced November 2019.

    Comments: Code available at https://github.com/f90/Seq-U-Net

  18. arXiv:1905.12660  [pdf, other

    cs.LG stat.ML

    Training Generative Adversarial Networks from Incomplete Observations using Factorised Discriminators

    Authors: Daniel Stoller, Sebastian Ewert, Simon Dixon

    Abstract: Generative adversarial networks (GANs) have shown great success in applications such as image generation and inpainting. However, they typically require large datasets, which are often not available, especially in the context of prediction tasks such as image segmentation that require labels. Therefore, methods such as the CycleGAN use more easily available unlabelled data, but do not offer a way… ▽ More

    Submitted 30 January, 2020; v1 submitted 29 May, 2019; originally announced May 2019.

    Comments: 10 pages plus 14 pages appendix. Accepted at the International Conference on Learning Representations (ICLR) 2020. Camera-ready submission. Implementation available at https://github.com/f90/FactorGAN

  19. arXiv:1902.06797  [pdf, other

    cs.SD cs.LG eess.AS

    End-to-end Lyrics Alignment for Polyphonic Music Using an Audio-to-Character Recognition Model

    Authors: Daniel Stoller, Simon Durand, Sebastian Ewert

    Abstract: Time-aligned lyrics can enrich the music listening experience by enabling karaoke, text-based song retrieval and intra-song navigation, and other applications. Compared to text-to-speech alignment, lyrics alignment remains highly challenging, despite many attempts to combine numerous sub-modules including vocal separation and detection in an effort to break down the problem. Furthermore, training… ▽ More

    Submitted 18 February, 2019; originally announced February 2019.

    Comments: 5 pages (1 for references), 2 figures, 2 tables. Camera-ready version, accepted at the International Conference on Acoustics, Speech, and Signal Processing 2019 (ICASSP)

  20. arXiv:1806.03185  [pdf, other

    cs.SD eess.AS stat.ML

    Wave-U-Net: A Multi-Scale Neural Network for End-to-End Audio Source Separation

    Authors: Daniel Stoller, Sebastian Ewert, Simon Dixon

    Abstract: Models for audio source separation usually operate on the magnitude spectrum, which ignores phase information and makes separation performance dependant on hyper-parameters for the spectral front-end. Therefore, we investigate end-to-end source separation in the time-domain, which allows modelling phase information and avoids fixed spectral transformations. Due to high sampling rates for audio, em… ▽ More

    Submitted 8 June, 2018; originally announced June 2018.

    Comments: 7 pages (1 for references), 4 figures, 3 tables. Appearing in the proceedings of the 19th International Society for Music Information Retrieval Conference (ISMIR 2018) (camera-ready version). Implementation available at https://github.com/f90/Wave-U-Net

    Journal ref: 19th International Society for Music Information Retrieval Conference (ISMIR 2018)

  21. arXiv:1804.01650  [pdf, other

    cs.SD cs.LG eess.AS

    Jointly Detecting and Separating Singing Voice: A Multi-Task Approach

    Authors: Daniel Stoller, Sebastian Ewert, Simon Dixon

    Abstract: A main challenge in applying deep learning to music processing is the availability of training data. One potential solution is Multi-task Learning, in which the model also learns to solve related auxiliary tasks on additional datasets to exploit their correlation. While intuitive in principle, it can be challenging to identify related tasks and construct the model to optimally share information be… ▽ More

    Submitted 4 April, 2018; originally announced April 2018.

    Comments: 10 pages, 2 figures, accepted for the 14th International Conference on Latent Variable Analysis and Signal Separation

  22. arXiv:1711.00351  [pdf, other

    cs.SD eess.AS

    Shift-Invariant Kernel Additive Modelling for Audio Source Separation

    Authors: Delia Fano Yela, Sebastian Ewert, Ken O'Hanlon, Mark B. Sandler

    Abstract: A major goal in blind source separation to identify and separate sources is to model their inherent characteristics. While most state-of-the-art approaches are supervised methods trained on large datasets, interest in non-data-driven approaches such as Kernel Additive Modelling (KAM) remains high due to their interpretability and adaptability. KAM performs the separation of a given source applying… ▽ More

    Submitted 16 February, 2018; v1 submitted 1 November, 2017; originally announced November 2017.

    Comments: Feedback is welcome

    ACM Class: H.5.5; I.5.1; I.5.4

    Journal ref: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, 2018

  23. arXiv:1711.00048  [pdf, other

    cs.LG cs.SD

    Adversarial Semi-Supervised Audio Source Separation applied to Singing Voice Extraction

    Authors: Daniel Stoller, Sebastian Ewert, Simon Dixon

    Abstract: The state of the art in music source separation employs neural networks trained in a supervised fashion on multi-track databases to estimate the sources from a given mixture. With only few datasets available, often extensive data augmentation is used to combat overfitting. Mixing random tracks, however, can even reduce separation performance as instruments in real music are strongly correlated. Th… ▽ More

    Submitted 6 April, 2018; v1 submitted 31 October, 2017; originally announced November 2017.

    Comments: 5 pages, 2 figures, 1 table. Final version of manuscript accepted for 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Implementation available at https://github.com/f90/AdversarialAudioSeparation

    ACM Class: H.5.5; I.2.6

  24. arXiv:1707.00160  [pdf, other

    cs.SD cs.NE

    An Augmented Lagrangian Method for Piano Transcription using Equal Loudness Thresholding and LSTM-based Decoding

    Authors: Sebastian Ewert, Mark B. Sandler

    Abstract: A central goal in automatic music transcription is to detect individual note events in music recordings. An important variant is instrument-dependent music transcription where methods can use calibration data for the instruments in use. However, despite the additional information, results rarely exceed an f-measure of 80%. As a potential explanation, the transcription problem can be shown to be ba… ▽ More

    Submitted 30 July, 2017; v1 submitted 1 July, 2017; originally announced July 2017.

    ACM Class: H.5.5; I.2.6

    Journal ref: Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), New Paltz, NY, USA, pp. 146-150, 2017

  25. arXiv:1702.02130  [pdf, other

    cs.SD

    On the Importance of Temporal Context in Proximity Kernels: A Vocal Separation Case Study

    Authors: Delia Fano Yela, Sebastian Ewert, Derry FitzGerald, Mark Sandler

    Abstract: Musical source separation methods exploit source-specific spectral characteristics to facilitate the decomposition process. Kernel Additive Modelling (KAM) models a source applying robust statistics to time-frequency bins as specified by a source-specific kernel, a function defining similarity between bins. Kernels in existing approaches are typically defined using metrics between single time fram… ▽ More

    Submitted 11 April, 2017; v1 submitted 7 February, 2017; originally announced February 2017.

    Comments: 2017 AES International Conference on Semantic Audio

    ACM Class: H.5.5

    Journal ref: Proceedings of the AES International Conference on Semantic Audio, Erlangen, Germany, pp. 13-20, 2017

  26. arXiv:1609.06210  [pdf, other

    cs.SD

    Interference Reduction in Music Recordings Combining Kernel Additive Modelling and Non-Negative Matrix Factorization

    Authors: Delia Fano Yela, Sebastian Ewert, Derry FitzGerald, Mark Sandler

    Abstract: In live and studio recordings unexpected sound events often lead to interferences in the signal. For non-stationary interferences, sound source separation techniques can be used to reduce the interference level in the recording. In this context, we present a novel approach combining the strengths of two algorithmic families: NMF and KAM. The recent KAM approach applies robust statistics on frames… ▽ More

    Submitted 8 February, 2017; v1 submitted 20 September, 2016; originally announced September 2016.

    Comments: International Conference on Acoustics, Speech and Signal Processing (ICASSP)

    ACM Class: H.5.5

    Journal ref: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), New Orleans, USA, pp. 51-55, 2017

  27. arXiv:1609.04557  [pdf, other

    cs.LG cs.SD

    Structured Dropout for Weak Label and Multi-Instance Learning and Its Application to Score-Informed Source Separation

    Authors: Sebastian Ewert, Mark B. Sandler

    Abstract: Many success stories involving deep neural networks are instances of supervised learning, where available labels power gradient-based learning methods. Creating such labels, however, can be expensive and thus there is increasing interest in weak labels which only provide coarse information, with uncertainty regarding time, location or value. Using such labels often leads to considerable challenges… ▽ More

    Submitted 26 December, 2016; v1 submitted 15 September, 2016; originally announced September 2016.

    ACM Class: I.2.6; H.5.5

    Journal ref: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), New Orleans, USA, pp. 2277-2281, 2017

  28. Piano Transcription in the Studio Using an Extensible Alternating Directions Framework

    Authors: Sebastian Ewert, Mark Sandler

    Abstract: Given a musical audio recording, the goal of automatic music transcription is to determine a score-like representation of the piece underlying the recording. Despite significant interest within the research community, several studies have reported on a 'glass ceiling' effect, an apparent limit on the transcription accuracy that current methods seem incapable of overcoming. In this paper, we explor… ▽ More

    Submitted 27 July, 2016; v1 submitted 2 June, 2016; originally announced June 2016.

    Comments: IEEE/ACM Transactions on Audio, Speech, and Language Processing

    ACM Class: H.5.5; G.1.6

    Journal ref: IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 24, no. 11, pp. 1983-1997, 2016

  29. arXiv:1604.08516  [pdf, other

    cs.SD

    Robust Joint Alignment of Multiple Versions of a Piece of Music

    Authors: Siying Wang, Sebastian Ewert, Simon Dixon

    Abstract: Large music content libraries often comprise multiple versions of a piece of music. To establish a link between different versions, automatic music alignment methods map each position in one version to a corresponding position in another version. Due to the leeway in interpreting a piece, any two versions can differ significantly, for example, in terms of local tempo, articulation, or playing styl… ▽ More

    Submitted 28 April, 2016; originally announced April 2016.

    Comments: International Society for Music Information Retrieval Conference (ISMIR)

    Journal ref: Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), Taipei, Taiwan, pp. 83-88, 2014

  30. Evaluation of spatial audio reproduction schemes for application in hearing aid research

    Authors: Giso Grimm, Stephan Ewert, Volker Hohmann

    Abstract: Loudspeaker-based spatial audio reproduction schemes are increasingly used for evaluating hearing aids in complex acoustic conditions. To further establish the feasibility of this approach, this study investigated the interaction between spatial resolution of different reproduction methods and technical and perceptual hearing aid performance measures using computer simulations. Three spatial audio… ▽ More

    Submitted 3 August, 2015; v1 submitted 2 March, 2015; originally announced March 2015.

    Comments: The archived file is not the final published version of the article Evaluation of spatial audio reproduction schemes for application in hearing aid research, in Acta Acustica united with Acustica, volume 101, 2015, pp. 842-854(13). The definitive publisher-authenticated version is available online at http://www.ingentaconnect.com/content/dav/aaua. http://dx.doi.org/10.3813/AAA.918878