Skip to main content

Showing 1–7 of 7 results for author: Slaney, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2404.17490  [pdf, other

    eess.AS cs.SD eess.SP

    The CARFAC v2 Cochlear Model in Matlab, NumPy, and JAX

    Authors: Richard F. Lyon, Rob Schonberger, Malcolm Slaney, Mihajlo Velimirović, Honglin Yu

    Abstract: The open-source CARFAC (Cascade of Asymmetric Resonators with Fast-Acting Compression) cochlear model is upgraded to version 2, with improvements to the Matlab implementation, and with new Python/NumPy and JAX implementations -- but C++ version changes are still pending. One change addresses the DC (direct current, or zero frequency) quadratic distortion anomaly previously reported; another reduce… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

  2. arXiv:2203.15578  [pdf, other

    cs.SD cs.LG eess.AS

    Disentangling speech from surroundings with neural embeddings

    Authors: Ahmed Omran, Neil Zeghidour, Zalán Borsos, Félix de Chaumont Quitry, Malcolm Slaney, Marco Tagliasacchi

    Abstract: We present a method to separate speech signals from noisy environments in the embedding space of a neural audio codec. We introduce a new training procedure that allows our model to produce structured encodings of audio waveforms given by embedding vectors, where one part of the embedding vector represents the speech signal, and the rest represent the environment. We achieve this by partitioning t… ▽ More

    Submitted 4 June, 2023; v1 submitted 29 March, 2022; originally announced March 2022.

    Comments: Accepted at ICASSP 2023

  3. arXiv:2202.08793  [pdf, other

    eess.AS cs.SD eess.SP

    Multi-Channel Speech Denoising for Machine Ears

    Authors: Cong Han, E. Merve Kaya, Kyle Hoefer, Malcolm Slaney, Simon Carlile

    Abstract: This work describes a speech denoising system for machine ears that aims to improve speech intelligibility and the overall listening experience in noisy environments. We recorded approximately 100 hours of audio data with reverberation and moderate environmental noise using a pair of microphone arrays placed around each of the two ears and then mixed sound recordings to simulate adverse acoustic s… ▽ More

    Submitted 17 February, 2022; originally announced February 2022.

    Comments: Accepted to ICASSP 2022

  4. arXiv:2202.05397  [pdf, ps, other

    eess.AS cs.LG cs.SD

    Neural Architecture Search for Energy Efficient Always-on Audio Models

    Authors: Daniel T. Speckhard, Karolis Misiunas, Sagi Perel, Tenghui Zhu, Simon Carlile, Malcolm Slaney

    Abstract: Mobile and edge computing devices for always-on classification tasks require energy-efficient neural network architectures. In this paper we present several changes to neural architecture searches (NAS) that improve the chance of success in practical situations. Our search simultaneously optimizes for network accuracy, energy efficiency and memory usage. We benchmark the performance of our search… ▽ More

    Submitted 1 June, 2023; v1 submitted 9 February, 2022; originally announced February 2022.

  5. arXiv:1706.00079  [pdf, other

    cs.MM cs.CV cs.SD

    Putting a Face to the Voice: Fusing Audio and Visual Signals Across a Video to Determine Speakers

    Authors: Ken Hoover, Sourish Chaudhuri, Caroline Pantofaru, Malcolm Slaney, Ian Sturdy

    Abstract: In this paper, we present a system that associates faces with voices in a video by fusing information from the audio and visual signals. The thesis underlying our work is that an extremely simple approach to generating (weak) speech clusters can be combined with visual signals to effectively associate faces and voices by aggregating statistics across a video. This approach does not need any traini… ▽ More

    Submitted 31 May, 2017; originally announced June 2017.

  6. arXiv:1609.09430  [pdf, other

    cs.SD cs.LG stat.ML

    CNN Architectures for Large-Scale Audio Classification

    Authors: Shawn Hershey, Sourish Chaudhuri, Daniel P. W. Ellis, Jort F. Gemmeke, Aren Jansen, R. Channing Moore, Manoj Plakal, Devin Platt, Rif A. Saurous, Bryan Seybold, Malcolm Slaney, Ron J. Weiss, Kevin Wilson

    Abstract: Convolutional Neural Networks (CNNs) have proven very effective in image classification and show promise for audio. We use various CNN architectures to classify the soundtracks of a dataset of 70M training videos (5.24 million hours) with 30,871 video-level labels. We examine fully connected Deep Neural Networks (DNNs), AlexNet [1], VGG [2], Inception [3], and ResNet [4]. We investigate varying th… ▽ More

    Submitted 10 January, 2017; v1 submitted 29 September, 2016; originally announced September 2016.

    Comments: Accepted for publication at ICASSP 2017 Changes: Added definitions of mAP, AUC, and d-prime. Updated mAP/AUC/d-prime numbers for Audio Set based on changes of latest Audio Set revision. Changed wording to fit 4 page limit with new additions

  7. arXiv:1206.5267  [pdf

    cs.LG cs.IR stat.ML

    Collaborative Filtering and the Missing at Random Assumption

    Authors: Benjamin Marlin, Richard S. Zemel, Sam Roweis, Malcolm Slaney

    Abstract: Rating prediction is an important application, and a popular research topic in collaborative filtering. However, both the validity of learning algorithms, and the validity of standard testing procedures rest on the assumption that missing ratings are missing at random (MAR). In this paper we present the results of a user study in which we collect a random sample of ratings from current users of an… ▽ More

    Submitted 20 June, 2012; originally announced June 2012.

    Comments: Appears in Proceedings of the Twenty-Third Conference on Uncertainty in Artificial Intelligence (UAI2007)

    Report number: UAI-P-2007-PG-267-275