-
Spiking Neural Network Nonlinear Demapping on Neuromorphic Hardware for IM/DD Optical Communication
Authors:
Elias Arnold,
Georg Böcherer,
Florian Strasser,
Eric Müller,
Philipp Spilger,
Sebastian Billaudelle,
Johannes Weis,
Johannes Schemmel,
Stefano Calabrò,
Maxim Kuschnerov
Abstract:
Neuromorphic computing implementing spiking neural networks (SNN) is a promising technology for reducing the footprint of optical transceivers, as required by the fast-paced growth of data center traffic. In this work, an SNN nonlinear demapper is designed and evaluated on a simulated intensity-modulation direct-detection link with chromatic dispersion. The SNN demapper is implemented in software…
▽ More
Neuromorphic computing implementing spiking neural networks (SNN) is a promising technology for reducing the footprint of optical transceivers, as required by the fast-paced growth of data center traffic. In this work, an SNN nonlinear demapper is designed and evaluated on a simulated intensity-modulation direct-detection link with chromatic dispersion. The SNN demapper is implemented in software and on the analog neuromorphic hardware system BrainScaleS-2 (BSS-2). For comparison, linear equalization (LE), Volterra nonlinear equalization (VNLE), and nonlinear demapping by an artificial neural network (ANN) implemented in software are considered. At a pre-forward error correction bit error rate of 2e-3, the software SNN outperforms LE by 1.5 dB, VNLE by 0.3 dB and the ANN by 0.5 dB. The hardware penalty of the SNN on BSS-2 is only 0.2 dB, i.e., also on hardware, the SNN performs better than all software implementations of the reference approaches. Hence, this work demonstrates that SNN demappers implemented on electrical analog hardware can realize powerful and accurate signal processing fulfilling the strict requirements of optical communications.
△ Less
Submitted 28 February, 2023;
originally announced February 2023.
-
Proceedings of the ICML 2022 Expressive Vocalizations Workshop and Competition: Recognizing, Generating, and Personalizing Vocal Bursts
Authors:
Alice Baird,
Panagiotis Tzirakis,
Gauthier Gidel,
Marco Jiralerspong,
Eilif B. Muller,
Kory Mathewson,
Björn Schuller,
Erik Cambria,
Dacher Keltner,
Alan Cowen
Abstract:
This is the Proceedings of the ICML Expressive Vocalization (ExVo) Competition. The ExVo competition focuses on understanding and generating vocal bursts: laughs, gasps, cries, and other non-verbal vocalizations that are central to emotional expression and communication. ExVo 2022, included three competition tracks using a large-scale dataset of 59,201 vocalizations from 1,702 speakers. The first,…
▽ More
This is the Proceedings of the ICML Expressive Vocalization (ExVo) Competition. The ExVo competition focuses on understanding and generating vocal bursts: laughs, gasps, cries, and other non-verbal vocalizations that are central to emotional expression and communication. ExVo 2022, included three competition tracks using a large-scale dataset of 59,201 vocalizations from 1,702 speakers. The first, ExVo-MultiTask, requires participants to train a multi-task model to recognize expressed emotions and demographic traits from vocal bursts. The second, ExVo-Generate, requires participants to train a generative model that produces vocal bursts conveying ten different emotions. The third, ExVo-FewShot, requires participants to leverage few-shot learning incorporating speaker identity to train a model for the recognition of 10 emotions conveyed by vocal bursts.
△ Less
Submitted 16 August, 2022; v1 submitted 14 July, 2022;
originally announced July 2022.
-
Spiking Neural Network Equalization on Neuromorphic Hardware for IM/DD Optical Communication
Authors:
Elias Arnold,
Georg Böcherer,
Eric Müller,
Philipp Spilger,
Johannes Schemmel,
Stefano Calabrò,
Maxim Kuschnerov
Abstract:
A spiking neural network (SNN) non-linear equalizer model is implemented on the mixed-signal neuromorphic hardware system BrainScaleS-2 and evaluated for an IM/DD link. The BER 2e-3 is achieved with a hardware penalty less than 1 dB, outperforming numeric linear equalization.
A spiking neural network (SNN) non-linear equalizer model is implemented on the mixed-signal neuromorphic hardware system BrainScaleS-2 and evaluated for an IM/DD link. The BER 2e-3 is achieved with a hardware penalty less than 1 dB, outperforming numeric linear equalization.
△ Less
Submitted 1 June, 2022;
originally announced June 2022.
-
Spiking Neural Network Equalization for IM/DD Optical Communication
Authors:
Elias Arnold,
Georg Böcherer,
Eric Müller,
Philipp Spilger,
Johannes Schemmel,
Stefano Calabrò,
Maxim Kuschnerov
Abstract:
A spiking neural network (SNN) equalizer model suitable for electronic neuromorphic hardware is designed for an IM/DD link. The SNN achieves the same bit-error-rate as an artificial neural network, outperforming linear equalization.
A spiking neural network (SNN) equalizer model suitable for electronic neuromorphic hardware is designed for an IM/DD link. The SNN achieves the same bit-error-rate as an artificial neural network, outperforming linear equalization.
△ Less
Submitted 1 June, 2022; v1 submitted 9 May, 2022;
originally announced May 2022.
-
The ICML 2022 Expressive Vocalizations Workshop and Competition: Recognizing, Generating, and Personalizing Vocal Bursts
Authors:
Alice Baird,
Panagiotis Tzirakis,
Gauthier Gidel,
Marco Jiralerspong,
Eilif B. Muller,
Kory Mathewson,
Björn Schuller,
Erik Cambria,
Dacher Keltner,
Alan Cowen
Abstract:
The ICML Expressive Vocalization (ExVo) Competition is focused on understanding and generating vocal bursts: laughs, gasps, cries, and other non-verbal vocalizations that are central to emotional expression and communication. ExVo 2022, includes three competition tracks using a large-scale dataset of 59,201 vocalizations from 1,702 speakers. The first, ExVo-MultiTask, requires participants to trai…
▽ More
The ICML Expressive Vocalization (ExVo) Competition is focused on understanding and generating vocal bursts: laughs, gasps, cries, and other non-verbal vocalizations that are central to emotional expression and communication. ExVo 2022, includes three competition tracks using a large-scale dataset of 59,201 vocalizations from 1,702 speakers. The first, ExVo-MultiTask, requires participants to train a multi-task model to recognize expressed emotions and demographic traits from vocal bursts. The second, ExVo-Generate, requires participants to train a generative model that produces vocal bursts conveying ten different emotions. The third, ExVo-FewShot, requires participants to leverage few-shot learning incorporating speaker identity to train a model for the recognition of 10 emotions conveyed by vocal bursts. This paper describes the three tracks and provides performance measures for baseline models using state-of-the-art machine learning strategies. The baseline for each track is as follows, for ExVo-MultiTask, a combined score, computing the harmonic mean of Concordance Correlation Coefficient (CCC), Unweighted Average Recall (UAR), and inverted Mean Absolute Error (MAE) ($S_{MTL}$) is at best, 0.335 $S_{MTL}$; for ExVo-Generate, we report Fréchet inception distance (FID) scores ranging from 4.81 to 8.27 (depending on the emotion) between the training set and generated samples. We then combine the inverted FID with perceptual ratings of the generated samples ($S_{Gen}$) and obtain 0.174 $S_{Gen}$; and for ExVo-FewShot, a mean CCC of 0.444 is obtained.
△ Less
Submitted 12 July, 2022; v1 submitted 3 May, 2022;
originally announced May 2022.
-
Generating Diverse Realistic Laughter for Interactive Art
Authors:
M. Mehdi Afsar,
Eric Park,
Étienne Paquette,
Gauthier Gidel,
Kory W. Mathewson,
Eilif Muller
Abstract:
We propose an interactive art project to make those rendered invisible by the COVID-19 crisis and its concomitant solitude reappear through the welcome melody of laughter, and connections created and explored through advanced laughter synthesis approaches. However, the unconditional generation of the diversity of human emotional responses in high-quality auditory synthesis remains an open problem,…
▽ More
We propose an interactive art project to make those rendered invisible by the COVID-19 crisis and its concomitant solitude reappear through the welcome melody of laughter, and connections created and explored through advanced laughter synthesis approaches. However, the unconditional generation of the diversity of human emotional responses in high-quality auditory synthesis remains an open problem, with important implications for the application of these approaches in artistic settings. We developed LaughGANter, an approach to reproduce the diversity of human laughter using generative adversarial networks (GANs). When trained on a dataset of diverse laughter samples, LaughGANter generates diverse, high quality laughter samples, and learns a latent space suitable for emotional analysis and novel artistic applications such as latent mixing/interpolation and emotional transfer.
△ Less
Submitted 29 July, 2022; v1 submitted 4 November, 2021;
originally announced November 2021.
-
Telephonetic: Making Neural Language Models Robust to ASR and Semantic Noise
Authors:
Chris Larson,
Tarek Lahlou,
Diana Mingels,
Zachary Kulis,
Erik Mueller
Abstract:
Speech processing systems rely on robust feature extraction to handle phonetic and semantic variations found in natural language. While techniques exist for desensitizing features to common noise patterns produced by Speech-to-Text (STT) and Text-to-Speech (TTS) systems, the question remains how to best leverage state-of-the-art language models (which capture rich semantic features, but are traine…
▽ More
Speech processing systems rely on robust feature extraction to handle phonetic and semantic variations found in natural language. While techniques exist for desensitizing features to common noise patterns produced by Speech-to-Text (STT) and Text-to-Speech (TTS) systems, the question remains how to best leverage state-of-the-art language models (which capture rich semantic features, but are trained on only written text) on inputs with ASR errors. In this paper, we present Telephonetic, a data augmentation framework that helps robustify language model features to ASR corrupted inputs. To capture phonetic alterations, we employ a character-level language model trained using probabilistic masking. Phonetic augmentations are generated in two stages: a TTS encoder (Tacotron 2, WaveGlow) and a STT decoder (DeepSpeech). Similarly, semantic perturbations are produced by sampling from nearby words in an embedding space, which is computed using the BERT language model. Words are selected for augmentation according to a hierarchical grammar sampling strategy. Telephonetic is evaluated on the Penn Treebank (PTB) corpus, and demonstrates its effectiveness as a bootstrapping technique for transferring neural language models to the speech domain. Notably, our language model achieves a test perplexity of 37.49 on PTB, which to our knowledge is state-of-the-art among models trained only on PTB.
△ Less
Submitted 13 June, 2019;
originally announced June 2019.