-
Heterogeneous quantization regularizes spiking neural network activity
Authors:
Roy Moyal,
Kyrus R. Mama,
Matthew Einhorn,
Ayon Borthakur,
Thomas A. Cleland
Abstract:
The learning and recognition of object features from unregulated input has been a longstanding challenge for artificial intelligence systems. Brains are adept at learning stable representations given small samples of noisy observations; across sensory modalities, this capacity is aided by a cascade of signal conditioning steps informed by domain knowledge. The olfactory system, in particular, solv…
▽ More
The learning and recognition of object features from unregulated input has been a longstanding challenge for artificial intelligence systems. Brains are adept at learning stable representations given small samples of noisy observations; across sensory modalities, this capacity is aided by a cascade of signal conditioning steps informed by domain knowledge. The olfactory system, in particular, solves a source separation and denoising problem compounded by concentration variability, environmental interference, and unpredictably correlated sensor affinities. To function optimally, its plastic network requires statistically well-behaved input. We present a data-blind neuromorphic signal conditioning strategy whereby analog data are normalized and quantized into spike phase representations. Input is delivered to a column of duplicated spiking principal neurons via heterogeneous synaptic weights; this regularizes layer utilization, yoking total activity to the network's operating range and rendering internal representations robust to uncontrolled open-set stimulus variance. We extend this mechanism by adding a data-aware calibration step whereby the range and density of the quantization weights adapt to accumulated input statistics, optimizing resource utilization by balancing activity regularization and information retention.
△ Less
Submitted 26 September, 2024;
originally announced September 2024.
-
Efficient-VDVAE: Less is more
Authors:
Louay Hazami,
Rayhane Mama,
Ragavan Thurairatnam
Abstract:
Hierarchical VAEs have emerged in recent years as a reliable option for maximum likelihood estimation. However, instability issues and demanding computational requirements have hindered research progress in the area. We present simple modifications to the Very Deep VAE to make it converge up to $2.6\times$ faster, save up to $20\times$ in memory load and improve stability during training. Despite…
▽ More
Hierarchical VAEs have emerged in recent years as a reliable option for maximum likelihood estimation. However, instability issues and demanding computational requirements have hindered research progress in the area. We present simple modifications to the Very Deep VAE to make it converge up to $2.6\times$ faster, save up to $20\times$ in memory load and improve stability during training. Despite these changes, our models achieve comparable or better negative log-likelihood performance than current state-of-the-art models on all $7$ commonly used image datasets we evaluated on. We also make an argument against using 5-bit benchmarks as a way to measure hierarchical VAE's performance due to undesirable biases caused by the 5-bit quantization. Additionally, we empirically demonstrate that roughly $3\%$ of the hierarchical VAE's latent space dimensions is sufficient to encode most of the image information, without loss of performance, opening up the doors to efficiently leverage the hierarchical VAEs' latent space in downstream tasks. We release our source code and models at https://github.com/Rayhane-mamah/Efficient-VDVAE .
△ Less
Submitted 28 April, 2022; v1 submitted 25 March, 2022;
originally announced March 2022.
-
NWT: Towards natural audio-to-video generation with representation learning
Authors:
Rayhane Mama,
Marc S. Tyndel,
Hashiam Kadhim,
Cole Clifford,
Ragavan Thurairatnam
Abstract:
In this work we introduce NWT, an expressive speech-to-video model. Unlike approaches that use domain-specific intermediate representations such as pose keypoints, NWT learns its own latent representations, with minimal assumptions about the audio and video content. To this end, we propose a novel discrete variational autoencoder with adversarial loss, dVAE-Adv, which learns a new discrete latent…
▽ More
In this work we introduce NWT, an expressive speech-to-video model. Unlike approaches that use domain-specific intermediate representations such as pose keypoints, NWT learns its own latent representations, with minimal assumptions about the audio and video content. To this end, we propose a novel discrete variational autoencoder with adversarial loss, dVAE-Adv, which learns a new discrete latent representation we call Memcodes. Memcodes are straightforward to implement, require no additional loss terms, are stable to train compared with other approaches, and show evidence of interpretability. To predict on the Memcode space, we use an autoregressive encoder-decoder model conditioned on audio. Additionally, our model can control latent attributes in the generated video that are not annotated in the data. We train NWT on clips from HBO's Last Week Tonight with John Oliver. NWT consistently scores above other approaches in Mean Opinion Score (MOS) on tests of overall video naturalness, facial naturalness and expressiveness, and lipsync quality. This work sets a strong baseline for generalized audio-to-video synthesis. Samples are available at https://next-week-tonight.github.io/NWT/.
△ Less
Submitted 8 June, 2021;
originally announced June 2021.